Florence-2 is an advanced vision foundation model developed by Microsoft, designed to handle a wide range of computer vision and vision-language tasks through a unified, prompt-based approach.
Configuration Options
Model
-
Path: Provide the full path to the model file or a HuggingFace model name (e.g.,
hf:microsoft/Florence-2-large
). -
Available Models:
-
Florence-2-large:
High accuracy, slower performance.
-
Florence-2-base:
Balanced accuracy and speed.
-
Florence-2-large-ft:
Fine-tuned for specific applications, offering enhanced results with a slight performance trade-off.
-
Florence-2-base-ft:
Fine-tuned base version, faster with moderate accuracy.
-
-
Models Directory: Specify the storage location for models.
-
Default:
Data/Models/Florence-2
Save and refresh to update the models list.
-
Beam Search
Adjust the number of beams for text generation. Higher values improve diversity but increase processing time.
- Default:
3
Flash Attention
Disable flash attention if you encounter errors or hardware limitations.
Replacements
Define regex-based replacements to clean up generated answers. Use the format:
Examples:
"The image is a screenshot from a video call." =
"is looking directly at the camera" =
"expression on ((his)|(her)) face" =