Florence-2 Vision

Florence-2 is an advanced vision foundation model developed by Microsoft, designed to handle a wide range of computer vision and vision-language tasks through a unified, prompt-based approach.

Configuration Options

Model

Path: Provide the full path to the model file or a HuggingFace model name (e.g., hf:microsoft/Florence-2-large).
Available Models:
- Florence-2-large:
  
  High accuracy, slower performance.
- Florence-2-base:
  
  Balanced accuracy and speed.
- Florence-2-large-ft:
  
  Fine-tuned for specific applications, offering enhanced results with a slight performance trade-off.
- Florence-2-base-ft:
  
  Fine-tuned base version, faster with moderate accuracy.
Models Directory: Specify the storage location for models.
- Default: Data/Models/Florence-2
  
  Save and refresh to update the models list.

Beam Search

Adjust the number of beams for text generation. Higher values improve diversity but increase processing time.

Default: 3

Flash Attention

Disable flash attention if you encounter errors or hardware limitations.

Replacements

Define regex-based replacements to clean up generated answers. Use the format:

Examples:

"The image is a screenshot from a video call." =
"is looking directly at the camera" =
"expression on ((his)|(her)) face" =