Florence-2 Vision

Florence-2 is an advanced vision foundation model developed by Microsoft, designed to handle a wide range of computer vision and vision-language tasks through a unified, prompt-based approach.

Configuration Options

Model

  • Path: Provide the full path to the model file or a HuggingFace model name (e.g., hf:microsoft/Florence-2-large).

  • Available Models:

    • Florence-2-large:

      High accuracy, slower performance.

    • Florence-2-base:

      Balanced accuracy and speed.

    • Florence-2-large-ft:

      Fine-tuned for specific applications, offering enhanced results with a slight performance trade-off.

    • Florence-2-base-ft:

      Fine-tuned base version, faster with moderate accuracy.

  • Models Directory: Specify the storage location for models.

    • Default: Data/Models/Florence-2

      Save and refresh to update the models list.

Adjust the number of beams for text generation. Higher values improve diversity but increase processing time.

  • Default: 3

Flash Attention

Disable flash attention if you encounter errors or hardware limitations.

Replacements

Define regex-based replacements to clean up generated answers. Use the format:

Examples:

  • "The image is a screenshot from a video call." =
  • "is looking directly at the camera" =
  • "expression on ((his)|(her)) face" =