🥞
Glif Docs/Guide
  • Getting Started
    • 👋What is Glif?
    • 💡What can I do with a glif?
      • 🏃Run a glif
      • 🔌Build a glif
      • 🔀Remix a glif
      • 🗣️Comment on a glif
      • 🔳Embed a glif
    • ⚒️How do I build a glif?
      • 📽️Video tutorial: Building a simple image generator
      • 🟰Using variables
    • ⚙️Profile Settings
    • 🪙Credits and Payments
    • ❓FAQs
  • Blocks
    • 🙋Inputs
      • ✍️Text Input Block
      • 🖼️Image Input Block
      • 📋Multipick Block
    • 🪄Generators
      • 📃Text Block
      • 🖼️Image Block
      • ➡️Image to Text Block
        • Florence2Sam2Segmenter
    • 🧰Tools
      • 🔀Text Combiner Block
      • 🔬JSON Extractor Block
    • 💅Styling
      • 🎨HTML Block
      • 🖼️Canvas Block
    • 🧑‍🔬Advanced/Experimental
      • 🎙️Audio Input Block
      • ↔️Glif Block
      • 🌐Web Fetcher Block
      • 🔊Audio Spell
      • 🧱ComfyUI Block
      • 📡Audio to Text Block
      • 🎥Video Input Block
      • 🔧JSON Repair Block
  • Apps
    • 🎨Glif It! Browser Extension
  • Glif University
    • 🎥Video Tutorials
      • 🐲How To: D&D Character Sheet Generator
      • 🧠How To: Expanding Brain Meme Generator
      • 🦑How To: Occult Memelord Generator
      • 🥸How To: InstantID Portrait Restyle Glif
      • 🕺How To: Style and Pose a Character with InstantID + Controlnet
      • 😱How To: Create a Simple Cartoon Portrait Animation Glif (LivePortrait + Custom Blocks)
      • 👗How to Create a Clothing Restyler App (IP Adapter, ControlNet + GPT Vision)
      • 🤡How to Create a 4+ Panel Storyboard/Comic (Flux Schnell)
      • 🎂How to Create a Recipe Generator with Accompanying Pictures
      • How to Use JasperAI Depth Controlnet on Flux Dev
      • 🦸‍♂️How to Make a Consistent Comic Panel Generator
    • 🧑‍🏫Prompt Engineering 101
    • 🖼️ControlNet
    • 📚AI Glossary
  • API - for Developers
    • ⚡Running glifs via the API
    • 🤖Using AI Assistants to build with the Glif API
    • 📙Reading & writing data via the API
    • 🗾Glif Graph JSON Schema
    • 📫Embed player & custom webpages
    • 📫Sample code
    • ❓What can I make with the Glif API?
      • Browser Extensions
      • Discord Bots
      • Games
      • Social Media Bots
      • Experimental Projects
  • Policies
    • 👨‍👩‍👧‍👦Community Guidelines
  • Programs
    • 🖼️Loradex Trainer Program
  • Community Resources
    • 🧑‍🤝‍🧑Resources Created by Glif Community Members
  • Contact Us
    • 📣Send us your feedback
    • 🚔Information for law enforcement
Powered by GitBook
On this page
  • Advanced Settings
  • Under the hood
  1. Blocks
  2. Generators
  3. Image to Text Block

Florence2Sam2Segmenter

A specialized engine to find objects in images and predict masks.

PreviousImage to Text BlockNextTools

Last updated 4 days ago

The Florence2SamSegmenter is a Glif-hosted service that uses and under the hood. It's inspired by this .

The model can be found in the dropdown under Model.

You can now add an image url under Image. For example, if it's run with the following image:

We will get:

This format is specialized for use in scripts, which can come in handy when building artifacts!

Continue reading for the advanced settings and how it works under the hood.

Advanced Settings

  • Output mask: instead of polygons the output now contains mask_url , which is a url to a black-and-white bitmask of the segmentation. Since there is only one export layer, multiple object segmentations are flattened.

  • Provide custom caption: instead of letting Florence2 predict a caption, you can also supply a caption yourself if you know what's in the image or if you want to specify an object to cut out.

  • Area threshold: after the Florence2 grounding step, you can filter out very large bounding boxes. Every bounding box with an area higher than this threshold is discarded. Expressed relative to the total image size.

  • Confidence threshold: disregard any SAM2 mask predictions with a confidence score lower than this threshold. Note: do not confuse this with the bounding box confidence.

  • NMS threshold: uses the SAM2 scores and filters out overlapping bounding boxes using Non-Max Supression.

  • Polygon precision: lower values create a more detailed polygon shape.

Under the hood

Florence2 is a versatile model that can do multiple tasks, but here we have chained together two tasks: MORE_DETAILED_CAPTIONING and CAPTION_TO_PHRASE_GROUNDING.

MORE_DETAILED_CAPTIONING predicts a detailed caption. For the rock on the table image we would get something like:

The image shows a large rock on a wooden table. The rock appears to be made of a light-colored material, possibly stone or concrete, and has a rough texture. It is resting on the table with its head slightly tilted to the side. The background is blurred, but it seems to be a room with a window and a wall. The lighting is soft and natural, creating a warm and cozy atmosphere.

We then pass the caption to the CAPTION_TO_PHRASE_GROUNDING task. This will produce a list of found objects and corresponding bounding boxes. This step could output something like:

The first filter checks the sizes of all bounding boxes and then removes those with a relative area higher than the Area threshold setting. In this example, the table bbox will be removed as it's area is 0.51 and is higher than the 0.3 threshold.

Then, SAM2 is used to transform each bounding box into a mask. It will return a bitmask and a confidence score per object.

The second filter is based purely on the confidence score. Everything lower than the set Confidence threshold will be removed.

The last filter removes overlapping objects with Non-Max Supression. Since the bounding boxes themselves don't have confidence scores, we use the corresponding SAM2 confidence scores.

This results in the final prediction:

🪄
➡️
Florence2
Sam2
Hugging Face space