Florence-2 MCP Server

Florence-2 MCP Server

An MCP server for processing images using Florence-2.

jkawamoto

Image & Video Processing
Visit Server

Tools

ocr

Process an image file or URL using OCR to extract text.

caption

Processes an image file and generates captions for the image.

README

Florence-2 MCP Server

Python Application GitHub License pre-commit Ruff smithery badge

An MCP server for processing images using Florence-2.

You can process images or PDF files stored on a local or web server to extract text using OCR (Optical Character Recognition) or generate descriptive captions summarizing the content of the images.

Installation

For Claude Desktop

To configure this server for Claude Desktop, edit the claude_desktop_config.json file with the following entry under mcpServers:

{
  "mcpServers": {
    "florence-2": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jkawamoto/mcp-florence2",
        "mcp-florence2"
      ]
    }
  }
}

After editing, restart the application. For more information, see: For Claude Desktop Users - Model Context Protocol.

For Goose CLI

To enable the Bear extension in Goose CLI, edit the configuration file ~/.config/goose/config.yaml to include the following entry:

extensions:
  bear:
    name: Florence-2
    cmd: uvx
    args: [ --from, git+https://github.com/jkawamoto/mcp-florence2, mcp-florence2 ]
    enabled: true
    type: stdio

For Goose Desktop

Add a new extension with the following settings:

  • Type: Standard IO
  • ID: florence-2
  • Name: Florence-2
  • Description: An MCP server for processing images using Florence-2
  • Command: uvx --from git+https://github.com/jkawamoto/mcp-florence2 mcp-florence2

For more details on configuring MCP servers in Goose Desktop, refer to the documentation: Using Extensions - MCP Servers.

Tools

ocr

Process an image file or URL using OCR to extract text.

Arguments:

  • src: A file path or URL to the image file that needs to be processed.

caption

Processes an image file and generates captions for the image.

Arguments:

  • src: A file path or URL to the image file that needs to be processed.

License

This application is licensed under the MIT License. See the LICENSE file for more details.

Recommended Servers

Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
@kazuph/mcp-fetch

@kazuph/mcp-fetch

Model Context Protocol server for fetching web content and processing images. This allows Claude Desktop (or any MCP client) to fetch web content and handle images appropriately.

Featured
Local
JavaScript
mcp-pinterest

mcp-pinterest

A Pinterest Model Context Protocol (MCP) server for image search and information retrieval

Featured
TypeScript
mermaid-mcp-server

mermaid-mcp-server

A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.

Featured
JavaScript
ScreenshotOne MCP Server

ScreenshotOne MCP Server

An official MCP server implementation that allows AI assistants to capture website screenshots through the ScreenshotOne API, enabling visual context from web pages during conversations.

Official
TypeScript
Glif

Glif

Run AI workflows hosted on Glif.app via MCP, including ComfyUI-based image generators, meme generators, selfies, chained LLM calls, and more

Official
TypeScript
DeepSRT MCP Server

DeepSRT MCP Server

An MCP server that enables users to generate summaries of YouTube videos in multiple languages and formats through integration with DeepSRT's API.

Official
JavaScript
WebPerfect MCP Server

WebPerfect MCP Server

An intelligent MCP server with a fully automated batch pipeline for web-ready images. Features include noise reduction, auto levels/curves, JPEG artifact removal, 4K resizing, smart sharpening with shadow/highlight enhancement, and advanced WebP conversion.

Local
JavaScript
Image-Gen-Server

Image-Gen-Server

A MCP server that integrates with Cursor IDE to generate images based on text descriptions using JiMeng AI, allowing users to create and save custom images directly within their development environment.

Local
Python
MCP Webcam Server

MCP Webcam Server

Enables users to send live webcam images to Claude Desktop or other MCP clients, facilitating interaction through capturing images, screenshots, and providing a webcam view for visual input.

Local
TypeScript