Gemini OCR MCP Server

Gemini OCR MCP Server

Provides OCR services powered by Google's Gemini API to extract text from images via file paths or base64 strings. It enables high-accuracy text recognition and CAPTCHA processing through simple MCP tools.

Category
Visit Server

README

Gemini OCR MCP Server

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

  • File-based OCR: Extract text directly from an image file on your local system.
  • Base64 OCR: Extract text from a base64 encoded image string.
  • Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
  • Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

  • Python 3.8 or higher
  • A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/WindoC/gemini-ocr-mcp
    cd gemini-ocr-mcp
    
  2. Create and activate a virtual environment:

    # Install uv standalone if needed
    
    ## On macOS and Linux.
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    ## On Windows.
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
  3. Install the required dependencies:

    uv sync
    

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "x:\\path\\to\\your\\project\\gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Linux/macOS Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/project/gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Note: Remember to replace the placeholder paths with the absolute path to your project directory.

Tools Provided

ocr_image_file

Performs OCR on a local image file.

  • Parameter: image_file (string): The absolute or relative path to the image file.
  • Returns: (string) The extracted text from the image.

ocr_image_base64

Performs OCR on a base64 encoded image.

  • Parameter: base64_image (string): The base64 encoded string of the image.
  • Returns: (string) The extracted text from the image.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured