yandex-vision-ocr-mcp

yandex-vision-ocr-mcp

Provides OCR capabilities for images and PDFs using Yandex Vision API. Supports multiple recognition models and output formats.

Category
Visit Server

README

Yandex Vision OCR MCP

npm version npm downloads License: MIT Node Tests

A Model Context Protocol (MCP) server that exposes Yandex Vision OCR as tools, so any MCP-compatible client — opencode, Claude Desktop, Cursor, Cline — can extract text from images and PDFs.

Features

  • recognize_text — synchronous OCR for images (JPEG/PNG/WEBP/HEIC/HEIF) and single-page PDFs.
  • recognize_pdf — asynchronous OCR for PDFs (single- or multi-page) and large files, via recognizeTextAsync + getRecognition polling.
  • Recognition models — printed text, multi-column, handwritten, tables, Markdown, and math formulas (LaTeX), selectable per call.
  • Accepts a local file path or raw base64 content.
  • Recognition languages selectable between ru and en (default ru; combine as ["ru","en"] for mixed text). Auto-detect is not supported by this endpoint.
  • Three output formats: text (default), markdown, or full json (the raw textAnnotation with blocks/lines/words/tables/entities).
  • Zero-touch error handling — API failures are returned as isError tool results, never crashes.
  • Lazy credentials — the server boots and lists tools even before YANDEX_* env vars are set, surfacing a clear error only on the first call.

Prerequisites

  1. Node.js ≥ 20.
  2. A Yandex Cloud account with the Vision/OCR API enabled.
  3. A folder ID + either an API key (recommended) or an IAM token. See the authentication docs.

Quick start

# Run directly with npx (no install needed)
npx -y yandex-vision-ocr-mcp

Then wire it into your MCP client (see Configuration).

Configuration

The server reads credentials from environment variables:

Variable Required Description
YANDEX_FOLDER_ID optional Yandex Cloud folder ID. Only sent as x-folder-id when set — the API key already scopes requests, so you can usually leave this unset. If set, it must match the key's folder.
YANDEX_API_KEY one of API key (recommended for long-lived usage).
YANDEX_IAM_TOKEN one of Short-lived IAM token (~12h). Use instead of an API key.

See .env.example for a template.

Models

Pass model to any tool to pick the recognition behaviour:

Model Best for
page (default) Single-column printed text.
page-column-sort Multi-column printed text.
handwritten Mixed handwritten + printed text (Russian, English).
table Tables (Russian, English).
markdown Printed text, also returned as Markdown.
math-markdown Math formulas, returned as Markdown with LaTeX (e.g. $a^2 + b^2$).

Tip: use format: "markdown" together with the markdown / math-markdown models to receive the model's Markdown output directly.

Tools

Both tools accept the same input shape:

Argument Type Default Description
path string Local file to OCR. Provide this or base64.
base64 string Base64 content (data: URIs accepted). Provide this or path.
mimeType string inferred Explicit MIME type override.
languages string[] ["ru"] Recognition languages, selectable: ru, en (e.g. ["ru","en"] for mixed).
model string page Recognition model — see Models.
format text | markdown | json text Output format.

Supported formats: JPEG, PNG, WEBP, HEIC, HEIF (images) and PDF. The mimeType sent to the API is derived automatically (you can pass a standard MIME type via mimeType if needed). BMP/TIFF are not supported by the service.

  • recognize_text — synchronous. Best for images and single-page PDFs.
  • recognize_pdf — asynchronous (submit + poll). Best for multi-page PDFs and large files. Requires the input to be a PDF.

Example result (text format)

Hello World
Yandex OCR

Connect to opencode

Add the server to your opencode.json under mcp:

{
  "mcp": {
    "yandex-vision-ocr": {
      "type": "local",
      "command": ["npx", "-y", "yandex-vision-ocr-mcp@latest"],
      "enabled": true,
      "environment": {
        "YANDEX_FOLDER_ID": "b1g...",
        "YANDEX_API_KEY": "your-api-key"
      }
    }
  }
}

If you cloned the repo instead, replace the command with ["node", "/absolute/path/to/yandex-vision-ocr-mcp/build/index.js"].

Connect to Claude Desktop / Cursor / Cline

<details> <summary>Claude Desktop — <code>claude_desktop_config.json</code></summary>

{
  "mcpServers": {
    "yandex-vision-ocr": {
      "command": "npx",
      "args": ["-y", "yandex-vision-ocr-mcp@latest"],
      "env": {
        "YANDEX_FOLDER_ID": "b1g...",
        "YANDEX_API_KEY": "your-api-key"
      }
    }
  }
}

</details>

<details> <summary>Cursor — <code>.cursor/mcp.json</code></summary>

{
  "mcpServers": {
    "yandex-vision-ocr": {
      "command": "npx",
      "args": ["-y", "yandex-vision-ocr-mcp@latest"],
      "env": {
        "YANDEX_FOLDER_ID": "b1g...",
        "YANDEX_API_KEY": "your-api-key"
      }
    }
  }
}

</details>

Local development

git clone https://github.com/chupre/yandex-vision-ocr-mcp.git
cd yandex-vision-ocr-mcp
npm install
npm run build      # type-check + compile to build/
npm test           # run the vitest suite
npm run dev        # run the server from source via tsx
npm run inspector  # open the MCP Inspector UI against the build

Useful scripts:

Script Description
npm run build Compile TypeScript to build/.
npm run typecheck Type-check without emitting.
npm test Run the offline test suite.
npm run dev Run the server from source (tsx).
npm run inspector Launch the MCP Inspector for manual testing.

Testing

The offline suite covers input handling, MIME inference, response formatting, the HTTP client (via a fake transport, no network), tool wiring, and a full MCP round-trip over an in-memory transport.

Live integration tests hit the real Yandex OCR API and are skipped unless credentials and sample files are provided:

YANDEX_FOLDER_ID=... YANDEX_API_KEY=... \
YOCR_LIVE_IMAGE=./sample.png \
YOCR_LIVE_PDF=./sample.pdf \
npx vitest run tests/live.test.ts

Docker

docker build -t yandex-vision-ocr-mcp .
docker run --rm -i \
  -e YANDEX_FOLDER_ID=b1g... \
  -e YANDEX_API_KEY=... \
  yandex-vision-ocr-mcp

API coverage

This server targets the Yandex Cloud Vision OCR REST API (ocr.api.cloud.yandex.net/ocr/v1):

Route Method Used for
/recognizeText POST Synchronous recognition (recognize_text).
/recognizeTextAsync POST Start async recognition (recognize_pdf).
/getRecognition GET Poll for the async result.

Concepts: OCR overview · image · PDF · handwritten.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured