handelsregister-mcp

handelsregister-mcp

Enables AI agents to search and retrieve structured data from the German Commercial Register, including company details, filed documents, and shareholder lists.

Category
Visit Server

README

handelsregister-mcp

<!-- mcp-name: io.github.lexdoudkin/handelsregister-mcp -->

An open-source Model Context Protocol server for the German Commercial Register — handelsregister.de.

Give your AI agents first-class access to official German company data: search the register, read register extracts with management and capital, list filed documents, and pull the shareholder list (Gesellschafterliste) — all returned as structured data and tables, not raw HTML.

CI License: Apache 2.0 Python 3.10+ MCP Built in


„Das Internet ist für uns alle Neuland.“

In 2013, a German Federal Chancellor stood next to the US President and described the internet as Neuland — uncharted new territory. The line became a national meme.

More than a decade later, the official Handelsregister is a living monument to that Neuland era: a JavaServer Faces application that threads a server-side ViewState through every click, a lazy-loaded PrimeFaces document tree, a strict 60-requests-per-hour limit baked into its terms of use, and no public API whatsoever. Since 2022 the data is free and public by law — yet it stays locked behind a portal that no machine was ever meant to talk to.

That gap is exactly why MCP servers need to exist. An LLM agent can reason brilliantly about a company — if something hands it the company's data. This project is that something: a small, deterministic bridge from a relic of Neuland to the agents of today.


Features

  • 🔎 Company search — by name/keywords, with exact, all-keyword, and phonetic matching, plus fuzzy resolution that returns ranked suggestions when the name isn't exact.
  • 📄 Register extracts (AD / CD / HD) — parsed into structured fields: name, register number, seat, address, capital, business purpose, dates, and a management table (Geschäftsführer / Vorstand with birthdates).
  • 🧬 Structured XJustiz data (SI) — the machine-readable register payload, parsed.
  • 👥 Shareholders as a table — finds and downloads the filed Gesellschafterliste and extracts {shareholder, type, city, register, date_of_birth, shares, nominal_total_eur, percent}. Handles complex, multi-page, bilingual cap tables.
  • 🗂️ Filed-document register — list everything on file (shareholder lists, articles of association, annual accounts, …) and download any of it.
  • 🧾 OCR fallback — scanned/image-only PDFs are run through Tesseract so text still comes out.
  • ⏱️ Polite by design — a shared 60 req/hour limiter, on-disk caching, descriptive User-Agent.
  • 🧱 Fully deterministic — no LLM lives inside the server (see Design).

How it works

handelsregister.de exposes no REST API — only a JSF/PrimeFaces web form. Every request must carry a session cookie and a server-generated ViewState through a multi-step submission. This server reproduces that browser flow with mechanize and turns the rendered HTML/PDF/XML into clean structured data.

Since 1 August 2022 (the DiRUG law), both searches and document downloads are free of charge.

⚠️ Legal & rate limits — read this first

The Handelsregister is public, but not a free-for-all to scrape:

  • Its Nutzungsordnung (terms of use, per §9 HGB) forbids more than 60 retrievals per hour, and the portal FAQ warns that automated mass querying may be treated as a criminal offence (§§303a, b StGB).
  • This server enforces a shared 60 requests/hour limit by default. Do not raise it to abuse the portal. For high-volume or commercial use, use a licensed data provider (e.g. OpenRegister, handelsregister.ai) instead of scraping.
  • Register data contains personal data (managing directors, shareholders, birthdates). Handle it under the GDPR and use it only for the informational purposes the register is intended for.

This is a tool for legitimate, measured lookups — diligence, research, journalism, compliance — not bulk harvesting.

Install

git clone https://github.com/lexdoudkin/handelsregister-mcp.git
cd handelsregister-mcp
python -m venv .venv && source .venv/bin/activate
pip install -e .

# optional: OCR for scanned documents
pip install -e ".[ocr]"
brew install tesseract tesseract-lang   # macOS; provides the Tesseract binary + German pack

License note on the OCR extra: [ocr] pulls in PyMuPDF, which is AGPL-3.0 / commercial dual-licensed. The core package (without [ocr]) is fully permissive. If you redistribute a product built on the OCR path, mind AGPL's terms or obtain a commercial PyMuPDF license.

Connect it to an MCP client

Claude Code

claude mcp add handelsregister -- handelsregister-mcp

Claude Desktop — add to your MCP config (see examples/claude_desktop_config.json):

{
  "mcpServers": {
    "handelsregister": {
      "command": "handelsregister-mcp",
      "env": { "HANDELSREGISTER_MAX_PER_HOUR": "60" }
    }
  }
}

Then just ask your agent things like "Who are the shareholders of Trade Republic Bank GmbH?" or "Get the register details and managing directors of GASAG AG."

Tools

Every tool returns structured data — search rows, parsed company fields, and shareholder/management tables — plus a ready-to-render markdown table for document tools.

Tool What it does
search_company(keywords, match="all", similar=False, max_results=20) Search by name/keywords. matchall | min | exact. similar=True enables phonetic matching. * and ? wildcards.
get_company(name) Name lookup. Exact name → the record; inexact/ambiguous → found: false + ranked suggestions.
get_shareholders(company, which="latest") Shareholders as a table — resolves the name, downloads the filed Gesellschafterliste, and extracts structured rows.
list_filed_documents(company) List everything filed in the DK register, grouped by category, with dates.
fetch_filed_document(company, category, which="latest") Download any filed document by category (+ shareholder table for share lists).
fetch_document(keywords, document_type="AD", ...) Download a register extract. AD/CD/HD → company fields + management table; SI → XJustiz data.
rate_limit_status() Remaining requests in the current hour.

Document types: AD current extract · CD chronological extract · HD historical extract · SI structured XML (XJustiz) · announcements · UT holder data. (DK, the filed-documents register, is reached via list_filed_documents / get_shareholders / fetch_filed_document.)

Example: get_shareholders

// get_shareholders("Trade Republic Bank GmbH")
{
  "company": { "name": "Trade Republic Bank GmbH", "register_number": "HRB 244347" },
  "source_document": { "label": "List of shareholders … on 15/06/2026", "date": "2026-06-15" },
  "method": "pdfplumber",
  "confidence": "high",
  "stammkapital_eur": 96293600.0,
  "shareholders": [
    { "shareholder": "Accel Holdings-TR LLC", "type": "company", "city": "Palo Alto, USA",
      "nominal_total_eur": 13008200.0, "percent": "12,7936%" },
    { "shareholder": "Creandum V, L.P.", "type": "company", "city": "St. Peter Port",
      "nominal_total_eur": 12936200.0, "percent": "12,7228%" }
    // … 74 shareholders total
  ]
}

Fuzzy name resolution

get_company and get_shareholders don't need the exact registered name. Exact match → they proceed; otherwise they fall back to keyword + phonetic search, rank the candidates, and return suggestions — e.g. get_shareholders("Trade Republic")["Trade Republic Bank GmbH", "Trade Republic Service GmbH", …].

Shareholder extraction (layered & deterministic)

The Gesellschafterliste has no standard layout, so extraction is layered — strongest engine first:

  1. Coordinate-aware table parsing (pdfplumber) — rebuilds columns from the PDF's ruling lines / text positions. Handles complex and bilingual cap tables (e.g. Trade Republic's 74-shareholder German/English list) that flat text extraction garbles.
  2. Text heuristic — the standard single-language notarial template.

The method field tells you which engine produced the table.

Shareholders are not in the register extract for a GmbH/UG — they exist only in this separately filed list, which get_shareholders downloads and parses.

Design: a deterministic tool

An MCP server is consumed by an LLM agent — so it would be redundant (and non-deterministic) to call another LLM inside it. This server never calls an LLM. When neither parser is confident (confidence: "low"), it doesn't guess: it returns the raw_text and the downloaded PDF path, and the calling agent — which is already an LLM — reads those and extracts the table itself. Intelligence lives in the caller; the tool just fetches data, faithfully.

OCR is the one heavyweight step, and it's deterministic too: scanned PDFs are rasterised and run through Tesseract (German) so text still comes out.

Configuration

Env var Default Purpose
HANDELSREGISTER_MAX_PER_HOUR 60 Hourly request cap. Do not exceed the portal limit.
HANDELSREGISTER_DOWNLOAD_DIR system temp dir Where downloaded documents are written.
HANDELSREGISTER_OCR auto OCR for scanned PDFs: auto (only when the text layer is empty), always, or off.
HANDELSREGISTER_OCR_LANG deu Tesseract language(s), e.g. deu+eng.

Use as a library

from handelsregister_mcp import HandelsregisterClient

client = HandelsregisterClient()
hits = client.search("GASAG AG", match="exact")
doc = client.fetch_document(hits[0]["row_index"], "AD")
print(doc["structured"]["management"])

Project layout

src/handelsregister_mcp/
├── server.py      # MCP tools (FastMCP)
├── client.py      # the JSF/PrimeFaces portal client (search + documents)
├── parsers.py     # register extract, XJustiz SI, shareholder-list parsing
├── ocr.py         # optional Tesseract OCR fallback
└── ratelimit.py   # shared 60/hour limiter

Known limitations

  • Scanned tables: OCR recovers text but not table structure — those parse low-confidence and return raw_text for the caller. Coordinate-based OCR table reconstruction (Tesseract TSV) is a possible future improvement.
  • Pagination: fuzzy suggestions are drawn from the first results page; a very common partial query may miss a match that sits on a later page.
  • Portal markup drift: the document-register flow depends on the live HTML; if the portal changes, parsing may need a tweak. PRs welcome.

Contributing

Issues and pull requests are welcome — especially for parser robustness across the many notarial Gesellschafterliste templates, and for additional register document types. Please keep changes deterministic and respectful of the portal's rate limits.

Credits

The portal search flow is adapted from the excellent bundesAPI/handelsregister / deutschland project (Apache-2.0), part of the bund.dev effort to document and open up Germany's public APIs.

License

Apache-2.0.

Disclaimer

Not affiliated with the Handelsregister, the German Federal States, or any official body. Use it lawfully and within the portal's terms of use. The maintainers accept no liability for how the data is obtained or used.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured