mcp-arabic-toolkit

mcp-arabic-toolkit

MCP server exposing Arabic text utilities: normalisation, tashkeel stripping, transliteration, heuristic dialect detection, and token counting.

Category
Visit Server

README

mcp-arabic-toolkit

A small Model Context Protocol (MCP) server exposing practical Arabic text utilities. Built with the official mcp Python SDK (FastMCP).

Demonstrates: MCP server authoring / tool development

All tools are implemented for real -- deterministic string processing plus one clearly-labelled heuristic. The pure logic lives in arabic_tools.py (no mcp dependency), so it is independently unit-tested; server.py is a thin MCP wrapper.

Tools

Tool Description Example input Example output
normalise_arabic NFC-normalises, removes diacritics (harakat/tashkil) and tatweel, and optionally unifies letter variants (alef/yeh/teh-marbuta). الْعَرَبِيَّةُ العربية
strip_tashkeel Removes only the diacritics (and, by default, the tatweel); leaves letters as-is. كــــتاب كتاب
transliterate Documented, deterministic Arabic→Latin romanisation (simplified DIN 31635 / ALA-LC, ASCII digraphs). كَتَبَ {"transliteration": "kataba", "scheme": "din31635-simplified-ascii"}
detect_dialect Heuristic dialect guess (Egyptian/Levantine/Gulf/Maghrebi/MSA) from marker words. Not a trained classifier — see limits below. شو بدك هلق؟ {"dialect": "levantine", "confidence": 1.0, ...}
count_tokens Whitespace-token count plus character and Arabic-character statistics. مرحبا يا عالم {"tokens": 3, "characters": 13, ...}

About detect_dialect (read this)

detect_dialect is an honest heuristic, not a machine-learning model. It counts hand-picked marker words/particles per dialect and returns the highest scorer. Known limits:

  • Only five coarse groups (Egyptian, Levantine, Gulf, Maghrebi, MSA).
  • Unreliable on short input, mixed-dialect text, and code-switching.
  • confidence is a crude ratio (winning hits / total hits), not a calibrated probability.
  • Falls back to MSA with confidence: 0.0 when no markers are found.

For production-grade detection, train a supervised classifier (e.g. fastText or a fine-tuned transformer) on a labelled corpus such as MADAR or NADI.

About transliterate

The romanisation is deterministic and documented but intentionally simple:

  • No vowel inference — short vowels are produced only from explicit harakat.
  • No context-sensitive rules — the article ال is always al- (no sun-letter assimilation), and hamzat al-wasl is not elided.
  • Shadda doubles the preceding consonant; sukun emits no vowel.
  • One-way (Arabic → Latin); not round-trippable.

Install

Requires Python 3.10+.

# Clone, then install the package (editable for local development):
pip install -e .

This pulls in the mcp SDK and registers a mcp-arabic-toolkit console script.

The tests themselves need only pytest (no mcp SDK):

pip install pytest

Run

# Option A: run the module directly (stdio transport)
python server.py

# Option B: run the installed console script
mcp-arabic-toolkit

Register with an MCP client

To use it from Claude Desktop (or any MCP client), add an entry to the client's MCP server config:

{
  "mcpServers": {
    "arabic-toolkit": {
      "command": "python",
      "args": ["/absolute/path/to/mcp-arabic-toolkit/server.py"]
    }
  }
}

Test

python -m pytest tests/ -v

The suite (tests/test_tools.py) imports the pure logic directly and covers every tool with concrete examples (diacritic/tatweel removal, letter unification, transliteration with and without harakat, each dialect, and token counting).

Quick local check

python -c "import arabic_tools; print(arabic_tools.normalise_arabic('الْعَرَبِيَّةُ'))"
# -> العربية

Publishing to the MCP registry

This package ships a server.json manifest compatible with the official MCP registry.

Exact metadata (server.json)

{
  "$schema": "https://static.modelcontextprotocol.io/schemas/2025-07-09/server.schema.json",
  "name": "io.github.benjiscollector/mcp-arabic-toolkit",
  "description": "MCP server exposing Arabic text utilities: normalisation, tashkeel stripping, transliteration, a heuristic dialect detector, and token counting.",
  "status": "active",
  "repository": {
    "url": "https://github.com/BenjisCollector/mcp-arabic-toolkit",
    "source": "github"
  },
  "version": "0.2.0",
  "packages": [
    {
      "registryType": "pypi",
      "registryBaseUrl": "https://pypi.org",
      "identifier": "mcp-arabic-toolkit",
      "version": "0.2.0",
      "transport": { "type": "stdio" }
    }
  ]
}

The server name uses the io.github.<owner>/<repo> namespace, which the registry verifies against GitHub ownership during publish.

Steps

  1. Build and publish the PyPI package so the registry has something to point at:
    python -m build
    twine upload dist/*
    
  2. Install the registry publisher CLI (mcp-publisher) — see the registry publishing guide.
  3. Authenticate with GitHub so the CLI can verify the io.github.* namespace:
    mcp-publisher login github
    
  4. Publish from the directory containing server.json:
    mcp-publisher publish
    

To list this server on the community modelcontextprotocol/servers README as well, see SUBMISSION.md for the exact entry text and PR steps.

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured