Võro MCP Server
An MCP server for working with the Võro language: local dictionary and corpus lookup, GiellaLT-backed analysis/spellcheck/grammar tools, and Neurotõlge translation.
README
Võro MCP Server
An MCP server for working with the Võro language: local dictionary and corpus lookup, GiellaLT-backed analysis/spellcheck/grammar tools, and Neurotõlge translation.
What it's for
This MCP server gives language models practical tools for working with the Võro language. Võro is a lower-resource language, so general-purpose language models may make more mistakes with it than with widely supported languages such as English, or even Estonian.
By connecting the model to dictionaries, corpus search, morphological analysis, spellchecking, grammar checking, translation, and form generation, this server helps improve the model’s ability to understand, generate, correct, and translate Võro text.
It can be useful for tasks such as Võro translation, checking and improving generated text, exploring real usage examples, generating word forms, detecting unknown words, and supporting people who are learning or working with the Võro language.
Install
On Debian/Ubuntu, one command does the whole setup.
make setup # scripts/run_local_ubuntu.sh; see `make help` for every task
You need make, Python, and the usual shell tools installed first. The setup
target installs the HFST/Divvun system binaries (adding the Apertium package
repo first if your system can't already find divvun-gramcheck), downloads the
SQLite datasets and the prebuilt Giella models, creates .venv, and installs
the package. Smoke-test it with make test.
Manual setup
For other platforms, or to see the pieces:
-
System binaries the Giella tools shell out to:
hfst-optimized-lookup,hfst-ospell,cg3,divvun-checker. On Debian/Ubuntu they come from the Apertium nightly apt repo:curl -fsSL https://apertium.projectjj.com/apt/install-nightly.sh | sudo bash sudo apt-get install -y hfst hfst-ospell cg3 divvun-gramcheck perl gawk bash -
The package, into a virtualenv (
make install):python3 -m venv .venv . .venv/bin/activate pip install -e . -
Data and models, they are pulled from GitHub releases:
scripts/fetch_data.sh scripts/fetch_giella.sh -
Verify the external tools resolved (prints JSON and exits):
vro-mcp-check
Tools
| Tool | What it does |
|---|---|
lookup_word |
Dictionary lookup (en↔vro). |
find_usage_examples |
Full-text corpus search for real usage. |
word_exists_in_bag |
Fast check whether a word form has been seen. |
find_unknown_words |
List word forms in a text absent from the word bag. |
analyze_word |
GiellaLT morphological analysis. |
generate_forms |
GiellaLT generation for one exact lemma + tag analysis. |
spellcheck_vro |
Token-level spellcheck with suggestions. |
grammar_check_vro |
Sentence-level grammar check. |
lint_estonian_leakage |
Flag Estonian-looking endings in Võro text. |
suggest_correction |
Analyzer-verified fixes for a bad/unknown form. |
translate_vro |
Neurotõlge/TartuNLP translation. |
check_setup |
Report database and external Giella tool availability. |
Most lookup tools accept a single word or a list for batched queries.
The open dictionary currently covers English↔Võro only.
Resources
Two Markdown grammar references are exposed over MCP:
vro://grammar/noun-cases: noun/adjective/numeral/pronoun declension.vro://grammar/verb-conjugation: verb conjugation, moods, tenses, voice.
Configuration
The data lives under data/ and is fetched for you. make setup (or make data + make giella) downloads everything, so you normally configure nothing.
The fetch scripts honour VRO_DATA_REPO, VRO_DATA_TAG, and VRO_GIELLA_TAG
to point at a different release.
All path defaults are repo-local; override any with environment variables or a
local .env where the deploy script supports it.
| Variable | Default | Description |
|---|---|---|
VRO_DICTIONARY_DB |
./data/vro_dictionary.sqlite |
Dictionary SQLite path used by lookup_word and correction suggestions. |
VRO_CORPUS_DB |
./data/vro_corpus.sqlite |
Corpus SQLite path used by find_usage_examples. |
VRO_WORD_BAG_DB |
./data/vro_word_bag.sqlite |
Word-bag SQLite path used by seen/unknown word checks. |
VRO_NEUROTOLGE_BASE_URL |
https://api.tartunlp.ai/translation/v2 |
Neurotõlge/TartuNLP translation API base URL. |
VRO_ANALYZER_CMD |
./tools/giella/bin/analyze-vro |
Command used for GiellaLT morphological analysis. |
VRO_GENERATOR_CMD |
./tools/giella/bin/generate-vro |
Command used for one-analysis GiellaLT form generation. |
VRO_SPELLER_CMD |
./tools/giella/bin/spellcheck-vro |
Command used for token spellchecking. |
VRO_GRAMMAR_CMD |
./tools/giella/bin/grammar-check-vro |
Command used for sentence grammar checking. |
VRO_ANALYZER_MODEL |
./data/giella-share/giella/vro/analyser-gt-desc.hfstol |
Model path used by tools/giella/bin/analyze-vro. |
VRO_GENERATOR_MODEL |
./data/giella-share/giella/vro/generator-gt-norm.hfstol |
Model path used by tools/giella/bin/generate-vro. |
VRO_SPELLER_MODEL |
./data/giella-share/voikko/3/vro.zhfst |
Speller archive path used by tools/giella/bin/spellcheck-vro. |
VRO_GRAMMAR_MODEL |
./data/giella-share/voikko/4/vro.zcheck |
Grammar checker archive path used by tools/giella/bin/grammar-check-vro. |
VRO_SPELLER_MAX_SUGGESTIONS |
10 |
Maximum spelling suggestions returned per unknown token. |
VRO_DATA_REPO |
Leo-Martin-Pala/voro-mcp |
GitHub repository used for dataset and Giella release downloads. |
VRO_DATA_TAG |
data-v1 |
GitHub release tag fetched for vro-data.tar.xz by scripts/fetch_data.sh and Modal release hydration. |
VRO_GIELLA_TAG |
giella-v1 |
GitHub release tag fetched for giella-share.tar.xz by scripts/fetch_giella.sh and Modal release hydration. |
VRO_GIELLA_BUILD_DIR |
./.cache/giella-build |
Temporary build directory for make giella-build. |
VRO_GIELLA_ARTIFACT_DIR |
./data/giella-share |
Output directory for locally built Giella artifacts. |
MCP_PATH |
/mcp locally; generated in .env for Modal deploys |
Secret hosted HTTP path segment for Modal; local stdio clients do not need it. |
DATA_SOURCE |
release |
Modal deploy data source: release, local, or none. |
FORCE_DATA |
0 |
Set to 1 to overwrite existing Modal Volume data. |
DATA_DIR |
./data |
Local data directory used when DATA_SOURCE=local. |
NEW_SECRET |
0 |
Set to 1 to rotate MCP_PATH during deploy and save it to .env. |
LOCAL_SECRET |
0 |
Set to 1 to push the MCP_PATH from .env/environment as-is and fail if it is empty (used by make deploy-local-secret). |
MODAL_APP_NAME |
vro-mcp |
Modal app name used by deploy/undeploy scripts. |
MODAL_VOLUME_NAME |
vro-data |
Modal Volume name for SQLite data and Giella artifacts. |
MODAL_SECRET_NAME |
vro-mcp-secret |
Modal secret name storing MCP_PATH. |
Point a client at it
You don't run the server yourself. The MCP client launches the vro-mcp-server
binary and talks to it over stdio, so all a client needs is the binary's
absolute path. Run make local-url to print that path and ready-to-paste config
for Claude Code and Codex.
Claude Code:
claude mcp add vro -- /absolute/path/to/vro-mcp-server/.venv/bin/vro-mcp-server
Codex:
codex mcp add vro -- /absolute/path/to/vro-mcp-server/.venv/bin/vro-mcp-server
Generic JSON MCP client configuration:
{
"mcpServers": {
"vro": {
"command": "/absolute/path/to/vro-mcp-server/.venv/bin/vro-mcp-server",
"cwd": "/absolute/path/to/vro-mcp-server"
}
}
}
Deployment
To host the server in the cloud so Claude or ChatGPT (the web apps) can reach it, see DEPLOY.md.
License
Code is MIT (LICENSE). The SQLite datasets are CC-BY-SA-4.0 and the prebuilt
GiellaLT models are GPL-3.0. Both are distributed as separate release assets,
each bundling its own license and attribution. See NOTICE.md for scope
and source summaries.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.