zh-dict-mcp
Provides Chinese figurative language lookup backed by CC-CEDICT, helping to verify whether a word or phrase has recorded figurative usage or is a one-off creative expression.
README
zh-dict-mcp
MCP server for Chinese figurative language lookup, backed by CC-CEDICT.
What it does: given a Chinese word or phrase, tells you whether its figurative usage has been lexicalized (recorded in the dictionary as an independent sense) or is a one-off creative expression.
Why it exists: LLMs writing Chinese dialogue, fiction, or roleplay tend to invent purple-prose figurative expressions that no real person would say (e.g., "他把心锁进铁盒里" / "墙比夜更厚"). This tool gives you an objective dictionary-backed check.
Install
Pick your MCP-aware client. Across all of them the runtime command is the same — uvx zh-dict-mcp — but the wrapping config differs.
Claude Code
claude mcp add zh-dict-mcp -- uvx zh-dict-mcp
Codex CLI
codex mcp add zh-dict-mcp -- uvx zh-dict-mcp
Or edit ~/.codex/config.toml directly:
[mcp_servers.zh-dict-mcp]
command = "uvx"
args = ["zh-dict-mcp"]
Cursor
In Cursor: Settings → MCP → Add new server (UI), or edit ~/.cursor/mcp.json:
{
"mcpServers": {
"zh-dict-mcp": {
"command": "uvx",
"args": ["zh-dict-mcp"]
}
}
}
Claude Desktop
Edit claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\):
{
"mcpServers": {
"zh-dict-mcp": {
"command": "uvx",
"args": ["zh-dict-mcp"]
}
}
}
Restart Claude Desktop to load the server.
Windsurf / Zed / other MCP-aware clients
The JSON block above is universal — find your client's MCP config file (search for "MCP" in its settings docs) and paste it in.
With an optional project whitelist
If you have a project-level whitelist of "approved dead metaphors" the dictionary happens to miss, point the server at it:
"args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/your_whitelist.yaml"]
Or set the environment variable ZH_DICT_WHITELIST=/abs/path/to/your_whitelist.yaml.
After install, the lookup_dictionary tool is exposed to your AI client. uvx pulls the package from PyPI on first run, caches it locally, then launches the stdio MCP server. No pip install needed.
What you get
A single MCP tool:
lookup_dictionary(word: string) → JSON
Example: lookup_dictionary("看见") returns:
{
"word": "看见",
"found_in_cedict": true,
"simplified": "看见",
"traditional": "看見",
"pinyin": "kan4 jian4",
"definitions": ["to see", "to catch sight of"],
"tags": {
"has_figurative": false,
"is_neologism": false,
"is_slang": false,
"has_idiom_marker": false
}
}
Example: lookup_dictionary("内卷") returns:
{
"word": "内卷",
"found_in_cedict": true,
"definitions": [
"(embryology) to involute; involution",
"(neologism, attested by 2017) (of a society) to become more and more involuted..."
],
"tags": { "is_neologism": true, ... }
}
Example: lookup_dictionary("锁进铁盒里") (a creative one-off) returns:
{
"word": "锁进铁盒里",
"found_in_cedict": false,
"found_in_whitelist": false,
"definitions": []
}
Use cases
- AI-generated dialogue review: catch live metaphors LLM invents but no real speaker would use
- AI writing lint: pipeline filter for game NPC dialogue / interactive fiction / chatbot scripts
- Lexicalization research: check whether a figurative expression has been recorded in standard dictionaries
- New word verification: confirm neologisms / slang with
(neologism, attested by YEAR)attribution - Idiom / 典故 lookup: get figurative sense for idioms like "滑铁卢" → "(fig.) a defeat"
Data source
CC-CEDICT — open Chinese-English dictionary, 12.5万条目, community-maintained, weekly updates.
License: CC BY-SA 4.0. Bundled in package. See LICENSE-CC-CEDICT.
Why CC-CEDICT vs 现代汉语词典 (XDHYCD) or other sources:
| Source | Coverage on AI-writing test set | Notes |
|---|---|---|
| chinese-xinhua (GitHub data) | 46% | Heavy classical/古汉语 bias |
| 现代汉语词典 第7版 (XDHYCD7th) | 56% | Doesn't list literal compound words (放下/抓住/等等) |
| CC-CEDICT | ~95% | Modern usage + neologisms + (fig.) / (slang) / (neologism) markers |
CC-CEDICT explicitly tags figurative senses, neologisms with attestation years, slang, and idioms — exactly the structure needed for figurative-language analysis.
Optional: project whitelist
For project-specific overrides (e.g., words CC-CEDICT happens to miss):
# my_whitelist.yaml
allowed:
- word: 凛然
note: Standard literary usage, CC-CEDICT misses it
- word: 头疼
note: Override to include "annoyance" figurative sense
Pass via CLI:
{
"mcpServers": {
"zh-dict-mcp": {
"command": "uvx",
"args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/my_whitelist.yaml"]
}
}
}
Or via env var ZH_DICT_WHITELIST=/path/to/file.yaml.
When a word is in the whitelist, the result includes "found_in_whitelist": true and the note.
Python API (no MCP needed)
Use the lookup library directly without launching a server:
from zh_dict_mcp import DictionaryLookup
lookup = DictionaryLookup() # bundled CC-CEDICT loads in ~200ms
result = lookup.lookup("滑铁卢")
print(result.found) # True
print(result.definitions) # ['Waterloo (Belgium)', 'Battle of Waterloo (1815)', '(fig.) a defeat']
print(result.tags.has_figurative) # True
print(result.pinyin) # 'Hua2 tie3 lu2'
With custom whitelist:
from pathlib import Path
lookup = DictionaryLookup(whitelist_path=Path("my_whitelist.yaml"))
lookup.py has zero external dependencies (stdlib only). The mcp dependency is only needed for the MCP server.
Install standalone (no MCP, just Python library)
pip install zh-dict-mcp
Or with uv:
uv add zh-dict-mcp
Limitations
- English-language definitions (CC-CEDICT is a Chinese-English dictionary). Works well with LLMs that handle cross-lingual judgment (Claude, GPT-4+, Gemini). For monolingual Chinese consumers you'd need a translation layer.
- Sense matching is on the caller — this tool returns all senses; deciding whether the speaker's intended sense matches a returned sense is left to the LLM or human reviewer.
- Single-word / single-phrase lookup — doesn't parse full sentences. Wrap with your own NLP layer for sentence-level work.
- 9.4 MB data bundle — CC-CEDICT data is included in the wheel for offline use.
How it fits with broader writing-quality pipelines
This tool is one piece of a larger "AI-generated text quality" framework. Typical usage flow:
LLM generates Chinese dialogue
↓
Scan for figurative expressions (比喻 / 借代 / 委婉 / ...)
↓
For each: lookup_dictionary(expression)
↓
├── found + sense matches intent → pass
└── not found or sense mismatch → flag for rewrite
A reference review prompt for this flow is documented in Forgewright (the project that spawned this tool).
Project status
v0.1.0 — initial release. Validated on a 39-case test set covering 6 categories (dead metaphors / live metaphors / literal words / boundary cases / idioms / neologisms) with 100% accuracy.
Bug reports and PRs welcome.
License
- Code: MIT (see
LICENSE) - CC-CEDICT data: CC BY-SA 4.0 (see
LICENSE-CC-CEDICT)
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.