claude-orator-mcp
An MCP server for deterministic prompt optimization in Claude Code. Score prompts across 7 quality dimensions, auto-select from 11 Anthropic techniques, and return a structural scaffold.
README
<img align="right" src="claude-orator.svg" alt="claude-orator-mcp" width="220">
claude-orator-mcp
An Model Context Protocol (MCP) server for deterministic prompt optimization in Claude Code. Score prompts across 7 quality dimensions, auto-select from 11 Anthropic techniques, and return a structural scaffold. No LLM calls, no network, sub-millisecond.
<br clear="right">

install
Requirements:
From shell:
claude mcp add claude-orator-mcp -- npx claude-orator-mcp
From inside Claude (restart required):
Add this to our global mcp config: npx claude-orator-mcp
Install this mcp: https://github.com/Vvkmnn/claude-orator-mcp
From any manually configurable mcp.json: (Cursor, Windsurf, etc.)
{
"mcpServers": {
"claude-orator-mcp": {
"command": "npx",
"args": ["claude-orator-mcp"],
"env": {}
}
}
}
There is no npm install required -- no external dependencies or databases, only deterministic heuristics.
However, if npx resolves the wrong package, you can force resolution with:
npm install -g claude-orator-mcp
skill
Optionally, install the skill to teach Claude when to proactively optimize prompts:
npx skills add Vvkmnn/claude-orator-mcp --skill claude-orator --global
# Optional: add --yes to skip interactive prompt and install to all agents
This makes Claude automatically optimize prompts before dispatching subagents, writing system prompts, or crafting any prompt worth improving. The MCP works without the skill, but the skill improves discoverability.
plugin
For automatic prompt optimization with hooks and commands, install from the claude-emporium marketplace:
/plugin marketplace add Vvkmnn/claude-emporium
/plugin install claude-orator@claude-emporium
The claude-orator plugin provides:
Hooks (fires before subagent dispatch):
- Before Task -- Suggest prompt optimization before launching agents
Commands: /reprompt-orator
Requires the MCP server installed first. See the emporium for other Claude Code plugins and MCPs.
features
MCP server with a single tool. Prompt in, optimized prompt out.
orator_optimize
Analyze a prompt across 7 quality dimensions, auto-select from 11 Anthropic techniques, and return a structurally optimized scaffold with before/after scores.
orator_optimize prompt="Write a function that sorts users"
> Returns optimized scaffold with XML tags, output format, examples section
orator_optimize prompt="You are a helpful assistant" intent="system"
> Returns role-assigned system prompt with structure and constraints
orator_optimize prompt="Extract all emails from this text" techniques=["xml-tags", "few-shot"]
> Force-applies specific techniques regardless of auto-selection
Score meter (gradient fill bar):
๐ชถ 3.2 โโโโโโโโโโโ 7.8
+xml-tags +few-shot +structured-output ยท 3 issues
Wrapped in XML tags, added examples, specified output format
Three-zone bar: โโโ (baseline) โโโโโ (improvement) โโ (headroom to 10).
Minimal case (already well-structured):
๐ชถ โโ already well-structured (8.4)
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt |
string | Yes | The raw prompt to optimize |
intent |
enum | No | code | analysis | creative | extraction | conversation | system (auto-detected) |
target |
enum | No | claude-code | claude-api | claude-desktop | generic (default: claude-code) |
techniques |
string[] | No | Force-apply specific technique IDs |
Output:
| Field | Type | Description |
|---|---|---|
optimized_prompt |
string | Rewritten prompt scaffold (primary output) |
score_before |
number | Quality score of original (0-10) |
score_after |
number | Quality score after optimization (0-10) |
summary |
string | 1-line explanation of improvements |
detected_intent |
string | Auto-detected intent category |
applied_techniques |
string[] | Technique IDs applied |
issues |
string[] | Detected problems |
suggestions |
string[] | Actionable fixes |
The optimized_prompt is a structural scaffold. Claude refines it with domain knowledge, codebase context, and conversation history.
methodology
How claude-orator-mcp works:
๐ชถ claude-orator-mcp
โโโโโโโโโโโโโโโโโโโโ
orator_optimize
โโโโโโโโโโโโโโ
PROMPT
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ Detect โ โ Measure โ
โ Intent โ โ Complexity โ
โโโโโโโฌโโโโโโ โโโโโโโโฌโโโโโโ
โ โ
system > code > word count +
extraction > clause depth
analysis > โ
creative > โ
conversation โ
+ disambiguation โ
+ fallback heuristics โ
โ โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ
โ Score Before โ
โ โ
โ clarity 20% โ strong verbs, single task
โ specificity 20% โ named tech, constraints
โ structure 15% โ XML tags, headers, lists
โ examples 15% โ input/output pairs
โ constraints 10% โ scope, edge cases
โ output_fmt 10% โ format specification
โ efficiency 10% โ no filler, no redundancy
โ โ
โ โโโโโโโโโโ 3.2 โ
โโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ techniques?
โ Select Techniques โโโโโโ (force override)
โ โ
โ when_to_use() ร โ 11 predicates
โ intent match ร โ filtered
โ score gaps ร โ sorted by impact
โ cap at 4 โ
โโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ
โ Template Assembly โ
โ โ
โ role preamble โ expert identity
โ โ <context> โ grounding data first
โ โ <task> โ XML-wrapped prompt
โ โ <requirements> โ constraints + gaps
โ โ <examples> โ multishot I/O pairs
โ โ output format โ format specification
โโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ
โ Score After โ
โ โ
โ โโโโโโโโโโโโ 7.8โ
โโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
OUTPUT
optimized_prompt
+ scores + techniques
+ issues + suggestions
score meter (gradient fill bar):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ชถ 3.2 โโโโโโโโโโโ 7.8
+xml-tags +few-shot +structured-output
Wrapped in XML, added examples, format
โโโ baseline โโโ improvement โโ headroom
7 quality dimensions (weighted scoring, deterministic):
| Dimension | Weight | Measures |
|---|---|---|
| Clarity | 20% | Strong verbs, single task, no hedging |
| Specificity | 20% | Named tech, numbers, constraints |
| Structure | 15% | XML tags, headers, lists |
| Examples | 15% | Input/output pairs, demonstrations |
| Constraints | 10% | Negative constraints, scope, edge cases |
| Output Format | 10% | Format spec, structure definition |
| Token Efficiency | 10% | No filler, no redundancy |
11 Anthropic techniques (auto-selected based on intent, scores, and complexity):
| ID | Name | Auto-selected when |
|---|---|---|
chain-of-thought |
Let Claude Think | Analysis intent, complex tasks |
xml-tags |
Use XML Tags | Long prompt + low structure score |
few-shot |
Multishot Examples | Low example score + extraction/code |
role-assignment |
System Prompts & Roles | System intent or low specificity |
structured-output |
Control Output Format | Low output format score |
prefill |
Structured Output Format | API target + extraction/code |
prompt-chaining |
Chain Complex Tasks | Complex + multiple subtasks |
uncertainty-permission |
Say "I Don't Know" | Analysis or extraction intent |
extended-thinking |
Extended Thinking | Complex + analysis/code intent |
long-context-tips |
Long Context | Long prompt (>2000 chars or >50 lines) |
tool-use |
Tool Use | Prompt mentions tool/function calling |
Core algorithms:
- Intent detection (
detectIntent): Priority-ordered regex patterns across 6 categories:system > code > extraction > analysis > creative > conversation. Includes disambiguation (e.g.,system+codesignals resolves tocode) and fallback heuristics for code blocks, "build me" patterns, and debugging language. - Heuristic scoring (
scorePrompt): 7-dimension weighted analysis. Each dimension 0-10, overall is weighted sum. Also generates flatissues[]andsuggestions[]arrays. - Technique selection (
selectTechniques): Each technique has awhen_to_use()predicate. Auto-selected based on intent + scores + complexity. Sorted by impact, capped at 4. - Template assembly (
optimize): Builds structural scaffold from selected techniques. Context-first ordering: role โ<context>โ<task>โ<requirements>โ<examples>โ output format.
Design principles:
- Single tool: one entry point, minimal cognitive overhead
- Deterministic: same input, same output. No LLM calls, no network
- Scaffold, not final: the optimized prompt is structural; Claude adds substance
- Lean output: flat string arrays for issues/suggestions, no nested objects
- Weighted dimensions: clarity and specificity matter most (20% each)
- Technique cap: max 4 techniques per optimization (diminishing returns beyond)
- Anti-pattern detection: 12 Claude-specific anti-patterns + 20 industry patterns from 34 production AI tools
- Zero dependencies: only
@modelcontextprotocol/sdk+zod
alternatives
Every existing prompt optimization tool requires LLM calls, labeled datasets, or evaluation infrastructure. When you need structural improvement at zero latency (CI/CD, subagent dispatch, offline), they cannot help.
| Feature | orator | DSPy | promptfoo | TextGrad | OPRO | LLMLingua | Anthropic Generator |
|---|---|---|---|---|---|---|---|
| Zero latency | Yes (<1ms) | No (LLM calls) | No (eval runs) | No (LLM calls) | No (LLM calls) | No (LLM calls) | No (LLM call) |
| Offline/airgapped | Yes | No | Partial | No | No | No | No |
| Deterministic | Yes | No | No | No | No | Partial | No |
| No labeled data | Yes | No (examples) | No (test cases) | No (feedback) | No (examples) | Yes | Yes |
| Claude-specific | Yes (anti-patterns) | No | No | No | No | No | Yes |
| MCP native | Yes | No | No | No | No | No | No |
| Structural scoring | 7 dimensions | None | Custom metrics | None | None | None | None |
| Dependencies | 0 (pure TS) | PyTorch + LLM | Node + LLM | PyTorch + LLM | LLM | PyTorch + LLM | LLM API |
DSPy: Stanford's framework for compiling LM programs with automatic prompt optimization. Requires labeled examples, LLM calls for optimization, and PyTorch. Optimizes for task accuracy, not structural quality. Latency: seconds to minutes per optimization. Use DSPy when you have labeled data and want to tune for a specific metric.
promptfoo: Test-driven prompt evaluation framework. Requires test cases, LLM calls for evaluation, and an evaluation dataset. Measures output quality, not prompt structure. Complementary: use Orator for structural scaffolding, then promptfoo to evaluate output quality.
TextGrad: Automatic differentiation via text feedback from LLMs. Requires LLM calls for both forward and backward passes. Research-oriented, PyTorch dependency. Latency: minutes. Use when iterating on prompt wording with measurable objectives.
OPRO: DeepMind's optimization by prompting. Uses an LLM to iteratively rewrite prompts. Requires examples of good/bad outputs, multiple LLM calls per iteration. Latency: minutes. Use when exploring creative prompt variations with evaluation feedback.
LLMLingua: Microsoft's prompt compression via perplexity-based token removal. Reduces token count by 2-20x but requires a local LLM for perplexity scoring. Different goal: compression, not structural improvement. Use when context window is the bottleneck.
Anthropic Prompt Generator: Anthropic's own tool that generates prompts via Claude. Excellent quality but requires an LLM call, non-deterministic, and not available offline or via MCP. Use when you want Claude to write your prompt from scratch.
Orator's approach is deliberately different: structural analysis via deterministic heuristics. No LLM calls means no API keys, no latency variance, no cost per optimization, and identical results every run. The trade-off is that Orator optimizes prompt structure (clarity, specificity, constraints, format) rather than prompt wording. It can't tell you if your prompt produces good output, only that it's well-formed for Claude. This makes it complementary to evaluation tools like promptfoo: scaffold with Orator, then validate with eval.
development
git clone https://github.com/Vvkmnn/claude-orator-mcp && cd claude-orator-mcp
npm install && npm run build
npm test
Package requirements:
- Node.js: >=20.0.0 (ES modules)
- Runtime:
@modelcontextprotocol/sdk,zod - Zero external databases: works with
npx
Development workflow:
npm run build # TypeScript compilation with executable permissions
npm run dev # Watch mode with tsc --watch
npm run start # Run the MCP server directly
npm run lint # ESLint code quality checks
npm run lint:fix # Auto-fix linting issues
npm run format # Prettier formatting (src/)
npm run format:check # Check formatting without changes
npm run typecheck # TypeScript validation without emit
npm run test # Lint + type check + vitest (25 tests)
npm run prepublishOnly # Pre-publish validation (build + lint + format:check)
Git hooks (via Husky):
- pre-commit: Auto-formats staged
.tsfiles with Prettier and ESLint
Contributing:
- Fork the repository and create feature branches
- Follow TypeScript strict mode and MCP protocol standards
Learn from examples:
- Official MCP servers for reference implementations
- TypeScript SDK for best practices
- Creating Node.js modules for npm package development
- Anthropic prompt engineering docs for technique details
acknowledgments
Industry pattern data derived from deep analysis of system prompts from 34 AI coding tools collected in system-prompts-and-models-of-ai-tools, including Claude Code, Cursor, Windsurf, v0, Devin, Cline, Lovable, Replit, Amp, Gemini, and 25 others. Patterns are curated with prevalence data and embedded โ no external dependency or installation required. Cross-referenced with research from the Prompt Report (1,500 papers surveyed) and Anthropic's prompt engineering documentation.
license
<hr>
<a href="https://en.wikipedia.org/wiki/Cicero_Denounces_Catiline"><img src="logo/maccari-cicero.jpg" alt="Cicero Denounces Catiline -- Cesare Maccari" width="100%"></a>
<p align="center">
Cicero Denounces Catiline by Cesare Maccari (1889). "Quo usque tandem abutere, Catilina, patientia nostra?" (How long, Catiline, will you abuse our patience?) - Claudius.
</p>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.