Clerk Chat MCP Server
Enables autonomous prompt improvement for voice AI agents through feedback analysis, test generation, and iterative testing.
README
Clerk Chat MCP Server
MCP server for Clerk Chat voice AI tools and skills.
Features
Prompt Improvement
Autonomous prompt improvement loop for voice AI agents:
- Analyze call transcripts and feedback
- Generate improved prompts
- Create regression tests with LLM evaluation
- Track improvement runs in database
- Iterate until tests pass
Setup
1. Install Dependencies
npm install
2. Configure API Access
The server requires a Clerk Chat API key for full functionality. You have two options:
Option A: Using .env file (Development)
-
Copy the example configuration:
cp .env.example .env -
Edit
.envand add your API key:CLERK_CHAT_API_KEY=your_api_key_here
Option B: Using Claude Desktop config (Production)
Add environment variables directly to your Claude Desktop config at:
~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"clerk-chat": {
"command": "node",
"args": ["/absolute/path/to/clerk-chat-mcp/src/index.js"],
"env": {
"CLERK_CHAT_API_KEY": "07fe6d3f658d65d7fd906068e21eef5f5182fd2438e66c78e6786b661e668b2e",
"CLERK_CHAT_API_BASE_URL": "https://puxgxqdkizwdzqyuaitm.supabase.co/functions/v1"
}
}
}
}
Note: The API key shown above is for Technical Life Care company (testing). Each company has its own API key hash.
3. Start the Server
npm start
The server will validate your API configuration on startup and report if the API is enabled or disabled.
Skills
Skills are exposed as MCP resources. Available skills:
prompt-improvement/
feedback-analysis- Structure user feedback into actionable specsprompt-improvement- Generate improved promptstest-generation- Create regression teststest-analysis- Diagnose test failuresimprovement-loop- Orchestrate the full improvement cycle
Tools
Tools Management
list_tools
Get all available tools for a company. Use this when generating tests to determine if tool calls should be tested.
Parameters:
company_name(string): Company name in kebab-case (e.g., 'technical-life-care', 'tetrix')
Returns: List of tool definitions with IDs, names, descriptions, and parameters.
Example:
list_tools(company_name: "tetrix")
// Returns:
[
{
"id": "tool-uuid-123",
"name": "search_knowledge_base",
"description": "Search the knowledge base",
"parameters": [{"name": "query", "type": "string"}]
}
]
get_tool
Get details of a specific tool by ID.
Parameters:
company_name(string): Company name in kebab-casetool_id(string): UUID of the tool
Returns: Tool definition with full details.
Test Management
list_test_cases
Get all test cases from the database for a specific company.
Parameters:
company_name(string): Company name in kebab-case
Returns: List of test cases with IDs, names, conversations, and expected outputs.
create_test_case
Create a new test case in the database.
Parameters:
name(string): Test case nameconversation(array): Array of {role, content} messagesexpected_output(string): Expected AI responsetool_mocks(array, optional): Tool call mocksexpected_tool_call(object, optional): Expected tool call
Returns: Created test case with UUID.
run_tests
Run actual LLM evaluation against test cases. Supports two modes: direct prompt text or saved prompt ID (faster).
Parameters (use one of the first two):
system_prompt(string, optional): Direct prompt text to testprompt_id(string, optional): UUID of saved draft prompt (fromsave_draft_prompt) - Preferred for performancetest_case_ids(array of strings, optional): Test case UUIDs to run. If omitted, runs all tests.test_model(string, optional): Model to use for testingtools(array, optional): Tool definitions
Returns: Test results with summary (total, passed, failed) and individual results.
Example (with prompt_id - faster):
{
"prompt_id": "550e8400-e29b-41d4-a716-446655440000",
"test_case_ids": ["uuid-1", "uuid-2"]
}
Example (with direct text - slower):
{
"system_prompt": "You are a helpful assistant...",
"test_case_ids": ["uuid-1", "uuid-2"]
}
Prompt Management
save_draft_prompt
Save a draft prompt to the database for later use in testing. Returns a prompt_id that avoids streaming full prompts during iterations.
Parameters:
prompt(string): The full prompt text to savelabel(string, optional): Label like "iteration-1" or "baseline"
Returns: Saved prompt with UUID and timestamp.
Benefits:
- Faster performance (no streaming of full prompts)
- Automatic version history
- Can re-run tests against old versions
Example:
{
"prompt": "You are a helpful assistant...",
"label": "iteration-1"
}
// Returns: { id: "550e8400-...", created_at: "2026-02-02T14:30:00Z" }
get_draft_prompt
Retrieve a previously saved draft prompt by ID.
Parameters:
prompt_id(string): UUID of the draft prompt
Returns: Prompt text, label, and metadata.
list_draft_prompts
List all saved draft prompts with their IDs and labels.
Returns: Array of draft prompts.
Improvement Tracking
save_improvement_run
Store a complete improvement cycle with prompts, analysis, and test results.
Parameters:
company_name(string): Company name in kebab-caseoriginal_prompt(string): Starting system prompt textnew_prompt(string): Final improved system prompt textclient_feedback(string): User's description of what went wronganalysis(object): Structured feedback analysiswhat_went_wrong(string): Specific behavior that failedwhy_it_went_wrong(string): Root cause analysisrecommended_fix(string): What changes were made to fix it
model_used(string, optional): Model used for testing (e.g., 'google/gemini-2.5-flash')test_results(array): Test execution results with full detailstest_name(string): Test case namepassed(boolean): Whether the test passedis_generated(boolean): true for new tests from feedback, false for existing testsexpected(string): What the response should beresponse(string): What the AI actually respondedconversation(array): Full conversation for this test
metadata(object, optional): Additional context (iterations, timestamps, etc.)
Returns: Saved improvement run with UUID and timestamp.
Example:
save_improvement_run(
company_name: "tetrix",
original_prompt: "You are a helpful assistant...",
new_prompt: "You are a helpful assistant. Always confirm existing data...",
client_feedback: "AI keeps re-asking for customer email even when on file",
analysis: {
what_went_wrong: "AI re-requests known customer information",
why_it_went_wrong: "System prompt didn't specify to confirm existing data",
recommended_fix: "Added explicit instruction to confirm rather than re-request"
},
model_used: "google/gemini-2.5-flash",
test_results: [
{
test_name: "Confirm existing email",
passed: true,
is_generated: true,
expected: "AI should confirm existing email",
response: "I have john@example.com on file — is that current?",
conversation: [
{ role: "user", content: "Hi, I have a question" },
{ role: "assistant", content: "I have john@example.com on file — is that current?" }
]
}
]
)
Skills
list_skills
List all available skills (filesystem-only, no API required).
Usage Flow
Basic Improvement Loop
- Provide Claude with: transcript + feedback + current prompt
- Claude reads relevant skills
- Claude runs the improvement loop:
- Analyze feedback (
feedback-analysisskill) - Improve prompt (
prompt-improvementskill) - Generate tests (
test-generationskill →create_test_casetool) - Save prompt version (
save_draft_prompttool → returns prompt_id) - Run tests (
run_teststool with prompt_id - fast, no streaming) - Analyze failures (
test-analysisskill) - Iterate until pass or stop condition
- Save run (
save_improvement_runtool)
- Analyze feedback (
Example Workflow
User: "Here's a transcript where the AI was too verbose. Current prompt: [...]"
Claude:
1. Uses feedback-analysis skill to structure the feedback
2. Uses prompt-improvement skill to generate new prompt
3. Uses test-generation skill to create test cases
4. Calls create_test_case for each test
5. Calls save_draft_prompt(new_prompt, "iteration-1") → gets prompt_id
6. Calls run_tests(prompt_id, test_ids) → fast, no streaming
7. If failures: uses test-analysis skill, improves prompt, repeats from step 5
8. If success: calls save_improvement_run to persist results
Architecture
src/
├── index.js # Main MCP server
├── config.js # Configuration management
├── api/
│ ├── client.js # HTTP client with auth
│ ├── test-cases.js # Test CRUD operations
│ ├── test-runner.js # Test execution API
│ ├── prompts.js # Draft prompt management
│ ├── tools.js # Tool definitions API
│ └── improvement-runs.js # Improvement tracking
└── tools/
├── test-tools.js # Test management tools
├── improvement-tools.js # Improvement tracking tools
├── prompt-tools.js # Draft prompt tools
├── tools-management.js # Tool fetching tools
└── skill-tools.js # Skill listing tools
skills/
└── prompt-improvement/ # Markdown skills for Claude
├── feedback-analysis.md
├── prompt-improvement.md
├── test-generation.md
├── test-analysis.md
└── improvement-loop.md
Error Handling
The server fails gracefully with clear error messages:
- Missing API key: "API authentication failed. Set CLERK_CHAT_API_KEY in .env or Claude Desktop config."
- Network error: "Unable to reach API. Check internet connection."
- 404 Not Found: "Test case 'abc123' not found. Use list_test_cases to see available tests."
- 422 Validation: "Invalid test case: 'name' is required."
- 500 Server Error: "API error. Try again or check API status."
The MCP server never crashes - all errors are returned as tool results to Claude.
Development
Running Without API
The server can run without API configuration for skill-only functionality:
- Skills will still be available as resources
list_skillstool will work- API-dependent tools (test management, improvement tracking) will not be registered
Testing API Integration
- Configure API key in .env or Claude Desktop config
- Restart Claude Desktop (if using config option)
- Test each tool:
list_test_cases → Should return test cases from database create_test_case → Should create test with UUID run_tests → Should execute with real LLM evaluation save_improvement_run → Should persist to database
API Endpoints
The server integrates with these Supabase Edge Functions:
Tool Definitions:
GET /api-tools- List all tools for authenticated companyGET /api-tools?id=uuid- Get specific tool by ID
Test Execution:
POST /api-run-tests- Execute tests with LLM evaluation (supports prompt_id or system_prompt)
Test Cases:
GET /api-test-cases- List all test casesPOST /api-test-cases- Create new test casePUT /api-test-cases?id=uuid- Update test caseDELETE /api-test-cases?id=uuid- Delete test case
Draft Prompts:
POST /api-prompts- Save draft prompt (returns prompt_id)GET /api-prompts/:id- Get specific draft promptGET /api-prompts- List all draft prompts
Improvement Tracking:
POST /api-improvement-runs- Save improvement runGET /api-improvement-runs- List improvement runs
Security
- API keys are never logged
.envis gitignored- All credentials use environment variables
- Input validation with Zod schemas
- Sanitized error messages
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.