MCP Servers

Clerk Chat MCP Server

Enables autonomous prompt improvement for voice AI agents through feedback analysis, test generation, and iterative testing.

README

Clerk Chat MCP Server

MCP server for Clerk Chat voice AI tools and skills.

Features

Prompt Improvement

Autonomous prompt improvement loop for voice AI agents:

Analyze call transcripts and feedback
Generate improved prompts
Create regression tests with LLM evaluation
Track improvement runs in database
Iterate until tests pass

Setup

1. Install Dependencies

npm install

2. Configure API Access

The server requires a Clerk Chat API key for full functionality. You have two options:

Option A: Using .env file (Development)

Copy the example configuration:
```
cp .env.example .env
```
Edit .env and add your API key:
```
CLERK_CHAT_API_KEY=your_api_key_here
```

Option B: Using Claude Desktop config (Production)

Add environment variables directly to your Claude Desktop config at: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "clerk-chat": {
      "command": "node",
      "args": ["/absolute/path/to/clerk-chat-mcp/src/index.js"],
      "env": {
        "CLERK_CHAT_API_KEY": "07fe6d3f658d65d7fd906068e21eef5f5182fd2438e66c78e6786b661e668b2e",
        "CLERK_CHAT_API_BASE_URL": "https://puxgxqdkizwdzqyuaitm.supabase.co/functions/v1"
      }
    }
  }
}

Note: The API key shown above is for Technical Life Care company (testing). Each company has its own API key hash.

3. Start the Server

npm start

The server will validate your API configuration on startup and report if the API is enabled or disabled.

Skills

Skills are exposed as MCP resources. Available skills:

prompt-improvement/

feedback-analysis - Structure user feedback into actionable specs
prompt-improvement - Generate improved prompts
test-generation - Create regression tests
test-analysis - Diagnose test failures
improvement-loop - Orchestrate the full improvement cycle

Tools

Tools Management

`list_tools`

Get all available tools for a company. Use this when generating tests to determine if tool calls should be tested.

Parameters:

company_name (string): Company name in kebab-case (e.g., 'technical-life-care', 'tetrix')

Returns: List of tool definitions with IDs, names, descriptions, and parameters.

Example:

list_tools(company_name: "tetrix")
// Returns:
[
  {
    "id": "tool-uuid-123",
    "name": "search_knowledge_base",
    "description": "Search the knowledge base",
    "parameters": [{"name": "query", "type": "string"}]
  }
]

`get_tool`

Get details of a specific tool by ID.

Parameters:

company_name (string): Company name in kebab-case
tool_id (string): UUID of the tool

Returns: Tool definition with full details.

Test Management

`list_test_cases`

Get all test cases from the database for a specific company.

Parameters:

company_name (string): Company name in kebab-case

Returns: List of test cases with IDs, names, conversations, and expected outputs.

`create_test_case`

Create a new test case in the database.

Parameters:

name (string): Test case name
conversation (array): Array of {role, content} messages
expected_output (string): Expected AI response
tool_mocks (array, optional): Tool call mocks
expected_tool_call (object, optional): Expected tool call

Returns: Created test case with UUID.

`run_tests`

Run actual LLM evaluation against test cases. Supports two modes: direct prompt text or saved prompt ID (faster).

Parameters (use one of the first two):

system_prompt (string, optional): Direct prompt text to test
prompt_id (string, optional): UUID of saved draft prompt (from save_draft_prompt) - Preferred for performance
test_case_ids (array of strings, optional): Test case UUIDs to run. If omitted, runs all tests.
test_model (string, optional): Model to use for testing
tools (array, optional): Tool definitions

Returns: Test results with summary (total, passed, failed) and individual results.

Example (with prompt_id - faster):

{
  "prompt_id": "550e8400-e29b-41d4-a716-446655440000",
  "test_case_ids": ["uuid-1", "uuid-2"]
}

Example (with direct text - slower):

{
  "system_prompt": "You are a helpful assistant...",
  "test_case_ids": ["uuid-1", "uuid-2"]
}

Prompt Management

`save_draft_prompt`

Save a draft prompt to the database for later use in testing. Returns a prompt_id that avoids streaming full prompts during iterations.

Parameters:

prompt (string): The full prompt text to save
label (string, optional): Label like "iteration-1" or "baseline"

Returns: Saved prompt with UUID and timestamp.

Benefits:

Faster performance (no streaming of full prompts)
Automatic version history
Can re-run tests against old versions

Example:

{
  "prompt": "You are a helpful assistant...",
  "label": "iteration-1"
}
// Returns: { id: "550e8400-...", created_at: "2026-02-02T14:30:00Z" }

`get_draft_prompt`

Retrieve a previously saved draft prompt by ID.

Parameters:

prompt_id (string): UUID of the draft prompt

Returns: Prompt text, label, and metadata.

`list_draft_prompts`

List all saved draft prompts with their IDs and labels.

Returns: Array of draft prompts.

Improvement Tracking

`save_improvement_run`

Store a complete improvement cycle with prompts, analysis, and test results.

Parameters:

company_name (string): Company name in kebab-case
original_prompt (string): Starting system prompt text
new_prompt (string): Final improved system prompt text
client_feedback (string): User's description of what went wrong
analysis (object): Structured feedback analysis
- what_went_wrong (string): Specific behavior that failed
- why_it_went_wrong (string): Root cause analysis
- recommended_fix (string): What changes were made to fix it
model_used (string, optional): Model used for testing (e.g., 'google/gemini-2.5-flash')
test_results (array): Test execution results with full details
- test_name (string): Test case name
- passed (boolean): Whether the test passed
- is_generated (boolean): true for new tests from feedback, false for existing tests
- expected (string): What the response should be
- response (string): What the AI actually responded
- conversation (array): Full conversation for this test
metadata (object, optional): Additional context (iterations, timestamps, etc.)

Returns: Saved improvement run with UUID and timestamp.

Example:

save_improvement_run(
  company_name: "tetrix",
  original_prompt: "You are a helpful assistant...",
  new_prompt: "You are a helpful assistant. Always confirm existing data...",
  client_feedback: "AI keeps re-asking for customer email even when on file",
  analysis: {
    what_went_wrong: "AI re-requests known customer information",
    why_it_went_wrong: "System prompt didn't specify to confirm existing data",
    recommended_fix: "Added explicit instruction to confirm rather than re-request"
  },
  model_used: "google/gemini-2.5-flash",
  test_results: [
    {
      test_name: "Confirm existing email",
      passed: true,
      is_generated: true,
      expected: "AI should confirm existing email",
      response: "I have john@example.com on file — is that current?",
      conversation: [
        { role: "user", content: "Hi, I have a question" },
        { role: "assistant", content: "I have john@example.com on file — is that current?" }
      ]
    }
  ]
)

Skills

`list_skills`

List all available skills (filesystem-only, no API required).

Usage Flow

Basic Improvement Loop

Provide Claude with: transcript + feedback + current prompt
Claude reads relevant skills
Claude runs the improvement loop:
- Analyze feedback (feedback-analysis skill)
- Improve prompt (prompt-improvement skill)
- Generate tests (test-generation skill → create_test_case tool)
- Save prompt version (save_draft_prompt tool → returns prompt_id)
- Run tests (run_tests tool with prompt_id - fast, no streaming)
- Analyze failures (test-analysis skill)
- Iterate until pass or stop condition
- Save run (save_improvement_run tool)

Example Workflow

User: "Here's a transcript where the AI was too verbose. Current prompt: [...]"

Claude:
1. Uses feedback-analysis skill to structure the feedback
2. Uses prompt-improvement skill to generate new prompt
3. Uses test-generation skill to create test cases
4. Calls create_test_case for each test
5. Calls save_draft_prompt(new_prompt, "iteration-1") → gets prompt_id
6. Calls run_tests(prompt_id, test_ids) → fast, no streaming
7. If failures: uses test-analysis skill, improves prompt, repeats from step 5
8. If success: calls save_improvement_run to persist results

Architecture

src/
├── index.js              # Main MCP server
├── config.js             # Configuration management
├── api/
│   ├── client.js         # HTTP client with auth
│   ├── test-cases.js     # Test CRUD operations
│   ├── test-runner.js    # Test execution API
│   ├── prompts.js        # Draft prompt management
│   ├── tools.js          # Tool definitions API
│   └── improvement-runs.js # Improvement tracking
└── tools/
    ├── test-tools.js     # Test management tools
    ├── improvement-tools.js # Improvement tracking tools
    ├── prompt-tools.js   # Draft prompt tools
    ├── tools-management.js # Tool fetching tools
    └── skill-tools.js    # Skill listing tools

skills/
└── prompt-improvement/   # Markdown skills for Claude
    ├── feedback-analysis.md
    ├── prompt-improvement.md
    ├── test-generation.md
    ├── test-analysis.md
    └── improvement-loop.md

Error Handling

The server fails gracefully with clear error messages:

Missing API key: "API authentication failed. Set CLERK_CHAT_API_KEY in .env or Claude Desktop config."
Network error: "Unable to reach API. Check internet connection."
404 Not Found: "Test case 'abc123' not found. Use list_test_cases to see available tests."
422 Validation: "Invalid test case: 'name' is required."
500 Server Error: "API error. Try again or check API status."

The MCP server never crashes - all errors are returned as tool results to Claude.

Development

Running Without API

The server can run without API configuration for skill-only functionality:

Skills will still be available as resources
list_skills tool will work
API-dependent tools (test management, improvement tracking) will not be registered

Testing API Integration

Configure API key in .env or Claude Desktop config
Restart Claude Desktop (if using config option)

Test each tool:

list_test_cases → Should return test cases from database
create_test_case → Should create test with UUID
run_tests → Should execute with real LLM evaluation
save_improvement_run → Should persist to database

API Endpoints

The server integrates with these Supabase Edge Functions:

Tool Definitions:

GET /api-tools - List all tools for authenticated company
GET /api-tools?id=uuid - Get specific tool by ID

Test Execution:

POST /api-run-tests - Execute tests with LLM evaluation (supports prompt_id or system_prompt)

Test Cases:

GET /api-test-cases - List all test cases
POST /api-test-cases - Create new test case
PUT /api-test-cases?id=uuid - Update test case
DELETE /api-test-cases?id=uuid - Delete test case

Draft Prompts:

POST /api-prompts - Save draft prompt (returns prompt_id)
GET /api-prompts/:id - Get specific draft prompt
GET /api-prompts - List all draft prompts

Improvement Tracking:

POST /api-improvement-runs - Save improvement run
GET /api-improvement-runs - List improvement runs

Security

API keys are never logged
.env is gitignored
All credentials use environment variables
Input validation with Zod schemas
Sanitized error messages

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured