Clerk Chat MCP Server

Clerk Chat MCP Server

Enables autonomous prompt improvement for voice AI agents through feedback analysis, test generation, and iterative testing.

Category
Visit Server

README

Clerk Chat MCP Server

MCP server for Clerk Chat voice AI tools and skills.

Features

Prompt Improvement

Autonomous prompt improvement loop for voice AI agents:

  • Analyze call transcripts and feedback
  • Generate improved prompts
  • Create regression tests with LLM evaluation
  • Track improvement runs in database
  • Iterate until tests pass

Setup

1. Install Dependencies

npm install

2. Configure API Access

The server requires a Clerk Chat API key for full functionality. You have two options:

Option A: Using .env file (Development)

  1. Copy the example configuration:

    cp .env.example .env
    
  2. Edit .env and add your API key:

    CLERK_CHAT_API_KEY=your_api_key_here
    

Option B: Using Claude Desktop config (Production)

Add environment variables directly to your Claude Desktop config at: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "clerk-chat": {
      "command": "node",
      "args": ["/absolute/path/to/clerk-chat-mcp/src/index.js"],
      "env": {
        "CLERK_CHAT_API_KEY": "07fe6d3f658d65d7fd906068e21eef5f5182fd2438e66c78e6786b661e668b2e",
        "CLERK_CHAT_API_BASE_URL": "https://puxgxqdkizwdzqyuaitm.supabase.co/functions/v1"
      }
    }
  }
}

Note: The API key shown above is for Technical Life Care company (testing). Each company has its own API key hash.

3. Start the Server

npm start

The server will validate your API configuration on startup and report if the API is enabled or disabled.

Skills

Skills are exposed as MCP resources. Available skills:

prompt-improvement/

  • feedback-analysis - Structure user feedback into actionable specs
  • prompt-improvement - Generate improved prompts
  • test-generation - Create regression tests
  • test-analysis - Diagnose test failures
  • improvement-loop - Orchestrate the full improvement cycle

Tools

Tools Management

list_tools

Get all available tools for a company. Use this when generating tests to determine if tool calls should be tested.

Parameters:

  • company_name (string): Company name in kebab-case (e.g., 'technical-life-care', 'tetrix')

Returns: List of tool definitions with IDs, names, descriptions, and parameters.

Example:

list_tools(company_name: "tetrix")
// Returns:
[
  {
    "id": "tool-uuid-123",
    "name": "search_knowledge_base",
    "description": "Search the knowledge base",
    "parameters": [{"name": "query", "type": "string"}]
  }
]

get_tool

Get details of a specific tool by ID.

Parameters:

  • company_name (string): Company name in kebab-case
  • tool_id (string): UUID of the tool

Returns: Tool definition with full details.

Test Management

list_test_cases

Get all test cases from the database for a specific company.

Parameters:

  • company_name (string): Company name in kebab-case

Returns: List of test cases with IDs, names, conversations, and expected outputs.

create_test_case

Create a new test case in the database.

Parameters:

  • name (string): Test case name
  • conversation (array): Array of {role, content} messages
  • expected_output (string): Expected AI response
  • tool_mocks (array, optional): Tool call mocks
  • expected_tool_call (object, optional): Expected tool call

Returns: Created test case with UUID.

run_tests

Run actual LLM evaluation against test cases. Supports two modes: direct prompt text or saved prompt ID (faster).

Parameters (use one of the first two):

  • system_prompt (string, optional): Direct prompt text to test
  • prompt_id (string, optional): UUID of saved draft prompt (from save_draft_prompt) - Preferred for performance
  • test_case_ids (array of strings, optional): Test case UUIDs to run. If omitted, runs all tests.
  • test_model (string, optional): Model to use for testing
  • tools (array, optional): Tool definitions

Returns: Test results with summary (total, passed, failed) and individual results.

Example (with prompt_id - faster):

{
  "prompt_id": "550e8400-e29b-41d4-a716-446655440000",
  "test_case_ids": ["uuid-1", "uuid-2"]
}

Example (with direct text - slower):

{
  "system_prompt": "You are a helpful assistant...",
  "test_case_ids": ["uuid-1", "uuid-2"]
}

Prompt Management

save_draft_prompt

Save a draft prompt to the database for later use in testing. Returns a prompt_id that avoids streaming full prompts during iterations.

Parameters:

  • prompt (string): The full prompt text to save
  • label (string, optional): Label like "iteration-1" or "baseline"

Returns: Saved prompt with UUID and timestamp.

Benefits:

  • Faster performance (no streaming of full prompts)
  • Automatic version history
  • Can re-run tests against old versions

Example:

{
  "prompt": "You are a helpful assistant...",
  "label": "iteration-1"
}
// Returns: { id: "550e8400-...", created_at: "2026-02-02T14:30:00Z" }

get_draft_prompt

Retrieve a previously saved draft prompt by ID.

Parameters:

  • prompt_id (string): UUID of the draft prompt

Returns: Prompt text, label, and metadata.

list_draft_prompts

List all saved draft prompts with their IDs and labels.

Returns: Array of draft prompts.

Improvement Tracking

save_improvement_run

Store a complete improvement cycle with prompts, analysis, and test results.

Parameters:

  • company_name (string): Company name in kebab-case
  • original_prompt (string): Starting system prompt text
  • new_prompt (string): Final improved system prompt text
  • client_feedback (string): User's description of what went wrong
  • analysis (object): Structured feedback analysis
    • what_went_wrong (string): Specific behavior that failed
    • why_it_went_wrong (string): Root cause analysis
    • recommended_fix (string): What changes were made to fix it
  • model_used (string, optional): Model used for testing (e.g., 'google/gemini-2.5-flash')
  • test_results (array): Test execution results with full details
    • test_name (string): Test case name
    • passed (boolean): Whether the test passed
    • is_generated (boolean): true for new tests from feedback, false for existing tests
    • expected (string): What the response should be
    • response (string): What the AI actually responded
    • conversation (array): Full conversation for this test
  • metadata (object, optional): Additional context (iterations, timestamps, etc.)

Returns: Saved improvement run with UUID and timestamp.

Example:

save_improvement_run(
  company_name: "tetrix",
  original_prompt: "You are a helpful assistant...",
  new_prompt: "You are a helpful assistant. Always confirm existing data...",
  client_feedback: "AI keeps re-asking for customer email even when on file",
  analysis: {
    what_went_wrong: "AI re-requests known customer information",
    why_it_went_wrong: "System prompt didn't specify to confirm existing data",
    recommended_fix: "Added explicit instruction to confirm rather than re-request"
  },
  model_used: "google/gemini-2.5-flash",
  test_results: [
    {
      test_name: "Confirm existing email",
      passed: true,
      is_generated: true,
      expected: "AI should confirm existing email",
      response: "I have john@example.com on file — is that current?",
      conversation: [
        { role: "user", content: "Hi, I have a question" },
        { role: "assistant", content: "I have john@example.com on file — is that current?" }
      ]
    }
  ]
)

Skills

list_skills

List all available skills (filesystem-only, no API required).

Usage Flow

Basic Improvement Loop

  1. Provide Claude with: transcript + feedback + current prompt
  2. Claude reads relevant skills
  3. Claude runs the improvement loop:
    • Analyze feedback (feedback-analysis skill)
    • Improve prompt (prompt-improvement skill)
    • Generate tests (test-generation skill → create_test_case tool)
    • Save prompt version (save_draft_prompt tool → returns prompt_id)
    • Run tests (run_tests tool with prompt_id - fast, no streaming)
    • Analyze failures (test-analysis skill)
    • Iterate until pass or stop condition
    • Save run (save_improvement_run tool)

Example Workflow

User: "Here's a transcript where the AI was too verbose. Current prompt: [...]"

Claude:
1. Uses feedback-analysis skill to structure the feedback
2. Uses prompt-improvement skill to generate new prompt
3. Uses test-generation skill to create test cases
4. Calls create_test_case for each test
5. Calls save_draft_prompt(new_prompt, "iteration-1") → gets prompt_id
6. Calls run_tests(prompt_id, test_ids) → fast, no streaming
7. If failures: uses test-analysis skill, improves prompt, repeats from step 5
8. If success: calls save_improvement_run to persist results

Architecture

src/
├── index.js              # Main MCP server
├── config.js             # Configuration management
├── api/
│   ├── client.js         # HTTP client with auth
│   ├── test-cases.js     # Test CRUD operations
│   ├── test-runner.js    # Test execution API
│   ├── prompts.js        # Draft prompt management
│   ├── tools.js          # Tool definitions API
│   └── improvement-runs.js # Improvement tracking
└── tools/
    ├── test-tools.js     # Test management tools
    ├── improvement-tools.js # Improvement tracking tools
    ├── prompt-tools.js   # Draft prompt tools
    ├── tools-management.js # Tool fetching tools
    └── skill-tools.js    # Skill listing tools

skills/
└── prompt-improvement/   # Markdown skills for Claude
    ├── feedback-analysis.md
    ├── prompt-improvement.md
    ├── test-generation.md
    ├── test-analysis.md
    └── improvement-loop.md

Error Handling

The server fails gracefully with clear error messages:

  • Missing API key: "API authentication failed. Set CLERK_CHAT_API_KEY in .env or Claude Desktop config."
  • Network error: "Unable to reach API. Check internet connection."
  • 404 Not Found: "Test case 'abc123' not found. Use list_test_cases to see available tests."
  • 422 Validation: "Invalid test case: 'name' is required."
  • 500 Server Error: "API error. Try again or check API status."

The MCP server never crashes - all errors are returned as tool results to Claude.

Development

Running Without API

The server can run without API configuration for skill-only functionality:

  • Skills will still be available as resources
  • list_skills tool will work
  • API-dependent tools (test management, improvement tracking) will not be registered

Testing API Integration

  1. Configure API key in .env or Claude Desktop config
  2. Restart Claude Desktop (if using config option)
  3. Test each tool:
    list_test_cases → Should return test cases from database
    create_test_case → Should create test with UUID
    run_tests → Should execute with real LLM evaluation
    save_improvement_run → Should persist to database
    

API Endpoints

The server integrates with these Supabase Edge Functions:

Tool Definitions:

  • GET /api-tools - List all tools for authenticated company
  • GET /api-tools?id=uuid - Get specific tool by ID

Test Execution:

  • POST /api-run-tests - Execute tests with LLM evaluation (supports prompt_id or system_prompt)

Test Cases:

  • GET /api-test-cases - List all test cases
  • POST /api-test-cases - Create new test case
  • PUT /api-test-cases?id=uuid - Update test case
  • DELETE /api-test-cases?id=uuid - Delete test case

Draft Prompts:

  • POST /api-prompts - Save draft prompt (returns prompt_id)
  • GET /api-prompts/:id - Get specific draft prompt
  • GET /api-prompts - List all draft prompts

Improvement Tracking:

  • POST /api-improvement-runs - Save improvement run
  • GET /api-improvement-runs - List improvement runs

Security

  • API keys are never logged
  • .env is gitignored
  • All credentials use environment variables
  • Input validation with Zod schemas
  • Sanitized error messages

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured