MCP Servers

MCP Gatekeeper

Enables users to validate MCP servers, skills, extensions, and packages for schema, security, functional, and semantic quality directly from their MCP client.

README

<p align="center"> <h1 align="center">MCP Gatekeeper</h1> <p align="center"> <strong>The QA framework for the MCP ecosystem</strong> <br /> 46 validators · 8 gates · Schema · Security · Functional · AI Eval · Human Review </p> </p>

<p align="center"> <a href="#quick-start">Quick Start</a> · <a href="#what-it-catches">What It Catches</a> · <a href="#gates">Gates</a> · <a href="#ai-evaluation">AI Evaluation</a> · <a href="#use-as-mcp-server">Use as MCP Server</a> </p>

MCP has 200+ server implementations but zero quality gates before publishing. MCP Gatekeeper fills that gap.

Run it before you publish to the MCP Registry, npm, or Smithery. It catches schema violations, security vulnerabilities, functional bugs, and quality issues that other tools miss — including an LLM-powered semantic evaluation that compares what your tools claim to do against what they actually do.

npx mcpqa run --server-url https://your-mcp-server.com/mcp

What It Catches

Real findings from running against a production Zoho MCP server (65 tools):

Finding	Gate	What It Means
125 tools accept invalid inputs silently	Functional	Server returns `isError: false` for empty/wrong-type args
107 parameters have no description	Schema	LLMs can't use tools with undocumented params
26 write tools lack rate limit info	Security	Consumers don't know if there's a throttle
SQL injection payloads reflected in output	Security	Dynamic injection testing caught real vulnerability
Tool descriptions don't match actual behavior	AI Eval	Claude compared descriptions vs invocation results

Quick Start

# Validate an MCP server (Gates 1-3: schema + security + functional)
npx mcpqa run --server-url https://your-server.com/mcp

# Add AI semantic evaluation (Gate 4 — requires Anthropic API key)
ANTHROPIC_API_KEY=... npx mcpqa run --server-url https://your-server.com/mcp --gates 1,2,3,4

# Validate a Claude Code skill
npx mcpqa run --skill-path ./.claude/skills/my-skill --gates 6

# Validate an extension
npx mcpqa run --extension-path ./my-extension --gates 7

# Validate your npm package before publishing
npx mcpqa run --package-path . --gates 8

# Everything at once
npx mcpqa run --server-url https://server.com/mcp --package-path . --gates 1,2,3,4,8 --mode lenient

Gates

Gate	What It Validates	Validators	Speed
1. Schema	Protocol conformance, tool names, JSON Schema, descriptions, parameter types	6	< 1s
2. Security	Prompt injection, SSRF, secrets, permissions, dangerous patterns, dynamic injection with real payloads	9	2-10s
3. Functional	Handshake, invocation, boundary testing, error format, timeouts, idempotency	9	2-5min
4. AI Semantic	Description accuracy vs behavior, param clarity, response completeness, integration readiness, tool chains	7	2-5min
5. Human Review	Dashboard with approve/reject/escalate, auto-approve for high scores, audit trail	1	On-demand
6. Skills	SKILL.md structure, frontmatter, description quality, content length, references, scripts	6	< 1s
7. Extensions	Manifest, permissions, bundled MCP configs, security scanning	4	< 1s
8. Package	package.json, server.json, LICENSE, dependency audit, secrets in source	5	< 1s

AI Evaluation

Gate 4 uses an LLM-as-judge (Claude or OpenAI-compatible) to evaluate things code can't:

Description accuracy: Does the description match what the tool actually does? (compares against Gate 3 invocation results)
Parameter clarity: Could a developer use this tool correctly on the first try?
Integration readiness: Can someone build an integration from metadata alone?
Tool chain analysis: Do the tools form coherent workflows, or are there gaps?

Applies patterns from Anthropic's harness design and eval framework:

Skeptical evaluator — system prompt forces adversarial QA, not praise
pass@k / pass^k — multi-trial evaluation for reliability metrics
Actionable fixes — every finding includes specific developer guidance
Transcript recording — full LLM conversations saved for debugging

Use as MCP Server

MCP Gatekeeper is itself an MCP server. Add it to Claude Desktop:

{
  "mcpServers": {
    "mcpqa": {
      "command": "npx",
      "args": ["-y", "mcpqa-server"]
    }
  }
}

Then ask Claude: "Validate the MCP server at https://example.com/mcp"

6 tools exposed: validate_mcp_server, validate_skill, validate_extension, validate_package, get_report, list_validators

CLI Reference

mcpqa run [options]          # Run validation pipeline
mcpqa validate-config <path> # Validate YAML config
mcpqa diff <before> <after>  # Compare two reports
mcpqa dashboard [--port N]   # Launch human review UI

Options:
  --server-url <url>         MCP server URL (http/sse)
  --server-cmd <cmd>         MCP server command (stdio)
  --skill-path <path>        Claude Code skill directory
  --extension-path <path>    Extension directory
  --package-path <path>      npm package directory
  --gates <nums>             Gate numbers (e.g., 1,2,3,4,6,7,8)
  --mode <strict|lenient>    Stop on first fail vs run all
  --trials <number>          LLM evaluation trials for pass@k
  --save-baseline            Save results for regression tracking
  --check-regression         Compare against saved baseline
  --verbose / --debug        Control log verbosity
  --dry-run                  Show what would run without executing

Reports

Format	File	Use Case
Console	stdout	Real-time progress with colored badges and progress bars
JSON	`reports/latest.json`	CI/CD integration, programmatic access
HTML	`reports/latest.html`	Dark-themed visual report for stakeholders
SARIF	`reports/latest.sarif`	GitHub Code Scanning, VS Code SARIF Viewer
Transcripts	`reports/transcripts/`	Gate 4 LLM conversation debugging

CI/CD Integration

# GitHub Actions
- name: Validate MCP Server
  run: npx mcpqa run --server-cmd "node dist/server.js" --gates 1,2,3,8
  # Exit code 0 = pass, 1 = fail

# Docker
docker build -t mcpqa .
docker run mcpqa run --server-url https://server.com/mcp

Configuration

# configs/default.yaml
pipeline:
  mode: strict
  enabledGates: [1, 2, 3]

server:
  transport: http
  url: https://your-server.com/mcp
  headers:
    Authorization: "Bearer your-token"

semantic:
  trials: 3                    # pass@k evaluation
  autoApproveThreshold: 4.5   # Gate 5 auto-approve

reporting:
  formats: [console, json, sarif]
  outputDir: ./reports

Development

npm install
npm test              # 172 tests
npm run lint          # type-check
npm run build         # compile to dist/

Architecture

CLI (mcpqa) ─┐                    ┌─ MCP Server (mcpqa-server)
              │                    │
              ▼                    ▼
         PipelineOrchestrator (shared core)
              │
    ┌─────────┼──────────┬──────────┐
    ▼         ▼          ▼          ▼
 Gate 1-3  Gate 4     Gate 5     Gate 6-8
 Code      LLM Judge  Dashboard  File
 (24 val)  (7 val)    (htmx)    (15 val)

Contributing

PRs welcome. Each validator is a single file implementing IValidator. To add a new check:

Create src/gates/gateN-name/validators/your-check.ts
Implement IValidator interface (name, description, validate)
Register in the gate's index.ts
Add tests in tests/unit/gateN/

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured