MCP Debat

MCP Debat

Orchestrates sequential debates between multiple AI models across four phases (constructive, challenge, closing, synthesis) with host intervention and anti-sycophancy enforcement.

Category
Visit Server

README

MCP Debat — Multi-Model AI Debate Server

MCP Debat is an MCP (Model Context Protocol) server that orchestrates sequential debates between 6 AI models. Models present approaches, challenge each other, and synthesize a final verdict — all through structured phases with anti-sycophancy enforcement and evidence-based validation.

Built for offensive security workflows but usable for any domain requiring multi-perspective AI analysis.

How It Works

You (Host)                    MCP Debat Server                    6 AI Models
    │                              │                                  │
    ├── mcp_debate(topic) ────────>│                                  │
    │<── session_id + briefing ────│                                  │
    │                              │                                  │
    ├── mcp_respond(continue) ────>│── call Model 1 ─────────────────>│
    │<── Model 1 response ─────────│<── response ─────────────────────│
    │                              │                                  │
    ├── mcp_respond(continue) ────>│── call Model 2 (sees Model 1) ──>│
    │<── Model 2 response ─────────│<── response ─────────────────────│
    │         ...                  │         ...                      │
    │                              │                                  │
    ├── mcp_respond(evidence) ────>│── inject into transcript ────────│
    │                              │                                  │
    ├── mcp_respond(synthesize) ──>│── Synthesizer produces verdict ──>│
    │<── verdict + conclusions ────│                                  │
    │                              │                                  │
    ├── mcp_respond(close) ───────>│── save conclusions ──────────────│

Each model receives a rebuilt transcript of all prior responses (shared canonical transcript replay), so every model sees what came before it. The host (you) can inject evidence, corrections, or decisions between any model's turn.

Features

  • 6 AI Models: DeepSeek, Gemini, Mistral, Claude, Qwen, GPT — configurable and extensible
  • 4-Phase Debate: Constructive → Challenge → Closing → Synthesis
  • Sequential Model-by-Model: one model per API call, no timeout issues
  • Host Intervention: inject info, corrections, evidence, or decisions mid-debate
  • Anti-Sycophancy: per-response critique enforcement in Challenge phase — models must provide weakness + steel man + counterargument or get re-prompted
  • Evidence Gate: Synthesis claims tagged [VERIFIED] must reference actual evidence or get flagged
  • Command Executor: AI models can run read-only commands on the server to verify claims
  • Runbook Integration: auto-load target-specific context into debates (for security workflows)
  • Multi-User: session ownership, isolation between clients, transfer mechanism
  • Session Persistence: all sessions saved to disk, resume anytime
  • Cost Tracking: per-model, per-session, and daily cost tracking
  • Debate Memory: conclusions from past debates auto-injected into relevant new debates

Requirements

  • Node.js >= 18.0.0
  • API Keys for at least 1 provider (more = better debate quality)
  • A server or machine to run on (can be localhost or remote)

Installation

# Clone the repository
git clone <your-repo-url> mcp-debat
cd mcp-debat

# Install dependencies
npm install

# Create environment file
cp .env.example .env
# Edit .env with your API keys (see Configuration below)

# Start the server
node src/index.js

# Or use PM2 for production (recommended)
pm2 start ecosystem.config.cjs

Configuration

Environment Variables (.env)

Create a .env file in the project root:

# Required: at least 1 provider API key
DEEPSEEK_API_KEY=sk-your-deepseek-key
GEMINI_API_KEY=AIza-your-gemini-key
MISTRAL_API_KEY=your-mistral-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
OPENAI_API_KEY=sk-your-openai-key
QWEN_API_KEY=your-qwen-key

# Server
PORT=3900
AUTH_TOKEN=your-random-secret-token

Providers without API keys are skipped at startup — the server works with as few as 1 model.

Provider Configuration (config.json)

Each provider in config.json has:

{
  "providers": {
    "deepseek": {
      "enabled": true,           // set false to disable
      "name": "DeepSeek V4 Pro", // display name
      "model": "deepseek-v4-pro",// model ID sent to API
      "baseURL": "https://api.deepseek.com",
      "apiKeyEnv": "DEEPSEEK_API_KEY",
      "sdk": "openai",           // SDK type: openai | google | anthropic | openai-responses | ollama
      "timeout": 120000,         // per-call timeout in ms
      "maxTokens": 2048,         // max output tokens per response
      "costPer1M": { "input": 0.435, "output": 0.87 }
    }
    // ... more providers
  },
  "debate": {
    "defaultRounds": 3,
    "maxRounds": 5,
    "modelResponseCap": 3000     // compact mode truncation limit (chars)
  },
  "executor": {
    "enabled": true,
    "allowedPaths": ["/your/server/path/runbooks", "/your/server/path/src"]
  }
}

Adding a New Provider

  1. Create src/providers/yourprovider.js extending BaseProvider
  2. Implement the chat(messages, opts) method
  3. Register it in src/providers/index.js
  4. Add config entry in config.json

SDK types already supported: openai (OpenAI-compatible), google (Gemini), anthropic (Claude), openai-responses (GPT Responses API), ollama (local models).

Connecting to Claude Code

Add to your Claude Code MCP config (~/.claude.json or project .mcp.json):

{
  "mcpServers": {
    "mcp-debat": {
      "type": "url",
      "url": "https://your-server.com/mcp-debat",
      "headers": {
        "Authorization": "Bearer your-auth-token"
      }
    }
  }
}

For local development (no auth):

{
  "mcpServers": {
    "mcp-debat": {
      "type": "url",
      "url": "http://localhost:3900"
    }
  }
}

When AUTH_TOKEN is not set, the server binds to 127.0.0.1 only (localhost) for safety. With AUTH_TOKEN set, it binds to 0.0.0.0 (all interfaces).

Usage

6 MCP Tools

Tool Purpose Blocking?
mcp_debate Start new debate or resume existing session Per-model
mcp_respond Inject content + control debate flow Instant or per-model
mcp_quick Quick parallel opinions from all models ~30-60s
mcp_review Multi-model code review with verdict Per-model
mcp_sessions List, get, delete, transfer sessions Instant
mcp_health Server + provider status check Instant

Starting a Debate

// Start a new debate
mcp_debate({
  topic: "Is this SQL injection exploitable given prepared statements?",
  style: "exploit-review",       // freeform | redteam | socratic | exploit-review
  context: "Target runs PHP 8.2 + MySQL 8.0, PDO with prepared statements",
  history_mode: "full",          // "compact" (3K chars/response) or "full" (32K)
  rounds: 1                      // number of debate rounds (default: 3, max: 5)
})
// Returns: { session_id, phase: "briefing", briefing: {...} }

Advancing the Debate

Each continue call makes exactly 1 model respond:

// Model 1 speaks
mcp_respond({ session_id: "xxx", action: "continue" })

// Model 2 speaks (sees Model 1's response)
mcp_respond({ session_id: "xxx", action: "continue" })

// ... repeat for all models, then next phase starts

Host Intervention

Inject your own input between any model's turn:

// Add evidence
mcp_respond({
  session_id: "xxx",
  type: "evidence",
  response: "I ran the exploit and got: HTTP 500 Internal Server Error",
  action: "continue"
})

// Correct a model's mistake
mcp_respond({
  session_id: "xxx",
  type: "correct",
  response: "That's not Apache, the target runs Nginx 1.25",
  action: "continue"
})

// Make a decision and skip to synthesis
mcp_respond({
  session_id: "xxx",
  type: "decision",
  response: "Use approach B from DeepSeek's analysis",
  action: "synthesize"
})

// Close the debate
mcp_respond({ session_id: "xxx", action: "close" })

Quick Opinions (No Debate)

// All models answer in parallel — fast cross-check
mcp_quick({
  question: "Is CVE-2024-1234 applicable to nginx 1.25.3?",
  context: "Target has default config with proxy_pass"
})

Code Review

mcp_review({
  code: "function login(user, pass) { ... }",
  focus: ["security", "bypass"],
  language: "javascript"
})
// Returns: per-model findings + synthesized verdict

Debate Phases

Phase 0: BRIEFING
  → Host opens with topic + context
  → Server loads target runbook + past conclusions

Phase 1: CONSTRUCTIVE
  → Each model presents their approach (no critique allowed)
  → Output: approach, reasoning, evidence, confidence

Phase 2: CHALLENGE
  → Models critique each other (reverse order)
  → Steel Man required before any criticism
  → Server enforces: must include weakness + counterargument
  → If missing → auto re-prompt with anti-sycophancy warning

Phase 3: CLOSING
  → Each model: final position + what changed + confidence

Phase 4: SYNTHESIS
  → Rotating synthesizer (different from majority)
  → Evidence gate: [VERIFIED] claims must have references
  → Output: recommendation, dissent, unresolved, action items
  → Host decides: accept → close | reject → retry from Challenge

Debate Parameters

Parameter Type Default Description
topic string required Question or problem to debate
target string null Target name — auto-loads runbook if available
style string "freeform" Debate style: freeform, redteam, socratic, exploit-review
rounds number 3 Max debate rounds (1-5)
history_mode string "compact" compact: 3K chars/response in transcript. full: 32K cap
runbook_mode string "summary" summary: 8K briefing. full: entire runbook in briefing
require_full_runbook boolean false Auto-read entire runbook into transcript before debate starts
executor_mode string "safe" safe: AI can run read-only commands. off: disabled
models array all 6 Override which models participate
client_id string null Your instance ID for multi-user session ownership

Command Executor

When executor_mode: "safe", AI models can request commands during debate to verify their claims:

Available tools for AI:

  • bash — read-only shell commands (cat, grep, find, ls, head, tail, diff, wc, sort, uniq, cut, awk, sed -n, jq, file, stat, md5sum, sha256sum, strings, git log/diff/show/status/blame)
  • read_runbook_chunk(offset, limit) — read target runbook in batches
  • read_runbook_section(section) — read specific runbook section
  • search_runbook(query) — search keywords in target runbook

Security (3-layer):

  1. Path whitelist — only configured directories (default: runbooks/, src/)
  2. Command blacklist — blocks rm, dd, kill, sudo, eval, write operations
  3. Command whitelist — allows: cat, grep, find, ls, head, tail, diff, wc, sort, uniq, cut, awk, sed -n, jq, file, stat, md5sum, sha256sum, strings, git (read-only)

Commands are target-locked: AI can only read the runbook belonging to the current session's target.

Multi-User Support

Sessions have an owner_client field. When client_id is provided:

  • List: only shows your sessions
  • Get: non-owners see summary only (topic, target, status)
  • Respond: blocked for non-owners
  • Transfer: owner can transfer session to another client
  • Delete: owner-only
// List only my sessions
mcp_sessions({ action: "list", client_id: "my-instance-123" })

// Transfer a session
mcp_sessions({
  action: "transfer",
  session_id: "xxx",
  to_client: "colleague-456",
  client_id: "my-instance-123"
})

Runbook Integration (Optional)

For security workflows, you can sync target-specific runbooks to the server. Place markdown files in the runbooks/ directory:

runbooks/
  RUNBOOK_target-name.md
  RUNBOOK_example.com.md

When you start a debate with target: "example.com", the server loads RUNBOOK_example.com.md and injects relevant sections into the debate context.

With require_full_runbook: true, the entire runbook is read into the transcript before the debate starts, so all models have complete context.

Production Deployment

With PM2

# Create ecosystem.config.cjs
cat > ecosystem.config.cjs << 'EOF'
const { readFileSync } = require('fs');
const { join } = require('path');

const envPath = join(__dirname, '.env');
const env = {};
try {
  readFileSync(envPath, 'utf-8').split('\n').forEach(line => {
    const [key, ...val] = line.split('=');
    if (key && val.length) env[key.trim()] = val.join('=').trim();
  });
} catch(e) {}

module.exports = {
  apps: [{
    name: 'mcp-debat',
    script: 'src/index.js',
    cwd: __dirname,
    env: { NODE_ENV: 'production', ...env },
    instances: 1,
    autorestart: true,
    max_restarts: 10,
    restart_delay: 5000,
    max_memory_restart: '500M',
    log_date_format: 'YYYY-MM-DD HH:mm:ss',
    error_file: 'logs/error.log',
    out_file: 'logs/out.log',
    merge_logs: true
  }]
};
EOF

# Create directories
mkdir -p logs sessions memory runbooks

# Start
pm2 start ecosystem.config.cjs
pm2 save

With Nginx (HTTPS reverse proxy)

server {
    listen 443 ssl;
    server_name your-server.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location /mcp-debat {
        rewrite ^/mcp-debat(.*) $1 break;
        proxy_pass http://127.0.0.1:3900;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}

Set proxy_read_timeout to at least 120s — model responses can take up to 60s.

Health Check

curl https://your-server.com/mcp-debat/health
# {"status":"ok","uptime":12345,"version":"1.0.0"}

Or via MCP tool:

mcp_health({ deep: true })
// Returns: provider status, session count, memory usage, daily cost

Project Structure

mcp-debat/
├── config.json              # Provider configs, debate settings, executor paths
├── package.json
├── ecosystem.config.cjs     # PM2 production config
├── .env                     # API keys (not in repo)
├── src/
│   ├── index.js             # Entry point
│   ├── server.js            # HTTP server, JSON-RPC handler, auth
│   ├── engine/
│   │   ├── debate.js        # Core debate engine (phase logic, model calling, transcript)
│   │   ├── sessions.js      # Session CRUD, disk persistence
│   │   ├── executor.js      # Command executor, runbook helpers, security
│   │   ├── styles.js        # System prompts, phase prompts, role assignments
│   │   ├── collapse.js      # Sycophancy detection, collapse warning
│   │   ├── notebook.js      # Runbook loading, summarization, conclusions
│   │   └── cost.js          # Cost tracking per model/session/day
│   ├── providers/
│   │   ├── base.js          # BaseProvider class (refusal detection, truncation)
│   │   ├── deepseek.js      # DeepSeek (OpenAI-compatible SDK)
│   │   ├── gemini.js        # Gemini (Google AI SDK)
│   │   ├── mistral.js       # Mistral (OpenAI-compatible SDK)
│   │   ├── anthropic.js     # Claude (Anthropic SDK)
│   │   ├── qwen.js          # Qwen (Ollama API)
│   │   ├── openai.js        # GPT (OpenAI Responses API)
│   │   └── index.js         # Provider registry, init, health test
│   └── tools/
│       ├── index.js          # Tool registry
│       ├── debate.js         # mcp_debate tool
│       ├── respond.js        # mcp_respond tool
│       ├── quick.js          # mcp_quick tool
│       ├── review.js         # mcp_review tool
│       ├── sessions.js       # mcp_sessions tool
│       └── health.js         # mcp_health tool
├── scripts/
│   ├── sync-runbook.sh       # Sync single runbook to server (for remote setups)
│   └── hook_sync_debat.js    # PostToolUse hook for auto-sync
├── sessions/                 # Session JSON files (auto-created)
├── memory/                   # Conclusions + cost tracking (auto-created)
├── runbooks/                 # Target runbooks (optional)
└── logs/                     # PM2 logs (production)

Known Limitations

Limitation Impact Workaround
MCP client timeout ~60s Long model responses may timeout on client side Response is saved server-side — retry continue
Compact mode truncates at 3K chars Later models see truncated earlier responses Use history_mode: "full" for important debates
Full history 32K cap Extremely long responses still truncated By design — prevents context overflow
Confidence scores unreliable Model self-reported confidence has near-zero correlation with accuracy (r=0.024) Don't use confidence for automated decisions
Gemini health check flaky Returns "Empty response" on ping but works in debates Known issue, no impact on functionality

Cost Estimates

Typical cost per full debate (6 models, 1 round, all 4 phases):

Mode Approximate Cost
Compact history $0.50 - $1.50
Full history $1.00 - $3.00
Quick opinions $0.10 - $0.30
Code review $0.30 - $0.80

Cost varies by topic complexity and response length. Anthropic (Claude) is typically the most expensive provider.

License

Private project. Not open-sourced.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured