MCP Debat
Orchestrates sequential debates between multiple AI models across four phases (constructive, challenge, closing, synthesis) with host intervention and anti-sycophancy enforcement.
README
MCP Debat — Multi-Model AI Debate Server
MCP Debat is an MCP (Model Context Protocol) server that orchestrates sequential debates between 6 AI models. Models present approaches, challenge each other, and synthesize a final verdict — all through structured phases with anti-sycophancy enforcement and evidence-based validation.
Built for offensive security workflows but usable for any domain requiring multi-perspective AI analysis.
How It Works
You (Host) MCP Debat Server 6 AI Models
│ │ │
├── mcp_debate(topic) ────────>│ │
│<── session_id + briefing ────│ │
│ │ │
├── mcp_respond(continue) ────>│── call Model 1 ─────────────────>│
│<── Model 1 response ─────────│<── response ─────────────────────│
│ │ │
├── mcp_respond(continue) ────>│── call Model 2 (sees Model 1) ──>│
│<── Model 2 response ─────────│<── response ─────────────────────│
│ ... │ ... │
│ │ │
├── mcp_respond(evidence) ────>│── inject into transcript ────────│
│ │ │
├── mcp_respond(synthesize) ──>│── Synthesizer produces verdict ──>│
│<── verdict + conclusions ────│ │
│ │ │
├── mcp_respond(close) ───────>│── save conclusions ──────────────│
Each model receives a rebuilt transcript of all prior responses (shared canonical transcript replay), so every model sees what came before it. The host (you) can inject evidence, corrections, or decisions between any model's turn.
Features
- 6 AI Models: DeepSeek, Gemini, Mistral, Claude, Qwen, GPT — configurable and extensible
- 4-Phase Debate: Constructive → Challenge → Closing → Synthesis
- Sequential Model-by-Model: one model per API call, no timeout issues
- Host Intervention: inject info, corrections, evidence, or decisions mid-debate
- Anti-Sycophancy: per-response critique enforcement in Challenge phase — models must provide weakness + steel man + counterargument or get re-prompted
- Evidence Gate: Synthesis claims tagged
[VERIFIED]must reference actual evidence or get flagged - Command Executor: AI models can run read-only commands on the server to verify claims
- Runbook Integration: auto-load target-specific context into debates (for security workflows)
- Multi-User: session ownership, isolation between clients, transfer mechanism
- Session Persistence: all sessions saved to disk, resume anytime
- Cost Tracking: per-model, per-session, and daily cost tracking
- Debate Memory: conclusions from past debates auto-injected into relevant new debates
Requirements
- Node.js >= 18.0.0
- API Keys for at least 1 provider (more = better debate quality)
- A server or machine to run on (can be localhost or remote)
Installation
# Clone the repository
git clone <your-repo-url> mcp-debat
cd mcp-debat
# Install dependencies
npm install
# Create environment file
cp .env.example .env
# Edit .env with your API keys (see Configuration below)
# Start the server
node src/index.js
# Or use PM2 for production (recommended)
pm2 start ecosystem.config.cjs
Configuration
Environment Variables (.env)
Create a .env file in the project root:
# Required: at least 1 provider API key
DEEPSEEK_API_KEY=sk-your-deepseek-key
GEMINI_API_KEY=AIza-your-gemini-key
MISTRAL_API_KEY=your-mistral-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
OPENAI_API_KEY=sk-your-openai-key
QWEN_API_KEY=your-qwen-key
# Server
PORT=3900
AUTH_TOKEN=your-random-secret-token
Providers without API keys are skipped at startup — the server works with as few as 1 model.
Provider Configuration (config.json)
Each provider in config.json has:
{
"providers": {
"deepseek": {
"enabled": true, // set false to disable
"name": "DeepSeek V4 Pro", // display name
"model": "deepseek-v4-pro",// model ID sent to API
"baseURL": "https://api.deepseek.com",
"apiKeyEnv": "DEEPSEEK_API_KEY",
"sdk": "openai", // SDK type: openai | google | anthropic | openai-responses | ollama
"timeout": 120000, // per-call timeout in ms
"maxTokens": 2048, // max output tokens per response
"costPer1M": { "input": 0.435, "output": 0.87 }
}
// ... more providers
},
"debate": {
"defaultRounds": 3,
"maxRounds": 5,
"modelResponseCap": 3000 // compact mode truncation limit (chars)
},
"executor": {
"enabled": true,
"allowedPaths": ["/your/server/path/runbooks", "/your/server/path/src"]
}
}
Adding a New Provider
- Create
src/providers/yourprovider.jsextendingBaseProvider - Implement the
chat(messages, opts)method - Register it in
src/providers/index.js - Add config entry in
config.json
SDK types already supported: openai (OpenAI-compatible), google (Gemini), anthropic (Claude), openai-responses (GPT Responses API), ollama (local models).
Connecting to Claude Code
Add to your Claude Code MCP config (~/.claude.json or project .mcp.json):
{
"mcpServers": {
"mcp-debat": {
"type": "url",
"url": "https://your-server.com/mcp-debat",
"headers": {
"Authorization": "Bearer your-auth-token"
}
}
}
}
For local development (no auth):
{
"mcpServers": {
"mcp-debat": {
"type": "url",
"url": "http://localhost:3900"
}
}
}
When AUTH_TOKEN is not set, the server binds to 127.0.0.1 only (localhost) for safety. With AUTH_TOKEN set, it binds to 0.0.0.0 (all interfaces).
Usage
6 MCP Tools
| Tool | Purpose | Blocking? |
|---|---|---|
mcp_debate |
Start new debate or resume existing session | Per-model |
mcp_respond |
Inject content + control debate flow | Instant or per-model |
mcp_quick |
Quick parallel opinions from all models | ~30-60s |
mcp_review |
Multi-model code review with verdict | Per-model |
mcp_sessions |
List, get, delete, transfer sessions | Instant |
mcp_health |
Server + provider status check | Instant |
Starting a Debate
// Start a new debate
mcp_debate({
topic: "Is this SQL injection exploitable given prepared statements?",
style: "exploit-review", // freeform | redteam | socratic | exploit-review
context: "Target runs PHP 8.2 + MySQL 8.0, PDO with prepared statements",
history_mode: "full", // "compact" (3K chars/response) or "full" (32K)
rounds: 1 // number of debate rounds (default: 3, max: 5)
})
// Returns: { session_id, phase: "briefing", briefing: {...} }
Advancing the Debate
Each continue call makes exactly 1 model respond:
// Model 1 speaks
mcp_respond({ session_id: "xxx", action: "continue" })
// Model 2 speaks (sees Model 1's response)
mcp_respond({ session_id: "xxx", action: "continue" })
// ... repeat for all models, then next phase starts
Host Intervention
Inject your own input between any model's turn:
// Add evidence
mcp_respond({
session_id: "xxx",
type: "evidence",
response: "I ran the exploit and got: HTTP 500 Internal Server Error",
action: "continue"
})
// Correct a model's mistake
mcp_respond({
session_id: "xxx",
type: "correct",
response: "That's not Apache, the target runs Nginx 1.25",
action: "continue"
})
// Make a decision and skip to synthesis
mcp_respond({
session_id: "xxx",
type: "decision",
response: "Use approach B from DeepSeek's analysis",
action: "synthesize"
})
// Close the debate
mcp_respond({ session_id: "xxx", action: "close" })
Quick Opinions (No Debate)
// All models answer in parallel — fast cross-check
mcp_quick({
question: "Is CVE-2024-1234 applicable to nginx 1.25.3?",
context: "Target has default config with proxy_pass"
})
Code Review
mcp_review({
code: "function login(user, pass) { ... }",
focus: ["security", "bypass"],
language: "javascript"
})
// Returns: per-model findings + synthesized verdict
Debate Phases
Phase 0: BRIEFING
→ Host opens with topic + context
→ Server loads target runbook + past conclusions
Phase 1: CONSTRUCTIVE
→ Each model presents their approach (no critique allowed)
→ Output: approach, reasoning, evidence, confidence
Phase 2: CHALLENGE
→ Models critique each other (reverse order)
→ Steel Man required before any criticism
→ Server enforces: must include weakness + counterargument
→ If missing → auto re-prompt with anti-sycophancy warning
Phase 3: CLOSING
→ Each model: final position + what changed + confidence
Phase 4: SYNTHESIS
→ Rotating synthesizer (different from majority)
→ Evidence gate: [VERIFIED] claims must have references
→ Output: recommendation, dissent, unresolved, action items
→ Host decides: accept → close | reject → retry from Challenge
Debate Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
topic |
string | required | Question or problem to debate |
target |
string | null | Target name — auto-loads runbook if available |
style |
string | "freeform" | Debate style: freeform, redteam, socratic, exploit-review |
rounds |
number | 3 | Max debate rounds (1-5) |
history_mode |
string | "compact" | compact: 3K chars/response in transcript. full: 32K cap |
runbook_mode |
string | "summary" | summary: 8K briefing. full: entire runbook in briefing |
require_full_runbook |
boolean | false | Auto-read entire runbook into transcript before debate starts |
executor_mode |
string | "safe" | safe: AI can run read-only commands. off: disabled |
models |
array | all 6 | Override which models participate |
client_id |
string | null | Your instance ID for multi-user session ownership |
Command Executor
When executor_mode: "safe", AI models can request commands during debate to verify their claims:
Available tools for AI:
bash— read-only shell commands (cat, grep, find, ls, head, tail, diff, wc, sort, uniq, cut, awk, sed -n, jq, file, stat, md5sum, sha256sum, strings, git log/diff/show/status/blame)read_runbook_chunk(offset, limit)— read target runbook in batchesread_runbook_section(section)— read specific runbook sectionsearch_runbook(query)— search keywords in target runbook
Security (3-layer):
- Path whitelist — only configured directories (default:
runbooks/,src/) - Command blacklist — blocks rm, dd, kill, sudo, eval, write operations
- Command whitelist — allows: cat, grep, find, ls, head, tail, diff, wc, sort, uniq, cut, awk, sed -n, jq, file, stat, md5sum, sha256sum, strings, git (read-only)
Commands are target-locked: AI can only read the runbook belonging to the current session's target.
Multi-User Support
Sessions have an owner_client field. When client_id is provided:
- List: only shows your sessions
- Get: non-owners see summary only (topic, target, status)
- Respond: blocked for non-owners
- Transfer: owner can transfer session to another client
- Delete: owner-only
// List only my sessions
mcp_sessions({ action: "list", client_id: "my-instance-123" })
// Transfer a session
mcp_sessions({
action: "transfer",
session_id: "xxx",
to_client: "colleague-456",
client_id: "my-instance-123"
})
Runbook Integration (Optional)
For security workflows, you can sync target-specific runbooks to the server. Place markdown files in the runbooks/ directory:
runbooks/
RUNBOOK_target-name.md
RUNBOOK_example.com.md
When you start a debate with target: "example.com", the server loads RUNBOOK_example.com.md and injects relevant sections into the debate context.
With require_full_runbook: true, the entire runbook is read into the transcript before the debate starts, so all models have complete context.
Production Deployment
With PM2
# Create ecosystem.config.cjs
cat > ecosystem.config.cjs << 'EOF'
const { readFileSync } = require('fs');
const { join } = require('path');
const envPath = join(__dirname, '.env');
const env = {};
try {
readFileSync(envPath, 'utf-8').split('\n').forEach(line => {
const [key, ...val] = line.split('=');
if (key && val.length) env[key.trim()] = val.join('=').trim();
});
} catch(e) {}
module.exports = {
apps: [{
name: 'mcp-debat',
script: 'src/index.js',
cwd: __dirname,
env: { NODE_ENV: 'production', ...env },
instances: 1,
autorestart: true,
max_restarts: 10,
restart_delay: 5000,
max_memory_restart: '500M',
log_date_format: 'YYYY-MM-DD HH:mm:ss',
error_file: 'logs/error.log',
out_file: 'logs/out.log',
merge_logs: true
}]
};
EOF
# Create directories
mkdir -p logs sessions memory runbooks
# Start
pm2 start ecosystem.config.cjs
pm2 save
With Nginx (HTTPS reverse proxy)
server {
listen 443 ssl;
server_name your-server.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location /mcp-debat {
rewrite ^/mcp-debat(.*) $1 break;
proxy_pass http://127.0.0.1:3900;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}
Set proxy_read_timeout to at least 120s — model responses can take up to 60s.
Health Check
curl https://your-server.com/mcp-debat/health
# {"status":"ok","uptime":12345,"version":"1.0.0"}
Or via MCP tool:
mcp_health({ deep: true })
// Returns: provider status, session count, memory usage, daily cost
Project Structure
mcp-debat/
├── config.json # Provider configs, debate settings, executor paths
├── package.json
├── ecosystem.config.cjs # PM2 production config
├── .env # API keys (not in repo)
├── src/
│ ├── index.js # Entry point
│ ├── server.js # HTTP server, JSON-RPC handler, auth
│ ├── engine/
│ │ ├── debate.js # Core debate engine (phase logic, model calling, transcript)
│ │ ├── sessions.js # Session CRUD, disk persistence
│ │ ├── executor.js # Command executor, runbook helpers, security
│ │ ├── styles.js # System prompts, phase prompts, role assignments
│ │ ├── collapse.js # Sycophancy detection, collapse warning
│ │ ├── notebook.js # Runbook loading, summarization, conclusions
│ │ └── cost.js # Cost tracking per model/session/day
│ ├── providers/
│ │ ├── base.js # BaseProvider class (refusal detection, truncation)
│ │ ├── deepseek.js # DeepSeek (OpenAI-compatible SDK)
│ │ ├── gemini.js # Gemini (Google AI SDK)
│ │ ├── mistral.js # Mistral (OpenAI-compatible SDK)
│ │ ├── anthropic.js # Claude (Anthropic SDK)
│ │ ├── qwen.js # Qwen (Ollama API)
│ │ ├── openai.js # GPT (OpenAI Responses API)
│ │ └── index.js # Provider registry, init, health test
│ └── tools/
│ ├── index.js # Tool registry
│ ├── debate.js # mcp_debate tool
│ ├── respond.js # mcp_respond tool
│ ├── quick.js # mcp_quick tool
│ ├── review.js # mcp_review tool
│ ├── sessions.js # mcp_sessions tool
│ └── health.js # mcp_health tool
├── scripts/
│ ├── sync-runbook.sh # Sync single runbook to server (for remote setups)
│ └── hook_sync_debat.js # PostToolUse hook for auto-sync
├── sessions/ # Session JSON files (auto-created)
├── memory/ # Conclusions + cost tracking (auto-created)
├── runbooks/ # Target runbooks (optional)
└── logs/ # PM2 logs (production)
Known Limitations
| Limitation | Impact | Workaround |
|---|---|---|
| MCP client timeout ~60s | Long model responses may timeout on client side | Response is saved server-side — retry continue |
| Compact mode truncates at 3K chars | Later models see truncated earlier responses | Use history_mode: "full" for important debates |
| Full history 32K cap | Extremely long responses still truncated | By design — prevents context overflow |
| Confidence scores unreliable | Model self-reported confidence has near-zero correlation with accuracy (r=0.024) | Don't use confidence for automated decisions |
| Gemini health check flaky | Returns "Empty response" on ping but works in debates | Known issue, no impact on functionality |
Cost Estimates
Typical cost per full debate (6 models, 1 round, all 4 phases):
| Mode | Approximate Cost |
|---|---|
| Compact history | $0.50 - $1.50 |
| Full history | $1.00 - $3.00 |
| Quick opinions | $0.10 - $0.30 |
| Code review | $0.30 - $0.80 |
Cost varies by topic complexity and response length. Anthropic (Claude) is typically the most expensive provider.
License
Private project. Not open-sourced.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.