GraphRAG MCP Server

GraphRAG MCP Server

Enables graph-powered semantic search over the Grand Débat National dataset, providing fast, transparent answers with provenance tracing back to citizen contributions.

Category
Visit Server

README

GraphRAG MCP Server: Production-Grade Knowledge Graph Retrieval for LLMs

A remote MCP (Model Context Protocol) server delivering graph-powered semantic search over the Grand Débat National dataset. Query 50 communes with 8,000+ entities using graph-first architecture that's 29x faster than vector RAG with built-in provenance tracing every answer back to citizen contributions.

Live Endpoint (No signup required):

https://graphragmcp-production.up.railway.app/mcp

What Makes This Special

This isn't just another RAG system. GraphRAG MCP Server is built on seven constitutional principles that deliver measurable advantages in speed, transparency, and quality.

1. Lightning-Fast Graph Traversal

For users: Queries return in 1-2 seconds, not 30-60 seconds. Interactive experiences, real-time analysis.

How we do it: Pre-computed graph indices loaded at startup enable O(1) neighbor lookups. No per-query graph parsing.

Evidence: 50x performance improvement documented in troubleshooting.md — graph loading time reduced from 25-30 seconds per query to 0.5 seconds. Compared to traditional vector RAG, GraphRAG achieves 29x faster response times (1.3s mean latency vs 45s, measured across 54 queries in experimental evaluation).


2. No Orphan Nodes - Everything Connects

For users: Every piece of information is contextualized through relationships. You get richer context, better answers, no isolated facts.

How we do it: Commune-centric design where every entity tracks its source commune and connections. Graph operations only return entities with relationships — orphan nodes are automatically filtered.

Why it matters: Information without context is just noise. The graph structure ensures that when you ask about taxation concerns, you don't just get a keyword match — you get themes, related concepts, and the citizen contributions that discuss them together.


3. Complete Transparency - Answer Provenance

For users: See exactly which citizen contributions support each claim. Verify accuracy, build trust, audit responses.

How we do it: Text chunks are first-class graph nodes with bidirectional edges to entities. Every response includes source quotes traceable through the graph: chunk → entity → response.

Evidence: Chunk retrieval optimization reduced file I/O from 500ms+ to <1ms by treating chunks as graph entities with in-memory traversal (troubleshooting.md - "Fast Graph Traversal to Chunks"). After the GraphML source_id attribute discovery, 93.7% of entities now have retrievable source chunks (up from 0.15%) (constitution.md).


4. Universal MCP Compatibility

For users: Works with Claude Desktop, Cline, Dust.tt, any MCP client. Integrate once, use everywhere.

How we do it: Flat parameter signatures (not nested Pydantic models), JSON-RPC 2.0 compliance, Server-Sent Events for streaming. Tested with multiple clients.

Why it matters: The "Pydantic Validation Error" issue documented in troubleshooting.md shows that nested params break Dust.tt compatibility. Flat parameters ensure this server works universally without client-specific workarounds.


5. Performance by Design

For users: Every optimization is documented with before/after metrics. No mystery performance regressions, complete architectural transparency.

Evidence: troubleshooting.md documents 7 major optimization efforts with quantified improvements:

  • Pre-computed graph indices: 50x speedup
  • Dual-strategy retrieval: 16% → 92.7% corpus coverage
  • Fast chunk traversal: 500ms → <1ms
  • LLM cache singleton: -5-20s for overlapping queries

6. Empirically Validated Quality

For users: Changes are tested with LLM-as-judge, not gut feelings. Confidence that updates improve quality.

How we do it: OPIK evaluation framework with GPT-4o-mini judge measuring meaning_match, hallucination, answer_relevance, and latency. A/B comparisons control for model, temperature, timeout, and execution order.

Evidence: The experimental-design-rag-comparison.md evaluation revealed the 9-commune limitation bug (meaning_match: 0.037 → 0.60+ after fix). Systematic testing with 100% success rate (54/54 queries) and lower hallucination than vector RAG (0.25 vs 0.54) validates production-readiness.


7. Architecture Through Iteration

For users: System evolved through real-world problem solving, not ivory tower design. Battle-tested architecture.

Example: The GraphML source_id attribute discovery emerged from debugging why 99.85% of chunk retrievals were failing. Investigation revealed chunks weren't connected via HAS_SOURCE edges as expected, but through a semicolon-separated source_id attribute. This architectural insight (documented in troubleshooting.md) fundamentally changed how chunks are accessed, improving coverage from 0.15% to 93.7%.


Quick Start - Get Running in 5 Minutes

Test with curl

# 1. Initialize session
curl -s -i -X POST "https://graphragmcp-production.up.railway.app/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc": "2.0", "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}, "id": 1}'

# Note the mcp-session-id header in response

# 2. List available communes
curl -s -X POST "https://graphragmcp-production.up.railway.app/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "mcp-session-id: YOUR_SESSION_ID" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "grand_debat_list_communes", "arguments": {}}, "id": 2}'

# 3. Run your first query
curl -s -X POST "https://graphragmcp-production.up.railway.app/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "mcp-session-id: YOUR_SESSION_ID" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "grand_debat_query", "arguments": {"params": {"commune_id": "Rochefort", "query": "Quelles sont les principales préoccupations fiscales?", "mode": "local"}}}, "id": 3}'

Configure in Your MCP Client


Understanding the Dataset

What is the Grand Débat National?

The Grand Débat National (2019) was a French civic consultation initiative where citizens contributed to "Cahiers de Doléances" — notebooks documenting concerns, proposals, and perspectives on public policy. This server indexes citizen contributions from 50 communes in Charente-Maritime, creating a unique civic research tool.

Coverage & Structure

  • Geographic scope: 50 communes in Charente-Maritime département, France
  • Total entities: 8,000+ extracted concepts, themes, policy proposals
  • Structure: Each commune is a separate knowledge graph
  • Entity types (extracted from citizen contributions in French):
    • PROPOSITION (policy proposals/suggestions)
    • THEMATIQUE (thematic categories)
    • SERVICEPUBLIC (public services)
    • DOLEANCE (grievances/complaints)
    • CONCEPT (conceptual entities)
    • OPINION (citizen opinions/viewpoints)
    • ACTEURINSTITUTIONNEL (institutional actors)
    • CITOYEN (citizen references)
    • Plus others: REFORMEDEMOCRATIQUE (democratic reforms), TERRITOIRE (territories), CONSULTATION (consultations), VERBATIM (direct quotes), CLUSTERSEMANTIQUE (semantic clusters), TYPEIMPOT (tax types), REFORMEFISCALE (fiscal reforms), MESUREECOLOGIQUE (ecological measures), etc.
  • Relationship types (extraction system defines 26 semantic types):
    • Primary flow: SOUMET (submits), REPOND_A (responds to), APPARTIENT_A (belongs to), FAIT_PARTIE_DE (part of)
    • Content extraction: EXPRIME (expresses), FORMULE (formulates), FAIT_REMONTER (raises), CONTIENT (contains)
    • Thematic: CONCERNE (concerns), CIBLE (targets), GERE (manages)
    • Consensus: CONTRIBUE_A (contributes to), REVELE (reveals), SINSCRIT_DANS (inscribes in)
    • Programmatically added: HAS_SOURCE (entity→chunk provenance), SOURCED_BY (chunk→entity)
    • Note: GraphML files primarily store RELATED_TO edges; semantic types are defined in extraction prompts

Top Communes by Coverage

Commune Entities Communities Contributions
Rochefort 812 140 102
Marennes_Hiers_Brouage 659 119 52
Saint_Xandre 537 78 41
Saint_Jean_Dangely 505 0 50
Rivedoux_Plage 387 56 28
L_Gue_Dallere 356 17 21
Surgères 330 54 26

Use Cases: Civic research, policy analysis, democratic participation studies, thematic analysis of citizen concerns (taxation, public services, environmental issues, democratic participation).


Integration Guides

5.1 Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (macOS/Linux) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "grand-debat": {
      "url": "https://graphragmcp-production.up.railway.app/mcp",
      "transport": "streamable-http"
    }
  }
}

Restart Claude Desktop. Verify tools appear in the MCP tools list (hammer icon).


5.2 Cline / VS Code

Add to your MCP settings (.vscode/mcp.json or Cline extension settings):

{
  "grand-debat": {
    "url": "https://graphragmcp-production.up.railway.app/mcp",
    "transport": "streamable-http"
  }
}

Reload VS Code window. Verify tools appear in Cline's tool panel.


5.3 Dust.tt

Dust.tt supports remote MCP servers natively. See Dust Remote MCP Server docs.

Setup steps:

  1. Go to Dust AdminDevelopersMCP Servers
  2. Click Add Remote Server
  3. Enter server URL: https://graphragmcp-production.up.railway.app/mcp
  4. Give it a name (e.g., "Grand Debat GraphRAG")
  5. Click Sync — Dust will discover all 5 tools automatically
  6. Assign the server to your desired Spaces

Using in Dust Agents:

@agent Query the Grand Debat data for Rochefort about fiscal concerns

The agent will automatically initialize a session, call grand_debat_query, and return the GraphRAG-powered response.


5.4 Custom MCP Clients

Requirements: JSON-RPC 2.0 over HTTP, Server-Sent Events (SSE) for streaming responses.

Session Flow:

  1. Initialize: POST /mcp with initialize method → receive mcp-session-id in response headers
  2. Call tools: POST /mcp with tools/call method, include mcp-session-id header
  3. Parse SSE: Responses are event: message with data: {...} containing JSON-RPC result

Example (Python):

import httpx

session = httpx.Client(base_url="https://graphragmcp-production.up.railway.app")

# Initialize
resp = session.post("/mcp", json={
    "jsonrpc": "2.0",
    "method": "initialize",
    "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "custom", "version": "1.0"}},
    "id": 1
}, headers={"Accept": "application/json, text/event-stream"})

session_id = resp.headers["mcp-session-id"]

# Call tool
resp = session.post("/mcp", json={
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {"name": "grand_debat_list_communes", "arguments": {}},
    "id": 2
}, headers={"mcp-session-id": session_id, "Accept": "application/json, text/event-stream"})

print(resp.text)  # Parse SSE response

MCP Tools Reference

6.1 Which Tool Should I Use?

Want to see what's available?
  ↓
  grand_debat_list_communes

Have a specific question about a commune?
  ↓
  grand_debat_query (mode: "local")

Need a thematic overview?
  ↓
  grand_debat_query (mode: "global")

Looking for specific entities/themes?
  ↓
  grand_debat_search_entities

Want to explore topic clusters?
  ↓
  grand_debat_get_communities

Need to read original citizen texts?
  ↓
  grand_debat_get_contributions

6.2 Tool Details

grand_debat_list_communes

Purpose: Discover all 50 available communes with statistics (entity counts, community counts, contribution counts).

When to use: First step to understand dataset coverage, or to get exact commune IDs for queries.

Parameters: None

Returns: Array of commune objects with name, total_entities, total_communities, total_contributions.

Example:

{
  "name": "grand_debat_list_communes",
  "arguments": {}
}

Response:

{
  "communes": [
    {"name": "Rochefort", "total_entities": 812, "total_communities": 140, "total_contributions": 102},
    {"name": "Marennes_Hiers_Brouage", "total_entities": 659, "total_communities": 119, "total_contributions": 52},
    ...
  ]
}

grand_debat_query

Purpose: Main query tool — answer questions using GraphRAG with local (entity-based) or global (community-based) modes.

When to use: This is your primary tool for answering questions about citizen concerns. Use local mode for targeted fact-finding, global mode for thematic overviews.

Parameters:

Parameter Type Required Description
commune_id string Yes Exact commune name (use grand_debat_list_communes to get valid IDs)
query string Yes Natural language question (French recommended for this dataset)
mode string Yes "local" (entity-based) or "global" (community-based)

Returns: Structured response with answer (synthesized answer), sources (entity names or community reports), provenance (source chunks with quotes).

Example (Local Mode):

{
  "name": "grand_debat_query",
  "arguments": {
    "params": {
      "commune_id": "Rochefort",
      "query": "Quelles sont les principales préoccupations fiscales des citoyens?",
      "mode": "local"
    }
  }
}

Example (Global Mode):

{
  "name": "grand_debat_query",
  "arguments": {
    "params": {
      "commune_id": "Surgères",
      "query": "Quels sont les grands thèmes abordés par les citoyens?",
      "mode": "global"
    }
  }
}

Tips:

  • Use exact commune IDs from grand_debat_list_communes (e.g., Saint_Jean_Dangely not Saint-Jean-d'Angély)
  • Local mode finds specific entities → traverses graph → retrieves source chunks (~1-2s)
  • Global mode uses AI-generated community summaries → faster for broad themes (~1-3s)

grand_debat_search_entities

Purpose: Search for entities (themes, concepts, actors) matching a keyword pattern.

When to use: When you need to find specific topics mentioned in the data without asking a full question.

Parameters:

Parameter Type Required Description
commune_id string Yes Commune to search within
pattern string Yes Keyword or phrase to match (case-insensitive, partial match)
limit integer No Max results to return (default: 20)

Returns: Array of entities with entity_name, entity_type, description.

Example:

{
  "name": "grand_debat_search_entities",
  "arguments": {
    "params": {
      "commune_id": "Marans",
      "pattern": "retraite",
      "limit": 20
    }
  }
}

grand_debat_get_communities

Purpose: Retrieve AI-generated thematic community reports (Louvain algorithm clustering).

When to use: Explore how the GraphRAG system has organized entities into topic clusters.

Parameters:

Parameter Type Required Description
commune_id string Yes Commune to retrieve communities from
limit integer No Max communities to return (default: 10)

Returns: Array of community objects with level, title, summary, rank, findings.

Example:

{
  "name": "grand_debat_get_communities",
  "arguments": {
    "params": {
      "commune_id": "Rivedoux_Plage",
      "limit": 10
    }
  }
}

grand_debat_get_contributions

Purpose: Get original citizen contribution texts (source documents).

When to use: Read raw citizen input, verify quotes, understand context beyond extracted entities.

Parameters:

Parameter Type Required Description
commune_id string Yes Commune to retrieve contributions from
limit integer No Max contributions to return (default: 5)

Returns: Array of contribution objects with full_doc_id, content, commune, tokens, chunk_order_index.

Example:

{
  "name": "grand_debat_get_contributions",
  "arguments": {
    "params": {
      "commune_id": "Andilly",
      "limit": 5
    }
  }
}

Query Modes Explained

7.1 Local Mode - Entity-Based Retrieval

What it does: Finds specific entities matching your query, traverses the graph to find related entities and relationships, retrieves source chunks via graph edges, synthesizes answer with LLM using context.

Best for:

  • Targeted questions ("What do citizens say about pensions?")
  • Fact-finding ("Are there concerns about fiscal policy?")
  • Specific topics ("Mentions of environmental issues")

How it works: Keyword matching → graph expansion via weighted Dijkstra → chunk retrieval via source_id attribute → LLM synthesis with provenance.

Performance: ~1-2 seconds (graph traversal is <1ms, LLM call is majority of latency).

Example questions:

  • "Quelles sont les préoccupations fiscales à Rochefort?"
  • "Que disent les citoyens sur les retraites?"
  • "Mentions de la transition écologique?"

7.2 Global Mode - Community-Based Analysis

What it does: Selects relevant community reports (AI-generated thematic summaries), combines community summaries as context, synthesizes high-level overview with LLM.

Best for:

  • Broad overviews ("What are the main themes?")
  • Thematic patterns ("Overview of citizen concerns")
  • Multi-topic analysis ("What topics are discussed together?")

How it works: Community selection via keyword matching → report retrieval (pre-generated) → LLM synthesis with thematic context.

Performance: ~1-3 seconds (slightly slower due to larger context from community summaries).

Example questions:

  • "Quels sont les grands thèmes du débat?"
  • "Vue d'ensemble des préoccupations citoyennes?"
  • "Thématiques principales abordées?"

7.3 Choosing the Right Mode

Question Type Mode Reason
Specific facts Local Direct entity retrieval with provenance
Thematic overview Global Community summaries provide high-level patterns
Multi-commune Local Cross-graph traversal (set commune_id to null or query all)
Exploratory Global Higher-level patterns without drilling into specifics
Provenance-critical Local Full chunk→entity→response tracing

Performance & Quality

8.1 Benchmarked Performance

Latency: 1.3s mean vs 45s for vector RAG29x faster (experimental-design-rag-comparison.md)

Reliability: 100% success rate (54/54 queries successful in evaluation)

Coverage: 92.7% corpus coverage with dual-strategy retrieval (up from 16% with single-strategy)

Provenance: 93.7% of entities have retrievable source chunks (up from 0.15% after GraphML source_id discovery)


8.2 Quality Validation

Framework: OPIK evaluation platform with GPT-4o-mini as LLM judge (temperature=0 for consistency).

Metrics:

  • meaning_match: Semantic equivalence between response and expected answer
  • hallucination: Inverted faithfulness score (1 = faithful, 0 = hallucinated)
  • answer_relevance: How directly response addresses question
  • usefulness: Practical utility for answering civic questions

Results (from experimental-design-rag-comparison.md):

  • GraphRAG: Lower hallucination rate than vector RAG (0.25 vs 0.54)
  • GraphRAG: Equivalent semantic precision to vector RAG (0.30 LLM precision score)
  • GraphRAG: 35x faster latency while maintaining quality

8.3 Key Optimizations

Brief summary with links to troubleshooting.md:

  1. Pre-computed graph indices: 50x speedup (25-30s → 0.5s per query) by loading all commune graphs at startup into in-memory adjacency lists
  2. Dual-strategy retrieval: 16% → 92.7% coverage by combining community keywords + global entity search for cross-commune queries
  3. Fast chunk traversal: 500ms+ file I/O → <1ms in-memory by treating chunks as graph entities with bidirectional edges
  4. GraphML source_id fix: 0.15% → 93.7% success rate by discovering chunks are connected via source_id attribute (not HAS_SOURCE edges)
  5. LLM cache singleton: -5-20s for overlapping queries by preventing redundant cache initialization per request
  6. Weighted graph traversal: Prioritizes semantic relationships (CONCERNE: 1.0) over structural relationships (RELATED_TO: 0.1) using Dijkstra's algorithm
  7. DNS rebinding protection fix: Resolves HTTP 421 errors on Railway/Cloud Run by disabling MCP SDK's DNS rebinding check (proxy handles security)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                   MCP Client (Claude, Cline, etc.)          │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ Streamable HTTP / MCP Protocol
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              Grand Debat MCP Server (Railway)               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ FastMCP + Uvicorn                                    │  │
│  │                                                       │  │
│  │ Tools:                                                │  │
│  │  - grand_debat_list_communes                         │  │
│  │  - grand_debat_query (local/global)                  │  │
│  │  - grand_debat_search_entities                       │  │
│  │  - grand_debat_get_communities                       │  │
│  │  - grand_debat_get_contributions                     │  │
│  │                                                       │  │
│  │ GraphIndex (in-memory adjacency lists)               │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    nano_graphrag Engine                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │   Entities  │  │ Communities │  │   Text Chunks       │ │
│  │   (VDB)     │  │  (Reports)  │  │ (Contributions)     │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Knowledge Graph (GraphML)               │   │
│  │   Entities ──relationships──> Entities               │   │
│  │   Chunks connected via source_id attribute           │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      OpenAI API                             │
│              (GPT-4o-mini for query synthesis)              │
└─────────────────────────────────────────────────────────────┘

Components

  • FastMCP + Uvicorn: MCP protocol server with TransportSecuritySettings for reverse proxy compatibility
  • GraphIndex: Pre-computed graph indices (feature 007) enabling O(1) neighbor lookups, loaded at server startup
  • nano_graphrag: Graph traversal and query engine with dual-strategy retrieval
  • Storage: GraphML graphs (entity relationships), JSON stores (chunks, community reports, entity vectors)
  • LLM: GPT-4o-mini for query synthesis and community report generation

Data Flow

  1. Client initializes MCP session → receives mcp-session-id
  2. Tool call dispatched (e.g., grand_debat_query) with session ID
  3. GraphIndex retrieves entities via O(1) adjacency list lookups
  4. Graph traversal finds related entities/chunks using weighted Dijkstra
  5. Context assembled from chunks (via source_id attribute) + community reports
  6. LLM synthesizes answer with source quotes and provenance chains
  7. Response streamed via Server-Sent Events (SSE)

Key Architectural Decisions

  • Pre-computation: All 50 commune graphs loaded at startup (no lazy loading) to guarantee O(1) lookups
  • Graph-first: Chunks are graph nodes, provenance is graph edges (bidirectional chunk↔entity connections)
  • Weighted traversal: Dijkstra's algorithm with relationship type weights (CONCERNE: 1.0, HAS_SOURCE: 0.9, APPARTIENT_A: 0.3, RELATED_TO: 0.1)
  • Dual-strategy: For cross-commune queries, combine community keywords + global entity search to achieve 92.7% coverage

Data Structure

Each commune folder contains pre-indexed GraphRAG data:

law_data/
├── Rochefort/
│   ├── vdb_entities.json              # Entity vector database
│   ├── kv_store_text_chunks.json      # Original contribution texts
│   ├── kv_store_community_reports.json # AI-generated community summaries
│   ├── kv_store_full_docs.json        # Full documents
│   ├── kv_store_llm_response_cache.json # Cached LLM responses
│   └── graph_chunk_entity_relation.graphml # Knowledge graph
├── Andilly/
│   └── ...
└── ... (50 communes total)

GraphML Schema

Nodes:

  • entity_name: Entity identifier (unique per commune)
  • entity_type: COMMUNE, CONCEPT, THEME, CITIZEN_CONTRIBUTION, CHUNK
  • description: Natural language description
  • source_id: Critical attribute — semicolon-separated chunk IDs (e.g., "chunk_001<SEP>chunk_002")

Edges:

  • relationship_type or type: CONCERNE, HAS_SOURCE, APPARTIENT_A, RELATED_TO
  • weight: Relationship strength (optional)

Chunks: Connected via source_id attribute (NOT via HAS_SOURCE edges — this was a key discovery documented in troubleshooting.md).


Deployment

Environment Variables

Variable Description Required Default Example
OPENAI_API_KEY OpenAI API key for LLM calls Yes - sk-...
GRAND_DEBAT_DATA_PATH Path to commune data directory No ./law_data /data/communes
PORT HTTP server port No 8080 8000
ENABLE_OPIK_LOGGING Enable evaluation logging No true false
OPIK_API_KEY Opik API key for logging (optional) No - ...

Deploy to Railway

# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Link to project
railway link

# Set environment variables
railway variables --set "OPENAI_API_KEY=your-key"

# Deploy
railway up

Important: Railway uses a reverse proxy (railway-edge). The server includes TransportSecuritySettings(enable_dns_rebinding_protection=False) to prevent HTTP 421 "Invalid Host header" errors (see troubleshooting.md).


Deploy to Cloud Run

gcloud run deploy grand-debat-mcp \
  --source . \
  --region europe-west1 \
  --allow-unauthenticated \
  --set-env-vars "OPENAI_API_KEY=your-key"

Docker

# Build
docker build -t grand-debat-mcp .

# Run
docker run -p 8080:8080 \
  -e OPENAI_API_KEY="your-key" \
  -v $(pwd)/law_data:/app/law_data \
  grand-debat-mcp

Local Development

Prerequisites: Python 3.11+, OpenAI API key

# Clone repository
git clone https://github.com/ArthurSrz/graphRAGmcp.git
cd graphRAGmcp

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export OPENAI_API_KEY="your-api-key"
export GRAND_DEBAT_DATA_PATH="./law_data"

# Run with stdio (for MCP Inspector testing)
python server.py --stdio

# Run as HTTP server
python server.py --port 8000

Test with MCP Inspector:

npx @modelcontextprotocol/inspector python server.py --stdio

Troubleshooting

Common Issues

1. "Invalid Host header" (HTTP 421)

Cause: MCP SDK's DNS rebinding protection rejects requests from reverse proxies (Railway, Cloud Run) where the Host header doesn't match allowed list.

Solution: The server includes TransportSecuritySettings(enable_dns_rebinding_protection=False) — security is handled at the proxy layer. If you're running a custom deployment, ensure this setting is present in server.py.


2. "Field required" Pydantic Validation Errors

Cause: Nested Pydantic models break Dust.tt and other MCP clients that expect flat parameter schemas.

Solution: This server uses flat parameters with Annotated[type, Field(description="...")] for universal client compatibility. If you're modifying tools, avoid nested params.


3. Empty Query Results

Cause: Commune ID mismatch (e.g., using Saint-Jean-d'Angély instead of Saint_Jean_Dangely).

Solution: Always use grand_debat_list_communes to get exact commune IDs. The response includes the correct underscore-formatted names.


4. Session Errors

Cause: Missing mcp-session-id header in tool calls.

Solution:

  1. Call initialize method first → extract mcp-session-id from response headers
  2. Include mcp-session-id: <your-session-id> header in all subsequent tools/call requests

5. Slow First Query

Cause: First query after server startup warms caches (LLM response cache initialization, entity vector loading).

Solution: Expected behavior — subsequent queries are faster (~1-2s). This is a one-time cost per server restart.


For detailed troubleshooting, see troubleshooting.md which documents all major optimizations, bug fixes, and architectural discoveries.


Example Queries

Discovery

List all communes:

{"name": "grand_debat_list_communes", "arguments": {}}

Search for retirement-related entities:

{
  "name": "grand_debat_search_entities",
  "arguments": {"params": {"commune_id": "Marans", "pattern": "retraite", "limit": 20}}
}

Targeted Research (Local Mode)

Fiscal concerns in Rochefort:

{
  "name": "grand_debat_query",
  "arguments": {"params": {"commune_id": "Rochefort", "query": "Quelles sont les principales préoccupations fiscales des citoyens?", "mode": "local"}}
}

Retirement topics:

{
  "name": "grand_debat_query",
  "arguments": {"params": {"commune_id": "Saint_Xandre", "query": "Que disent les citoyens sur les retraites?", "mode": "local"}}
}

Thematic Analysis (Global Mode)

Overall themes in Surgères:

{
  "name": "grand_debat_query",
  "arguments": {"params": {"commune_id": "Surgères", "query": "Quels sont les grands thèmes abordés par les citoyens?", "mode": "global"}}
}

Community clusters in Rivedoux-Plage:

{
  "name": "grand_debat_get_communities",
  "arguments": {"params": {"commune_id": "Rivedoux_Plage", "limit": 10}}
}

Provenance & Verification

Get original contributions from Andilly:

{
  "name": "grand_debat_get_contributions",
  "arguments": {"params": {"commune_id": "Andilly", "limit": 5}}
}

Query with full provenance tracing (Local mode automatically includes source chunks with quotes):

{
  "name": "grand_debat_query",
  "arguments": {"params": {"commune_id": "Rochefort", "query": "Préoccupations environnementales?", "mode": "local"}}
}

Contributing & Support

Questions & Bug Reports

Performance Feedback

  • Report performance regressions with benchmarks (before/after latency measurements)
  • Performance improvements should include quantified impact (see troubleshooting.md for template)

Documentation Improvements

  • Corrections and clarifications welcome via Pull Requests
  • Focus on user-facing documentation (integration guides, examples, troubleshooting)

Links & Resources


License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured