MongoDB Intelligence MCP Server
An autonomous MCP server that enables LLMs to intelligently query and analyze MongoDB databases by reverse-engineering schemas, proving relationships, and enforcing security safeguards like PII masking and query limits.
README
<div align="center"> <h1>🧠 MongoDB Intelligence Server (v2.0 Enterprise)</h1> <p><b>An Autonomous, Self-Healing Model Context Protocol (MCP) Server for LLMs</b></p>
📖 The "Blind AI" Problem
Large Language Models (like Claude, GPT-4, and Cursor) are incredibly smart at writing SQL because SQL has a strict INFORMATION_SCHEMA. AI can instantly read the table structures.
MongoDB is Schemaless.
If you point an AI at a raw MongoDB database using the official MCP, the AI is essentially "blind." It has to guess field names, it doesn't know what collections link to each other (no Foreign Keys), and it will constantly hallucinate broken pymongo queries that crash your app.
💡 The Solution
This MCP injects a Dynamic Intelligence Layer between your database and the AI. Before the AI even asks a question, this server:
- Reverse-Engineers the Schema: Scans all collections to infer exact data types, nullability, and Enums.
- Proves Foreign Keys: Uses mathematical heuristics and
$inqueries to physically prove relationships between collections. - Anonymizes Data: Detects and masks PII (Personally Identifiable Information) so your sensitive data never leaves your local machine.
🏗️ Architectural Overview
The server is decoupled into three pluggable layers:
graph TD
A[AI Agent / LLM] <-->|MCP Protocol via stdio| B[server.py - 14 Exposed Tools]
subgraph "MongoDB Intelligence Server"
B <--> C[Intelligence Layer]
C <--> D[Data Access Layer]
C <--> E[Presentation Layer]
end
D <-->|Safe, Paginated Queries| F[(Raw MongoDB)]
C --> G[Discovery Engine]
C --> H[Knowledge Graph]
C --> I[Security Analyzers]
🛡️ Enterprise Security & Hardening
Standard AI connections are dangerous. We engineered 4 critical safeguards:
- Memory Protection (OOM Safety Ceiling):
If an AI hallucinates a
{"$limit": 5000000}query, it will crash your RAM. The MCP dynamically parses all incoming MQL pipelines. If a limit exceeds 1,000, it hard-caps it and returns aLimit enforced: custom (max 1000)flag to the LLM. - PII Anonymization:
The engine actively scans field names. If it detects
password,ssn,email, orphone, it replaces the string payloads with***MASKED***before the data hits the AI's context window. - Strict Mutation Gating:
The server includes Write tools (
insert,update,delete,create_index), but they are completely disabled by default via the.envREAD_ONLY=trueflag. - Threaded Concurrency:
Reverse-engineering a 5,000-collection ERP database sequentially takes minutes. We use
ThreadPoolExecutor(10 workers) to map massive architectures in under ~1.5 seconds.
🛠️ The 14-Tool Arsenal
The MCP exposes exactly 14 functions to the LLM. The AI is strictly forbidden from bypassing these tools.
🔍 Core Data Operations
| Tool Name | Description |
|---|---|
execute_aggregation_pipeline |
The flagship query engine. Executes complex MQL natively. Automatically capped at 1000 rows. |
execute_find |
Dedicated high-speed query tool supporting projection, sort, and skip. |
execute_create_index |
Allows the AI to act as a DBA and create compound indexes to fix slow queries natively. |
execute_insert, update, delete |
Document mutation tools. (Blocked unless READ_ONLY=false). |
execute_drop_collection, database |
High-risk structural deletion tools. (Blocked unless READ_ONLY=false). |
🧠 Autonomous Intelligence
| Tool Name | Description |
|---|---|
chat_with_database |
The RAG engine. The AI asks a natural language question, and the MCP answers it using its cached Knowledge Graph. |
explain_collection |
Generates a deep-dive dossier on a single collection's lifecycle, dependencies, and Enums. |
suggest_test_cases |
Generates strict QA scenarios based on discovered schema anomalies. |
full_intelligence_pipeline |
The ultimate 360-audit. Runs all discovery and analysis algorithms simultaneously. |
📊 Multi-Modal Output Generators
| Tool Name | Description |
|---|---|
generate_dashboard |
Bypasses plain text and generates a premium Vanilla CSS/HTML interactive dashboard. |
generate_executive_report |
Compiles database findings into a formatted PDF via ReportLab. |
export_demo_package |
Generates a full markdown documentation bundle (architecture, onboarding, DBA recommendations). |
🚀 Installation & Setup
1. Prerequisites
- Python 3.11+
- A running MongoDB instance (Local or Atlas)
2. Clone & Install
git clone https://github.com/YOUR_USERNAME/mongodb-intelligence-mcp.git
cd mongodb-intelligence-mcp
pip install -r requirements.txt
3. Configuration
Copy the .env.example file to a new .env file:
cp .env.example .env
Open .env and set your variables:
MONGODB_URI="mongodb://localhost:27017"
# Set to false ONLY if you want the AI to mutate data
READ_ONLY=true
# Use glob patterns to restrict analysis to specific databases (e.g., tenant_*)
DATABASE_FILTER=
4. Run the Server
python server.py
Note: The server runs on stdio by default, awaiting an MCP Client connection.
🤖 The .cursorrules (Critical for LLM Usage)
If you are using Cursor, Claude Desktop, or Antigravity, you must use the provided .cursorrules file.
This file acts as a strict "Constitution" for the AI. It physically forbids the AI from writing slow, dangerous PyMongo scripts, and forces it to use the MCP tools.
The Auto-Dashboard Rule:
The rules file contains an AUTOMATIC REPORT GENERATION RULE. If you ask the AI:
"Find the top 5 highest-paid employees."
The AI will not just print the JSON. Because of the rules, it will autonomously query the MCP, write a background Python script, and generate a beautiful HTML dashboard without you ever explicitly asking for one.
🏗️ Project Structure
d:\MCP\
├── server.py # FastMCP entry point (Registers all 14 tools)
├── .cursorrules # The AI "Constitution"
├── src/
│ ├── data_access/ # Pluggable Layer (PyMongo adapters)
│ ├── intelligence/ # The Brain (Discovery, Graph Theory, Caching, PII)
│ │ └── analyzers/ # Pluggable Security & Compliance scanners
│ └── presentation/ # HTML/PDF/Markdown generators
└── outputs/ # Timestamped artifacts (gitignored)
🤝 Extensibility
Want to add a new security compliance check (e.g., HIPAA scanning)? You don't need to rewrite the server. Simply drop a new Python script into src/intelligence/analyzers/ by subclassing BaseAnalyzer. The MCP will automatically pick it up, run it during discovery, and inject the findings into the LLM context.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.