Super RAG MCP Server
Enables AI agents to perform offline cybersecurity research and penetration testing by querying a local knowledge base of curated security data, with tools for searching, answering, and status checking.
README
<div align="center">
๐ก๏ธ Super RAG
An offline, agentic AI for penetration testing โ your private cybersecurity brain.
Ask pentest questions, run reconnaissance, and exploit-chain over 61,000+ chunks of curated security knowledge โ entirely on your own machine. No cloud. No telemetry. No data leaves your box.
</div>
๐ก Why Super RAG?
Enterprise security copilots (Microsoft Security Copilot, CrowdStrike Charlotte) live in the cloud, cost a fortune, and are built for defenders. Super RAG flips that:
- ๐ Fully offline โ runs against a local LLM in LM Studio. Perfect for air-gapped labs, sensitive engagements, and regions with restricted cloud access.
- โ๏ธ Built for offensive reasoning โ not just "summarize this alert," but "I see Apache 2.4.49 โ what's my next move?" and it chains recon โ vuln-ID โ exploitation.
- ๐ Grounded in real knowledge โ indexes 1,110 hand-curated notes plus HackTricks, PayloadsAllTheThings, the OWASP cheat sheets, and hundreds of CTF write-ups. Every answer is cited back to its source file.
- ๐งฉ Plugs into your agents โ exposed as an MCP server, so Hermes, OpenClaw, Claude Desktop, or any MCP client gains a
cybersec_searchtool instantly.
โ ๏ธ For authorized use only. This is a tool for pentesters, CTF players, and security researchers operating with explicit written permission. See Responsible Use.
โจ Features
| ๐ 13-strategy hybrid retrieval | Dense vectors + full-text BM25, fused with RRF, then multi-hop, corrective, context-aware and re-ranking passes โ tuned per pentest phase. |
| ๐ค Agentic pentest loop | reason โ act โ observe โ reflect, with stuck-loop detection and a hybrid mode (auto-recon, manual approval before exploitation). |
| ๐ฏ Evasion awareness | Built-in honeypot, WAF (8 vendors), and firewall detection โ so the agent doesn't waste moves on a tarpit. |
| ๐ก๏ธ Hallucination guard | A command registry validates every tool invocation against known-good flags before anything runs. |
| ๐ Report generation | One command turns findings into a professional pentest report or a HackerOne-style bug-bounty submission, with CVSS and evidence. |
| ๐ MCP integration | One shared server, many agents โ cybersec_search, cybersec_answer, cybersec_status. |
| ๐ Fine-tune ready | Extracts real HTB/VulnHub write-up reasoning into JSONL (never fabricated scenarios). |
๐๏ธ Architecture
flowchart LR
subgraph Knowledge["๐ Knowledge (61,552 chunks)"]
V["1,110 curated notes"]
D["HackTricks ยท PayloadsAllTheThings ยท OWASP"]
W["CTF / HTB write-ups"]
end
subgraph Engine["๐ง Super RAG"]
I["ingest.py<br/>chunk + embed (parallel)"]
Q[("Qdrant<br/>hybrid index")]
R["rag_engine.py<br/>13 RAG strategies"]
end
subgraph Local["๐ป LM Studio (localhost:1234)"]
E["nomic-embed-text"]
L["gpt-oss-20b"]
end
A["agent.py<br/>pentest loop"]
M["rag_mcp.py<br/>MCP server :8765"]
V & D & W --> I --> Q
I -.embeddings.-> E
Q --> R --> L
R --> A
R --> M
M --> Hermes & OpenClaw & Claude["Claude Desktop"]
Three tiers, depth over scale: embedded Qdrant (no Docker) for hybrid search โ 13 combined RAG strategies for context assembly โ a local LLM for generation.
๐ฌ The 13 RAG strategies (combined on every query)
Most projects use #1 and wonder why retrieval is mediocre. Super RAG layers 13, each earning its place in a pentest workflow:
| # | Strategy | What it buys you |
|---|---|---|
| 4 | Hybrid (vector + BM25, RRF-fused) | Semantic recall and exact-string recall for CVE-2024-1086, --no-preauth, SeDebugPrivilege |
| 17 | Multi-Hop | Port 389 โ LDAP enum โ user list โ AS-REP roast โ hash โ crack, each hop informed by the last |
| 9 | Agentic | The agent decides when and what to retrieve mid-engagement |
| 6 | Memory-Augmented | Remembers what was tried hours ago, so it never re-runs a dead path |
| 3 | Corrective | Detects weak retrieval and re-queries with reformulated terms |
| 8 | Context-Aware | Filters to the target environment (Windows/AD vs web vs cloud) |
| 18 | Reasoning re-rank | Keyword-overlap boost so the most useful chunk floats up, not just the most similar |
| 13 | Adaptive | Broad scope during recon, narrow and precise during exploitation |
| 21 | Hierarchical | General โ specific drill-down, mirroring the ATT&CK structure |
| 5 | Speculative | Pre-fetches likely follow-ups in the background to cut latency |
| 11 | Self-RAG | Builds on its own prior answers across a session |
| 24 | Few-Shot | Pulls a real write-up where someone exploited the same service/version |
| 14 | Citation-Aware | Every fact carries its source path โ essential for bug-bounty reports |
๐ Model benchmark (16 local models, real pentest scenario)
We benchmarked every model in LM Studio on a 2-turn pentest tool-use scenario (recon โ exploit chain), measuring speed, accuracy, and valid tool-loops โ ejecting each model between runs for clean numbers.
| Rank | Model | tok/s | Tool-loops | Verdict |
|---|---|---|---|---|
| ๐ฅ | openai/gpt-oss-20b | 165 | 2/2 โ | Only model fast, accurate and emitting clean tool calls. 12 GB. |
| ๐ฅ | qwen3.6-35b reasoning-distilled | 139 | 2/2 โ | Correct chains, needs fence-stripping |
| โ | gemma-4-26b / glm-4.7-flash | 156 / 125 | 0/2 โ | Fast & accurate, but think without emitting usable tool calls |
Key finding: raw "accuracy" is misleading for agents โ several high-scoring models produced empty output because they reasoned internally without ever emitting an actionable tool call. The metric that matters is valid tool-loops, and
gpt-oss-20bwins it. Reproduce withpython model_benchmark.py.
๐ Quick start
Prerequisites
- LM Studio on
localhost:1234(local server enabled) withnomic-embed-text-v1.5(embeddings) + any chat model loaded โ bring your own local model - Python 3.10+ (tested on 3.14.5)
- A GPU is recommended (built on an RTX 5090; the embedder + a 12 GB chat model fit in 24 GB VRAM โ but smaller models work too)
๐ฆ Batteries included: 1,110 curated notes ship in
./vault, so it works the moment you clone.setup.pythen pulls in the public doc corpora (HackTricks, PayloadsAllTheThings, OWASP, CTF write-ups) for the full ~60k-chunk brain.
# 1. Clone
git clone https://github.com/test-center-ai/super-rag.git
cd super-rag
# 2. Install (no torch, no Docker, no HuggingFace needed)
pip install -r requirements.txt
# 3. Bootstrap โ clone the public doc repos + check LM Studio
python setup.py # or: python setup.py --minimal (bundled notes only)
# 4. Build the index (resumable; minutes with parallel embedding)
python main.py ingest
# 5. Ask anything
python main.py query "how do I exploit Apache 2.4.49 path traversal"
python main.py query "AS-REP roasting โ what tool and command?"
# 6. Run the agentic pentest loop (hybrid: auto-recon, manual exploit approval)
python main.py pentest 10.10.10.5 --scope 10.10.10.0/24
# 7. Generate a report, or check health
python main.py report
python main.py status
Use your own notes instead? Point it anywhere:
SUPERRAG_VAULT=/path/to/your/notes python main.py ingest. Any folder of Markdown works.Pick a model: set
CHAT_MODELinconfig.py. Our 16-model benchmark crownedopenai/gpt-oss-20bfor agentic tool-use;google/gemma-4-12b-qatis a great lighter pick for plain Q&A.
๐ค Use it from your AI agents (MCP)
Super RAG runs as one shared MCP server (rag_mcp.py, HTTP @ 127.0.0.1:8765/mcp) so multiple agents can query it concurrently:
python rag_mcp.py # or let Startup\SuperRAG-MCP.cmd auto-start it
Tools exposed: cybersec_search(query, phase) ยท cybersec_answer(question) ยท cybersec_status()
Register it in any MCP client:
// Claude Desktop / OpenClaw style
"mcp": { "servers": { "cybersec-rag": {
"url": "http://127.0.0.1:8765/mcp", "transport": "streamable-http"
}}}
# Hermes style (config.yaml)
mcp_servers:
cybersec-rag: { url: http://127.0.0.1:8765/mcp, enabled: true }
See INTEGRATION.md for the full Hermes + OpenClaw walkthrough.
๐ฌ It works โ real agent output
Scenario: black-box target, nmap reveals
Apache httpd 2.4.49.
STEP 1 recon โ nmap -sS -A 10.10.10.5 โ correct first move (2.9s)
STEP 2 enumerate โ gobuster on :80 (Apache live) โ methodical (3.3s)
STEP 3 exploit โ curl --path-as-is "โฆ/cgi-bin/.%2e/โฆ/bin/bash" -d 'reverse shell'
โ textbook CVE-2021-41773 mod_cgi RCE (4.9s)
sources: htb-cpts/initial-access-exploitation.md, oswe/file-inclusion-upload.md, HackTricks
The model followed correct methodology, grounded each step in the vault, and produced a working exploit chain โ fully offline.
๐ Project layout
super-rag/
โโโ main.py # CLI: ingest ยท query ยท pentest ยท report ยท extract ยท status
โโโ config.py # paths, model IDs, chunking, timeouts, tool registry
โโโ ingest.py # vault โ chunks โ parallel embed โ Qdrant (~14ร faster pipeline)
โโโ rag_engine.py # the 13 RAG strategies + RRF fusion
โโโ agent.py # ReAct+Reflect pentest loop, scope + approval gates
โโโ detector.py # honeypot / WAF / firewall detection
โโโ memory.py # attack-surface graph, findings, stuck-loop tracking
โโโ report.py # pentest report + bug-bounty submission generators
โโโ llm.py # one streaming chat helper (works for every model)
โโโ rag_mcp.py # MCP server for agent integration
โโโ model_benchmark.py # the 16-model benchmark harness
โโโ extract_training.py # real write-ups โ fine-tuning JSONL
โโโ tools/ # registry + nmap/gobuster/ffuf parsers
๐งฐ Tech stack
Python 3.14 ยท Qdrant (embedded) ยท LM Studio (OpenAI-compatible local API) ยท nomic-embed-text-v1.5 ยท gpt-oss-20b ยท MCP / FastMCP โ zero cloud dependencies.
๐ Responsible use
- Authorized targets only. Super RAG does not enforce authorization โ that is your legal responsibility. Use it on systems you own or have explicit written permission to test (engagements, CTFs, labs).
- Hybrid mode gates exploitation behind a manual
[y/N]approval. Don't bypass it. - Indexed external repos are reference-only and untrusted โ never execute code pulled from them.
- This project is for defensive learning, authorized testing, and CTF/education. Don't be a criminal.
๐ฃ๏ธ Roadmap
- [ ] Qdrant server mode (Docker) for fully-concurrent multi-agent access
- [ ] GRPO fine-tune of a 7โ14B specialist on extracted real write-ups
- [ ] Web UI (the CLI works today)
- [ ] Auto-scoping from engagement rules-of-engagement files
๐ค Contributing
Issues and PRs welcome โ new tool parsers, RAG strategies, and detector signatures especially. Keep it defensive, keep it cited.
๐ License
MIT โ see LICENSE.
<div align="center"> <sub>Built for the offline, the air-gapped, and the curious. โญ Star it if it helped.</sub> </div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.