OmniDocs RAG
Enables AI agents to index and search local files, websites, GitHub repos, and packages using hybrid retrieval with reranking, all through IDE chat.
README
<div align="center"> <h1>π§ OmniDocs RAG</h1>
<img src="image.jpg" alt="Markdown RAG MCP Server" width="100%"> <br/><br/>
<strong>Universal High-Performance RAG Knowledge Base for AI Agents</strong> <br/> <em>Index local files, websites, GitHub repos, npm/PyPI packages β then search them with hybrid AI-powered retrieval. All through your IDE chat.</em> <br/><br/>
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License"></a> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a> <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-Server-brightgreen.svg" alt="MCP Server"></a> <a href="https://trychroma.com/"><img src="https://img.shields.io/badge/Vector_DB-Chroma-orange.svg" alt="Chroma DB"></a> <a href="https://huggingface.co/BAAI/bge-m3"><img src="https://img.shields.io/badge/π€_Model-BGE--M3-yellow.svg" alt="HuggingFace"></a> <br/><br/>
<a href="#-features">Features</a> β’ <a href="#οΈ-architecture">Architecture</a> β’ <a href="#-quickstart">Quickstart</a> β’ <a href="#-mcp-tools">Tools</a> β’ <a href="#-faq">FAQ</a>
</div>
β¨ Features
π Search Pipeline
- Hybrid Search β ChromaDB vector + BM25 keyword scoring fused via Reciprocal Rank Fusion (RRF, k=60)
- Cross-Encoder Reranking β
BAAI/bge-reranker-v2-m3rescores top candidates for surgical precision - Query Expansion β RUβEN synonym expansion for mixed-language documentation (e.g. "ΡΠΎΠ·Π΄Π°ΡΡ" β "create post make new")
- Result Deduplication β removes near-duplicate chunks (>80% similarity threshold)
π Universal Source Ingestion
- 40+ file formats β
.md,.py,.js,.ts,.json,.yaml,.html,.csv, and more - Binary documents β PDF, DOCX, XLSX, PPTX, Jupyter Notebooks (optional packages)
- Websites β full async BFS crawler with boundary control, robots.txt, sitemap.xml
- GitHub repositories β direct API tree walking
- npm / PyPI packages β metadata + README extraction
- ZIP archives β automatic extraction and indexing
- JS-rendered sites β optional Playwright support for Docusaurus, GitBook, VitePress
β‘ Performance
- GPU acceleration β auto-detects CUDA (RTX 3080 = ~11x speedup)
- Incremental indexing β MD5 file hashing, only re-indexes changed files
- Code-aware chunking β Python files split by class/function via AST, JS/TS via regex
- Heading-aware chunking β Markdown split at
##/###with 2-sentence overlap - BM25 persistence β survives server restarts via pickle cache
π οΈ Management
- Multi-collection β separate knowledge bases per project
- Auto-categorization β YAML frontmatter β H1 heading β filename fallback
- File Watcher β auto-reindex on filesystem changes (watchdog, 2s debounce)
- Admin tools β list, remove, delete, reindex β all through chat
- 100% Local & Free β no API keys, no Docker, no monthly fees
ποΈ Architecture
flowchart TD
Q(["π User Query"])
Q --> QE
subgraph EXPAND["π Stage 0 β Query Expansion"]
QE["RUβEN Synonyms<br>(up to 3 variations)"]
end
QE --> VS
QE --> BM
subgraph HYBRID["β‘ Stage 1 β Hybrid Retrieval"]
VS["π§ Vector Search<br>(ChromaDB + bge-m3)"]
BM["π Keyword Search<br>(BM25 Okapi)"]
end
VS --> RRF
BM --> RRF
subgraph FUSION["π Stage 2 β Fusion"]
RRF["Reciprocal Rank Fusion<br>(k=60)"]
end
RRF --> CE
subgraph RERANK["π― Stage 3 β Reranking"]
CE["Cross-Encoder<br>bge-reranker-v2-m3"]
end
CE --> DD
subgraph DEDUP["π§Ή Stage 4 β Dedup"]
DD["Remove >80%<br>similar results"]
end
DD --> OUT
subgraph OUTPUT["π Stage 5 β Result"]
OUT["Top-N Documents<br>with Breadcrumbs"]
end
style Q fill:#6366f1,color:#fff,stroke:#4338ca
style EXPAND fill:#0f172a,color:#e2e8f0,stroke:#334155
style HYBRID fill:#0f172a,color:#e2e8f0,stroke:#334155
style FUSION fill:#0f172a,color:#e2e8f0,stroke:#334155
style RERANK fill:#0f172a,color:#e2e8f0,stroke:#334155
style DEDUP fill:#0f172a,color:#e2e8f0,stroke:#334155
style OUTPUT fill:#0f172a,color:#e2e8f0,stroke:#334155
style QE fill:#581c87,color:#e9d5ff,stroke:#9333ea
style VS fill:#1e40af,color:#bfdbfe,stroke:#3b82f6
style BM fill:#065f46,color:#a7f3d0,stroke:#10b981
style RRF fill:#7c3aed,color:#ede9fe,stroke:#8b5cf6
style CE fill:#b45309,color:#fef3c7,stroke:#f59e0b
style DD fill:#0e7490,color:#cffafe,stroke:#06b6d4
style OUT fill:#1e3a5f,color:#bae6fd,stroke:#38bdf8
How It Works
- Query Expansion β generates up to 3 query variations with RUβEN programming synonyms
- Hybrid Retrieval β searches by semantic meaning (
bge-m3, 8192 tokens) and exact keywords (BM25) simultaneously - Reciprocal Rank Fusion β mathematically combines ranks from both engines
- Cross-Encoder Reranking β
bge-reranker-v2-m3deeply computes relevance for top candidates - Deduplication β removes near-identical results
- Structured Output β results returned with breadcrumbs (e.g.,
README.md > Quickstart > Installation)
π Quickstart
1. Prerequisites
- Python 3.10+ (Tested on 3.13)
git
2. Install
git clone https://github.com/ElvinBayramov/OmniDocs-RAG.git
cd OmniDocs-RAG
python install.py
The installer downloads AI models (~2.2GB for bge-m3 + bge-reranker), installs dependencies, and auto-configures your IDE.
β οΈ Enable GPU Acceleration (Crucial for Speed): By default,
pipmay install the CPU-only version of PyTorch on Windows. To unlock your NVIDIA GPU, run:pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 --upgrade --force-reinstall
3. Configure (if manual)
Add to your IDE's MCP config (mcp_config.json):
{
"mcpServers": {
"markdown-rag": {
"command": "python",
"args": ["C:\\path\\to\\OmniDocs-RAG\\server.py"],
"env": {
"RAG_DOCS_PATH": "C:\\path\\to\\your\\docs",
"RAG_DEVICE": "cuda"
}
}
}
}
4. Use
Just talk to your AI assistant:
> Index my project docs
> Search: how does authentication work?
> Index the FastAPI documentation from https://fastapi.tiangolo.com
> Add the langchain repo: github://langchain-ai/langchain/docs
The AI calls the MCP tools automatically β no UI, no buttons, just chat.
π οΈ MCP Tools
| Tool | Description |
|---|---|
index_documents(path, collection) |
Index local files (40+ formats, incremental) |
index_url(uri, collection, ...) |
Index websites, GitHub, npm, PyPI, ZIP |
search_docs(query, n, category, filename, collection) |
Hybrid search with reranking |
rag_status(collection) |
Full system status: models, GPU, BM25, chunks |
list_collections() |
List all knowledge base collections |
list_indexed_files(collection) |
List files in a collection |
remove_source(filename, collection) |
Remove a file from the index |
delete_collection(name, confirm) |
Delete an entire collection |
reindex_collection(path, collection) |
Force full rebuild |
index_url() β Universal Source Ingestion
# Websites (async BFS crawler)
index_url("https://docs.python.org/3/library/asyncio.html")
# GitHub repositories
index_url("github://tiangolo/fastapi/docs")
# npm packages
index_url("npm://axios@1.6")
# PyPI packages
index_url("pypi://fastapi")
# ZIP archives
index_url("file:///path/to/docs.zip")
βοΈ Configuration
All settings via environment variables:
| Variable | Default | Description |
|---|---|---|
RAG_DOCS_PATH |
parent directory | Folder to scan for files |
RAG_DB_PATH |
./chroma_db |
ChromaDB storage location |
RAG_DEVICE |
auto |
cuda / cpu / auto |
RAG_EMBED_MODEL |
BAAI/bge-m3 |
Embedding model |
RAG_RERANK_MODEL |
BAAI/bge-reranker-v2-m3 |
Cross-Encoder model |
RAG_WATCH_PATH |
β | Directory to watch for auto-reindex |
RAG_WATCH_COLLECTION |
docs_v4 |
Collection for file watcher |
GITHUB_TOKEN |
β | GitHub API token (higher rate limits) |
π Supported Formats
Text (no extra deps):
.md .txt .rst .log .html .htm
Code (wrapped in markdown):
.py .js .ts .jsx .tsx .css .java .go .rs .c .cpp .rb .php .swift .kt .lua .sh
Config:
.json .yaml .yml .toml .xml .csv .ini .cfg
Binary (optional packages):
| Format | Install |
|---|---|
pip install pypdf |
|
| Word (.docx) | pip install python-docx |
| Excel (.xlsx) | pip install openpyxl |
| PowerPoint (.pptx) | pip install python-pptx |
| Jupyter (.ipynb) | built-in |
π·οΈ Auto-Categorization
Every file gets a category automatically (no manual tagging needed):
| Priority | Source | Example |
|---|---|---|
| 1 | YAML frontmatter category: |
category: architecture β architecture |
| 2 | First # Heading in the file |
# API Reference β api reference |
| 3 | Filename stem | system_design.md β system design |
β FAQ
Q: Does this send my data anywhere? A: No. 100% local. Models download once from HuggingFace, then everything runs offline. No API keys, no cloud.
Q: Do I need a GPU?
A: No, but it helps. CPU works fine for search (~200ms). GPU (CUDA) accelerates indexing ~11x. Set RAG_DEVICE=cuda.
Q: How do I update the index?
A: The server uses incremental indexing β only changed files are re-indexed. Just call index_documents() again, or enable the file watcher with RAG_WATCH_PATH.
Q: Why is the first search slow? A: The Cross-Encoder (~1.1GB) loads lazily on first query. All subsequent searches are instant.
Q: Does it support my language?
A: Yes. bge-m3 supports 100+ languages including English, Russian, Chinese, Japanese, Spanish, and more. Cross-language search works too (query in Russian, find results in English).
Q: Can I have separate knowledge bases per project?
A: Yes. Use the collection parameter: index_documents(path, collection="my-project"), then search_docs(query, collection="my-project").
π License
Licensed under the Apache License 2.0. See LICENSE for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.