RAG MCP Gateway
A smart proxy that aggregates multiple downstream MCP servers and enables natural language search for tools, allowing clients like Claude Desktop to discover and use tools through semantic queries.
README
RAG MCP Gateway
A smart proxy server for the Model Context Protocol (MCP) that aggregates multiple downstream MCP servers and provides Natural Language Search capabilities over their tools.
The gateway acts as a single entry point for an MCP client (like Claude Desktop or an Agent), allowing it to discover and use tools from a wide array of connected servers using semantic queries instead of exact naming matching.
Architecture
The system is built on a modular "Gateway" architecture designed for high discoverability and robust connection management.
graph TD
Client[MCP Client] <-->|Stdio| Gateway[RAG MCP Gateway]
subgraph "Internal Components"
Gateway --> ConnectionManager
Gateway --> Indexer
Gateway --> Retriever
subgraph "Indexing Pipeline"
Indexer --> Discovery[Tool Discovery]
Discovery --> Enrichment[LLM Enrichment]
Enrichment --> Embedding[Vector Embedding]
Embedding --> Orama[(Orama DB)]
Enrichment --> Gemini[Google Gemini API]
end
subgraph "Retrieval Pipeline"
Retriever --> Search[Parallel Dense/Sparse Search]
Search --> RRF[RRF Fusion]
RRF --> Rerank[Cross-Encoder Reranking]
Rerank --> Model[Transformers.js]
Search --> Orama
end
end
subgraph "Downstream Servers"
ConnectionManager <-->|Stdio| ServerA[Local Process]
ConnectionManager <-->|SSE / HTTP| ServerB[Remote Server]
ConnectionManager <-->|Docker| ServerC[Containerized Tool]
end
Key Components
- Connection Manager: Handles persistent connections to multiple downstream MCP servers.
- Transports: Supports Stdio, SSE, and Streamable-HTTP.
- Docker Integration: Can manage lifecycle for Docker-based servers, including automatic container cleanup (
stopandrm) before startup to avoid name conflicts.
- Indexer: Synchronizes the local index with downstream servers.
- Tool Discovery: Polls
listToolsfrom all clients. - Enrichment: Uses Google Gemini to generate human-readable summaries and potential search questions for tools, significantly increasing search accuracy.
- Smart Sync: Only re-indexes tools that have changed their name, description, or schema.
- Tool Discovery: Polls
- Vector Store (Orama): A high-performance, in-memory JavaScript vector database that persists to JSON. It handles both vector (dense) and full-text (sparse) indexing.
- Retriever: Implements a sophisticated search pipeline:
- Hybrid Search: Simultaneously executes vector search and keyword search.
- RRF Fusion: Combines results using Reciprocal Rank Fusion to balance semantic and exact matches.
- Reranking: A second-stage Cross-Encoder (HuggingFace model via Transformers.js) reranks candidates based on the actual technical schema and logic, ensuring the most relevant tool is prioritized.
- LLM Service: Provides the generative bridge for metadata enrichment, ensuring that even minimally documented tools are discoverable via natural language queries.
Prerequisites
- Node.js: v18 or higher
- NPM: v9 or higher
- Gemini API Key (Optional but Recommended): For generating better tool descriptions and search queries. Get one here.
Installation
-
Clone the repository:
git clone <repository-url> cd rag-mcp -
Install dependencies:
npm install -
Build the project:
npm run build
Configuration
The gateway is configured using a config.json file in the root directory. You can copy the example file to start:
cp config.example.json config.json
config.json Structure
Define your downstream servers in the mcpServers object:
{
"mcpServers": {
"weather": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-weather"]
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./allowed-dir"]
},
"remote-server": {
"transport": "sse",
"url": "http://localhost:3000/sse"
}
}
}
Environment Variables
You can configure the gateway using the following environment variables. These can be set in your OS or passed via the env object in your Claude Desktop configuration.
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Required for Enrichment. API Key for Google Generative AI. | - |
RAG_MCP_BASE_DIR |
Base directory for all relative paths. | process.cwd() |
RAG_MCP_CONFIG_PATH |
Path to the downstream servers config file. | BASE_DIR/config.json |
RAG_MCP_DB_PATH |
Path to the Orama persistence folder. | BASE_DIR/data/orama_db |
RAG_MCP_LOG_PATH |
Path to the debug log file. | BASE_DIR/rag-mcp.log |
RAG_MCP_LOGGING_ENABLED |
Set to true to enable debug logging to the log file. |
false |
RAG_MCP_REBUILD_INDEX |
Set to true to force a full re-index on every startup. |
false |
RAG_MCP_SEARCH_THRESHOLD |
Minimum relevance score (0.0 to 1.0) for search results. | 0.85 |
RAG_MCP_EMBEDDING_MODEL |
Required if Dense enabled. Transformers.js model for generating vector embeddings. | - |
RAG_MCP_RERANKING_MODEL |
Required if Reranker enabled. Transformers.js model for second-stage reranking. | - |
RAG_MCP_GENERATIVE_MODEL |
Required if LLM enabled. Google Gemini model for tool enrichment. | - |
RAG_MCP_ENABLE_LLM |
Enable LLM enrichment (summaries and questions) during indexing. | false |
RAG_MCP_ENABLE_DENSE |
Enable semantic vector search (Dense retrieval). | true |
RAG_MCP_ENABLE_SPARSE |
Enable full-text keyword search (Sparse retrieval). | true |
RAG_MCP_ENABLE_RERANKER |
Enable the cross-encoder reranking stage. | true |
Usage
1. Running Locally (Development)
You can run the server directly using ts-node:
# Set your API key first (Windows PowerShell)
$env:GEMINI_API_KEY="your-key-here"
npm run dev
2. Connecting to Claude Desktop
To use this gateway with Claude Desktop, edit your config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
Mac/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
Add the gateway to the mcpServers list:
{
"mcpServers": {
"rag-gateway": {
"command": "node",
"args": ["C:/path/to/rag-mcp/dist/src/server.js"],
"env": {
"GEMINI_API_KEY": "your-key-here",
"RAG_MCP_LOGGING_ENABLED": "true"
}
}
}
}
Note: Always use absolute paths for the command and arguments when configuring Claude Desktop.
How it Works
Once connected, the Gateway exposes two primary tools to the client:
search_tool(query: string, limit?: number)
This is the discovery mechanism. The Agent should call this first when it doesn't know which tool to use.
- Input:
query: "I need to check the weather in London", limit: 3 - Process: The gateway embeds this query, searches the vector database, reranks results, and returns up to
limitmatching tool schemas (default is 10).
execute_tool(tool_name: string, arguments: object)
This is the execution mechanism.
- Input:
tool_name: "weather_get_current", arguments: { city: "London" } - Process: The gateway looks up which downstream server owns "weather_get_current" and proxies the request to it.
Testing & Development
This project includes a suite of verification scripts in the tests/ directory to validate different components without needing a full MCP client.
Running Tests
Use ts-node to run specific test scenarios:
-
Verify Gateway Logic: Simulates a client connecting to the gateway and running searches.
npx ts-node tests/verify_gateway.ts -
Verify Index Synchronization: Checks if tools are correctly added, updated, or removed from the vector index when downstream servers change.
npx ts-node tests/verify_index_sync.ts -
Verify Transports: Tests the connection managers handling of Stdio and SSE connections.
npx ts-node tests/verify_transports.ts
Debugging
Since the server communicates over Stdio, standard output (console.log) is reserved for the protocol.
- Logs: Check
rag-mcp.login the project root (must enableRAG_MCP_LOGGING_ENABLED=true). - Errors: Critical errors are also logged to the file.
Project Structure
rag-mcp/
├── src/
│ ├── server.ts # Gateway Entry Point (Stdio Server)
│ ├── indexer.ts # Tool Discovery & Enrichment Logic
│ ├── retriever.ts # Hybrid Search & Reranking Pipeline
│ ├── connection_manager.ts # Transport Management (Stdio/SSE/Docker)
│ ├── vector_store.ts # Orama DB Wrapper (Dense/Sparse)
│ ├── models.ts # Transformer.js Model Management
│ └── llm.ts # Gemini API Integration
├── data/ # Local Database & Persistence
├── tests/ # Verification Scripts
├── config.json # Downstream Servers Configuration
└── rag-mcp.log # Debug Logs (if enabled)
Security & Best Practices
- API Keys: Avoid hardcoding
GEMINI_API_KEY. Use an environment variable or a secure secret manager. - Environment Forwarding: When using the Stdio transport, the Gateway forwards
process.envplus any specificenvdefined inconfig.jsonto the child process. Be mindful of sensitive variables. - Local Persistence: Orama data is stored as a plain JSON file in the
./datadirectory. Ensure this directory is protected. - Network Access: Transformers.js will attempt to download models from HuggingFace on the first run. Ensure your environment allows this or pre-download the models.
Troubleshooting
"No tools found"
- Verify that downstream servers in
config.jsonare running and accessible. - Check
rag-mcp.logfor connection errors (ensureRAG_MCP_LOGGING_ENABLED=true). - Run
refresh_index()tool to force a scan.
"Vector search is inaccurate"
- Enable
RAG_MCP_ENABLE_LLM=trueand provide aGEMINI_API_KEY. Tools with poor descriptions need LLM enrichment to be discoverable via natural language. - Adjust
RAG_MCP_SEARCH_THRESHOLD. A lower value (e.g.,0.7) returns more candidates but may include irrelevant results.
"Docker errors"
- Ensure the Docker daemon is running.
- The Gateway attempts to
stopandrmcontainers with the sameserverIdon startup to avoid name conflicts. Ensure the system user has permissions to execute these commands.
"Model download failed"
- If deployment is in an air-gapped environment, you must pre-cache models in the
~/.cache/huggingface(or equivalent) directory.
License & Credits
Project License
The source code for RAG MCP Gateway is licensed under the ISC License
Third-Party Licenses & Terms
This project utilizes several high-quality models and libraries that are subject to their own licenses:
- Inference Engine: Transformers.js is licensed under the Apache License 2.0.
- Vector Database: Orama is licensed under the Apache License 2.0.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.