RAG MCP Gateway

RAG MCP Gateway

A smart proxy that aggregates multiple downstream MCP servers and enables natural language search for tools, allowing clients like Claude Desktop to discover and use tools through semantic queries.

Category
Visit Server

README

RAG MCP Gateway

A smart proxy server for the Model Context Protocol (MCP) that aggregates multiple downstream MCP servers and provides Natural Language Search capabilities over their tools.

The gateway acts as a single entry point for an MCP client (like Claude Desktop or an Agent), allowing it to discover and use tools from a wide array of connected servers using semantic queries instead of exact naming matching.

Architecture

The system is built on a modular "Gateway" architecture designed for high discoverability and robust connection management.

graph TD
    Client[MCP Client] <-->|Stdio| Gateway[RAG MCP Gateway]
    
    subgraph "Internal Components"
        Gateway --> ConnectionManager
        Gateway --> Indexer
        Gateway --> Retriever
        
        subgraph "Indexing Pipeline"
            Indexer --> Discovery[Tool Discovery]
            Discovery --> Enrichment[LLM Enrichment]
            Enrichment --> Embedding[Vector Embedding]
            Embedding --> Orama[(Orama DB)]
            Enrichment --> Gemini[Google Gemini API]
        end
        
        subgraph "Retrieval Pipeline"
            Retriever --> Search[Parallel Dense/Sparse Search]
            Search --> RRF[RRF Fusion]
            RRF --> Rerank[Cross-Encoder Reranking]
            Rerank --> Model[Transformers.js]
            Search --> Orama
        end
    end
    
    subgraph "Downstream Servers"
        ConnectionManager <-->|Stdio| ServerA[Local Process]
        ConnectionManager <-->|SSE / HTTP| ServerB[Remote Server]
        ConnectionManager <-->|Docker| ServerC[Containerized Tool]
    end

Key Components

  1. Connection Manager: Handles persistent connections to multiple downstream MCP servers.
    • Transports: Supports Stdio, SSE, and Streamable-HTTP.
    • Docker Integration: Can manage lifecycle for Docker-based servers, including automatic container cleanup (stop and rm) before startup to avoid name conflicts.
  2. Indexer: Synchronizes the local index with downstream servers.
    • Tool Discovery: Polls listTools from all clients.
    • Enrichment: Uses Google Gemini to generate human-readable summaries and potential search questions for tools, significantly increasing search accuracy.
    • Smart Sync: Only re-indexes tools that have changed their name, description, or schema.
  3. Vector Store (Orama): A high-performance, in-memory JavaScript vector database that persists to JSON. It handles both vector (dense) and full-text (sparse) indexing.
  4. Retriever: Implements a sophisticated search pipeline:
    • Hybrid Search: Simultaneously executes vector search and keyword search.
    • RRF Fusion: Combines results using Reciprocal Rank Fusion to balance semantic and exact matches.
    • Reranking: A second-stage Cross-Encoder (HuggingFace model via Transformers.js) reranks candidates based on the actual technical schema and logic, ensuring the most relevant tool is prioritized.
  5. LLM Service: Provides the generative bridge for metadata enrichment, ensuring that even minimally documented tools are discoverable via natural language queries.

Prerequisites

  • Node.js: v18 or higher
  • NPM: v9 or higher
  • Gemini API Key (Optional but Recommended): For generating better tool descriptions and search queries. Get one here.

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd rag-mcp
    
  2. Install dependencies:

    npm install
    
  3. Build the project:

    npm run build
    

Configuration

The gateway is configured using a config.json file in the root directory. You can copy the example file to start:

cp config.example.json config.json

config.json Structure

Define your downstream servers in the mcpServers object:

{
  "mcpServers": {
    "weather": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-weather"]
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "./allowed-dir"]
    },
    "remote-server": {
        "transport": "sse",
        "url": "http://localhost:3000/sse"
    }
  }
}

Environment Variables

You can configure the gateway using the following environment variables. These can be set in your OS or passed via the env object in your Claude Desktop configuration.

Variable Description Default
GEMINI_API_KEY Required for Enrichment. API Key for Google Generative AI. -
RAG_MCP_BASE_DIR Base directory for all relative paths. process.cwd()
RAG_MCP_CONFIG_PATH Path to the downstream servers config file. BASE_DIR/config.json
RAG_MCP_DB_PATH Path to the Orama persistence folder. BASE_DIR/data/orama_db
RAG_MCP_LOG_PATH Path to the debug log file. BASE_DIR/rag-mcp.log
RAG_MCP_LOGGING_ENABLED Set to true to enable debug logging to the log file. false
RAG_MCP_REBUILD_INDEX Set to true to force a full re-index on every startup. false
RAG_MCP_SEARCH_THRESHOLD Minimum relevance score (0.0 to 1.0) for search results. 0.85
RAG_MCP_EMBEDDING_MODEL Required if Dense enabled. Transformers.js model for generating vector embeddings. -
RAG_MCP_RERANKING_MODEL Required if Reranker enabled. Transformers.js model for second-stage reranking. -
RAG_MCP_GENERATIVE_MODEL Required if LLM enabled. Google Gemini model for tool enrichment. -
RAG_MCP_ENABLE_LLM Enable LLM enrichment (summaries and questions) during indexing. false
RAG_MCP_ENABLE_DENSE Enable semantic vector search (Dense retrieval). true
RAG_MCP_ENABLE_SPARSE Enable full-text keyword search (Sparse retrieval). true
RAG_MCP_ENABLE_RERANKER Enable the cross-encoder reranking stage. true

Usage

1. Running Locally (Development)

You can run the server directly using ts-node:

# Set your API key first (Windows PowerShell)
$env:GEMINI_API_KEY="your-key-here"
npm run dev

2. Connecting to Claude Desktop

To use this gateway with Claude Desktop, edit your config file:

Windows: %APPDATA%\Claude\claude_desktop_config.json
Mac/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json

Add the gateway to the mcpServers list:

{
  "mcpServers": {
    "rag-gateway": {
      "command": "node",
      "args": ["C:/path/to/rag-mcp/dist/src/server.js"],
      "env": {
        "GEMINI_API_KEY": "your-key-here",
        "RAG_MCP_LOGGING_ENABLED": "true"
      }
    }
  }
}

Note: Always use absolute paths for the command and arguments when configuring Claude Desktop.


How it Works

Once connected, the Gateway exposes two primary tools to the client:

search_tool(query: string, limit?: number)

This is the discovery mechanism. The Agent should call this first when it doesn't know which tool to use.

  • Input: query: "I need to check the weather in London", limit: 3
  • Process: The gateway embeds this query, searches the vector database, reranks results, and returns up to limit matching tool schemas (default is 10).

execute_tool(tool_name: string, arguments: object)

This is the execution mechanism.

  • Input: tool_name: "weather_get_current", arguments: { city: "London" }
  • Process: The gateway looks up which downstream server owns "weather_get_current" and proxies the request to it.

Testing & Development

This project includes a suite of verification scripts in the tests/ directory to validate different components without needing a full MCP client.

Running Tests

Use ts-node to run specific test scenarios:

  1. Verify Gateway Logic: Simulates a client connecting to the gateway and running searches.

    npx ts-node tests/verify_gateway.ts
    
  2. Verify Index Synchronization: Checks if tools are correctly added, updated, or removed from the vector index when downstream servers change.

    npx ts-node tests/verify_index_sync.ts
    
  3. Verify Transports: Tests the connection managers handling of Stdio and SSE connections.

    npx ts-node tests/verify_transports.ts
    

Debugging

Since the server communicates over Stdio, standard output (console.log) is reserved for the protocol.

  • Logs: Check rag-mcp.log in the project root (must enable RAG_MCP_LOGGING_ENABLED=true).
  • Errors: Critical errors are also logged to the file.

Project Structure

rag-mcp/
├── src/
│   ├── server.ts             # Gateway Entry Point (Stdio Server)
│   ├── indexer.ts            # Tool Discovery & Enrichment Logic
│   ├── retriever.ts          # Hybrid Search & Reranking Pipeline
│   ├── connection_manager.ts # Transport Management (Stdio/SSE/Docker)
│   ├── vector_store.ts       # Orama DB Wrapper (Dense/Sparse)
│   ├── models.ts             # Transformer.js Model Management
│   └── llm.ts                # Gemini API Integration
├── data/                     # Local Database & Persistence
├── tests/                    # Verification Scripts
├── config.json               # Downstream Servers Configuration
└── rag-mcp.log               # Debug Logs (if enabled)

Security & Best Practices

  • API Keys: Avoid hardcoding GEMINI_API_KEY. Use an environment variable or a secure secret manager.
  • Environment Forwarding: When using the Stdio transport, the Gateway forwards process.env plus any specific env defined in config.json to the child process. Be mindful of sensitive variables.
  • Local Persistence: Orama data is stored as a plain JSON file in the ./data directory. Ensure this directory is protected.
  • Network Access: Transformers.js will attempt to download models from HuggingFace on the first run. Ensure your environment allows this or pre-download the models.

Troubleshooting

"No tools found"

  • Verify that downstream servers in config.json are running and accessible.
  • Check rag-mcp.log for connection errors (ensure RAG_MCP_LOGGING_ENABLED=true).
  • Run refresh_index() tool to force a scan.

"Vector search is inaccurate"

  • Enable RAG_MCP_ENABLE_LLM=true and provide a GEMINI_API_KEY. Tools with poor descriptions need LLM enrichment to be discoverable via natural language.
  • Adjust RAG_MCP_SEARCH_THRESHOLD. A lower value (e.g., 0.7) returns more candidates but may include irrelevant results.

"Docker errors"

  • Ensure the Docker daemon is running.
  • The Gateway attempts to stop and rm containers with the same serverId on startup to avoid name conflicts. Ensure the system user has permissions to execute these commands.

"Model download failed"

  • If deployment is in an air-gapped environment, you must pre-cache models in the ~/.cache/huggingface (or equivalent) directory.

License & Credits

Project License

The source code for RAG MCP Gateway is licensed under the ISC License

Third-Party Licenses & Terms

This project utilizes several high-quality models and libraries that are subject to their own licenses:


Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured