MCP Servers

faq-rag

Enables answering natural-language questions from FAQ documents using vector search and LLM generation via an MCP tool.

README

FAQ RAG + MCP Tool

A RAG prototype that answers natural-language questions from FAQ documents using vector search and LLM generation, exposed as an MCP tool.

What We Built

A three-stage RAG pipeline:

Ingest — FAQ markdown files are chunked (~200 chars), embedded with VoyageAI, and stored in MongoDB
Retrieve — User questions are embedded and matched against stored chunks using MongoDB $vectorSearch (cosine similarity)
Generate — Top matching chunks are passed as context to an LLM, which generates a cited answer

The whole thing is wrapped as an MCP tool (ask_faq) so any MCP-compatible client can call it directly.

Architecture

Component	Choice	Why
Embeddings	VoyageAI `voyage-3-lite` (512d)	Purpose-built for retrieval; outperforms OpenAI ada-002 on search benchmarks
Vector Store	MongoDB `$vectorSearch`	Persistent, scalable, production-realistic — vs in-memory numpy which loses data on restart
LLM	OpenAI `gpt-4o-mini`	Cost-efficient, fast, plenty capable for FAQ Q&A
MCP	stdio transport	Standard for local MCP tools

How It Works

Question → VoyageAI embed → MongoDB $vectorSearch → Top-K chunks → OpenAI generate → Cited answer

Chunking: Fixed ~200 character splits. Simple and predictable for a small corpus.
Retrieval: Cosine similarity via MongoDB vector search index (HNSW). Returns top 4 chunks by default.
Generation: System prompt enforces grounded answers — no hallucination, must cite source filenames, infers intent (e.g. "locked out" → password reset).
Lazy client init: API clients connect on first query, not at server startup — so the MCP server registers tools cleanly before any API calls.

How to Run

1. Configure environment

cp .env.example .env

Set VOYAGE_API_KEY, OPENAI_API_KEY, and MONGODB_URI. That’s all that’s required.

2. Ingest the FAQ corpus

uv run ingest.py

3. Test via CLI

uv run rag_core.py

4. Run as MCP tool in Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "faq-rag": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "python", "${workspaceFolder}/mcp_server.py"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

Example Questions

These show the system understands intent, not just keywords:

Question	What it tests
"How do I reset my password?"	Direct keyword match (faq_auth.md)
"I'm locked out of my account"	Semantic inference — no "password" or "reset" in query
"Can I take 3 weeks off in a row?"	Retrieves the 2-week approval rule from PTO policy
"When do my shares kick in?"	Maps "shares" → equity vesting schedule
"I want to use one login for everything"	Maps to SSO without mentioning it
"What do new employees need to know?"	Cross-document retrieval from multiple FAQ files

Deviations from Starter Skeleton

The starter used OpenAI embeddings + in-memory numpy for cosine similarity. We replaced both:

VoyageAI instead of OpenAI embeddings — Voyage models are purpose-built for retrieval and rank higher on search benchmarks (MTEB). Using a separate embedding provider also decouples retrieval quality from the LLM choice.
MongoDB instead of numpy — A real vector database with persistence, indexing (HNSW), and $vectorSearch aggregation. Data survives restarts, and the same approach scales to millions of documents without code changes.
Lazy client initialization — API clients connect on first tool call, not at import. This lets the MCP server start and register tools cleanly.
Kept everything else simple — no LangChain, no caching layers, no retry logic. Clean Python with direct API calls.

Files

ingest.py        # Build the index: read faqs/ → chunk → embed → store in MongoDB → ensure vector index
rag_core.py      # Query path only: embed question → vector search → generate answer (no ingestion)
mcp_server.py    # MCP server (exposes ask_faq, calls rag_core)
faqs/            # FAQ markdown corpus

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured