faq-rag
Enables answering natural-language questions from FAQ documents using vector search and LLM generation via an MCP tool.
README
FAQ RAG + MCP Tool
A RAG prototype that answers natural-language questions from FAQ documents using vector search and LLM generation, exposed as an MCP tool.
What We Built
A three-stage RAG pipeline:
- Ingest — FAQ markdown files are chunked (~200 chars), embedded with VoyageAI, and stored in MongoDB
- Retrieve — User questions are embedded and matched against stored chunks using MongoDB
$vectorSearch(cosine similarity) - Generate — Top matching chunks are passed as context to an LLM, which generates a cited answer
The whole thing is wrapped as an MCP tool (ask_faq) so any MCP-compatible client can call it directly.
Architecture
| Component | Choice | Why |
|---|---|---|
| Embeddings | VoyageAI voyage-3-lite (512d) |
Purpose-built for retrieval; outperforms OpenAI ada-002 on search benchmarks |
| Vector Store | MongoDB $vectorSearch |
Persistent, scalable, production-realistic — vs in-memory numpy which loses data on restart |
| LLM | OpenAI gpt-4o-mini |
Cost-efficient, fast, plenty capable for FAQ Q&A |
| MCP | stdio transport | Standard for local MCP tools |
How It Works
Question → VoyageAI embed → MongoDB $vectorSearch → Top-K chunks → OpenAI generate → Cited answer
- Chunking: Fixed ~200 character splits. Simple and predictable for a small corpus.
- Retrieval: Cosine similarity via MongoDB vector search index (HNSW). Returns top 4 chunks by default.
- Generation: System prompt enforces grounded answers — no hallucination, must cite source filenames, infers intent (e.g. "locked out" → password reset).
- Lazy client init: API clients connect on first query, not at server startup — so the MCP server registers tools cleanly before any API calls.
How to Run
1. Configure environment
cp .env.example .env
Set VOYAGE_API_KEY, OPENAI_API_KEY, and MONGODB_URI. That’s all that’s required.
2. Ingest the FAQ corpus
uv run ingest.py
3. Test via CLI
uv run rag_core.py
4. Run as MCP tool in Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"faq-rag": {
"type": "stdio",
"command": "uv",
"args": ["run", "python", "${workspaceFolder}/mcp_server.py"],
"envFile": "${workspaceFolder}/.env"
}
}
}
Example Questions
These show the system understands intent, not just keywords:
| Question | What it tests |
|---|---|
| "How do I reset my password?" | Direct keyword match (faq_auth.md) |
| "I'm locked out of my account" | Semantic inference — no "password" or "reset" in query |
| "Can I take 3 weeks off in a row?" | Retrieves the 2-week approval rule from PTO policy |
| "When do my shares kick in?" | Maps "shares" → equity vesting schedule |
| "I want to use one login for everything" | Maps to SSO without mentioning it |
| "What do new employees need to know?" | Cross-document retrieval from multiple FAQ files |
Deviations from Starter Skeleton
The starter used OpenAI embeddings + in-memory numpy for cosine similarity. We replaced both:
- VoyageAI instead of OpenAI embeddings — Voyage models are purpose-built for retrieval and rank higher on search benchmarks (MTEB). Using a separate embedding provider also decouples retrieval quality from the LLM choice.
- MongoDB instead of numpy — A real vector database with persistence, indexing (HNSW), and
$vectorSearchaggregation. Data survives restarts, and the same approach scales to millions of documents without code changes. - Lazy client initialization — API clients connect on first tool call, not at import. This lets the MCP server start and register tools cleanly.
- Kept everything else simple — no LangChain, no caching layers, no retry logic. Clean Python with direct API calls.
Files
ingest.py # Build the index: read faqs/ → chunk → embed → store in MongoDB → ensure vector index
rag_core.py # Query path only: embed question → vector search → generate answer (no ingestion)
mcp_server.py # MCP server (exposes ask_faq, calls rag_core)
faqs/ # FAQ markdown corpus
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.