findocs-mcp
An eval-first MCP server for semantic search and grounded Q\&A over financial documentation, with a CI regression gate that fails on retrieval or faithfulness regressions.
README
FinDocs MCP
An eval-first, reliability-first MCP server for semantic search and grounded Q&A over a financial-docs corpus β Postgres + pgvector for retrieval, a first-class eval-loop that fails CI on regression.
FinDocs MCP gives an AI agent three tools over MCP: search a corpus of broker API documentation (Zerodha Kite Connect + Finvasia Shoonya), ask grounded questions that come back with citations, and ingest new documents. The interesting part isn't the RAG β it's the evaluation harness: every change is scored on retrieval recall, ranking quality, answer faithfulness, and refusal correctness, and a regression below baseline turns the build red.
This is the "tick-data validation, zero production mis-fires" discipline from quant trading infrastructure, applied to AI tooling: a confident wrong answer is worse than an honest "not found."
π Learning the codebase? The source is written as a reverse-learning layer: read it top-down from
src/mcp/server.ts(where an agent calls in) and follow theβΌ LEARNcomment blocks down through retrieval, embeddings, cosine/pgvector, chunking, the refusal gate, and the eval-loop β to the linear algebra at the bottom. Each concept is taught inline, right where it's implemented.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
MCP client β MCP server (stdio) β
(Claude Code/ β search_docs Β· answer_question Β· ingest_docβ
Desktop) ββββββΆβ β
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬ββββββ
β β β
ββββββββΌββββββ βββββββΌβββββββ βββββββΌβββββββ
β Embedder β β Retrieval β β Ingest β
β (local β β + QA gate β β chunkβembedβ
β MiniLM) β β + citationsβ β βupsert β
ββββββββ¬ββββββ βββββββ¬βββββββ βββββββ¬βββββββ
βββββββββββββββββΌβββββββββββββββββ
ββββββββΌββββββββ
β Postgres + β
β pgvector β HNSW cosine
ββββββββββββββββ
evals/ βββΆ runner βββΆ metrics (recall@k Β· MRR Β· faithfulness Β· refusal)
β
βΌ
baseline.json gate βββΆ CI pass/fail
Everything is provider-agnostic behind thin adapters:
| Concern | Default (zero cost, no secrets) | Swap-in |
|---|---|---|
| Embeddings | @xenova/transformers MiniLM-L6-v2 (384-dim) |
OpenAI / Voyage |
| LLM | deterministic heuristic (extractive + overlap judge) | local Ollama, or Anthropic / OpenAI |
| Store | Postgres + pgvector (HNSW, cosine) | β |
The defaults run with no API keys and no per-call cost, which is exactly what makes the eval gate reproducible in CI.
MCP tools
| Tool | Description |
|---|---|
search_docs(query, k?) |
Top-k chunks with cosine similarity scores + source metadata. |
answer_question(question) |
Retrieves, applies a confidence gate, synthesizes a grounded answer with citations, or refuses with "not found" when retrieval confidence is low. |
ingest_doc({ url | text, source?, title? }) |
Chunk β embed β upsert. Idempotent on content. |
The reliability core β the refusal gate
answer_question never synthesizes when retrieval confidence is below the
configured floor. It refuses instead. The eval set includes out-of-corpus
negative cases specifically to prove this behavior holds (see
src/qa/gate.ts). With the default thresholds there is a clean
margin between in-corpus questions (top cosine β₯ 0.35) and out-of-corpus questions
(top cosine β€ 0.31).
The eval-loop (the centerpiece)
A labeled dataset of ~50 cases (evals/dataset.jsonl) β
question β expected supporting document(s), including negative/out-of-corpus cases.
Metrics (evals/harness/metrics.ts):
| Metric | Question it answers |
|---|---|
| recall@k | Did the right document make it into the top-k? |
| MRR | How highly was the right document ranked? |
| faithfulness | Is the answer actually supported by the retrieved chunks? (LLM-as-judge; deterministic fallback) |
| refusal accuracy | Does it answer in-corpus questions and refuse out-of-corpus ones? |
Runner β pnpm eval prints a scorecard, writes
evals/results/{timestamp}.json, and appends a row to evals/history.ndjson so
you can track the score-over-time curve.
Regression gate β pnpm eval:gate compares the scorecard against
evals/baseline.json and exits non-zero if any metric drops
below threshold (minus a small epsilon). CI runs this on every PR.
Current baseline (calibrated against the real corpus):
recall@5 0.92 Β· MRR 0.80 Β· faithfulness 0.80 Β· refusal accuracy 0.90
Offline smoke test:
pnpm calibrateruns the entire scoring pipeline with the real embedder against an in-memory index β no database required β useful for tuning thresholds and sanity-checking retrieval quality locally.
Quickstart
Prerequisites: Node 20+, pnpm (corepack enable pnpm), and
Docker (for the pgvector container).
pnpm install
cp .env.example .env # defaults match docker-compose
pnpm db:up # start Postgres + pgvector (host port 5433)
pnpm db:wait # wait until it accepts connections
pnpm migrate # apply schema + HNSW index
pnpm ingest # chunk β embed β upsert the corpus
pnpm eval # print the scorecard
pnpm eval:gate # run the regression gate (CI uses this)
pnpm dev # run the MCP server over stdio
The first
pnpm ingest/pnpm evaldownloads the MiniLM model (~90 MB) and caches it under.models/.
Using it from Claude Desktop / Claude Code
Build first (pnpm build), then point your MCP client at dist/mcp/server.js.
Claude Desktop β add to claude_desktop_config.json:
{
"mcpServers": {
"findocs": {
"command": "node",
"args": ["/absolute/path/to/findocs-mcp/dist/mcp/server.js"],
"env": {
"DATABASE_URL": "postgres://findocs:findocs@localhost:5433/findocs"
}
}
}
}
Claude Code β register the server from the repo root:
claude mcp add findocs \
--env DATABASE_URL=postgres://findocs:findocs@localhost:5433/findocs \
-- node ./dist/mcp/server.js
Then ask things like "Search the docs for how GTT OCO orders work" or "How is the Kite Connect access token checksum computed?" β and try an out-of-corpus question to watch it refuse.
2-minute demo
Demo recording goes here β replace with an asciinema cast or GIF:
# record: asciinema rec demo.cast -c "pnpm eval && pnpm dev"
Project layout
src/
config.ts zod-validated env
db/ postgres.js client + repo (upsert / vectorSearch / getChunk)
embeddings/ Embedder interface + local transformers.js impl + factory
llm/ LLMProvider {synthesize, judge}: heuristic + ollama
ingest/ chunk Β· load Β· pipeline
retrieval/search.ts search_docs core
qa/ confidence gate + grounded answer with citations
mcp/server.ts MCP stdio server (3 tools, zod schemas)
evals/
dataset.jsonl labeled cases (incl. negatives)
harness/ metrics Β· runner Β· scorecard Β· gate (first-class module)
baseline.json regression thresholds
corpus/ vendored broker API docs (deterministic eval base)
db/ schema.sql Β· migrate Β· wait
scripts/calibrate.ts offline eval (no DB) for threshold tuning
Notes & scope
- Corpus is a curated, vendored subset of public broker API documentation for demo and reproducibility; it may lag the official docs. Treat it as a fixture, not a source of truth for live trading.
- TypeScript strict throughout (
exactOptionalPropertyTypes,noUncheckedIndexedAccess, β¦), ESM, noanyin core paths. Tests in vitest. - Out of scope for v1: rerankers, hybrid BM25+vector, auth, web UI β the adapters are structured so these slot in without a rewrite.
License
MIT β see LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
