findocs-mcp

findocs-mcp

An eval-first MCP server for semantic search and grounded Q\&A over financial documentation, with a CI regression gate that fails on retrieval or faithfulness regressions.

Category
Visit Server

README

FinDocs MCP

An eval-first, reliability-first MCP server for semantic search and grounded Q&A over a financial-docs corpus β€” Postgres + pgvector for retrieval, a first-class eval-loop that fails CI on regression.

CI

FinDocs MCP gives an AI agent three tools over MCP: search a corpus of broker API documentation (Zerodha Kite Connect + Finvasia Shoonya), ask grounded questions that come back with citations, and ingest new documents. The interesting part isn't the RAG β€” it's the evaluation harness: every change is scored on retrieval recall, ranking quality, answer faithfulness, and refusal correctness, and a regression below baseline turns the build red.

This is the "tick-data validation, zero production mis-fires" discipline from quant trading infrastructure, applied to AI tooling: a confident wrong answer is worse than an honest "not found."

πŸ“š Learning the codebase? The source is written as a reverse-learning layer: read it top-down from src/mcp/server.ts (where an agent calls in) and follow the β–Ό LEARN comment blocks down through retrieval, embeddings, cosine/pgvector, chunking, the refusal gate, and the eval-loop β€” to the linear algebra at the bottom. Each concept is taught inline, right where it's implemented.


Architecture

                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   MCP client    β”‚                 MCP server (stdio)          β”‚
 (Claude Code/   β”‚   search_docs Β· answer_question Β· ingest_docβ”‚
  Desktop) ─────▢│                                             β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                         β”‚               β”‚               β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                  β”‚  Embedder  β”‚   β”‚ Retrieval  β”‚   β”‚   Ingest   β”‚
                  │ (local     │   │ + QA gate  │   │ chunk→embed│
                  β”‚  MiniLM)   β”‚   β”‚ + citationsβ”‚   β”‚  β†’upsert   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
                                  β”‚  Postgres +  β”‚
                                  β”‚   pgvector   β”‚  HNSW cosine
                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

   evals/  ──▢  runner ──▢ metrics (recall@k Β· MRR Β· faithfulness Β· refusal)
                                  β”‚
                                  β–Ό
                          baseline.json gate ──▢ CI pass/fail

Everything is provider-agnostic behind thin adapters:

Concern Default (zero cost, no secrets) Swap-in
Embeddings @xenova/transformers MiniLM-L6-v2 (384-dim) OpenAI / Voyage
LLM deterministic heuristic (extractive + overlap judge) local Ollama, or Anthropic / OpenAI
Store Postgres + pgvector (HNSW, cosine) β€”

The defaults run with no API keys and no per-call cost, which is exactly what makes the eval gate reproducible in CI.


MCP tools

Tool Description
search_docs(query, k?) Top-k chunks with cosine similarity scores + source metadata.
answer_question(question) Retrieves, applies a confidence gate, synthesizes a grounded answer with citations, or refuses with "not found" when retrieval confidence is low.
ingest_doc({ url | text, source?, title? }) Chunk β†’ embed β†’ upsert. Idempotent on content.

The reliability core β€” the refusal gate

answer_question never synthesizes when retrieval confidence is below the configured floor. It refuses instead. The eval set includes out-of-corpus negative cases specifically to prove this behavior holds (see src/qa/gate.ts). With the default thresholds there is a clean margin between in-corpus questions (top cosine β‰₯ 0.35) and out-of-corpus questions (top cosine ≀ 0.31).


The eval-loop (the centerpiece)

A labeled dataset of ~50 cases (evals/dataset.jsonl) β€” question β†’ expected supporting document(s), including negative/out-of-corpus cases.

Metrics (evals/harness/metrics.ts):

Metric Question it answers
recall@k Did the right document make it into the top-k?
MRR How highly was the right document ranked?
faithfulness Is the answer actually supported by the retrieved chunks? (LLM-as-judge; deterministic fallback)
refusal accuracy Does it answer in-corpus questions and refuse out-of-corpus ones?

Runner β€” pnpm eval prints a scorecard, writes evals/results/{timestamp}.json, and appends a row to evals/history.ndjson so you can track the score-over-time curve.

Regression gate β€” pnpm eval:gate compares the scorecard against evals/baseline.json and exits non-zero if any metric drops below threshold (minus a small epsilon). CI runs this on every PR.

Current baseline (calibrated against the real corpus):

recall@5  0.92   Β·   MRR  0.80   Β·   faithfulness  0.80   Β·   refusal accuracy  0.90

Offline smoke test: pnpm calibrate runs the entire scoring pipeline with the real embedder against an in-memory index β€” no database required β€” useful for tuning thresholds and sanity-checking retrieval quality locally.


Quickstart

Prerequisites: Node 20+, pnpm (corepack enable pnpm), and Docker (for the pgvector container).

pnpm install
cp .env.example .env          # defaults match docker-compose

pnpm db:up                    # start Postgres + pgvector (host port 5433)
pnpm db:wait                  # wait until it accepts connections
pnpm migrate                  # apply schema + HNSW index
pnpm ingest                   # chunk β†’ embed β†’ upsert the corpus

pnpm eval                     # print the scorecard
pnpm eval:gate                # run the regression gate (CI uses this)

pnpm dev                      # run the MCP server over stdio

The first pnpm ingest / pnpm eval downloads the MiniLM model (~90 MB) and caches it under .models/.


Using it from Claude Desktop / Claude Code

Build first (pnpm build), then point your MCP client at dist/mcp/server.js.

Claude Desktop β€” add to claude_desktop_config.json:

{
  "mcpServers": {
    "findocs": {
      "command": "node",
      "args": ["/absolute/path/to/findocs-mcp/dist/mcp/server.js"],
      "env": {
        "DATABASE_URL": "postgres://findocs:findocs@localhost:5433/findocs"
      }
    }
  }
}

Claude Code β€” register the server from the repo root:

claude mcp add findocs \
  --env DATABASE_URL=postgres://findocs:findocs@localhost:5433/findocs \
  -- node ./dist/mcp/server.js

Then ask things like "Search the docs for how GTT OCO orders work" or "How is the Kite Connect access token checksum computed?" β€” and try an out-of-corpus question to watch it refuse.


2-minute demo

Demo recording goes here β€” replace with an asciinema cast or GIF:

# record:
asciinema rec demo.cast -c "pnpm eval && pnpm dev"

demo


Project layout

src/
  config.ts              zod-validated env
  db/                    postgres.js client + repo (upsert / vectorSearch / getChunk)
  embeddings/            Embedder interface + local transformers.js impl + factory
  llm/                   LLMProvider {synthesize, judge}: heuristic + ollama
  ingest/                chunk Β· load Β· pipeline
  retrieval/search.ts    search_docs core
  qa/                    confidence gate + grounded answer with citations
  mcp/server.ts          MCP stdio server (3 tools, zod schemas)
evals/
  dataset.jsonl          labeled cases (incl. negatives)
  harness/               metrics Β· runner Β· scorecard Β· gate (first-class module)
  baseline.json          regression thresholds
corpus/                  vendored broker API docs (deterministic eval base)
db/                      schema.sql Β· migrate Β· wait
scripts/calibrate.ts     offline eval (no DB) for threshold tuning

Notes & scope

  • Corpus is a curated, vendored subset of public broker API documentation for demo and reproducibility; it may lag the official docs. Treat it as a fixture, not a source of truth for live trading.
  • TypeScript strict throughout (exactOptionalPropertyTypes, noUncheckedIndexedAccess, …), ESM, no any in core paths. Tests in vitest.
  • Out of scope for v1: rerankers, hybrid BM25+vector, auth, web UI β€” the adapters are structured so these slot in without a rewrite.

License

MIT β€” see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured