LongBook Verifier
An MCP server that evaluates whether retrieval methods and AI outputs are grounded in long narrative manuscripts by retrieving evidence and scoring coverage deterministically, without external model APIs. It provides tools for chunking, indexing, retrieval, and evaluation.
README
LongBook Verifier
An MCP-enabled evaluation and claim-grounding toolkit for long-document RAG systems.
LongBook Verifier measures whether retrieval methods and AI outputs are actually grounded in long narrative manuscripts β by retrieving evidence from the document and scoring coverage, deterministically and without external model APIs.
Use it three ways
| What you get | For | |
|---|---|---|
| π Local MCP | A local stdio MCP server for Claude Code / Codex β long-document evaluation, retrieval, claim verification, and report tools. Local only (not hosted or remote). β docs/MCP.md | Using the verifier as tools inside your AI client |
| π Try BookProof online | The live hosted web product β upload a document + golden questions in the browser, no install. β tts.bedvibe.studio/bookproof/app | Trying it instantly |
| π» Run locally | Clone and run the FastAPI web app + evaluation engine on your own machine. β docs/LOCAL_RUN.md | Developers / researchers inspecting or running the verifier |
What it does
- Retrieval evaluation across five methods β
naive_first_context,naive_last_context,flat_chunk_rag,chapter_summary_chain,hierarchical_book_ragβ on book-length documents. - Claim / answer grounding: scores an AI output (or a set of claims/questions) against the source document using evidence-term coverage, answer-term coverage, and retrieval context precision/recallβlike metrics.
- Deterministic local embeddings (
hashing_numpy) β reproducible, no downloaded models and no Claude/OpenAI/Gemini calls. - Three ways to use the same engine: a CLI/eval pipeline, a local FastAPI web app, and a local stdio MCP server for AI coding clients.
Why it exists
Short-answer correctness and evidence grounding can diverge: a model can give a plausible answer that the document doesn't actually support. LongBook Verifier separates those signals so you can audit whether outputs and retrieval are grounded in long manuscripts β useful for manuscript QA and reproducible long-document evaluation.
Live product
A hosted, public version of this evaluation runs as BookProof:
β‘οΈ BookProof β try it online
BookProof is an existing, related public product. It is not required to run anything in this repository locally.
Architecture
One evaluation engine, three access surfaces, plus the hosted product:
- Research / evaluation engine (
src/) β chunking, deterministic index build, retrieval, the five methods, metrics, and claim verification. - Local FastAPI web app (
product_mvp/server_longbook_verifier.py) β upload a document + golden questions in the browser and get scored locally. - Local stdio MCP server (
product_mvp/mcp_longbook_server.py) β exposes the engine to MCP clients (e.g. Claude Code / Codex) over stdio, locally only. - BookProof public product/API β a deployed instance offering a rate-limited public demo and a separate token-gated verification API (see BookProof API).
See docs/ARCHITECTURE.md for a diagram.
Research methods
The benchmark reports evidence-term coverage, answer-term coverage, retrieval context recall, and
task-completion behavior separately β because short-answer correctness and evidence grounding can
diverge. Two experiments are documented in paper/:
- Experiment A β a pilot single-book benchmark (~64k words, 40 gold questions, 5 retrieval methods, 5 external consumer AI systems under a free-tier protocol).
- Experiment B β an extended stress test on a 240,767-word corpus (~320,220 tokens, 80 gold questions, 5 retrieval methods).
These are a pilot plus stress-test package, not a universal model ranking or state-of-the-art
claim. Full methods and results are in paper/; the research package is archived at
DOI 10.5281/zenodo.20513116.
Quick start
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Details: docs/LOCAL_RUN.md.
Run the local web app
python -m uvicorn product_mvp.server_longbook_verifier:app --host 127.0.0.1 --port 8078
Then open http://127.0.0.1:8078/ and upload a document + golden questions.
Use the local MCP server
python product_mvp\mcp_longbook_server.py
The MCP server runs locally over stdio; an MCP client launches it as a subprocess. It also
requires the mcp package: python -m pip install mcp. See docs/MCP.md.
MCP client configuration
Generic mcpServers entry (replace the path with the location of your cloned repo β see
mcp_config.example.json):
{
"mcpServers": {
"longbook-proof-local": {
"command": "python",
"args": ["EDIT_THIS_PATH/product_mvp/mcp_longbook_server.py"]
}
}
}
MCP tools
All tools are read-or-allowlisted, local-only:
| Tool | Description |
|---|---|
longbook_status |
Read-only project summary (allowed roots, scripts, report/run counts, default backend). |
list_books |
List .txt / .md / .docx files under the project book folder (or a sub-path inside the project). |
list_reports |
List report-like files (.md / .txt / .json / .jsonl / .csv). |
read_report |
Read a report-like file with truncation. |
run_chunking |
Chunk a book into a .jsonl (src/chunk_book.py). |
run_index_build |
Build a retrieval index (src/build_index.py, hashing_numpy). |
run_retrieve |
Return ranked chunks from an existing local index (src/retrieve.py). |
run_eval |
Run a retrieval-evaluation method over a book + questions (src/run_eval.py). |
generate_report_tables |
Build summary CSV tables from run folders (src/report_tables.py). |
Repository structure
src/ evaluation engine (chunking, index, retrieval, methods, metrics, claim checks)
product_mvp/ local FastAPI web app + local stdio MCP server + site/ frontend
paper/ research write-ups (methods, results, limitations) + CITATION.cff
docs/ LOCAL_RUN, MCP, ARCHITECTURE
scripts/ Windows helpers (run_web.bat, run_mcp.bat)
Data policy
Copyrighted corpora, source manuscripts, private evaluation data, and user uploads are intentionally excluded from this repository. The tools operate on documents you provide.
Security model
Confirmed in product_mvp/mcp_longbook_server.py: the MCP server runs local stdio only and
calls an allowlisted set of local scripts. It uses no arbitrary shell commands (no
shell=True), enforces strict read/write path checks (reads confined to the project root;
writes confined to outputs/, reports/, and product_mvp/runs/), rejects paths containing
.env / secret / key / token / password, runs child scripts with stdin=DEVNULL, applies a
timeout, and makes no cloud or external model calls. It does not provide shell execution or
remote access.
BookProof API
The hosted BookProof product exposes:
- a public, rate-limited demo endpoint (capped document size, capped questions, one run per IP per day, inputs deleted after processing), and
- a separate token-gated verification API (authenticated via an
X-BookProof-Tokenheader) with a machine-readable spec endpoint.
No token is included in this repository.
Limitations
- Evaluation is lexical/retrieval-based and deterministic (
hashing_numpy); it is not a semantic-embedding or model-graded benchmark. - The published results are a pilot + stress test, not a universal ranking or SOTA claim.
- The MCP server expects the local project files and runs entirely on your machine.
License
MIT Β© 2026 Panos Gkilis. Contact via GitHub Security Advisories for security reports (see SECURITY.md).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.