LongBook Verifier

LongBook Verifier

An MCP server that evaluates whether retrieval methods and AI outputs are grounded in long narrative manuscripts by retrieving evidence and scoring coverage deterministically, without external model APIs. It provides tools for chunking, indexing, retrieval, and evaluation.

Category
Visit Server

README

LongBook Verifier

An MCP-enabled evaluation and claim-grounding toolkit for long-document RAG systems.

License: MIT Python DOI Live demo: BookProof MCP: local stdio

LongBook Verifier measures whether retrieval methods and AI outputs are actually grounded in long narrative manuscripts β€” by retrieving evidence from the document and scoring coverage, deterministically and without external model APIs.

Use it three ways

What you get For
πŸ”Œ Local MCP A local stdio MCP server for Claude Code / Codex β€” long-document evaluation, retrieval, claim verification, and report tools. Local only (not hosted or remote). β†’ docs/MCP.md Using the verifier as tools inside your AI client
🌐 Try BookProof online The live hosted web product β€” upload a document + golden questions in the browser, no install. β†’ tts.bedvibe.studio/bookproof/app Trying it instantly
πŸ’» Run locally Clone and run the FastAPI web app + evaluation engine on your own machine. β†’ docs/LOCAL_RUN.md Developers / researchers inspecting or running the verifier

What it does

  • Retrieval evaluation across five methods β€” naive_first_context, naive_last_context, flat_chunk_rag, chapter_summary_chain, hierarchical_book_rag β€” on book-length documents.
  • Claim / answer grounding: scores an AI output (or a set of claims/questions) against the source document using evidence-term coverage, answer-term coverage, and retrieval context precision/recall–like metrics.
  • Deterministic local embeddings (hashing_numpy) β€” reproducible, no downloaded models and no Claude/OpenAI/Gemini calls.
  • Three ways to use the same engine: a CLI/eval pipeline, a local FastAPI web app, and a local stdio MCP server for AI coding clients.

Why it exists

Short-answer correctness and evidence grounding can diverge: a model can give a plausible answer that the document doesn't actually support. LongBook Verifier separates those signals so you can audit whether outputs and retrieval are grounded in long manuscripts β€” useful for manuscript QA and reproducible long-document evaluation.

Live product

A hosted, public version of this evaluation runs as BookProof:

➑️ BookProof β€” try it online

BookProof is an existing, related public product. It is not required to run anything in this repository locally.

Architecture

One evaluation engine, three access surfaces, plus the hosted product:

  • Research / evaluation engine (src/) β€” chunking, deterministic index build, retrieval, the five methods, metrics, and claim verification.
  • Local FastAPI web app (product_mvp/server_longbook_verifier.py) β€” upload a document + golden questions in the browser and get scored locally.
  • Local stdio MCP server (product_mvp/mcp_longbook_server.py) β€” exposes the engine to MCP clients (e.g. Claude Code / Codex) over stdio, locally only.
  • BookProof public product/API β€” a deployed instance offering a rate-limited public demo and a separate token-gated verification API (see BookProof API).

See docs/ARCHITECTURE.md for a diagram.

Research methods

The benchmark reports evidence-term coverage, answer-term coverage, retrieval context recall, and task-completion behavior separately β€” because short-answer correctness and evidence grounding can diverge. Two experiments are documented in paper/:

  • Experiment A β€” a pilot single-book benchmark (~64k words, 40 gold questions, 5 retrieval methods, 5 external consumer AI systems under a free-tier protocol).
  • Experiment B β€” an extended stress test on a 240,767-word corpus (~320,220 tokens, 80 gold questions, 5 retrieval methods).

These are a pilot plus stress-test package, not a universal model ranking or state-of-the-art claim. Full methods and results are in paper/; the research package is archived at DOI 10.5281/zenodo.20513116.

Quick start

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Details: docs/LOCAL_RUN.md.

Run the local web app

python -m uvicorn product_mvp.server_longbook_verifier:app --host 127.0.0.1 --port 8078

Then open http://127.0.0.1:8078/ and upload a document + golden questions.

Use the local MCP server

python product_mvp\mcp_longbook_server.py

The MCP server runs locally over stdio; an MCP client launches it as a subprocess. It also requires the mcp package: python -m pip install mcp. See docs/MCP.md.

MCP client configuration

Generic mcpServers entry (replace the path with the location of your cloned repo β€” see mcp_config.example.json):

{
  "mcpServers": {
    "longbook-proof-local": {
      "command": "python",
      "args": ["EDIT_THIS_PATH/product_mvp/mcp_longbook_server.py"]
    }
  }
}

MCP tools

All tools are read-or-allowlisted, local-only:

Tool Description
longbook_status Read-only project summary (allowed roots, scripts, report/run counts, default backend).
list_books List .txt / .md / .docx files under the project book folder (or a sub-path inside the project).
list_reports List report-like files (.md / .txt / .json / .jsonl / .csv).
read_report Read a report-like file with truncation.
run_chunking Chunk a book into a .jsonl (src/chunk_book.py).
run_index_build Build a retrieval index (src/build_index.py, hashing_numpy).
run_retrieve Return ranked chunks from an existing local index (src/retrieve.py).
run_eval Run a retrieval-evaluation method over a book + questions (src/run_eval.py).
generate_report_tables Build summary CSV tables from run folders (src/report_tables.py).

Repository structure

src/             evaluation engine (chunking, index, retrieval, methods, metrics, claim checks)
product_mvp/     local FastAPI web app + local stdio MCP server + site/ frontend
paper/           research write-ups (methods, results, limitations) + CITATION.cff
docs/            LOCAL_RUN, MCP, ARCHITECTURE
scripts/         Windows helpers (run_web.bat, run_mcp.bat)

Data policy

Copyrighted corpora, source manuscripts, private evaluation data, and user uploads are intentionally excluded from this repository. The tools operate on documents you provide.

Security model

Confirmed in product_mvp/mcp_longbook_server.py: the MCP server runs local stdio only and calls an allowlisted set of local scripts. It uses no arbitrary shell commands (no shell=True), enforces strict read/write path checks (reads confined to the project root; writes confined to outputs/, reports/, and product_mvp/runs/), rejects paths containing .env / secret / key / token / password, runs child scripts with stdin=DEVNULL, applies a timeout, and makes no cloud or external model calls. It does not provide shell execution or remote access.

BookProof API

The hosted BookProof product exposes:

  • a public, rate-limited demo endpoint (capped document size, capped questions, one run per IP per day, inputs deleted after processing), and
  • a separate token-gated verification API (authenticated via an X-BookProof-Token header) with a machine-readable spec endpoint.

No token is included in this repository.

Limitations

  • Evaluation is lexical/retrieval-based and deterministic (hashing_numpy); it is not a semantic-embedding or model-graded benchmark.
  • The published results are a pilot + stress test, not a universal ranking or SOTA claim.
  • The MCP server expects the local project files and runs entirely on your machine.

License

MIT Β© 2026 Panos Gkilis. Contact via GitHub Security Advisories for security reports (see SECURITY.md).

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured