Halyard Knowledge Base MCP Server

Halyard Knowledge Base MCP Server

Enables to search and retrieve chunks from a fictional library's documentation through three tools (kb_search, kb_fetch, kb_sources), allowing LLMs to perform RAG queries via the Model Context Protocol.

Category
Visit Server

README

agentic-rag-mcp

A weekend build to get the three buzzwords straight in my own head by actually wiring them up: agentic RAG, MCP, and a small multi-agent pipeline, all over the same tiny document set. It's a working sketch, not a framework, and not production code — the point was to see the moving parts and where each idea earns its keep (and where it doesn't).

The knowledge base is five short docs about a fictional library called Halyard (docs/). They're written so a couple of questions genuinely need more than one lookup — which is the whole reason "agentic" retrieval is interesting here.

What runs without an API key, and what needs one

File What it is API key?
knowledge_base.py chunk → embed (MiniLM) → FAISS search. The retrieval backend everything else uses. no
mcp_server.py an MCP server exposing the KB as tools (kb_search, kb_fetch, kb_sources) no
mcp_client_demo.py launches that server over stdio and calls its tools no
agentic_rag.py single agent that drives its own retrieval via a search tool yes (offline: naive baseline)
multi_agent.py retriever → synthesizer → verifier pipeline with a revise loop yes (offline: stub roles)
llm.py the manual tool-use loop the agents share

The LLM pieces use Claude (claude-opus-4-8, adaptive thinking, prompt caching on the system/tools prefix). Without a key, agentic_rag.py falls back to a naive single-shot retrieval baseline and multi_agent.py runs stub roles, so you can still watch the control flow.

Retrieval (knowledge_base.py)

Nothing clever: fixed-size character chunks with overlap, MiniLM embeddings, a FAISS flat inner-product index. No reranker, no hybrid BM25, no semantic chunking. Those are the obvious quality upgrades, but they'd be beside the point — this repo is about the orchestration on top, so the retriever is intentionally the dumb part.

Agentic RAG (agentic_rag.py)

Naive RAG retrieves once and answers. That breaks on questions like "how do I authenticate SqlSource after 3.0, and is the old way still supported?" — the answer is split across the migration guide and the changelog, and a single top-k search usually grabs one and misses the other.

Agentic RAG hands the model a kb_search tool and lets it decide: search, read the results, notice the second half of the question isn't covered, search again, then answer with citations. Same FAISS backend; the model just does multi-hop lookups itself. agentic_rag.py prints each query it chooses so you can see the hops.

The trade-off is real and worth stating: the agent costs several model round-trips and can occasionally wander or over-search. For single-fact questions naive RAG is cheaper and just as good. The agent earns its cost only when one lookup genuinely isn't enough.

MCP (mcp_server.py, mcp_client_demo.py)

The agents above could just import knowledge_base. MCP is about not doing that. Model Context Protocol is a standard way for a server to advertise tools and for any client — Claude Desktop, an IDE, your own agent — to discover and call them without custom glue. Run one retrieval server, point many clients at it.

mcp_server.py is that server (built with the official SDK's FastMCP), serving three tools over stdio. mcp_client_demo.py is the easiest way to see it work: it spawns the server as a subprocess, does the MCP handshake, lists the tools, and calls them — all locally, no key:

python mcp_client_demo.py

To use the same server from Claude Desktop, add it to that app's claude_desktop_config.json:

{
  "mcpServers": {
    "halyard-kb": {
      "command": "python",
      "args": ["C:/Users/you/.../agentic-rag-mcp/mcp_server.py"]
    }
  }
}

Then Claude can call kb_search directly. The point: the retrieval logic lives in one place behind a stable interface, and the consumer doesn't know or care that it's FAISS underneath.

Multi-agent (multi_agent.py)

Same task, decomposed into three focused roles, orchestrated in plain Python:

question -> RETRIEVER (has the search tool) -> evidence
         -> SYNTHESIZER (no tools)          -> draft answer
         -> VERIFIER (structured output)    -> supported? / issues
              supported -> return
              not       -> feed issues back to the synthesizer, retry (<= N)

That's two standard patterns — orchestrator/workers and evaluator/optimizer (the verify-and-revise loop). The flow is ordinary code; I'm deliberately not letting agents spawn each other freely. For a task this well-shaped, a fixed workflow is more predictable, cheaper, and far easier to debug than an autonomous swarm, and you can read exactly what happened. Autonomous multi-agent is worth it when the task genuinely can't be scripted up front — this one can, so it's a workflow with LLM steps, and I think that's the honest default for most "agent" problems.

The three roles are just functions, so the LLM version and an offline stub version share the same orchestrate() loop.

Running it

pip install -r requirements.txt

python knowledge_base.py     # retrieval sanity check
python mcp_client_demo.py    # MCP server + client, end to end, offline

# with a key, the LLM versions:
export ANTHROPIC_API_KEY=...        # PowerShell: $env:ANTHROPIC_API_KEY="..."
python agentic_rag.py
python multi_agent.py

First run downloads the ~90 MB embedding model. Requirements: Python 3.9+, numpy, faiss-cpu, sentence-transformers, mcp, and (for the LLM steps) anthropic + pydantic.

Caveats / what I'd do differently for real work

  • It's five toy docs. Retrieval quality, chunking, and reranking would matter a lot more on a real corpus and are barely exercised here.
  • I never measured anything. To make claims about "agentic beats naive" I'd build a small eval set of multi-hop questions with reference answers and actually score them, including tokens and round-trips, not just eyeball the traces.
  • The verifier is an LLM judging against retrieved text; it can be wrong and tends to be lenient. I'd calibrate it against a few human-checked answers.
  • The agent loop has no budget cap beyond max_turns; a real one wants a token budget and a cost ceiling.
  • In a real deployment the agents would call the KB through the MCP server (the SDK has helpers to bridge MCP tools into the tool loop); here they call the KB directly for simplicity, and MCP is shown as its own slice.

Muhammad Farooqi · https://github.com/mqfarooqi1

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured