Tabular Document Retriever MCP
Transforms CSV and Excel data into Markdown-formatted vector embeddings stored in a local ChromaDB instance for semantic search. It enables MCP clients to retrieve relevant tabular data through single-row, batch, or free-text queries.
README
Tabular Document Retriever MCP
This is a Model Context Protocol (MCP) server that transforms tabular data (CSV/Excel) into Markdown key-value pairs, embeds them into a local vector database (ChromaDB), and provides retrieval tools to contextually answer queries.
It leverages the python uv package manager, mcp SDK, and FastAPI to optionally expose the server using Server-Sent Events (SSE).
Vibe-Coded
The entire code base was generated by Antigravity with the help of Gemini 3.1 Pro (High and Fast) to make a fast proof of concept.
Features
- Ingestion: Parses
.csvand.xlsxfiles and upserts Markdown-formatted strings into ChromaDB. - Retrieval Engine: Uses
sentence-transformers/all-MiniLM-L6-v2locally for exact semantic search. - MCP Server: Provides three tools exposed over an SSE endpoint:
retrieve_batchretrieve_singleretrieve_by_query
- Dockerization: Quick spin-up of the Database and the MCP Server together without exposing the raw database to the host machine.
Prerequisites
🚀 Running the Stack
To start the server and the ChromaDB vector database locally:
docker-compose up -d --build
This will launch:
- ChromaDB internally on
chroma-db:8000. - MCP Server accessible externally on
http://localhost:8000/sse.
💾 Ingesting User Data
Before you can search, you need to ingest tabular data into the running ChromaDB instance.
You can use the built CLI ingestor directly from your host machine. Make sure to map environment variables appropriately to reach your local stack or run it via Docker Compose.
To run the ingestor against a locally running ChromaDB (or inside the container):
# First, ensure dependencies are synced
uv sync
# Run the ingestor (Assuming there's a file `data/my_table.csv`)
# When interacting with the dockerized ChromeDB, make sure to temporarily expose port 8000 for chroma-db, OR simply just run ingestion locally with local persistence.
uv run python -m src.ingestor data/my_table.csv
Note: Since the docker stack makes ChromaDB private, you can either map a port for chroma-db in docker-compose.yml temporarily, or run a one-off task using docker-compose:
docker-compose exec mcp-app python src/ingestor.py /path/to/mounted/data.csv
🛠️ MCP Tools
Once running, any MCP client can connect to http://localhost:8000/sse via Server-Sent Events (SSE).
Available tools:
retrieve_single(row): Top-K search using a single row's markdown string.retrieve_batch(rows): Batch retrieval handling a list of markdown row strings.retrieve_by_query(query): Free-text query mapped exactly to ChromaDB's search.
💻 Local Testing Example
You can test the running MCP server locally using the official Python SDK. First, ensure you have the mcp package installed in your environment (uv pip install mcp or uv add mcp).
Run the example with:
uv run python tests/test_client.py
🗄️ Checking ChromaDB Records
You can easily dump the ingested records directly from your local container exposed on port 8001. A utility script is provided to connect to the database and retrieve all content from the tabular_data collection.
Run the script using:
uv run python tests/dump_records.py
Alternatively, you can query the ChromaDB REST API directly using curl to list the collections and check the status of your data:
# List all collections
curl http://localhost:8001/api/v1/collections
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.