TabulaRAG
Enables AI assistants to query tabular data (CSV/TSV) using natural language with cell-level citations, supporting multi-tenant workspaces, access control, and semantic search.
README
<h1 align="center"> <img src="frontend/src/images/logo.png" alt="TabulaRAG logo" width="64" height="64" /></br> TabulaRAG </h1>
<p align="center"> <strong>A fast-ingesting tabular data MCP RAG tool backed with cell citations.</strong><br/> Upload a CSV or TSV, then query it in natural language. Results include cell-level citations so you can trace exactly where each answer came from. We also have multi-role access where admin users can add, delete, edit datasets as well as invite users to their enterprise/organization via invite codes. </p>
<p align="center"> <img src="https://img.shields.io/badge/python-3.11-blue" alt="Python 3.11" /> <img src="https://img.shields.io/badge/node-20-brightgreen" alt="Node 20" /> <img src="https://img.shields.io/badge/typescript-strict-blue" alt="TypeScript Strict" /> <img src="https://img.shields.io/badge/license-MIT-lightgrey" alt="License" /> </p>
Features
- CSV/TSV ingestion — upload tabular files with automatic header detection, delimiter inference, and column type recognition (dates, money, measurements)
- Semantic search — each row is embedded via
sentence-transformers/all-MiniLM-L6-v2and indexed in Qdrant for natural-language retrieval with cell-level citations - Structured queries — filter and aggregate data through the API with SQL over PostgreSQL JSON columns
- Multi-tenant workspaces — enterprises with invite codes, roles (owner / admin / querier), and switchable active workspace
- Folders & access control — public, protected, and private folders with user-group-based permissions
- MCP server — exposes Streamable HTTP + SSE endpoints so AI assistants can use your tables as a retrieval tool
- Auth — Google OAuth and email/password with verification codes and password reset (via Brevo SMTP)
- Background indexing — threaded worker pool for non-blocking embedding and Qdrant upserts with progress tracking
Architecture
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript 5.9, Vite 7, React Router 7 |
| Backend | Python 3.11, FastAPI, SQLAlchemy 2.x, Uvicorn |
| Database | PostgreSQL 16 |
| Vector store | Qdrant v1.13.4 (FastEmbed, 384-dim dense vectors) |
| Auth | JWT, Google OAuth, bcrypt |
| MCP | fastapi-mcp (Streamable HTTP + SSE) |
| Web server | Nginx 1.27 (production frontend) |
| CI | GitHub Actions (pytest, ESLint, Docker builds) |
MCP integration
TabulaRAG exposes endpoints for AI assistant integration:
| Type | URL |
|---|---|
| OpenAPI | http://localhost:8000/openapi.json |
| MCP (Streamable HTTP) | http://localhost:8000/mcp |
Authentication: use a personal MCP token (generated from the app's MCP section) in Authorization: Bearer <token>, or the server API_KEY for automation. Tokens are scoped per user and workspace.
If your MCP client runs outside the browser (e.g. Docker, desktop app), replace
localhostwith your machine's IP (ipconfigon Windows,ifconfigon Mac/Linux).
Local vs. Deployed
| Feature | Local (Docker Compose) | Deployed |
|---|---|---|
| Database | PostgreSQL in Docker | External PostgreSQL |
| Vector store | Qdrant in Docker | External Qdrant |
Prerequisites (Local)
| Tool | Version | Notes |
|---|---|---|
| Docker | Latest | https://docs.docker.com/get-docker/ |
| Docker Compose | v2+ | Bundled with Docker Desktop |
Docker Compose v2 is required (
docker compose, notdocker-compose). It ships with Docker Desktop on Mac and Windows. Linux users may need to install it separately.
You do not need to install Python or Node locally — everything runs inside containers.
Quick start
cp .env.example .env # create config (edit values as needed)
./scripts/dev-up.sh # build and start all services
Once running:
| Service | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:8000 |
| PostgreSQL | localhost:5433 |
| Qdrant | http://localhost:6333 |
Health checks:
curl http://localhost:8000/health
curl http://localhost:8000/health/deps
Stop / logs:
./scripts/dev-down.sh
./scripts/dev-logs.sh # all services
./scripts/dev-logs.sh backend # single service
Environment variables
Copy .env.example to .env. The key variables:
| Variable | Required | Description |
|---|---|---|
POSTGRES_DB |
Yes | Database name (default: tabularag) |
POSTGRES_USER |
Yes | Database user (default: tabularag) |
POSTGRES_PASSWORD |
Yes | Database password |
DATABASE_URL |
Yes | PostgreSQL connection string (set in docker-compose) |
QDRANT_URL |
Yes | Qdrant endpoint (set in docker-compose) |
JWT_SECRET |
Yes | Secret for signing JWT tokens |
PUBLIC_API_BASE_URL |
Yes | Backend URL used in email links (default: http://localhost:8000) |
PUBLIC_UI_BASE_URL |
Yes | Frontend URL used in email links (default: http://localhost:5173) |
API_KEY |
No | Optional static key for script/API access |
GOOGLE_CLIENT_ID |
No | Google OAuth client ID |
GOOGLE_CLIENT_SECRET |
No | Google OAuth client secret |
SMTP_HOST |
No | Brevo SMTP relay for verification/reset emails |
SMTP_PORT |
No | SMTP port (default: 587) |
SMTP_USER |
No | SMTP login |
SMTP_PASSWORD |
No | SMTP key |
SMTP_FROM |
No | Sender email address |
SMTP is optional for local development. If
SMTP_HOSTis empty, verification and reset codes are written to the backend logs instead of being emailed. For any real deployment, configure SMTP so users receive emails.
The remaining variables in .env.example control embedding model tuning, Qdrant HNSW parameters, batch sizes, and indexing concurrency. The defaults work for local development.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.