MCP Data Server
Indexes local files (PDF, TXT, CSV, Markdown) with embeddings for semantic search. Provides both CLI and MCP server interfaces so Claude Desktop can search and read your local documents.
README
MCP Data Server — Local file search you can call from Claude (or CLI)
MCP Data Server indexes files on your machine (PDF, TXT, CSV, Markdown, etc.) and lets you search them with embeddings. You can use it from:
- a friendly CLI (
ls,index,search) - an MCP server over stdio (so Claude Desktop/Cursor can call your tools)
Works great on Windows 11. Also tested on macOS/Linux (see notes).
Table of contents
- Features
- Prerequisites
- Quick start (Windows)
- Quick start (macOS/Linux)
- Usage (CLI)
- Use with Claude Desktop (MCP)
- Configuration
- Project structure
- Development (lint, type, test)
- Troubleshooting
- Contributing
- License
Features
- 🔎 Local search with SentenceTransformers embeddings (cosine similarity)
- ⚡ Optional FAISS index for fast Top-K search
- 🧰 Simple CLI:
ls,index,search - 🔌 MCP server so Claude Desktop can call tools:
list_docs_tool,index_docs_tool,search_chunks_tool,read_doc_tool - 🧩 Extensible loaders/chunkers; add new formats easily
- ✅ Batteries-included dev setup: Ruff, Black, MyPy, PyTest, pre-commit
Prerequisites
- Python 3.11+ (3.11 recommended)
- Windows 11 (PowerShell) macOS/Linux are fine too (bash)
- ~3 GB free disk space on first run (model cache)
- (Optional) FAISS CPU wheels installed automatically via
faiss-cpu
Quick start (Windows)
Folder in this repo where you put files to index:
./data/
# 1) Clone and enter project
git clone https://github.com/hkonda015/McpServer.git
Set-Location .\McpServer\McpServer
# 2) Create & activate venv (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# 3) Install runtime (or dev) dependencies
pip install -r requirements.txt
# or for contributors:
pip install -r requirements-dev.txt
# 4) (Optional) pre-commit hooks
pre-commit install
# 5) Put a few files in .\data\ (txt/pdf/csv/md), then:
python -m mcp_data_server ls
python -m mcp_data_server index
python -m mcp_data_server search "your query" --k 5
## Usage (CLI)
The CLI lets you **list files**, **build/rebuild the index**, and **search** your local documents.
> **Prereq:** open a terminal at your repo root and activate the venv
> Windows (PowerShell):
> ```powershell
> Set-Location .\McpServer
> .\.venv\Scripts\Activate.ps1
> ```
> macOS/Linux (bash):
> ```bash
> cd McpServer
> source .venv/bin/activate
> ```
---
### 1) List files (`ls`)
Lists all **supported documents** under `DATA_DIR` (defaults to `./data`).
```powershell
python -m mcp_data_server ls
# Contributing to MCP Data Server
Thanks for your interest in contributing! This document explains how to set up your dev environment, the coding standards we use, how to run tests, and how to submit a good pull request.
---
## Ways to contribute
- **Bug reports**: include steps to reproduce, expected vs actual behavior, OS, Python version, and logs.
- **Feature requests**: explain the use case, not just the solution. Sketch CLI and/or MCP tool UX if relevant.
- **Documentation**: improve READMEs, examples, and comments.
- **Code**: bug fixes, new loaders, chunking strategies, performance improvements, tests.
Good first issues will be labeled **good first issue** and **help wanted**.
---
## Development setup
### Prerequisites
- Python **3.11+** (we recommend 3.11)
- Git
- ~3 GB free disk space for model cache on first run
### Clone and create a virtual environment
#### Windows (PowerShell)
```powershell
git clone https://github.com/hkonda015/McpServer.git
Set-Location .\McpServer\McpServer
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements-dev.txt
pre-commit install
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.