MCP Data Server

MCP Data Server

Indexes local files (PDF, TXT, CSV, Markdown) with embeddings for semantic search. Provides both CLI and MCP server interfaces so Claude Desktop can search and read your local documents.

Category
Visit Server

README

MCP Data Server — Local file search you can call from Claude (or CLI)

MCP Data Server indexes files on your machine (PDF, TXT, CSV, Markdown, etc.) and lets you search them with embeddings. You can use it from:

  • a friendly CLI (ls, index, search)
  • an MCP server over stdio (so Claude Desktop/Cursor can call your tools)

Works great on Windows 11. Also tested on macOS/Linux (see notes).


Table of contents


Features

  • 🔎 Local search with SentenceTransformers embeddings (cosine similarity)
  • ⚡ Optional FAISS index for fast Top-K search
  • 🧰 Simple CLI: ls, index, search
  • 🔌 MCP server so Claude Desktop can call tools: list_docs_tool, index_docs_tool, search_chunks_tool, read_doc_tool
  • 🧩 Extensible loaders/chunkers; add new formats easily
  • ✅ Batteries-included dev setup: Ruff, Black, MyPy, PyTest, pre-commit

Prerequisites

  • Python 3.11+ (3.11 recommended)
  • Windows 11 (PowerShell) macOS/Linux are fine too (bash)
  • ~3 GB free disk space on first run (model cache)
  • (Optional) FAISS CPU wheels installed automatically via faiss-cpu

Quick start (Windows)

Folder in this repo where you put files to index: ./data/

# 1) Clone and enter project
git clone https://github.com/hkonda015/McpServer.git
Set-Location .\McpServer\McpServer

# 2) Create & activate venv (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# 3) Install runtime (or dev) dependencies
pip install -r requirements.txt
# or for contributors:
pip install -r requirements-dev.txt

# 4) (Optional) pre-commit hooks
pre-commit install

# 5) Put a few files in .\data\ (txt/pdf/csv/md), then:
python -m mcp_data_server ls
python -m mcp_data_server index
python -m mcp_data_server search "your query" --k 5



## Usage (CLI)

The CLI lets you **list files**, **build/rebuild the index**, and **search** your local documents.

> **Prereq:** open a terminal at your repo root and activate the venv
> Windows (PowerShell):
> ```powershell
> Set-Location .\McpServer
> .\.venv\Scripts\Activate.ps1
> ```
> macOS/Linux (bash):
> ```bash
> cd McpServer
> source .venv/bin/activate
> ```

---

### 1) List files (`ls`)

Lists all **supported documents** under `DATA_DIR` (defaults to `./data`).

```powershell
python -m mcp_data_server ls



# Contributing to MCP Data Server

Thanks for your interest in contributing! This document explains how to set up your dev environment, the coding standards we use, how to run tests, and how to submit a good pull request.

---

## Ways to contribute

- **Bug reports**: include steps to reproduce, expected vs actual behavior, OS, Python version, and logs.
- **Feature requests**: explain the use case, not just the solution. Sketch CLI and/or MCP tool UX if relevant.
- **Documentation**: improve READMEs, examples, and comments.
- **Code**: bug fixes, new loaders, chunking strategies, performance improvements, tests.

Good first issues will be labeled **good first issue** and **help wanted**.

---

## Development setup

### Prerequisites
- Python **3.11+** (we recommend 3.11)
- Git
- ~3 GB free disk space for model cache on first run

### Clone and create a virtual environment

#### Windows (PowerShell)
```powershell
git clone https://github.com/hkonda015/McpServer.git
Set-Location .\McpServer\McpServer
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements-dev.txt
pre-commit install

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured