Local File Search MCP
Provides tools for searching and reading local files, supporting metadata filters, PDF full-text keyword search, and sandboxed file access.
README
Local File Search MCP Agent
LangChain CLI agent that combines an in-process Local File Search MCP (FastMCP) with the remote Microsoft Learn MCP. The LLM uses MiniMax via the OpenAI-compatible API.
Features
search_files— metadata filters (name, folder, extension, dates, size)search_pdf_content— PDF full-text keyword search viapypdf- Microsoft Learn MCP at
https://learn.microsoft.com/api/mcp(streamable_http) - SKILL-based routing with JSON-only local results and 2000-char MS answers
- Async REPL CLI
Documentation
| Guide | Description |
|---|---|
| docs/README.md | Documentation index |
| docs/PROJECT_OVERVIEW.md | Architecture and what was built |
| docs/LLM_PROVIDER_GUIDE.md | MiniMax ↔ OpenAI migration |
| docs/DEPLOYMENT.md | Deploy: local, GitHub, Docker, systemd |
| docs/OPERATIONS.md | Operations and CI |
| docs/TROUBLESHOOTING.md | Common issues |
| docs/COMPLIANCE_REPORT.md | Assignment audit |
| docs/INSTRUCTIONS_FOR_ABIN.md | Reviewer guide for install.py |
Setup
One command (recommended):
git clone https://github.com/dchatpar/mcp-file-agent.git
cd mcp-file-agent
chmod +x install.py
./install.py --non-interactive --skip-e2e # no API key; full gate without E2E
# Or interactive: ./install.py
Manual setup:
cd mcp-file-agent
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Set OPENAI_API_KEY in .env (never commit .env)
python scripts/generate_samples.py
MiniMax (OpenAI-compatible)
Configure .env:
OPENAI_API_KEY=<your MiniMax API key>
OPENAI_BASE_URL=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
LangChain uses ChatOpenAI with base_url pointing at MiniMax. MiniMax-only extra_body (thinking disabled) is applied automatically when the base URL contains minimax. OPENAI_API_BASE_URL is accepted as an alias for OPENAI_BASE_URL.
Assignment / OpenAI GPT-5.x: Defaults to MiniMax-M2.7. For OpenAI, copy .env.openai.example to .env or set OPENAI_BASE_URL=https://api.openai.com/v1, your GPT model id, and an OpenAI API key. Full steps: docs/LLM_PROVIDER_GUIDE.md.
Environment
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | Required for agent E2E (MiniMax key) |
OPENAI_BASE_URL |
https://api.minimax.io/v1 |
MiniMax OpenAI-compatible endpoint |
OPENAI_MODEL |
MiniMax-M2.7 |
Model name on MiniMax |
SEARCH_ROOT |
data/samples/zoology |
Sandboxed search directory |
FILE_SEARCH_ROOT |
(same as SEARCH_ROOT) |
Alias for SEARCH_ROOT |
MICROSOFT_LEARN_MCP_URL |
https://learn.microsoft.com/api/mcp |
Learn MCP endpoint |
MS_ANSWER_MAX_CHARS |
2000 |
Max length for Microsoft Learn answers |
Run CLI
file-search-agent
# or
python -m file_search_agent.main
Sample queries
- Local (JSON only):
What PDF files are available in our system? - Learn (≤2000 chars):
What is Azure Blob Storage? - PDF content search:
Find mentions of migration in the PDFs - Out-of-scope:
What is the capital of France?→ refusal JSON
Test data
The data/samples/zoology/ directory holds 8 non-technical zoology files used for all local-search tests:
| File | Extension | Description |
|---|---|---|
african_elephant_study.pdf |
Elephant population dynamics | |
marine_mammals_report.pdf |
Orca/dolphin hydrophone survey | |
bird_migration_analysis.pdf |
Arctic tern geolocator study | |
amphibian_survey_2023.pdf |
Chytrid fungus impact assessment | |
coral_reef_observations.docx |
.docx | Great Barrier Reef transect notes |
species_count_2024.xls |
.xls | Endangered species population counts |
field_notes_borneo.txt |
.txt | Borneo rainforest expedition diary |
jaguar_photo_rainforest.jpg |
.jpg | Camera-trap image placeholder |
Regenerate with: python scripts/generate_samples.py
Verification
QA matrix
| Check | Command | API key | Expected |
|---|---|---|---|
| Lint | ruff check src tests scripts install.py |
No | All checks passed |
| Unit tests | pytest -v |
No | 40 passed |
| E2E agent | python -u scripts/e2e_verify.py |
Yes | 5/5 PASSED (~1–2 min) |
| Production gate | python -u scripts/production_gate.py |
Yes | All 6 steps PASS (~90s) |
| Sample data | python scripts/generate_samples.py |
No | 8 files in data/samples/zoology/ |
Run lint and unit tests in parallel:
source .venv/bin/activate
pip install -e ".[dev]"
python scripts/generate_samples.py
ruff check src tests scripts & pytest -v & wait
E2E (requires OPENAI_API_KEY in .env):
Takes about 1–2 minutes. Use unbuffered output so progress prints appear immediately ([1/5] … [5/5]):
python -u scripts/e2e_verify.py
Checks:
- PDF files query → local tools, JSON with PDF entries
- List all files → local tools, 8 files total
- Elephant search → local tools, elephant match in JSON
- Azure Blob Storage → Learn MCP only, answer ≤ 2000 chars
- Out-of-scope (capital of France) → assignment error JSON, no tools
Interactive CLI smoke test:
file-search-agent
Assignment compliance
| Requirement | Implementation | Verified by |
|---|---|---|
| Local File Search MCP (in-process) | mcp/local_file_search.py via FastMCP |
test_local_mcp.py, E2E [1–3] |
search_files metadata filters |
name, folder, extension, dates, size | test_search_files_* |
search_pdf_content full-text |
pypdf keyword search | test_search_pdf_content_keyword |
list_all_files |
lists all sandboxed files | test_list_all_files_returns_eight, E2E [2] |
read_pdf_content |
read single PDF by path | test_read_pdf_content_* |
| Microsoft Learn MCP (remote) | streamable_http at learn.microsoft.com |
test_learn_mcp.py, E2E [4] |
| SKILL routing (local JSON / MS prose / out-of-scope) | SKILL.md, routing.py, output_guard.py |
test_agent_routing.py, E2E [5] |
| MiniMax via OpenAI-compatible API | ChatOpenAI + conditional extra_body |
agent_factory.py, test_agent_factory.py, E2E all |
Sandboxed SEARCH_ROOT |
path traversal rejected | test_security.py |
| 8 sample zoology files | data/samples/zoology/ |
generate_samples.py, E2E [2] |
Dependencies
Pinned full environment (after pip install -e ".[dev]"):
pip install -r requirements.txt
pip install -e .
Or install from project metadata only: pip install -e ".[dev]".
GitHub
Published repository: https://github.com/dchatpar/mcp-file-agent
Reviewer abin-aot has been invited as a collaborator. Submission email draft for the AOT assessment: docs/SUBMISSION_EMAIL_TO_ABIN.md.
Project layout
src/file_search_agent/
main.py # Async REPL
config.py # Env config
models.py # Pydantic tool models
agent_factory.py # create_agent + MCP clients
output_guard.py # JSON / truncation guards
mcp/local_file_search.py
data/samples/zoology/ # Non-tech zoology sample files
docs/ # Full deployment and LLM guides
deploy/ # systemd unit example
Dockerfile # Container image
docker-compose.yml
tests/
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.