Local File Search MCP

Local File Search MCP

Provides tools for searching and reading local files, supporting metadata filters, PDF full-text keyword search, and sandboxed file access.

Category
Visit Server

README

Local File Search MCP Agent

LangChain CLI agent that combines an in-process Local File Search MCP (FastMCP) with the remote Microsoft Learn MCP. The LLM uses MiniMax via the OpenAI-compatible API.

Features

  • search_files — metadata filters (name, folder, extension, dates, size)
  • search_pdf_content — PDF full-text keyword search via pypdf
  • Microsoft Learn MCP at https://learn.microsoft.com/api/mcp (streamable_http)
  • SKILL-based routing with JSON-only local results and 2000-char MS answers
  • Async REPL CLI

Documentation

Guide Description
docs/README.md Documentation index
docs/PROJECT_OVERVIEW.md Architecture and what was built
docs/LLM_PROVIDER_GUIDE.md MiniMax ↔ OpenAI migration
docs/DEPLOYMENT.md Deploy: local, GitHub, Docker, systemd
docs/OPERATIONS.md Operations and CI
docs/TROUBLESHOOTING.md Common issues
docs/COMPLIANCE_REPORT.md Assignment audit
docs/INSTRUCTIONS_FOR_ABIN.md Reviewer guide for install.py

Setup

One command (recommended):

git clone https://github.com/dchatpar/mcp-file-agent.git
cd mcp-file-agent
chmod +x install.py
./install.py --non-interactive --skip-e2e   # no API key; full gate without E2E
# Or interactive: ./install.py

Manual setup:

cd mcp-file-agent
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Set OPENAI_API_KEY in .env (never commit .env)
python scripts/generate_samples.py

MiniMax (OpenAI-compatible)

Configure .env:

OPENAI_API_KEY=<your MiniMax API key>
OPENAI_BASE_URL=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7

LangChain uses ChatOpenAI with base_url pointing at MiniMax. MiniMax-only extra_body (thinking disabled) is applied automatically when the base URL contains minimax. OPENAI_API_BASE_URL is accepted as an alias for OPENAI_BASE_URL.

Assignment / OpenAI GPT-5.x: Defaults to MiniMax-M2.7. For OpenAI, copy .env.openai.example to .env or set OPENAI_BASE_URL=https://api.openai.com/v1, your GPT model id, and an OpenAI API key. Full steps: docs/LLM_PROVIDER_GUIDE.md.

Environment

Variable Default Description
OPENAI_API_KEY Required for agent E2E (MiniMax key)
OPENAI_BASE_URL https://api.minimax.io/v1 MiniMax OpenAI-compatible endpoint
OPENAI_MODEL MiniMax-M2.7 Model name on MiniMax
SEARCH_ROOT data/samples/zoology Sandboxed search directory
FILE_SEARCH_ROOT (same as SEARCH_ROOT) Alias for SEARCH_ROOT
MICROSOFT_LEARN_MCP_URL https://learn.microsoft.com/api/mcp Learn MCP endpoint
MS_ANSWER_MAX_CHARS 2000 Max length for Microsoft Learn answers

Run CLI

file-search-agent
# or
python -m file_search_agent.main

Sample queries

  • Local (JSON only): What PDF files are available in our system?
  • Learn (≤2000 chars): What is Azure Blob Storage?
  • PDF content search: Find mentions of migration in the PDFs
  • Out-of-scope: What is the capital of France? → refusal JSON

Test data

The data/samples/zoology/ directory holds 8 non-technical zoology files used for all local-search tests:

File Extension Description
african_elephant_study.pdf .pdf Elephant population dynamics
marine_mammals_report.pdf .pdf Orca/dolphin hydrophone survey
bird_migration_analysis.pdf .pdf Arctic tern geolocator study
amphibian_survey_2023.pdf .pdf Chytrid fungus impact assessment
coral_reef_observations.docx .docx Great Barrier Reef transect notes
species_count_2024.xls .xls Endangered species population counts
field_notes_borneo.txt .txt Borneo rainforest expedition diary
jaguar_photo_rainforest.jpg .jpg Camera-trap image placeholder

Regenerate with: python scripts/generate_samples.py

Verification

QA matrix

Check Command API key Expected
Lint ruff check src tests scripts install.py No All checks passed
Unit tests pytest -v No 40 passed
E2E agent python -u scripts/e2e_verify.py Yes 5/5 PASSED (~1–2 min)
Production gate python -u scripts/production_gate.py Yes All 6 steps PASS (~90s)
Sample data python scripts/generate_samples.py No 8 files in data/samples/zoology/

Run lint and unit tests in parallel:

source .venv/bin/activate
pip install -e ".[dev]"
python scripts/generate_samples.py
ruff check src tests scripts & pytest -v & wait

E2E (requires OPENAI_API_KEY in .env):

Takes about 1–2 minutes. Use unbuffered output so progress prints appear immediately ([1/5][5/5]):

python -u scripts/e2e_verify.py

Checks:

  1. PDF files query → local tools, JSON with PDF entries
  2. List all files → local tools, 8 files total
  3. Elephant search → local tools, elephant match in JSON
  4. Azure Blob Storage → Learn MCP only, answer ≤ 2000 chars
  5. Out-of-scope (capital of France) → assignment error JSON, no tools

Interactive CLI smoke test:

file-search-agent

Assignment compliance

Requirement Implementation Verified by
Local File Search MCP (in-process) mcp/local_file_search.py via FastMCP test_local_mcp.py, E2E [1–3]
search_files metadata filters name, folder, extension, dates, size test_search_files_*
search_pdf_content full-text pypdf keyword search test_search_pdf_content_keyword
list_all_files lists all sandboxed files test_list_all_files_returns_eight, E2E [2]
read_pdf_content read single PDF by path test_read_pdf_content_*
Microsoft Learn MCP (remote) streamable_http at learn.microsoft.com test_learn_mcp.py, E2E [4]
SKILL routing (local JSON / MS prose / out-of-scope) SKILL.md, routing.py, output_guard.py test_agent_routing.py, E2E [5]
MiniMax via OpenAI-compatible API ChatOpenAI + conditional extra_body agent_factory.py, test_agent_factory.py, E2E all
Sandboxed SEARCH_ROOT path traversal rejected test_security.py
8 sample zoology files data/samples/zoology/ generate_samples.py, E2E [2]

Dependencies

Pinned full environment (after pip install -e ".[dev]"):

pip install -r requirements.txt
pip install -e .

Or install from project metadata only: pip install -e ".[dev]".

GitHub

Published repository: https://github.com/dchatpar/mcp-file-agent

Reviewer abin-aot has been invited as a collaborator. Submission email draft for the AOT assessment: docs/SUBMISSION_EMAIL_TO_ABIN.md.

Project layout

src/file_search_agent/
  main.py              # Async REPL
  config.py            # Env config
  models.py            # Pydantic tool models
  agent_factory.py     # create_agent + MCP clients
  output_guard.py      # JSON / truncation guards
  mcp/local_file_search.py
data/samples/zoology/  # Non-tech zoology sample files
docs/                  # Full deployment and LLM guides
deploy/                # systemd unit example
Dockerfile             # Container image
docker-compose.yml
tests/

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured