MCP File Analyzer

MCP File Analyzer

An MCP server that enables the analysis of CSV and Parquet files by providing tools for statistical summaries, data previews, and structure exploration. It allows users to query local datasets and create sample data using natural language.

Category
Visit Server

README

MCP File Analyzer: Complete Setup & Usage Guide

This guide will walk you through setting up a Model Context Protocol (MCP) server that can analyze CSV and Parquet files, and connecting it to Claude Desktop for natural language data analysis.

๐ŸŽฏ What You'll Build

A powerful data analysis tool that allows Claude to:

  • ๐Ÿ“Š Read and analyze CSV/Parquet files
  • ๐Ÿ“ˆ Generate statistical summaries
  • ๐Ÿ‘€ Show data previews and structure
  • ๐Ÿ”ง Create sample datasets
  • ๐Ÿ’ฌ Answer natural language questions about your data

Table of Contents

  1. What is MCP?
  2. Quick Start
  3. Prerequisites
  4. Project Setup
  5. Claude Desktop Integration
  6. Usage Examples
  7. Testing & Verification
  8. Troubleshooting
  9. Extending the Server
  10. Project Structure

What is MCP?

Model Context Protocol (MCP) is a standardized way to connect AI assistants like Claude to external tools and data sources. It allows you to:

  • ๐Ÿ” Give Claude access to your local files (securely)
  • ๐Ÿ› ๏ธ Create custom tools that Claude can use
  • ๐Ÿ”„ Build reusable AI workflows
  • ๐Ÿ  Keep your data secure and local (no API keys needed!)

Quick Start

โšก For the Impatient

# Clone or create project directory
mkdir mcp-file-analyzer && cd mcp-file-analyzer

# Set up virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install mcp>=1.0.0 pandas>=2.0.0 pyarrow>=10.0.0

# Create and test the server (copy main.py and client.py from this repo)
python main.py  # Start server (Ctrl+C to stop)
python client.py  # Test the connection

# Configure Claude Desktop (see detailed steps below)

Prerequisites

Before you begin, make sure you have:

  • Python 3.8 or higher installed
  • pip (Python package manager)
  • Claude Desktop installed (download here)
  • macOS, Windows, or Linux (Claude Desktop support varies)

Check your Python version:

python3 --version  # Should be 3.8+

Project Setup

Step 1: Create Project and Virtual Environment

# Create project directory
mkdir mcp-file-analyzer
cd mcp-file-analyzer

# Create virtual environment
python3 -m venv .venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

Step 2: Install Dependencies

Create requirements.txt:

# Core dependencies for MCP File Analyzer
mcp>=1.0.0
pandas>=2.0.0
pyarrow>=10.0.0

# HTTP client dependencies (optional)
httpx>=0.27.0

# Development dependencies (optional)
# pytest>=7.0.0
# black>=23.0.0
# flake8>=6.0.0

Install dependencies:

pip install -r requirements.txt

Step 3: Create Project Files

Your project needs these core files:

  1. main.py - The MCP server
  2. client.py - Testing client
  3. requirements.txt - Dependencies
  4. run_mcp_server.sh - Launcher script for Claude Desktop
  5. claude_desktop_config.json - Claude Desktop configuration

Step 4: Create Helper Scripts

Create activate_env.sh for easy environment activation:

#!/bin/bash
echo "๐Ÿš€ Activating virtual environment..."
source .venv/bin/activate
echo "โœ… Virtual environment activated!"
echo "๐Ÿ“ฆ Installed packages:"
pip list --format=columns
echo ""
echo "๐ŸŽฏ Quick start commands:"
echo "  - Run MCP server: python main.py"
echo "  - Run demo client: python client.py"
echo "  - Interactive client: python client.py interactive"

Make it executable:

chmod +x activate_env.sh

Claude Desktop Integration

๐ŸŽฏ Method 1: Direct Integration (Recommended)

Step 1: Create Launcher Script

Create run_mcp_server.sh:

#!/bin/bash
# MCP Server Launcher for Claude Desktop

# Get the directory where this script is located
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

# Change to the script directory
cd "$SCRIPT_DIR"

# Activate the virtual environment
source .venv/bin/activate

# Run the MCP server
python main.py

Make it executable:

chmod +x run_mcp_server.sh

Step 2: Create Claude Desktop Configuration

Create claude_desktop_config.json:

{
  "mcpServers": {
    "file_analyzer": {
      "command": "/ABSOLUTE/PATH/TO/YOUR/PROJECT/run_mcp_server.sh",
      "args": []
    }
  }
}

Important: Replace /ABSOLUTE/PATH/TO/YOUR/PROJECT with your actual project path. Get it with:

pwd  # Copy this output

Step 3: Install Configuration in Claude Desktop

Copy the configuration to Claude Desktop:

macOS:

cp claude_desktop_config.json ~/Library/Application\ Support/Claude/claude_desktop_config.json

Windows:

copy claude_desktop_config.json %APPDATA%\Claude\claude_desktop_config.json

Linux:

cp claude_desktop_config.json ~/.config/claude/claude_desktop_config.json

Step 4: Restart Claude Desktop

  1. Quit Claude Desktop completely
  2. Relaunch the application
  3. Look for the tool icon (๐Ÿ”จ) in the interface

๐ŸŒ Method 2: HTTP Server (Alternative)

For web-based testing and debugging, you can also run an HTTP version:

# Install additional dependencies
pip install uvicorn fastapi

# Start HTTP server
python http_server.py

# Test with HTTP client
python http_client.py

# Access web interface
open http://localhost:8000/docs

Usage Examples

๐Ÿš€ Getting Started with Claude

Once integrated, try these commands in Claude Desktop:

Basic Commands

Check available tools:

What MCP tools do you have available?

List data files:

What data files do I have available?

Analyze a CSV file:

Can you summarize the sample.csv file?

Advanced Analysis

Data exploration:

Show me the first 5 rows of sample.csv and tell me about the data structure

Statistical analysis:

Give me statistical information about sample.csv - what are the data types and any interesting patterns?

Create new data:

Create a new CSV file called "customer_data.csv" with 50 rows of sample customer data

Comprehensive analysis:

List all my data files, pick the most interesting one, and give me a complete analysis including:
- File structure and dimensions
- Data types for each column  
- First few rows as examples
- Statistical summary for numeric columns

๐Ÿ“Š Expected Results

Claude should respond with actual data from your files:

  • File summaries: "CSV file 'sample.csv' has 5 rows and 4 columns. Columns: id, name, email, signup_date"
  • Data previews: Formatted tables showing your actual data
  • Statistical analysis: Mean, median, standard deviation for numeric columns
  • Data insights: Observations about patterns in your data

๐Ÿงช Sample Data Included

Your MCP server automatically creates sample data:

sample.csv:

id,name,email,signup_date
1,Alice Johnson,alice@example.com,2023-01-15
2,Bob Smith,bob@example.com,2023-02-22
3,Carol Lee,carol@example.com,2023-03-10
4,David Wu,david@example.com,2023-04-18
5,Eva Brown,eva@example.com,2023-05-30

Testing & Verification

๐Ÿ”ง Test the Server Directly

# Activate environment
source .venv/bin/activate

# Test server and client
python client.py

Expected output:

๐Ÿš€ Starting MCP File Analyzer Client Demo
==================================================
โœ… Connected to MCP server successfully!

๐Ÿ”ง Available tools:
  - list_data_files
  - summarize_csv_file
  - summarize_parquet_file
  - analyze_csv_data
  - create_sample_data

๐Ÿ“‚ Listing data files:
๐Ÿ“„ Result: Available data files: sample.csv, sample.parquet

๐Ÿ“Š Summarizing CSV file:
๐Ÿ“„ Result: CSV file 'sample.csv' has 5 rows and 4 columns...

๐ŸŽฎ Interactive Mode

python client.py interactive

Try these commands:

  • list_files
  • summarize sample.csv
  • analyze sample.csv head
  • create test_data.csv 10

โœ… Verify Claude Integration

In Claude Desktop, you should see:

  1. Tool icon (๐Ÿ”จ) in the interface
  2. Available tools when you ask "What MCP tools do you have?"
  3. Successful responses to data analysis questions

Troubleshooting

๐Ÿ› Common Issues

1. No Tool Icon in Claude Desktop

Symptoms: Claude Desktop starts but no MCP tools appear

Solutions:

# Check config file location
ls -la ~/Library/Application\ Support/Claude/claude_desktop_config.json

# Verify JSON syntax
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json

# Test launcher script
./run_mcp_server.sh

# Check permissions
chmod +x run_mcp_server.sh

2. "Server Not Found" Error

Symptoms: Claude shows error about server connection

Solutions:

# Verify absolute path in config
pwd  # Make sure this matches your config

# Test server independently
source .venv/bin/activate
python main.py

# Check virtual environment
which python  # Should show .venv path

3. "Module Not Found" Error

Symptoms: Import errors when starting server

Solutions:

# Reinstall dependencies
source .venv/bin/activate
pip install -r requirements.txt

# Verify installation
pip list | grep mcp
pip list | grep pandas
pip list | grep pyarrow

4. Tools Appear But Don't Work

Symptoms: Tools listed but return errors

Solutions:

# Check data directory
ls -la data/

# Recreate sample data
rm -rf data/
python main.py  # Will recreate sample files

# Test with client
python client.py

๐Ÿ” Debug Steps

  1. Test each component independently:

    # Test server
    python main.py
    
    # Test client (in another terminal)
    python client.py
    
    # Test launcher
    ./run_mcp_server.sh
    
  2. Check file permissions:

    ls -la *.py *.sh
    chmod +x run_mcp_server.sh
    
  3. Validate configuration:

    # Check JSON syntax
    python -c "import json; print(json.load(open('claude_desktop_config.json')))"
    
  4. Check Claude Desktop logs:

    • Look for error messages in Claude Desktop
    • Check system logs for permission issues

Extending the Server

๐Ÿ› ๏ธ Adding New Tools

Create custom tools with the @mcp.tool() decorator:

@mcp.tool()
def analyze_excel_file(filename: str) -> str:
    """
    Analyze an Excel file and return summary information.
    Args:
        filename: Name of the Excel file (e.g., 'data.xlsx')
    Returns:
        A string describing the file's contents.
    """
    import pandas as pd
    file_path = DATA_DIR / filename
    
    # Read Excel file
    df = pd.read_excel(file_path)
    
    return f"Excel file '{filename}' has {len(df)} rows and {len(df.columns)} columns"

๐Ÿ“š Adding Resources

Provide static information to Claude:

@mcp.resource("data://file-formats")
def get_supported_formats() -> str:
    """List supported file formats."""
    formats = {
        "supported_formats": ["CSV", "Parquet", "Excel", "JSON"],
        "max_file_size": "100MB",
        "encoding": "UTF-8"
    }
    return json.dumps(formats, indent=2)

๐Ÿ”— Adding Database Support

Connect to databases:

import sqlite3

@mcp.tool()
def query_database(query: str) -> str:
    """
    Execute a SQL query on the local database.
    Args:
        query: SQL query to execute
    Returns:
        Query results as formatted text.
    """
    conn = sqlite3.connect('data/database.db')
    df = pd.read_sql_query(query, conn)
    conn.close()
    
    return df.to_string()

Project Structure

Your complete project should look like this:

mcp-file-analyzer/
โ”œโ”€โ”€ .venv/                          # Virtual environment
โ”œโ”€โ”€ data/                           # Data files (auto-created)
โ”‚   โ”œโ”€โ”€ sample.csv                  # Sample CSV data
โ”‚   โ”œโ”€โ”€ sample.parquet              # Sample Parquet data
โ”‚   โ””โ”€โ”€ ...                        # Your data files
โ”œโ”€โ”€ main.py                         # MCP server (stdio)
โ”œโ”€โ”€ client.py                       # Test client (stdio)
โ”œโ”€โ”€ http_server.py                  # HTTP MCP server (optional)
โ”œโ”€โ”€ http_client.py                  # HTTP test client (optional)
โ”œโ”€โ”€ requirements.txt                # Python dependencies
โ”œโ”€โ”€ activate_env.sh                 # Environment activation script
โ”œโ”€โ”€ run_mcp_server.sh              # Claude Desktop launcher
โ”œโ”€โ”€ claude_desktop_config.json     # Claude Desktop config
โ”œโ”€โ”€ .gitignore                     # Git ignore file
โ””โ”€โ”€ README.md                      # This file

๐Ÿ“ Key Files Explained

  • main.py: MCP server that provides file analysis tools
  • client.py: Test client to verify server functionality
  • run_mcp_server.sh: Launcher script for Claude Desktop integration
  • claude_desktop_config.json: Configuration for Claude Desktop
  • requirements.txt: Python package dependencies
  • data/: Directory containing your data files

Next Steps

๐Ÿš€ After Setup

  1. Add your own data - Copy CSV/Parquet files to the data/ directory
  2. Experiment with Claude - Try complex data analysis questions
  3. Create custom tools - Build tools specific to your workflow
  4. Explore advanced features - Add database connections, web APIs, etc.

๐Ÿ’ก Ideas for Enhancement

  • Excel support - Add tools for .xlsx files
  • Data visualization - Generate charts and graphs
  • Database integration - Connect to SQL databases
  • API connections - Fetch data from web APIs
  • Machine learning - Add prediction and analysis tools
  • File monitoring - Watch directories for new data files

๐Ÿ”— Useful Resources


Claude Desktop Integration

To use this MCP server from Claude Desktop (macOS):

  1. Make the launcher script executable:

    chmod +x /Users/gaohan/Downloads/file_analyzer-main/run_mcp_server.sh
    
    
    
    

Natural Language Interaction Testing with Claude Desktop

  1. File listing

    • Prompt:
      โ€œPlease list all available data files from the file_analyzer MCP server.โ€

    • Behavior:
      Claude called the list_data_files tool and returned the same set of files as the Python client:

      client_generated.csv, generated_test.csv, sample.csv, sample.parquet.

  2. CSV summarization

    • Prompt:
      โ€œSummarize the structure of sample.csv (row count, column count, column names, and data types).โ€

    • Behavior:
      Claude invoked summarize_csv_file with {"filename": "sample.csv"} and replied that the file has 5 rows and 4 columns (id, name, email, signup_date) with dtypes matching the pandas output: id โ†’ int64, the others โ†’ object.
      This matches exactly what I see when running python client.py.

  3. Data analysis (describe / head / info)

    • Prompts (asked in separate turns):

      โ€œRun a describe analysis on sample.csv.โ€
      โ€œShow me the first 5 rows of sample.csv.โ€
      โ€œGive me the pandas info summary for sample.csv.โ€

    • Behavior:
      Claude mapped these to analyze_csv_data with operation="describe", "head", and "info" respectively.
      The numeric summary for id (count 5, mean 3, std โ‰ˆ 1.58, min 1, max 5) and the printed head/info are the same as the outputs from the interactive Python client.

  4. Data creation

    • Prompt:
      โ€œCreate a new sample CSV called new_sample.csv with 5 rows of data.โ€

    • Behavior:
      Claude used create_sample_data with {"filename": "new_sample.csv", "rows": 5} and confirmed that the file was created under the data/ directory. The path and row count match what I see on disk and in the command-line client.

  5. Error handling / edge cases

    • Prompts:

      โ€œSummarize a CSV file named missing.csv.โ€
      โ€œAnalyze sample.csv without specifying the operation.โ€

    • Behavior:
      For the missing file, Claude surfaced the serverโ€™s error message:

      Error: CSV file 'missing.csv' does not exist in data directory.

      For the incomplete analyze command, it reported the usage hint
      (analyze <filename> <operation>) and listed the supported operations (describe, head, info, columns).
      This matches the edge-case behavior tested in test.py.

3. Performance Analysis and Comparison

3.1 Response time: direct client vs. Claude Desktop

For simple operations on the small sample dataset (5 rows, 4 columns), the direct Python client (python client.py / python test.py) returns almost instantly โ€“ typically within a fraction of a second for:

  • list_data_files
  • summarize_csv_file("sample.csv")
  • analyze_csv_data("sample.csv", "head" | "info" | "describe")
  • create_sample_data(..., rows=5)

When the same operations are triggered via Claude Desktop, there is a small but noticeable overhead. Claude has to:

  1. parse the userโ€™s natural-language request,
  2. decide which MCP tool to call and with which arguments,
  3. send the request over the MCP stdio connection, and
  4. render the response back into a chat-style answer.

In practice this adds a few hundred milliseconds to around a second, depending on the complexity of the prompt and Claudeโ€™s own model latency. Because all file I/O and pandas work happen locally, the extra delay is dominated by the LLM and message-passing overhead, not by the MCP server itself.

3.2 Natural language vs. programmatic tool invocation

Programmatic calls (via client.py or the automated tests in test.py) are:

  • Deterministic โ€“ arguments are explicit (filename, operation),
  • Strictly validated โ€“ the client prints usage errors such as
    Usage: analyze <filename> <operation> when the user forgets an argument,
  • Easy to automate โ€“ suitable for CI or regression tests.

Natural-language use through Claude Desktop is more flexible but also a bit less predictable:

  • It is much easier for a non-technical user to type
    โ€œSummarize the structure of sample.csvโ€
    than to remember the exact CLI syntax.
  • However, ambiguous prompts can lead to small misunderstandings. For example, a request like โ€œsummarize fileโ€ does not specify the extension, and the server correctly responds with an error asking for โ€œ.csv or .parquetโ€.
  • For well-phrased prompts, Claude reliably maps questions to the right tools (list_data_files, summarize_csv_file, analyze_csv_data, etc.), and the outputs line up with what the Python client shows.

3.3 User experience and practical applications

From a user-experience perspective:

  • The Python client is ideal for developers: it exposes exact tool names, arguments, and raw outputs (JSON or plain tables).
  • The Claude integration is better for โ€œconversationalโ€ analysis:
    • A user can ask a high-level question (e.g., โ€œWhat does this CSV look like? Any obvious patterns?โ€), and Claude will combine MCP tool results with its own narrative explanation.
    • Multi-step interactions like โ€œlist files โ†’ pick a file โ†’ show head โ†’ describe columnsโ€ feel natural inside a chat thread.

In a real-world workflow, a typical pattern would be:

  1. Use Claude + MCP for initial exploration and quick questions.
  2. Switch to direct Python or notebooks when building more complex pipelines or when results need to be scripted and version-controlled.

3.4 Limitations and potential improvements

A few limitations of the current system and ideas for improvement:

  • Dataset size: the tools assume relatively small CSV/Parquet files that fit in memory. For larger datasets, the server would need chunked reading, sampling, or out-of-core processing.
  • Limited operations: current tools focus on listing files, summarization, and basic pandas analysis. In the future, it would be useful to add:
    • filtering and simple query tools,
    • group-by/aggregation helpers,
    • basic visualization (e.g., histograms or scatter plots exported as files).
  • Error reporting: error messages are clear but minimal. More structured error codes and suggestions (e.g., โ€œdid you mean sample.csv?โ€) could make the Claude interaction smoother.
  • Claude integration robustness: on some runs, Claude Desktop logs show unrelated CSP/UI warnings. Although the MCP server and Python clients work correctly, the integration could be hardened by:
    • adding more logging around MCP startup and shutdown,
    • providing clearer instructions for regenerating the config and troubleshooting local desktop issues.

๐ŸŽ‰ Congratulations!

You now have a fully functional MCP server that can:

โœ… Analyze CSV and Parquet files
โœ… Respond to natural language queries through Claude
โœ… Create and manipulate data files
โœ… Provide detailed statistical analysis
โœ… Work entirely offline (no API keys required!)

Happy data analyzing! ๐Ÿ“Š๐Ÿค–

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured