engineer-your-data

engineer-your-data

Enables data engineers and BI professionals to perform data pipeline development, quality assurance, visualization, and API integration locally with AI assistance.

Category
Visit Server

README

Engineer Your Data

MCP Registry PyPI

A Model Context Protocol (MCP) server designed specifically for data engineers and business intelligence professionals. Transform your data pipelines and BI workflows with AI-assisted data engineering capabilities that run locally without internet dependency.

Why Engineer Your Data?

Built from the ground up for data engineering teams and BI analysts who need:

  • Pipeline Development - Build and test ETL/ELT transformations
  • Data Quality Assurance - Profile and validate data sources
  • Business Intelligence - Create analytics models and dashboard visualizations
  • Local Control - Keep sensitive data on-premises with no cloud dependencies

šŸš€ Quick Start

New to Engineer Your Data? Start with these 5 essential operations:

  1. Check Data Quality: "Generate a data quality report for my sales.csv file"
  2. Find Issues: "Check for null values in the customer_data.csv"
  3. Transform Data: "Filter the orders.csv for rows where status is 'completed'"
  4. Visualize: "Create a bar chart showing sales by region from revenue.csv"
  5. Summarize: "Give me a statistical summary of the dataset"

These cover 80% of daily data engineering tasks. Explore the full capabilities below!

Core Capabilities

šŸš€ File Operations:

  • read_file - Read data files from local filesystem
  • write_file - Write processed data to files
  • list_files - Browse and discover data files
  • file_info - Get metadata about data files

šŸ“Š Data Validation & Quality:

  • validate_schema - Validate data against expected schemas
  • check_nulls - Analyze null values and missing data patterns
  • data_quality_report - Comprehensive data quality assessment
  • detect_duplicates - Find duplicate records with configurable matching

šŸ”„ Data Transformation:

  • filter_data - Filter datasets based on conditions
  • aggregate_data - Group and aggregate data with statistical functions
  • join_data - Join multiple datasets with flexible join types
  • pivot_data - Reshape data from long to wide format
  • clean_data - Clean and standardize data values

šŸ“ˆ Visualization & Analysis:

  • create_chart - Generate bar, pie, line, scatter, histogram, box, and heatmap charts
  • data_summary - Create comprehensive dataset summaries with statistics
  • export_visualization - Export charts and data to JSON, CSV, HTML, Markdown

🌐 API Integration:

  • fetch_api_data - Retrieve data from REST APIs
  • monitor_api - Monitor API endpoints for health and performance
  • batch_api_calls - Execute multiple API calls efficiently
  • api_auth - Manage API authentication

šŸ”§ Utilities:

  • chain_tools - Execute multiple tools in sequence
  • analyze_schema - Analyze and understand data schemas

Quick Start for Data Teams

Installation

# Option 1: Install from PyPI (recommended)
pip install engineer-your-data

# Option 2: Install from source
git clone https://github.com/eghuzefa/engineer-your-data-mcp.git
cd engineer-your-data-mcp
pip install -e .

Configure for Your Data Environment

For PyPI Installation: Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "engineer-your-data": {
      "command": "python",
      "args": ["-m", "src.server"],
      "env": {
        "WORKSPACE_PATH": "/path/to/your/data/workspace"
      }
    }
  }
}

For Source Installation: Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "engineer-your-data": {
      "command": "python",
      "args": ["/path/to/engineer-your-data-mcp/src/server.py"],
      "env": {
        "WORKSPACE_PATH": "/path/to/your/data/workspace"
      }
    }
  }
}

Data Engineering Examples

Data Quality Analysis:

"Check the customer data for null values and duplicates"
"Generate a comprehensive data quality report for the sales dataset"
"Validate this CSV file against our customer schema"

Data Transformation:

"Filter the orders data for customers in the US region"
"Aggregate sales data by month and calculate total revenue"
"Join customer data with order data on customer_id"
"Pivot the sales data to show products as columns"

Visualization & Reporting:

"Create a bar chart showing revenue by department"
"Generate a summary of the dataset with key statistics"
"Export the sales analysis as an HTML report"

API Data Integration:

"Fetch customer data from the CRM API"
"Monitor the data pipeline API for health status"
"Authenticate with the analytics API using OAuth"

Architecture for Data Teams

Claude Desktop → MCP Protocol → Engineer Your Data → Local Python Environment
                                        ↓
                    pandas + numpy + requests + matplotlib
                                        ↓
                         Local Files + APIs + Data Sources

Testing & Quality

  • 161 comprehensive tests with 100% pass rate
  • Async/await support for high-performance operations
  • Error handling with detailed logging and debugging
  • Type safety with proper schema validation
# Run all tests
python -m pytest

# Run with coverage
python -m pytest --cov=src

# Run specific tool tests
python -m pytest tests/tools/test_visualization.py

Available Tools (17 Total)

File Operations (4 tools)

Tool Description
read_file Read and parse data files (CSV, JSON, etc.)
write_file Write data to files with format options
list_files Directory browsing and file discovery
file_info File metadata and basic statistics

Data Validation (4 tools)

Tool Description
validate_schema Schema validation with custom rules
check_nulls Null value analysis and patterns
data_quality_report Comprehensive quality assessment
detect_duplicates Duplicate detection with flexible matching

Data Transformation (5 tools)

Tool Description
filter_data Advanced filtering with conditions
aggregate_data Grouping and statistical aggregation
join_data Multi-dataset joins (inner, outer, left, right)
pivot_data Data reshaping and pivoting
clean_data Data cleaning and standardization

Visualization (3 tools)

Tool Description
create_chart 7 chart types with customization
data_summary Statistical summaries and insights
export_visualization Multi-format export capabilities

API Integration (4 tools)

Tool Description
fetch_api_data REST API data retrieval
monitor_api API health monitoring
batch_api_calls Efficient bulk API operations
api_auth Authentication management

Data Engineering Best Practices

  • Sandboxed Execution - Safe environment for testing transformations
  • Local Data Control - Keep sensitive data on your infrastructure
  • Comprehensive Testing - All tools thoroughly tested and validated
  • Enterprise Security - No external API calls for core functionality
  • Performance Optimized - Async operations and efficient data processing

Integration with Your Stack

Works seamlessly alongside:

  • dbt - Use for complex transformation logic development
  • Airflow/Prefect - Incorporate into existing workflow orchestration
  • Jupyter/Notebooks - Prototype and iterate on data transformations
  • BI Tools - Generate data and visualizations for Tableau, Power BI, etc.
  • APIs - Integrate with REST APIs and microservices

Contributing

Data engineers and BI professionals welcome! Please read our contributing guidelines and submit PRs for new data connectors, transformations, or BI features.

MCP Registry

<!-- MCP name format: mcp-name: io.github.eghuzefa/engineer-your-data -->

This server is available in the official Model Context Protocol Registry.

License

MIT License - see LICENSE file for details.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured