engineer-your-data
Enables data engineers and BI professionals to perform data pipeline development, quality assurance, visualization, and API integration locally with AI assistance.
README
Engineer Your Data
A Model Context Protocol (MCP) server designed specifically for data engineers and business intelligence professionals. Transform your data pipelines and BI workflows with AI-assisted data engineering capabilities that run locally without internet dependency.
Why Engineer Your Data?
Built from the ground up for data engineering teams and BI analysts who need:
- Pipeline Development - Build and test ETL/ELT transformations
- Data Quality Assurance - Profile and validate data sources
- Business Intelligence - Create analytics models and dashboard visualizations
- Local Control - Keep sensitive data on-premises with no cloud dependencies
š Quick Start
New to Engineer Your Data? Start with these 5 essential operations:
- Check Data Quality:
"Generate a data quality report for my sales.csv file" - Find Issues:
"Check for null values in the customer_data.csv" - Transform Data:
"Filter the orders.csv for rows where status is 'completed'" - Visualize:
"Create a bar chart showing sales by region from revenue.csv" - Summarize:
"Give me a statistical summary of the dataset"
These cover 80% of daily data engineering tasks. Explore the full capabilities below!
Core Capabilities
š File Operations:
read_file- Read data files from local filesystemwrite_file- Write processed data to fileslist_files- Browse and discover data filesfile_info- Get metadata about data files
š Data Validation & Quality:
validate_schema- Validate data against expected schemascheck_nulls- Analyze null values and missing data patternsdata_quality_report- Comprehensive data quality assessmentdetect_duplicates- Find duplicate records with configurable matching
š Data Transformation:
filter_data- Filter datasets based on conditionsaggregate_data- Group and aggregate data with statistical functionsjoin_data- Join multiple datasets with flexible join typespivot_data- Reshape data from long to wide formatclean_data- Clean and standardize data values
š Visualization & Analysis:
create_chart- Generate bar, pie, line, scatter, histogram, box, and heatmap chartsdata_summary- Create comprehensive dataset summaries with statisticsexport_visualization- Export charts and data to JSON, CSV, HTML, Markdown
š API Integration:
fetch_api_data- Retrieve data from REST APIsmonitor_api- Monitor API endpoints for health and performancebatch_api_calls- Execute multiple API calls efficientlyapi_auth- Manage API authentication
š§ Utilities:
chain_tools- Execute multiple tools in sequenceanalyze_schema- Analyze and understand data schemas
Quick Start for Data Teams
Installation
# Option 1: Install from PyPI (recommended)
pip install engineer-your-data
# Option 2: Install from source
git clone https://github.com/eghuzefa/engineer-your-data-mcp.git
cd engineer-your-data-mcp
pip install -e .
Configure for Your Data Environment
For PyPI Installation:
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"engineer-your-data": {
"command": "python",
"args": ["-m", "src.server"],
"env": {
"WORKSPACE_PATH": "/path/to/your/data/workspace"
}
}
}
}
For Source Installation:
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"engineer-your-data": {
"command": "python",
"args": ["/path/to/engineer-your-data-mcp/src/server.py"],
"env": {
"WORKSPACE_PATH": "/path/to/your/data/workspace"
}
}
}
}
Data Engineering Examples
Data Quality Analysis:
"Check the customer data for null values and duplicates"
"Generate a comprehensive data quality report for the sales dataset"
"Validate this CSV file against our customer schema"
Data Transformation:
"Filter the orders data for customers in the US region"
"Aggregate sales data by month and calculate total revenue"
"Join customer data with order data on customer_id"
"Pivot the sales data to show products as columns"
Visualization & Reporting:
"Create a bar chart showing revenue by department"
"Generate a summary of the dataset with key statistics"
"Export the sales analysis as an HTML report"
API Data Integration:
"Fetch customer data from the CRM API"
"Monitor the data pipeline API for health status"
"Authenticate with the analytics API using OAuth"
Architecture for Data Teams
Claude Desktop ā MCP Protocol ā Engineer Your Data ā Local Python Environment
ā
pandas + numpy + requests + matplotlib
ā
Local Files + APIs + Data Sources
Testing & Quality
- 161 comprehensive tests with 100% pass rate
- Async/await support for high-performance operations
- Error handling with detailed logging and debugging
- Type safety with proper schema validation
# Run all tests
python -m pytest
# Run with coverage
python -m pytest --cov=src
# Run specific tool tests
python -m pytest tests/tools/test_visualization.py
Available Tools (17 Total)
File Operations (4 tools)
| Tool | Description |
|---|---|
read_file |
Read and parse data files (CSV, JSON, etc.) |
write_file |
Write data to files with format options |
list_files |
Directory browsing and file discovery |
file_info |
File metadata and basic statistics |
Data Validation (4 tools)
| Tool | Description |
|---|---|
validate_schema |
Schema validation with custom rules |
check_nulls |
Null value analysis and patterns |
data_quality_report |
Comprehensive quality assessment |
detect_duplicates |
Duplicate detection with flexible matching |
Data Transformation (5 tools)
| Tool | Description |
|---|---|
filter_data |
Advanced filtering with conditions |
aggregate_data |
Grouping and statistical aggregation |
join_data |
Multi-dataset joins (inner, outer, left, right) |
pivot_data |
Data reshaping and pivoting |
clean_data |
Data cleaning and standardization |
Visualization (3 tools)
| Tool | Description |
|---|---|
create_chart |
7 chart types with customization |
data_summary |
Statistical summaries and insights |
export_visualization |
Multi-format export capabilities |
API Integration (4 tools)
| Tool | Description |
|---|---|
fetch_api_data |
REST API data retrieval |
monitor_api |
API health monitoring |
batch_api_calls |
Efficient bulk API operations |
api_auth |
Authentication management |
Data Engineering Best Practices
- Sandboxed Execution - Safe environment for testing transformations
- Local Data Control - Keep sensitive data on your infrastructure
- Comprehensive Testing - All tools thoroughly tested and validated
- Enterprise Security - No external API calls for core functionality
- Performance Optimized - Async operations and efficient data processing
Integration with Your Stack
Works seamlessly alongside:
- dbt - Use for complex transformation logic development
- Airflow/Prefect - Incorporate into existing workflow orchestration
- Jupyter/Notebooks - Prototype and iterate on data transformations
- BI Tools - Generate data and visualizations for Tableau, Power BI, etc.
- APIs - Integrate with REST APIs and microservices
Contributing
Data engineers and BI professionals welcome! Please read our contributing guidelines and submit PRs for new data connectors, transformations, or BI features.
MCP Registry
<!-- MCP name format: mcp-name: io.github.eghuzefa/engineer-your-data -->
This server is available in the official Model Context Protocol Registry.
License
MIT License - see LICENSE file for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.