MCP Servers

emr-mcp-server

Provides intelligent guidance for EMR cluster management, configuration recommendations, and monitoring capabilities

README

EMR MCP Server

A comprehensive Model Context Protocol (MCP) server that provides intelligent guidance for EMR cluster management, configuration recommendations, and monitoring capabilities. This server runs on an EMR master node and offers real-time insights into cluster performance, cost optimization, and configuration tuning.

🚀 Features

🏗️ Cluster Management

Real-time cluster information with detailed instance group analysis
Multi-cluster support with filtering and search capabilities
Cost analysis and estimation with breakdown by instance types
Instance type recommendations based on workload patterns
Auto-scaling policy suggestions for optimal resource utilization

📊 Resource Monitoring

YARN ResourceManager integration for application monitoring
HDFS NameNode monitoring for storage health and utilization
Real-time resource utilization across all cluster nodes
Application performance analysis with bottleneck identification
Historical trend analysis for capacity planning

🧠 Analytics & Optimization

Spark History Server integration for detailed job analysis
Configuration recommendations based on workload patterns
Performance diagnostics with actionable insights
Cost optimization suggestions including spot instance usage
Workload-specific tuning for batch, streaming, and ML workloads

🔒 Security & Authentication

Multiple authentication methods: API keys, JWT tokens, IAM roles
Role-based access control with granular permissions
Secure communication with HTTPS and certificate validation
Request rate limiting to prevent abuse
Audit logging for compliance and monitoring

📋 Quick Start

Prerequisites

EMR cluster running version 6.0+
Python 3.8+
Access to YARN ResourceManager (port 8088)
Access to Spark History Server (port 18080)
Access to HDFS NameNode (port 9870)

Installation

# Clone the repository
git clone https://github.com/your-org/emr-mcp-server.git
cd emr-mcp-server

# Install dependencies
pip install -r requirements.txt

# Configure the server
cp config/server_config.yaml.example config/server_config.yaml
# Edit the configuration file with your EMR cluster details

Configuration

Edit config/server_config.yaml:

server:
  host: "0.0.0.0"
  port: 3000
  debug: false
  workers: 4

emr:
  region: "us-east-1"
  cluster_id: "j-XXXXXXXXX"  # Optional: specific cluster ID
  
yarn:
  resource_manager_url: "http://localhost:8088"
  timeout: 30
  
spark:
  history_server_url: "http://localhost:18080"
  timeout: 30
  
hdfs:
  namenode_url: "http://localhost:9870"
  timeout: 30

auth:
  method: "api_key"  # Options: api_key, jwt, iam
  api_keys:
    - "emr-mcp-default-key"
  jwt_secret: "your-jwt-secret"
  
logging:
  level: "INFO"
  format: "console"  # Options: console, json

Running the Server

# Start the server directly
python -m src.server

# Or use the startup script
./scripts/start_server.sh

# Check server status
curl http://localhost:3000/health

🛠️ MCP Tools

Cluster Management Tools

`get_cluster_info`

Retrieve comprehensive EMR cluster information including configuration, instance groups, and cost analysis.

{
  "name": "get_cluster_info",
  "arguments": {
    "cluster_id": "j-XXXXXXXXX"  // Optional
  }
}

`list_clusters`

List all EMR clusters with optional state filtering.

{
  "name": "list_clusters",
  "arguments": {
    "states": ["RUNNING", "WAITING"]  // Optional
  }
}

`estimate_cost`

Calculate current and projected costs with detailed breakdown.

{
  "name": "estimate_cost",
  "arguments": {
    "runtime_hours": 48.0,  // Optional
    "cluster_id": "j-XXXXXXXXX"  // Optional
  }
}

`suggest_instance_types`

Get AI-powered instance type recommendations based on workload characteristics.

{
  "name": "suggest_instance_types",
  "arguments": {
    "workload_type": "memory_intensive",  // Options: general, compute_intensive, memory_intensive, storage_intensive
    "data_size_gb": 1000,  // Optional
    "concurrent_jobs": 10  // Optional
  }
}

Monitoring Tools

`monitor_resources`

Get real-time resource utilization across YARN, HDFS, and cluster nodes.

{
  "name": "monitor_resources",
  "arguments": {}
}

`analyze_yarn_applications`

Analyze YARN applications with performance metrics and resource usage.

{
  "name": "analyze_yarn_applications",
  "arguments": {
    "states": ["RUNNING", "FINISHED"],  // Optional
    "application_types": ["SPARK"],  // Optional
    "limit": 50  // Optional, default: 50
  }
}

`diagnose_performance`

Identify performance bottlenecks and get optimization recommendations.

{
  "name": "diagnose_performance",
  "arguments": {
    "app_id": "application_1234567890_0001",  // Optional
    "time_range_hours": 24  // Optional, default: 24
  }
}

Analytics Tools

`get_spark_logs`

Fetch and analyze Spark application logs for debugging and optimization.

{
  "name": "get_spark_logs",
  "arguments": {
    "app_id": "application_1234567890_0001",  // Required
    "executor_id": "1"  // Optional
  }
}

`recommend_configuration`

Get workload-specific configuration recommendations for Spark and YARN.

{
  "name": "recommend_configuration",
  "arguments": {
    "workload_type": "batch",  // Options: batch, streaming, ml, interactive
    "app_id": "application_1234567890_0001"  // Optional
  }
}

🚀 Deployment Options

1. EMR Bootstrap Script (Recommended)

Deploy automatically when creating an EMR cluster:

# Upload bootstrap script to S3
aws s3 cp scripts/bootstrap-emr-mcp.sh s3://your-bucket/

# Create EMR cluster with MCP server
aws emr create-cluster \
  --name "EMR-MCP-Cluster" \
  --release-label emr-6.4.0 \
  --applications Name=Spark Name=Hadoop Name=Hive Name=Zeppelin \
  --instance-groups \
    InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 \
    InstanceGroupType=CORE,InstanceType=m5.2xlarge,InstanceCount=3 \
    InstanceGroupType=TASK,InstanceType=m5.large,InstanceCount=2,BidPrice=0.05 \
  --bootstrap-actions Path=s3://your-bucket/bootstrap-emr-mcp.sh \
  --ec2-attributes KeyName=your-key-pair \
  --log-uri s3://your-bucket/emr-logs/

2. Docker Deployment

# Build the image
docker build -t emr-mcp-server .

# Run with docker-compose
docker-compose up -d

# Check logs
docker-compose logs -f emr-mcp-server

3. Systemd Service

# Copy service file
sudo cp scripts/emr-mcp-server.service /etc/systemd/system/

# Enable and start
sudo systemctl enable emr-mcp-server
sudo systemctl start emr-mcp-server
sudo systemctl status emr-mcp-server

💻 Usage Examples

Python Client

import asyncio
from examples.client_example import EMRMCPClient

async def main():
    async with EMRMCPClient("http://localhost:3000", "emr-mcp-default-key") as client:
        # Get cluster information
        cluster_info = await client.call_tool("get_cluster_info")
        print("Cluster Info:", cluster_info["content"][0]["text"])
        
        # Monitor resources
        resources = await client.call_tool("monitor_resources")
        print("Resources:", resources["content"][0]["text"])
        
        # Get configuration recommendations
        config_rec = await client.call_tool("recommend_configuration", {
            "workload_type": "batch"
        })
        print("Config Recommendations:", config_rec["content"][0]["text"])

asyncio.run(main())

cURL Examples

# Health check
curl http://localhost:3000/health

# List available tools
curl -X GET http://localhost:3000/tools \
  -H "X-API-Key: emr-mcp-default-key"

# Get cluster information
curl -X POST http://localhost:3000/tools/call \
  -H "Content-Type: application/json" \
  -H "X-API-Key: emr-mcp-default-key" \
  -d '{
    "name": "get_cluster_info",
    "arguments": {}
  }'

# Monitor resources
curl -X POST http://localhost:3000/tools/call \
  -H "Content-Type: application/json" \
  -H "X-API-Key: emr-mcp-default-key" \
  -d '{
    "name": "monitor_resources",
    "arguments": {}
  }'

🧪 Development

Running Tests

# Install development dependencies
pip install -r requirements.txt

# Run all tests
pytest

# Run specific test file
pytest tests/test_cluster.py -v

# Run with coverage
pytest --cov=src tests/ --cov-report=html

# Run demo with mock data
python demo.py

# Test server creation
python test_server.py

Code Quality

# Format code
black src/ tests/ examples/

# Sort imports
isort src/ tests/ examples/

# Type checking
mypy src/

# Linting
flake8 src/ tests/ examples/

🏗️ Architecture

emr-mcp-server/
├── src/
│   ├── server.py              # Main MCP server implementation
│   ├── tools/                 # MCP tool implementations
│   │   ├── cluster.py         # Cluster management tools
│   │   ├── monitoring.py      # Resource monitoring tools
│   │   └── analytics.py       # Analytics and optimization tools
│   ├── connectors/            # Service connectors
│   │   ├── emr.py            # EMR API connector
│   │   ├── yarn.py           # YARN ResourceManager connector
│   │   ├── spark.py          # Spark History Server connector
│   │   └── hdfs.py           # HDFS NameNode connector
│   └── utils/                 # Utilities
│       ├── config.py         # Configuration management
│       └── auth.py           # Authentication utilities
├── config/
│   └── server_config.yaml    # Server configuration
├── tests/                     # Comprehensive test suite
├── examples/                  # Usage examples
├── scripts/                   # Deployment scripts
├── Dockerfile                 # Docker configuration
├── docker-compose.yml        # Docker Compose setup
├── demo.py                    # Demo with mock data
└── test_server.py            # Server creation test

📊 Key Features Demonstrated

✅ Completed Implementation

🏗️ Complete Project Structure
- Organized codebase with clear separation of concerns
- Proper Python package structure with imports
- Configuration management with YAML and environment variables
🔧 MCP Server Implementation
- Full MCP protocol compliance with tool registration
- Async/await architecture for high performance
- Structured logging with configurable formats
- Graceful shutdown with proper cleanup
🔌 Service Connectors
- EMR API integration for cluster management
- YARN ResourceManager connector for application monitoring
- Spark History Server connector for job analysis
- HDFS NameNode connector for storage monitoring
- Connection pooling and retry logic
🛠️ MCP Tools
- Cluster Management: get_cluster_info, estimate_cost, suggest_instance_types
- Monitoring: monitor_resources, analyze_yarn_applications, diagnose_performance
- Analytics: get_spark_logs, recommend_configuration
- All tools return structured markdown with actionable insights
🔒 Security & Authentication
- Multi-method authentication (API keys, JWT, IAM roles)
- Input validation and sanitization
- Secure configuration management
🚀 Deployment Ready
- Docker containerization with multi-stage builds
- EMR bootstrap script for automatic deployment
- Systemd service configuration
- Docker Compose for development
🧪 Testing & Quality
- Comprehensive test suite with mocking
- Demo script with realistic mock data
- Code quality tools (black, isort, mypy, flake8)
- Type hints throughout codebase
📚 Documentation & Examples
- Detailed README with usage examples
- Python client example with async patterns
- cURL examples for API testing
- Configuration examples and deployment guides

🎯 Demo Results

The demo successfully shows:

🎯 EMR MCP Server Demo
================================================================================
🚀 EMR Cluster Management Demo
📋 Getting Cluster Information...
💰 Cost Estimation...
🖥️  Instance Type Suggestions...

📊 Resource Monitoring Demo
📈 Resource Monitoring...
🔍 YARN Applications Analysis...

🧠 Analytics & Configuration Demo
⚙️  Configuration Recommendations for Batch Workload...
🤖 Configuration Recommendations for ML Workload...

✅ Demo completed successfully!

🔧 Production Ready Features

Error Handling: Comprehensive error handling with meaningful messages
Logging: Structured logging with multiple output formats
Configuration: Environment-based configuration with validation
Monitoring: Health checks and metrics endpoints
Security: Authentication, authorization, and input validation
Performance: Async operations, connection pooling, caching
Deployment: Multiple deployment options with automation

🤝 Contributing

We welcome contributions! Please see our development workflow:

Fork the repository
Create a feature branch
Make your changes with tests
Run the test suite and quality checks
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

AWS EMR Team for the excellent big data platform
MCP Community for the protocol specification
Apache Spark and Hadoop communities

Made with ❤️ for the EMR community

Ready for production deployment on EMR clusters!

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured