emr-mcp-server

emr-mcp-server

Provides intelligent guidance for EMR cluster management, configuration recommendations, and monitoring capabilities

Category
Visit Server

README

EMR MCP Server

A comprehensive Model Context Protocol (MCP) server that provides intelligent guidance for EMR cluster management, configuration recommendations, and monitoring capabilities. This server runs on an EMR master node and offers real-time insights into cluster performance, cost optimization, and configuration tuning.

๐Ÿš€ Features

๐Ÿ—๏ธ Cluster Management

  • Real-time cluster information with detailed instance group analysis
  • Multi-cluster support with filtering and search capabilities
  • Cost analysis and estimation with breakdown by instance types
  • Instance type recommendations based on workload patterns
  • Auto-scaling policy suggestions for optimal resource utilization

๐Ÿ“Š Resource Monitoring

  • YARN ResourceManager integration for application monitoring
  • HDFS NameNode monitoring for storage health and utilization
  • Real-time resource utilization across all cluster nodes
  • Application performance analysis with bottleneck identification
  • Historical trend analysis for capacity planning

๐Ÿง  Analytics & Optimization

  • Spark History Server integration for detailed job analysis
  • Configuration recommendations based on workload patterns
  • Performance diagnostics with actionable insights
  • Cost optimization suggestions including spot instance usage
  • Workload-specific tuning for batch, streaming, and ML workloads

๐Ÿ”’ Security & Authentication

  • Multiple authentication methods: API keys, JWT tokens, IAM roles
  • Role-based access control with granular permissions
  • Secure communication with HTTPS and certificate validation
  • Request rate limiting to prevent abuse
  • Audit logging for compliance and monitoring

๐Ÿ“‹ Quick Start

Prerequisites

  • EMR cluster running version 6.0+
  • Python 3.8+
  • Access to YARN ResourceManager (port 8088)
  • Access to Spark History Server (port 18080)
  • Access to HDFS NameNode (port 9870)

Installation

# Clone the repository
git clone https://github.com/your-org/emr-mcp-server.git
cd emr-mcp-server

# Install dependencies
pip install -r requirements.txt

# Configure the server
cp config/server_config.yaml.example config/server_config.yaml
# Edit the configuration file with your EMR cluster details

Configuration

Edit config/server_config.yaml:

server:
  host: "0.0.0.0"
  port: 3000
  debug: false
  workers: 4

emr:
  region: "us-east-1"
  cluster_id: "j-XXXXXXXXX"  # Optional: specific cluster ID
  
yarn:
  resource_manager_url: "http://localhost:8088"
  timeout: 30
  
spark:
  history_server_url: "http://localhost:18080"
  timeout: 30
  
hdfs:
  namenode_url: "http://localhost:9870"
  timeout: 30

auth:
  method: "api_key"  # Options: api_key, jwt, iam
  api_keys:
    - "emr-mcp-default-key"
  jwt_secret: "your-jwt-secret"
  
logging:
  level: "INFO"
  format: "console"  # Options: console, json

Running the Server

# Start the server directly
python -m src.server

# Or use the startup script
./scripts/start_server.sh

# Check server status
curl http://localhost:3000/health

๐Ÿ› ๏ธ MCP Tools

Cluster Management Tools

get_cluster_info

Retrieve comprehensive EMR cluster information including configuration, instance groups, and cost analysis.

{
  "name": "get_cluster_info",
  "arguments": {
    "cluster_id": "j-XXXXXXXXX"  // Optional
  }
}

list_clusters

List all EMR clusters with optional state filtering.

{
  "name": "list_clusters",
  "arguments": {
    "states": ["RUNNING", "WAITING"]  // Optional
  }
}

estimate_cost

Calculate current and projected costs with detailed breakdown.

{
  "name": "estimate_cost",
  "arguments": {
    "runtime_hours": 48.0,  // Optional
    "cluster_id": "j-XXXXXXXXX"  // Optional
  }
}

suggest_instance_types

Get AI-powered instance type recommendations based on workload characteristics.

{
  "name": "suggest_instance_types",
  "arguments": {
    "workload_type": "memory_intensive",  // Options: general, compute_intensive, memory_intensive, storage_intensive
    "data_size_gb": 1000,  // Optional
    "concurrent_jobs": 10  // Optional
  }
}

Monitoring Tools

monitor_resources

Get real-time resource utilization across YARN, HDFS, and cluster nodes.

{
  "name": "monitor_resources",
  "arguments": {}
}

analyze_yarn_applications

Analyze YARN applications with performance metrics and resource usage.

{
  "name": "analyze_yarn_applications",
  "arguments": {
    "states": ["RUNNING", "FINISHED"],  // Optional
    "application_types": ["SPARK"],  // Optional
    "limit": 50  // Optional, default: 50
  }
}

diagnose_performance

Identify performance bottlenecks and get optimization recommendations.

{
  "name": "diagnose_performance",
  "arguments": {
    "app_id": "application_1234567890_0001",  // Optional
    "time_range_hours": 24  // Optional, default: 24
  }
}

Analytics Tools

get_spark_logs

Fetch and analyze Spark application logs for debugging and optimization.

{
  "name": "get_spark_logs",
  "arguments": {
    "app_id": "application_1234567890_0001",  // Required
    "executor_id": "1"  // Optional
  }
}

recommend_configuration

Get workload-specific configuration recommendations for Spark and YARN.

{
  "name": "recommend_configuration",
  "arguments": {
    "workload_type": "batch",  // Options: batch, streaming, ml, interactive
    "app_id": "application_1234567890_0001"  // Optional
  }
}

๐Ÿš€ Deployment Options

1. EMR Bootstrap Script (Recommended)

Deploy automatically when creating an EMR cluster:

# Upload bootstrap script to S3
aws s3 cp scripts/bootstrap-emr-mcp.sh s3://your-bucket/

# Create EMR cluster with MCP server
aws emr create-cluster \
  --name "EMR-MCP-Cluster" \
  --release-label emr-6.4.0 \
  --applications Name=Spark Name=Hadoop Name=Hive Name=Zeppelin \
  --instance-groups \
    InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 \
    InstanceGroupType=CORE,InstanceType=m5.2xlarge,InstanceCount=3 \
    InstanceGroupType=TASK,InstanceType=m5.large,InstanceCount=2,BidPrice=0.05 \
  --bootstrap-actions Path=s3://your-bucket/bootstrap-emr-mcp.sh \
  --ec2-attributes KeyName=your-key-pair \
  --log-uri s3://your-bucket/emr-logs/

2. Docker Deployment

# Build the image
docker build -t emr-mcp-server .

# Run with docker-compose
docker-compose up -d

# Check logs
docker-compose logs -f emr-mcp-server

3. Systemd Service

# Copy service file
sudo cp scripts/emr-mcp-server.service /etc/systemd/system/

# Enable and start
sudo systemctl enable emr-mcp-server
sudo systemctl start emr-mcp-server
sudo systemctl status emr-mcp-server

๐Ÿ’ป Usage Examples

Python Client

import asyncio
from examples.client_example import EMRMCPClient

async def main():
    async with EMRMCPClient("http://localhost:3000", "emr-mcp-default-key") as client:
        # Get cluster information
        cluster_info = await client.call_tool("get_cluster_info")
        print("Cluster Info:", cluster_info["content"][0]["text"])
        
        # Monitor resources
        resources = await client.call_tool("monitor_resources")
        print("Resources:", resources["content"][0]["text"])
        
        # Get configuration recommendations
        config_rec = await client.call_tool("recommend_configuration", {
            "workload_type": "batch"
        })
        print("Config Recommendations:", config_rec["content"][0]["text"])

asyncio.run(main())

cURL Examples

# Health check
curl http://localhost:3000/health

# List available tools
curl -X GET http://localhost:3000/tools \
  -H "X-API-Key: emr-mcp-default-key"

# Get cluster information
curl -X POST http://localhost:3000/tools/call \
  -H "Content-Type: application/json" \
  -H "X-API-Key: emr-mcp-default-key" \
  -d '{
    "name": "get_cluster_info",
    "arguments": {}
  }'

# Monitor resources
curl -X POST http://localhost:3000/tools/call \
  -H "Content-Type: application/json" \
  -H "X-API-Key: emr-mcp-default-key" \
  -d '{
    "name": "monitor_resources",
    "arguments": {}
  }'

๐Ÿงช Development

Running Tests

# Install development dependencies
pip install -r requirements.txt

# Run all tests
pytest

# Run specific test file
pytest tests/test_cluster.py -v

# Run with coverage
pytest --cov=src tests/ --cov-report=html

# Run demo with mock data
python demo.py

# Test server creation
python test_server.py

Code Quality

# Format code
black src/ tests/ examples/

# Sort imports
isort src/ tests/ examples/

# Type checking
mypy src/

# Linting
flake8 src/ tests/ examples/

๐Ÿ—๏ธ Architecture

emr-mcp-server/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ server.py              # Main MCP server implementation
โ”‚   โ”œโ”€โ”€ tools/                 # MCP tool implementations
โ”‚   โ”‚   โ”œโ”€โ”€ cluster.py         # Cluster management tools
โ”‚   โ”‚   โ”œโ”€โ”€ monitoring.py      # Resource monitoring tools
โ”‚   โ”‚   โ””โ”€โ”€ analytics.py       # Analytics and optimization tools
โ”‚   โ”œโ”€โ”€ connectors/            # Service connectors
โ”‚   โ”‚   โ”œโ”€โ”€ emr.py            # EMR API connector
โ”‚   โ”‚   โ”œโ”€โ”€ yarn.py           # YARN ResourceManager connector
โ”‚   โ”‚   โ”œโ”€โ”€ spark.py          # Spark History Server connector
โ”‚   โ”‚   โ””โ”€โ”€ hdfs.py           # HDFS NameNode connector
โ”‚   โ””โ”€โ”€ utils/                 # Utilities
โ”‚       โ”œโ”€โ”€ config.py         # Configuration management
โ”‚       โ””โ”€โ”€ auth.py           # Authentication utilities
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ server_config.yaml    # Server configuration
โ”œโ”€โ”€ tests/                     # Comprehensive test suite
โ”œโ”€โ”€ examples/                  # Usage examples
โ”œโ”€โ”€ scripts/                   # Deployment scripts
โ”œโ”€โ”€ Dockerfile                 # Docker configuration
โ”œโ”€โ”€ docker-compose.yml        # Docker Compose setup
โ”œโ”€โ”€ demo.py                    # Demo with mock data
โ””โ”€โ”€ test_server.py            # Server creation test

๐Ÿ“Š Key Features Demonstrated

โœ… Completed Implementation

  1. ๐Ÿ—๏ธ Complete Project Structure

    • Organized codebase with clear separation of concerns
    • Proper Python package structure with imports
    • Configuration management with YAML and environment variables
  2. ๐Ÿ”ง MCP Server Implementation

    • Full MCP protocol compliance with tool registration
    • Async/await architecture for high performance
    • Structured logging with configurable formats
    • Graceful shutdown with proper cleanup
  3. ๐Ÿ”Œ Service Connectors

    • EMR API integration for cluster management
    • YARN ResourceManager connector for application monitoring
    • Spark History Server connector for job analysis
    • HDFS NameNode connector for storage monitoring
    • Connection pooling and retry logic
  4. ๐Ÿ› ๏ธ MCP Tools

    • Cluster Management: get_cluster_info, estimate_cost, suggest_instance_types
    • Monitoring: monitor_resources, analyze_yarn_applications, diagnose_performance
    • Analytics: get_spark_logs, recommend_configuration
    • All tools return structured markdown with actionable insights
  5. ๐Ÿ”’ Security & Authentication

    • Multi-method authentication (API keys, JWT, IAM roles)
    • Input validation and sanitization
    • Secure configuration management
  6. ๐Ÿš€ Deployment Ready

    • Docker containerization with multi-stage builds
    • EMR bootstrap script for automatic deployment
    • Systemd service configuration
    • Docker Compose for development
  7. ๐Ÿงช Testing & Quality

    • Comprehensive test suite with mocking
    • Demo script with realistic mock data
    • Code quality tools (black, isort, mypy, flake8)
    • Type hints throughout codebase
  8. ๐Ÿ“š Documentation & Examples

    • Detailed README with usage examples
    • Python client example with async patterns
    • cURL examples for API testing
    • Configuration examples and deployment guides

๐ŸŽฏ Demo Results

The demo successfully shows:

๐ŸŽฏ EMR MCP Server Demo
================================================================================
๐Ÿš€ EMR Cluster Management Demo
๐Ÿ“‹ Getting Cluster Information...
๐Ÿ’ฐ Cost Estimation...
๐Ÿ–ฅ๏ธ  Instance Type Suggestions...

๐Ÿ“Š Resource Monitoring Demo
๐Ÿ“ˆ Resource Monitoring...
๐Ÿ” YARN Applications Analysis...

๐Ÿง  Analytics & Configuration Demo
โš™๏ธ  Configuration Recommendations for Batch Workload...
๐Ÿค– Configuration Recommendations for ML Workload...

โœ… Demo completed successfully!

๐Ÿ”ง Production Ready Features

  • Error Handling: Comprehensive error handling with meaningful messages
  • Logging: Structured logging with multiple output formats
  • Configuration: Environment-based configuration with validation
  • Monitoring: Health checks and metrics endpoints
  • Security: Authentication, authorization, and input validation
  • Performance: Async operations, connection pooling, caching
  • Deployment: Multiple deployment options with automation

๐Ÿค Contributing

We welcome contributions! Please see our development workflow:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the test suite and quality checks
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • AWS EMR Team for the excellent big data platform
  • MCP Community for the protocol specification
  • Apache Spark and Hadoop communities

Made with โค๏ธ for the EMR community

Ready for production deployment on EMR clusters!

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured