Spark EventLog MCP Server

Spark EventLog MCP Server

Enables comprehensive analysis of Apache Spark event logs from S3, HTTP, or local sources, providing performance metrics, resource monitoring, shuffle analysis, and automated optimization recommendations with interactive HTML reports.

Category
Visit Server

README

Spark EventLog MCP Server

δΈ­ζ–‡η‰ˆζœ¬ | English

A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.

Features

  • 🌐 FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP
  • πŸ“Š Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis
  • πŸ“ˆ Visual Reports: Auto-generated interactive HTML reports with direct browser access
  • ☁️ Multiple Data Sources: Support for S3, HTTP URLs, and local files
  • πŸ’‘ Intelligent Optimization: Automated optimization recommendations based on analysis results

Quick Start

MCP Client Integration

uvx Mode (Recommended - Direct from GitHub)

{
  "mcpServers": {
    "spark-eventlog": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yhyyz/spark-eventlog-mcp",
        "spark-eventlog-mcp"
      ],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

stdio Mode (Local Development)

{
  "mcpServers": {
    "spark-eventlog": {
      "command": "uv run python",
      "args": ["/path/to/spark-eventlog-mcp/start.py"],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

HTTP Mode

1. Start HTTP Server:

export MCP_TRANSPORT=streamable-http
export MCP_HOST=localhost
export MCP_PORT=7799

uv run python start.py

2. Configure Remote MCP:

{
  "mcpServers": {
    "spark-eventlog": {
      "url": "http://localhost:7799/mcp",
      "type": "http"
    }
  }
}

3. Access Services:

  • API Documentation: http://localhost:7799/docs
  • Health Check: http://localhost:7799/health
  • Reports List: http://localhost:7799/api/reports
  • MCP Endpoint: http://localhost:7799/mcp

Analysis Examples

emr-serverless-small-job

emr-eks-big-job

emr-eks-big-job-sub-01

emr-eks-big-job-sub-02

Project Structure

spark-eventlog-mcp/
β”œβ”€β”€ src/spark_eventlog_mcp/
β”‚   β”œβ”€β”€ server.py              # FastAPI + MCP integrated server
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   └── mature_data_loader.py    # Data loader (S3/URL/Local)
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ mature_analyzer.py       # Event log analyzer
β”‚   β”‚   └── mature_report_generator.py  # HTML report generator
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ schemas.py        # Pydantic data models
β”‚   β”‚   └── mature_models.py  # Analysis result models
β”‚   └── utils/
β”‚       └── helpers.py         # Utility functions and logging config
β”œβ”€β”€ report_data/               # Generated reports storage
β”œβ”€β”€ start.py                   # Launch script
β”œβ”€β”€ README.md                 # This file (English)
└── README_zh.md              # Chinese version

MCP Tools

Tool Name Description
parse_eventlog Parse event logs (S3/URL/Local)
analyze_performance Execute performance analysis
generate_report Generate visual reports
get_optimization_suggestions Get optimization recommendations
get_analysis_status Query current analysis status
clear_session Clear session cache

RESTful API Endpoints

Basic Endpoints

  • GET / - Service information
  • GET /health - Health check
  • GET /docs - API documentation (Swagger UI)

Report Management

  • GET /api/reports - List all reports
  • GET /api/reports/{filename} - View HTML report
  • GET /reports/{filename} - Direct access to report files
  • DELETE /api/reports/{filename} - Delete report

MCP Tool Calls

  • POST /mcp - MCP protocol endpoint

Configuration

Environment Variables

# Server Configuration
MCP_TRANSPORT=http          # stdio or streamable-http
MCP_HOST=0.0.0.0           # HTTP mode listen address
MCP_PORT=7799              # HTTP mode port
LOG_LEVEL=INFO             # Log level

# AWS S3 Configuration (Optional)
# Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1

# Cache Configuration
CACHE_ENABLED=true
CACHE_TTL=300

# Default Data Source
DEFAULT_SOURCE_TYPE=s3  # s3, url, or local

Log Format

Logs contain detailed debugging information:

2025-12-05 10:30:45 - INFO     - [server.py:243:generate_report] - spark-eventlog-mcp - Generating html report

Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message

Data Source Support

S3

{
    "source_type": "s3",
    "path": "s3://bucket-name/path/to/eventlogs/"
}

HTTP URL

{
    "source_type": "url",
    "path": "https://example.com/eventlog.zip"
}

Local File

{
    "source_type": "local",
    "path": "/path/to/local/eventlog.zip"
}

Report Features

Generated HTML reports include:

  • πŸ“Š Application Overview (task counts, success rate, duration)
  • πŸ’» Executor Resource Usage Distribution
  • πŸ”„ Shuffle Performance Analysis
  • βš–οΈ Data Skew Detection
  • πŸ’‘ Intelligent Optimization Recommendations
  • πŸ“ˆ Interactive Visualizations

Troubleshooting

Port Already in Use

# Change port
MCP_PORT=9090 python start.py

Missing Dependencies

# Reinstall dependencies
uv pip install -e .

AWS Credentials Issues

# Check AWS configuration
aws configure list

# Or configure in .env
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx

Debug Logging

# Enable DEBUG logs
LOG_LEVEL=DEBUG uv run python start.py

Tech Stack

  • FastMCP 2.0: MCP protocol support
  • FastAPI: RESTful API framework
  • Pydantic: Data validation and serialization
  • Plotly: Interactive charts
  • boto3: AWS S3 integration
  • aiofiles: Async file operations

Development

# Clone repository
git clone <repository-url>
cd spark-eventlog-mcp

# Install development dependencies
uv pip install -e .

# MCP Inspector - stdio mode
MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py

# MCP Inspector - HTTP mode
MCP_TRANSPORT="streamable-http" uv run python start.py
npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list

Support

  • Documentation: Check /docs API documentation
  • Issues: Submit GitHub Issues
  • Reference: FastMCP Documentation

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured