Spark EventLog MCP Server
Enables comprehensive analysis of Apache Spark event logs from S3, HTTP, or local sources, providing performance metrics, resource monitoring, shuffle analysis, and automated optimization recommendations with interactive HTML reports.
README
Spark EventLog MCP Server
δΈζηζ¬ | English
A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.
Features
- π FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP
- π Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis
- π Visual Reports: Auto-generated interactive HTML reports with direct browser access
- βοΈ Multiple Data Sources: Support for S3, HTTP URLs, and local files
- π‘ Intelligent Optimization: Automated optimization recommendations based on analysis results
Quick Start
MCP Client Integration
uvx Mode (Recommended - Direct from GitHub)
{
"mcpServers": {
"spark-eventlog": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"git+https://github.com/yhyyz/spark-eventlog-mcp",
"spark-eventlog-mcp"
],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}
stdio Mode (Local Development)
{
"mcpServers": {
"spark-eventlog": {
"command": "uv run python",
"args": ["/path/to/spark-eventlog-mcp/start.py"],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}
HTTP Mode
1. Start HTTP Server:
export MCP_TRANSPORT=streamable-http
export MCP_HOST=localhost
export MCP_PORT=7799
uv run python start.py
2. Configure Remote MCP:
{
"mcpServers": {
"spark-eventlog": {
"url": "http://localhost:7799/mcp",
"type": "http"
}
}
}
3. Access Services:
- API Documentation: http://localhost:7799/docs
- Health Check: http://localhost:7799/health
- Reports List: http://localhost:7799/api/reports
- MCP Endpoint: http://localhost:7799/mcp
Analysis Examples




Project Structure
spark-eventlog-mcp/
βββ src/spark_eventlog_mcp/
β βββ server.py # FastAPI + MCP integrated server
β βββ core/
β β βββ mature_data_loader.py # Data loader (S3/URL/Local)
β βββ tools/
β β βββ mature_analyzer.py # Event log analyzer
β β βββ mature_report_generator.py # HTML report generator
β βββ models/
β β βββ schemas.py # Pydantic data models
β β βββ mature_models.py # Analysis result models
β βββ utils/
β βββ helpers.py # Utility functions and logging config
βββ report_data/ # Generated reports storage
βββ start.py # Launch script
βββ README.md # This file (English)
βββ README_zh.md # Chinese version
MCP Tools
| Tool Name | Description |
|---|---|
parse_eventlog |
Parse event logs (S3/URL/Local) |
analyze_performance |
Execute performance analysis |
generate_report |
Generate visual reports |
get_optimization_suggestions |
Get optimization recommendations |
get_analysis_status |
Query current analysis status |
clear_session |
Clear session cache |
RESTful API Endpoints
Basic Endpoints
GET /- Service informationGET /health- Health checkGET /docs- API documentation (Swagger UI)
Report Management
GET /api/reports- List all reportsGET /api/reports/{filename}- View HTML reportGET /reports/{filename}- Direct access to report filesDELETE /api/reports/{filename}- Delete report
MCP Tool Calls
POST /mcp- MCP protocol endpoint
Configuration
Environment Variables
# Server Configuration
MCP_TRANSPORT=http # stdio or streamable-http
MCP_HOST=0.0.0.0 # HTTP mode listen address
MCP_PORT=7799 # HTTP mode port
LOG_LEVEL=INFO # Log level
# AWS S3 Configuration (Optional)
# Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1
# Cache Configuration
CACHE_ENABLED=true
CACHE_TTL=300
# Default Data Source
DEFAULT_SOURCE_TYPE=s3 # s3, url, or local
Log Format
Logs contain detailed debugging information:
2025-12-05 10:30:45 - INFO - [server.py:243:generate_report] - spark-eventlog-mcp - Generating html report
Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message
Data Source Support
S3
{
"source_type": "s3",
"path": "s3://bucket-name/path/to/eventlogs/"
}
HTTP URL
{
"source_type": "url",
"path": "https://example.com/eventlog.zip"
}
Local File
{
"source_type": "local",
"path": "/path/to/local/eventlog.zip"
}
Report Features
Generated HTML reports include:
- π Application Overview (task counts, success rate, duration)
- π» Executor Resource Usage Distribution
- π Shuffle Performance Analysis
- βοΈ Data Skew Detection
- π‘ Intelligent Optimization Recommendations
- π Interactive Visualizations
Troubleshooting
Port Already in Use
# Change port
MCP_PORT=9090 python start.py
Missing Dependencies
# Reinstall dependencies
uv pip install -e .
AWS Credentials Issues
# Check AWS configuration
aws configure list
# Or configure in .env
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
Debug Logging
# Enable DEBUG logs
LOG_LEVEL=DEBUG uv run python start.py
Tech Stack
- FastMCP 2.0: MCP protocol support
- FastAPI: RESTful API framework
- Pydantic: Data validation and serialization
- Plotly: Interactive charts
- boto3: AWS S3 integration
- aiofiles: Async file operations
Development
# Clone repository
git clone <repository-url>
cd spark-eventlog-mcp
# Install development dependencies
uv pip install -e .
# MCP Inspector - stdio mode
MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py
# MCP Inspector - HTTP mode
MCP_TRANSPORT="streamable-http" uv run python start.py
npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list
Support
- Documentation: Check
/docsAPI documentation - Issues: Submit GitHub Issues
- Reference: FastMCP Documentation
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.