CloudWatch Log Analyst MCP
Enables LLMs to autonomously query AWS CloudWatch Logs and perform structured root-cause analysis via natural language prompts, using MCP tools for log group listing and Insights queries.
README
CloudWatch Log Analyst — Agentic LLM + MCP + AWS
An end-to-end agentic system where an LLM autonomously authenticates with AWS, queries CloudWatch Logs, and produces structured root-cause analysis — all triggered by a single natural language prompt in Cursor.
What this project demonstrates
- MCP (Model Context Protocol) — building a custom tool server that exposes AWS APIs to an LLM
- Agentic tool use — the LLM decides which tools to call, writes its own CloudWatch Insights queries, and reasons over real log data without human guidance
- AWS IAM + boto3 — least-privilege IAM setup, programmatic authentication, and CloudWatch Logs Insights queries via the AWS SDK
- Practical MLOps intuition — log analysis and error diagnosis are core MLE responsibilities; this automates the investigative loop
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Developer machine │
│ │
│ Cursor IDE ──── tool calls ────► MCP Server (Python) │
│ │ │ │
│ Claude LLM ◄─── log results ──── boto3 / AWS SDK │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────▼───────────────┐
│ AWS │
│ │
│ IAM user │
│ CloudWatch Logs │
│ Lambda (log generator) │
└──────────────────────────────┘
Flow: You type a prompt in Cursor → Claude sees the available MCP tools → it calls list_log_groups to orient itself → constructs and calls query_logs with a CloudWatch Insights query it writes itself → your MCP server authenticates with AWS and fetches real log data → Claude reads the results and returns a structured diagnosis.
Demo
Prompt
Check my CloudWatch logs for the last 5 hours. List all the log groups
you can see, then query the Lambda log group for any errors and tell
me what went wrong and why.
Claude's response (condensed)
Log groups found:
- /aws/lambda/mcp-log-generator
Errors in the last 5 hours — two patterns:
1. NullPointerException (field=customer_email)
Scenarios: fetch_inventory, processing_order, user_login
Cause: customer_email is null or missing in some user records.
Fix: Validate at the boundary; use null-safe access on required fields;
backfill missing emails upstream.
2. TimeoutException (latency_ms=5032, threshold_ms=5000)
Scenarios: payment_gateway (6×), processing_order (1×), fetch_inventory (1×)
Cause: downstream dependency consistently 32ms over the 5s cap.
Fix: Tune client timeouts above realistic p99 latency; add circuit breaker;
investigate gateway cold starts and DB contention during traffic spikes.
Error breakdown
| Error type | Count | Share |
|---|---|---|
TimeoutException |
8 | 72.7% |
NullPointerException |
3 | 27.3% |
User impact
Claude identified 6 distinct affected user_ids with timestamps, extracted directly from raw CloudWatch log events.
Setup
Prerequisites
- AWS account (free tier is sufficient)
- Python 3.10+
- Cursor IDE
1. Clone the repo
git clone https://github.com/eugeneoh04/cloudwatch-mcp.git
cd cloudwatch-mcp
2. Install dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
3. AWS — create an IAM user
In the AWS console, create a user with programmatic access and attach this inline policy (least-privilege, read-only):
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:FilterLogEvents",
"logs:StartQuery",
"logs:GetQueryResults",
"logs:GetLogEvents"
],
"Resource": "*"
}]
}
Save the generated AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
4. AWS — deploy the Lambda log generator
- In the AWS console, create a Lambda function (Python 3.12)
- Paste the contents of
lambda_function.pyinto the inline editor - Click Deploy, then click Test 15–20 times to populate CloudWatch with logs
5. Configure environment
cp .env.example .env
Fill in your credentials in .env:
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=...
6. Test your AWS connection
python test_connection.py
# Expected output: /aws/lambda/mcp-log-generator
7. Wire into Cursor
Create ~/.cursor/mcp.json:
{
"mcpServers": {
"cloudwatch": {
"command": "/absolute/path/to/venv/bin/python",
"args": ["/absolute/path/to/cloudwatch_mcp_server.py"],
"env": {
"AWS_ACCESS_KEY_ID": "AKIA...",
"AWS_SECRET_ACCESS_KEY": "...",
"AWS_DEFAULT_REGION": "..."
}
}
}
}
Use absolute paths — Cursor does not expand
~.
Open Cursor → Settings → MCP. A green dot next to cloudwatch means the server is connected.
Tools exposed via MCP
| Tool | Description | Arguments |
|---|---|---|
list_log_groups |
Lists all CloudWatch log groups in the account | none |
query_logs |
Runs a CloudWatch Logs Insights query | log_group (required), query (required), hours_back (optional, default 1) |
Example prompts
What MCP tools do you have available?
Check my CloudWatch logs for the last 2 hours. List all log groups,
then query the Lambda log group for errors and diagnose each one.
Group the errors by type, show how frequently each one occurs,
and suggest a fix for each.
Find all log entries where the payment_gateway scenario failed.
What user_ids were affected and when?
What percentage of invocations succeeded vs failed in the last hour?
Is there any pattern to when errors occur?
Project structure
cloudwatch-mcp/
├── cloudwatch_mcp_server.py # MCP server — exposes CloudWatch tools to the LLM
├── lambda_function.py # Lambda function that generates structured logs
├── test_connection.py # Quick IAM + boto3 connectivity check
├── requirements.txt
├── .env.example # Credentials template
└── .gitignore
Key design decisions
Why MCP over a direct API call? MCP gives the LLM the ability to decide when and how to query. It writes the CloudWatch Insights query itself based on your natural language prompt. A direct API call is static; MCP is agentic.
Why least-privilege IAM? The MCP server only needs read access to logs. This mirrors production best practices — no write permissions, no admin access.
Why CloudWatch Logs Insights over FilterLogEvents?
Insights supports SQL-like aggregations (stats count() by reason) that let the LLM produce quantitative breakdowns and trend analysis, not just raw log dumps.
Technologies
Python · AWS Lambda · AWS CloudWatch Logs · AWS IAM · boto3 · MCP (Model Context Protocol) · Claude · Cursor
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.