AWS Deep Learning Containers MCP Server
Provides tools for discovering, building, deploying, and troubleshooting AWS Deep Learning Containers (DLC) images. Supports multiple frameworks and instance recommendations.
README
AWS Deep Learning Containers MCP Server
A Model Context Protocol (MCP) server for AWS Deep Learning Containers (DLC) that provides tools for discovering, building, deploying, and troubleshooting DLC images.
Features
- Dynamic DLC Image Discovery: Automatically fetches latest images from AWS DLC GitHub - always up-to-date
- Image Building: Create custom Dockerfiles and build images based on DLC base images
- Multi-Platform Deployment: Deploy to SageMaker, EC2, ECS, and EKS
- Instance Recommendations: Get GPU instance recommendations based on model size and budget
- Upgrade Support: Analyze upgrade paths and generate migration Dockerfiles
- Troubleshooting: Diagnose common DLC issues with actionable solutions
- Best Practices: Security, cost optimization, and deployment guidance
- No AWS Credentials Required: Discovery tools work without AWS credentials
Quick Start
Option 1: Run with uv (Recommended)
# Clone the repo
git clone https://github.com/aws-samples/sample-dlc-mcp-server.git
cd sample-dlc-mcp-server
# Run directly with uv
uv run dlc-mcp-server
Option 2: Run with Docker
# Build the image
docker build -t dlc-mcp-server .
# Run the container
docker run -it --rm \
-v ~/.aws:/root/.aws:ro \
dlc-mcp-server
Option 3: Install locally
pip install -e .
dlc-mcp-server
MCP Client Configuration
For Amazon Q CLI
Add to ~/.aws/amazonq/mcp.json:
{
"mcpServers": {
"dlc-mcp-server": {
"command": "uv",
"args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
"timeout": 120000
}
}
}
For Kiro
Add to .kiro/settings/mcp.json:
{
"mcpServers": {
"dlc-mcp-server": {
"command": "uv",
"args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
"timeout": 120000
}
}
}
Using Docker
{
"mcpServers": {
"dlc-mcp-server": {
"command": "docker",
"args": ["run", "-i", "--rm", "-v", "~/.aws:/root/.aws:ro", "dlc-mcp-server"],
"timeout": 120000
}
}
}
Available Tools
DLC Discovery
| Tool | Description |
|---|---|
search_dlc_images |
Search DLC images by framework, version, accelerator, platform |
get_dlc_recommendation |
Get image recommendations based on model type and size |
list_dlc_frameworks |
List all available frameworks with versions |
get_llm_serving_options |
Compare vLLM, SGLang, DJL, NeuronX options |
compare_dlc_images |
Side-by-side image comparison |
refresh_dlc_catalog |
Force refresh image catalog from GitHub |
Image Building
| Tool | Description |
|---|---|
create_custom_dockerfile |
Generate Dockerfile with custom packages |
build_custom_dlc_image |
Build and optionally push to ECR |
Deployment
| Tool | Description |
|---|---|
deploy_to_sagemaker |
Deploy to SageMaker endpoint |
deploy_to_ec2 |
Launch EC2 instance with DLC |
deploy_to_ecs |
Deploy to ECS cluster |
deploy_to_eks |
Deploy to EKS cluster |
get_sagemaker_endpoint_status |
Check endpoint status |
Instance Advisor
| Tool | Description |
|---|---|
get_instance_recommendation |
GPU instance recommendations by model size |
list_gpu_instances |
List available GPU instances with pricing |
estimate_training_cost |
Estimate training job costs |
Troubleshooting
| Tool | Description |
|---|---|
analyze_dlc_error |
Analyze error logs with root cause analysis |
diagnose_common_issues |
Diagnose common DLC problems |
get_framework_compatibility_info |
Check framework version compatibility |
Best Practices
| Tool | Description |
|---|---|
get_security_best_practices |
Security guidelines |
get_cost_optimization_tips |
Cost reduction strategies |
get_deployment_best_practices |
Platform-specific guidance |
get_framework_specific_best_practices |
Framework optimization tips |
Supported Frameworks
| Framework | Latest Version | Use Cases |
|---|---|---|
| PyTorch | 2.9.0 | Training, Inference |
| TensorFlow | 2.19.0 | Training, Inference |
| vLLM | 0.15.1 | LLM Inference |
| SGLang | 0.5.8 | LLM Inference |
| HuggingFace PyTorch | 2.6.0 | NLP Training/Inference |
| AutoGluon | 1.5.0 | AutoML |
| DJL | 0.36.0 | Large Model Inference |
| PyTorch NeuronX | 2.9.0 | Trainium/Inferentia |
Example Usage
Find vLLM images
Search for vLLM images for SageMaker inference
Deploy LLM to SageMaker
Deploy Qwen2.5-32B using vLLM on SageMaker with the right instance type
Get instance recommendations
What instance should I use for a 35GB model?
Troubleshoot errors
Help me fix this CUDA out of memory error: [paste error]
Configuration
Environment variables:
| Variable | Description | Default |
|---|---|---|
ALLOW_WRITE |
Enable build/deploy operations | false |
ALLOW_SENSITIVE_DATA |
Enable detailed logs access | false |
FASTMCP_LOG_LEVEL |
Logging level | ERROR |
FASTMCP_LOG_FILE |
Log file path | None |
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
python -m pytest tests/ -v
# Run linting
ruff check .
See DEVELOPMENT.md for more details.
License
This library is licensed under the MIT-0 License.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.