AWS Deep Learning Containers MCP Server

AWS Deep Learning Containers MCP Server

Provides tools for discovering, building, deploying, and troubleshooting AWS Deep Learning Containers (DLC) images. Supports multiple frameworks and instance recommendations.

Category
Visit Server

README

AWS Deep Learning Containers MCP Server

A Model Context Protocol (MCP) server for AWS Deep Learning Containers (DLC) that provides tools for discovering, building, deploying, and troubleshooting DLC images.

Features

  • Dynamic DLC Image Discovery: Automatically fetches latest images from AWS DLC GitHub - always up-to-date
  • Image Building: Create custom Dockerfiles and build images based on DLC base images
  • Multi-Platform Deployment: Deploy to SageMaker, EC2, ECS, and EKS
  • Instance Recommendations: Get GPU instance recommendations based on model size and budget
  • Upgrade Support: Analyze upgrade paths and generate migration Dockerfiles
  • Troubleshooting: Diagnose common DLC issues with actionable solutions
  • Best Practices: Security, cost optimization, and deployment guidance
  • No AWS Credentials Required: Discovery tools work without AWS credentials

Quick Start

Option 1: Run with uv (Recommended)

# Clone the repo
git clone https://github.com/aws-samples/sample-dlc-mcp-server.git
cd sample-dlc-mcp-server

# Run directly with uv
uv run dlc-mcp-server

Option 2: Run with Docker

# Build the image
docker build -t dlc-mcp-server .

# Run the container
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  dlc-mcp-server

Option 3: Install locally

pip install -e .
dlc-mcp-server

MCP Client Configuration

For Amazon Q CLI

Add to ~/.aws/amazonq/mcp.json:

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "uv",
      "args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

For Kiro

Add to .kiro/settings/mcp.json:

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "uv",
      "args": ["--directory", "/path/to/sample-dlc-mcp-server", "run", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

Using Docker

{
  "mcpServers": {
    "dlc-mcp-server": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "~/.aws:/root/.aws:ro", "dlc-mcp-server"],
      "timeout": 120000
    }
  }
}

Available Tools

DLC Discovery

Tool Description
search_dlc_images Search DLC images by framework, version, accelerator, platform
get_dlc_recommendation Get image recommendations based on model type and size
list_dlc_frameworks List all available frameworks with versions
get_llm_serving_options Compare vLLM, SGLang, DJL, NeuronX options
compare_dlc_images Side-by-side image comparison
refresh_dlc_catalog Force refresh image catalog from GitHub

Image Building

Tool Description
create_custom_dockerfile Generate Dockerfile with custom packages
build_custom_dlc_image Build and optionally push to ECR

Deployment

Tool Description
deploy_to_sagemaker Deploy to SageMaker endpoint
deploy_to_ec2 Launch EC2 instance with DLC
deploy_to_ecs Deploy to ECS cluster
deploy_to_eks Deploy to EKS cluster
get_sagemaker_endpoint_status Check endpoint status

Instance Advisor

Tool Description
get_instance_recommendation GPU instance recommendations by model size
list_gpu_instances List available GPU instances with pricing
estimate_training_cost Estimate training job costs

Troubleshooting

Tool Description
analyze_dlc_error Analyze error logs with root cause analysis
diagnose_common_issues Diagnose common DLC problems
get_framework_compatibility_info Check framework version compatibility

Best Practices

Tool Description
get_security_best_practices Security guidelines
get_cost_optimization_tips Cost reduction strategies
get_deployment_best_practices Platform-specific guidance
get_framework_specific_best_practices Framework optimization tips

Supported Frameworks

Framework Latest Version Use Cases
PyTorch 2.9.0 Training, Inference
TensorFlow 2.19.0 Training, Inference
vLLM 0.15.1 LLM Inference
SGLang 0.5.8 LLM Inference
HuggingFace PyTorch 2.6.0 NLP Training/Inference
AutoGluon 1.5.0 AutoML
DJL 0.36.0 Large Model Inference
PyTorch NeuronX 2.9.0 Trainium/Inferentia

Example Usage

Find vLLM images

Search for vLLM images for SageMaker inference

Deploy LLM to SageMaker

Deploy Qwen2.5-32B using vLLM on SageMaker with the right instance type

Get instance recommendations

What instance should I use for a 35GB model?

Troubleshoot errors

Help me fix this CUDA out of memory error: [paste error]

Configuration

Environment variables:

Variable Description Default
ALLOW_WRITE Enable build/deploy operations false
ALLOW_SENSITIVE_DATA Enable detailed logs access false
FASTMCP_LOG_LEVEL Logging level ERROR
FASTMCP_LOG_FILE Log file path None

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
python -m pytest tests/ -v

# Run linting
ruff check .

See DEVELOPMENT.md for more details.

License

This library is licensed under the MIT-0 License.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured