Cortex

Cortex

MCP server that lets AI agents monitor and tune neural network training in real time, with anomaly detection and guarded interventions.

Category
Visit Server

README

Cortex

MCP server that lets AI agents monitor and tune neural network training in real time.

Instead of staring at loss curves and manually adjusting hyperparameters, connect an AI agent to your training loop. It watches metrics, detects anomalies, and makes guarded interventions — with safety rails to prevent it from doing more harm than good.

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌───────────────┐
│  Training Loop   │────▶│    Telemetry      │◀────│   AI Agent     │
│                  │     │  (tracker.log)    │     │  (via MCP)     │
│  PyTorch / MLX   │◀────│                  │────▶│                │
│  JAX / any       │     │  ┌────────────┐  │     │  diagnose()    │
│                  │     │  │ Detectors  │  │     │  adjust_param()│
│  tracker.poll()  │     │  │ Guardrails │  │     │  rollback()    │
│  get_override()  │     │  │ Log        │  │     │  save/restore  │
└─────────────────┘     └──────────────────┘     └───────────────┘

Three layers:

  1. Telemetry — metrics, config, checkpoints, history
  2. Detectors — rule-based anomaly detection producing typed findings (entropy collapse, loss divergence, gradient spikes, eval plateau, loss imbalance)
  3. Guardrails — safety constraints on interventions (max % change, cooldowns, rate limits, checkpoint-before-action requirement, full intervention log)

Quick Start

In your training code

from cortex import tracker

tracker.config(total_steps=100000, lr=3e-4, batch_size=64)

@tracker.on_checkpoint
def save(tag):
    torch.save(model.state_dict(), f"checkpoints/{tag}.pt")

@tracker.on_rollback
def rollback(tag):
    model.load_state_dict(torch.load(f"checkpoints/{tag}.pt"))
    return True

for step in range(100000):
    loss = train_step()
    tracker.log(step=step, loss=loss, entropy=ent, grad_norm=gnorm)

    # Process agent commands (checkpoint, rollback, pause/resume)
    tracker.poll()

    # Pick up live param adjustments
    new_lr = tracker.get_override("lr")
    if new_lr is not None:
        optimizer.lr = new_lr

Connect an agent

{
  "mcpServers": {
    "cortex": {
      "command": "cortex"
    }
  }
}

MCP Tools

Observe

Tool Description
get_status Step, progress %, ETA, steps/sec, phase
get_metrics Latest values of all tracked metrics
get_metric_history Time series for trend analysis
get_config Hyperparameter configuration
list_metrics All metric names being tracked

Detect

Tool Description
diagnose Run all anomaly detectors, get structured findings with severity and recommendations

Intervene

Tool Description
adjust_param Change a hyperparameter (guarded: max % change, cooldown, requires checkpoint)
save_checkpoint Save model state for later rollback
rollback Restore model to a checkpoint
pause_training Pause the loop to analyze and decide
resume_training Continue after pause

Review

Tool Description
get_intervention_log Full history of every change with before/after metrics
get_guardrail_status Rate limits, cooldowns, intervention count

Detectors

Built-in anomaly detectors that produce structured findings:

Detector What it catches
entropy_collapse Policy committing too early (entropy dropping fast while not improving)
loss_divergence NaN/inf loss, or loss increasing rapidly
loss_imbalance Value loss drowning out policy loss signal
grad_instability Gradient norm spikes that precede divergence
eval_plateau Eval score stopped improving
learning_stall Loss stopped decreasing

Each finding includes severity, explanation, the specific metric values, and a recommended action.

Guardrails

Every intervention is validated before execution:

  • Max % change: Can't change a param by more than 50% at once (configurable)
  • Cooldown: 30s minimum between adjustments to the same param
  • Rate limit: Max 20 interventions per hour
  • Checkpoint required: Must save a checkpoint before making any adjustment
  • Full logging: Every intervention recorded with before/after metrics and reason

Install

pip install cortex-mcp

Or from source:

pip install -e .

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured