rlm-mcp-server

rlm-mcp-server

Provides recursive language model capabilities to AI assistants, enabling efficient exploration of large contexts through iterative Python code execution.

Category
Visit Server

README

RLM MCP Server

"The difference between the Enterprise's computer and a Culture Mind is that the Mind doesn't try to hold everything in immediate consciousness—it knows how to efficiently explore vast data stores."

A containerized MCP (Model Context Protocol) server that provides Recursive Language Model capabilities to any AI assistant.

What is RLM?

Based on the paper "Recursive Language Models" from MIT CSAIL, RLM treats large contexts as external environments that can be explored programmatically rather than stuffed into a context window.

Instead of cramming a 500-page document into context, RLM teaches the model to write Python code that explores the document—searching, parsing, counting, extracting—building up understanding iteratively until it can answer your question.

Quick Start

With Docker Compose (recommended)

# Clone and enter directory
cd rlm-mcp-server

# With OpenAI
OPENAI_API_KEY=sk-xxx docker compose up

# With local llama.cpp server (running on port 8080)
RLM_API_BASE=http://host.docker.internal:8080/v1 docker compose up

# With Ollama
RLM_API_BASE=http://host.docker.internal:11434/v1 RLM_MODEL=llama3.2 docker compose up

With Docker directly

# Build
docker build -t rlm-mcp-server .

# Run with OpenAI
docker run -e OPENAI_API_KEY=sk-xxx -p 8765:8765 rlm-mcp-server

# Run with local LLM
docker run \
  -e RLM_API_BASE=http://host.docker.internal:8080/v1 \
  --add-host host.docker.internal:host-gateway \
  -p 8765:8765 \
  rlm-mcp-server

Integration with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "rlm": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--network", "host",
        "-e", "OPENAI_API_KEY",
        "rlm-mcp-server",
        "python", "src/server.py", "--transport", "stdio"
      ],
      "env": {
        "OPENAI_API_KEY": "your-key-here"
      }
    }
  }
}

See examples/ for more configuration options including local LLM setups.

Available Tools

Session Management

Tool Description
rlm_load_context Load text content into a session
rlm_load_file Load a file into a session
rlm_list_sessions List active sessions
rlm_close_session Close a session to free memory

Querying

Tool Description
rlm_query Ask a question about loaded context (iterative exploration)
rlm_quick_query One-shot: load and query in one call
rlm_execute_code Execute Python directly against context (power user)

Configuration

Tool Description
rlm_config View current server configuration

Usage Examples

Load and query a large document

Human: Load this 200-page PDF transcript and find all mentions of "quarterly revenue"

Claude: I'll use RLM to explore this large document.

[Uses rlm_load_context to load the document]
[Uses rlm_query with question "Find all mentions of quarterly revenue with surrounding context"]

Based on exploring the document, I found 47 mentions of quarterly revenue...

Analyze a codebase

Human: Here's our entire codebase (500 files). What authentication methods are used?

Claude: I'll load this into RLM and explore it programmatically.

[Uses rlm_load_context with the codebase]
[Uses rlm_query to explore authentication patterns]

After exploring the codebase, I found three authentication methods:
1. JWT tokens in /api/auth/...
2. OAuth2 in /integrations/...
3. API keys in /external/...

Environment Variables

Variable Default Description
RLM_MODEL gpt-4o-mini Primary model for RLM
RLM_SUB_MODEL Same as RLM_MODEL Model for iterations (can be cheaper)
RLM_MAX_ITERATIONS 15 Max exploration iterations
RLM_API_BASE OpenAI API endpoint (for local models)
RLM_API_KEY / OPENAI_API_KEY - API key
RLM_SUB_API_BASE Same as RLM_API_BASE Separate endpoint for sub-model

Cost Optimization

You can use a cheaper/local model for the iterative exploration while using a more capable model for initialization:

RLM_MODEL=gpt-4o \
RLM_SUB_MODEL=gpt-4o-mini \
docker compose up

Or use a local model for iterations entirely:

RLM_MODEL=gpt-4o \
RLM_SUB_MODEL=local-model \
RLM_SUB_API_BASE=http://host.docker.internal:8080/v1 \
docker compose up

How It Works

  1. Load Context: Your massive document/codebase is stored in a session
  2. Question: You ask a question about the content
  3. Exploration: The LLM writes Python code to explore the context
  4. Iteration: Code executes, LLM sees results, writes more code
  5. Answer: When confident, LLM provides final answer

The REPL environment has access to:

  • CONTEXT - the full loaded text
  • re - regex module
  • json - JSON module
  • Counter, defaultdict - from collections
  • Standard Python builtins

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Claude Desktop                        │
│                         or                               │
│                    Any MCP Client                        │
└────────────────────────┬────────────────────────────────┘
                         │ MCP Protocol
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   RLM MCP Server                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   Sessions   │  │   RLM Core   │  │  REPL Env    │  │
│  │   Storage    │  │   Engine     │  │  (Python)    │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │ OpenAI-compatible API
                         ▼
┌─────────────────────────────────────────────────────────┐
│           LLM Backend (OpenAI / Local / Ollama)          │
└─────────────────────────────────────────────────────────┘

Development

# Install dependencies
pip install -r requirements.txt

# Run locally (stdio mode for testing)
cd src && python server.py --transport stdio

# Run tests
pytest tests/

Transferring to Offline Lab (Mojoverse)

  1. Build the image on Cybertron:

    docker save rlm-mcp-server:latest | gzip > rlm-mcp-server.tar.gz
    
  2. Transfer to Mojoverse via your usual method

  3. Load on Mojoverse:

    gunzip -c rlm-mcp-server.tar.gz | docker load
    
  4. Run with local L4-powered llama.cpp:

    RLM_API_BASE=http://localhost:8080/v1 docker compose up
    

References

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured