rlm-mcp-server
Provides recursive language model capabilities to AI assistants, enabling efficient exploration of large contexts through iterative Python code execution.
README
RLM MCP Server
"The difference between the Enterprise's computer and a Culture Mind is that the Mind doesn't try to hold everything in immediate consciousness—it knows how to efficiently explore vast data stores."
A containerized MCP (Model Context Protocol) server that provides Recursive Language Model capabilities to any AI assistant.
What is RLM?
Based on the paper "Recursive Language Models" from MIT CSAIL, RLM treats large contexts as external environments that can be explored programmatically rather than stuffed into a context window.
Instead of cramming a 500-page document into context, RLM teaches the model to write Python code that explores the document—searching, parsing, counting, extracting—building up understanding iteratively until it can answer your question.
Quick Start
With Docker Compose (recommended)
# Clone and enter directory
cd rlm-mcp-server
# With OpenAI
OPENAI_API_KEY=sk-xxx docker compose up
# With local llama.cpp server (running on port 8080)
RLM_API_BASE=http://host.docker.internal:8080/v1 docker compose up
# With Ollama
RLM_API_BASE=http://host.docker.internal:11434/v1 RLM_MODEL=llama3.2 docker compose up
With Docker directly
# Build
docker build -t rlm-mcp-server .
# Run with OpenAI
docker run -e OPENAI_API_KEY=sk-xxx -p 8765:8765 rlm-mcp-server
# Run with local LLM
docker run \
-e RLM_API_BASE=http://host.docker.internal:8080/v1 \
--add-host host.docker.internal:host-gateway \
-p 8765:8765 \
rlm-mcp-server
Integration with Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"rlm": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--network", "host",
"-e", "OPENAI_API_KEY",
"rlm-mcp-server",
"python", "src/server.py", "--transport", "stdio"
],
"env": {
"OPENAI_API_KEY": "your-key-here"
}
}
}
}
See examples/ for more configuration options including local LLM setups.
Available Tools
Session Management
| Tool | Description |
|---|---|
rlm_load_context |
Load text content into a session |
rlm_load_file |
Load a file into a session |
rlm_list_sessions |
List active sessions |
rlm_close_session |
Close a session to free memory |
Querying
| Tool | Description |
|---|---|
rlm_query |
Ask a question about loaded context (iterative exploration) |
rlm_quick_query |
One-shot: load and query in one call |
rlm_execute_code |
Execute Python directly against context (power user) |
Configuration
| Tool | Description |
|---|---|
rlm_config |
View current server configuration |
Usage Examples
Load and query a large document
Human: Load this 200-page PDF transcript and find all mentions of "quarterly revenue"
Claude: I'll use RLM to explore this large document.
[Uses rlm_load_context to load the document]
[Uses rlm_query with question "Find all mentions of quarterly revenue with surrounding context"]
Based on exploring the document, I found 47 mentions of quarterly revenue...
Analyze a codebase
Human: Here's our entire codebase (500 files). What authentication methods are used?
Claude: I'll load this into RLM and explore it programmatically.
[Uses rlm_load_context with the codebase]
[Uses rlm_query to explore authentication patterns]
After exploring the codebase, I found three authentication methods:
1. JWT tokens in /api/auth/...
2. OAuth2 in /integrations/...
3. API keys in /external/...
Environment Variables
| Variable | Default | Description |
|---|---|---|
RLM_MODEL |
gpt-4o-mini |
Primary model for RLM |
RLM_SUB_MODEL |
Same as RLM_MODEL | Model for iterations (can be cheaper) |
RLM_MAX_ITERATIONS |
15 |
Max exploration iterations |
RLM_API_BASE |
OpenAI | API endpoint (for local models) |
RLM_API_KEY / OPENAI_API_KEY |
- | API key |
RLM_SUB_API_BASE |
Same as RLM_API_BASE | Separate endpoint for sub-model |
Cost Optimization
You can use a cheaper/local model for the iterative exploration while using a more capable model for initialization:
RLM_MODEL=gpt-4o \
RLM_SUB_MODEL=gpt-4o-mini \
docker compose up
Or use a local model for iterations entirely:
RLM_MODEL=gpt-4o \
RLM_SUB_MODEL=local-model \
RLM_SUB_API_BASE=http://host.docker.internal:8080/v1 \
docker compose up
How It Works
- Load Context: Your massive document/codebase is stored in a session
- Question: You ask a question about the content
- Exploration: The LLM writes Python code to explore the context
- Iteration: Code executes, LLM sees results, writes more code
- Answer: When confident, LLM provides final answer
The REPL environment has access to:
CONTEXT- the full loaded textre- regex modulejson- JSON moduleCounter,defaultdict- from collections- Standard Python builtins
Architecture
┌─────────────────────────────────────────────────────────┐
│ Claude Desktop │
│ or │
│ Any MCP Client │
└────────────────────────┬────────────────────────────────┘
│ MCP Protocol
▼
┌─────────────────────────────────────────────────────────┐
│ RLM MCP Server │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Sessions │ │ RLM Core │ │ REPL Env │ │
│ │ Storage │ │ Engine │ │ (Python) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────┘
│ OpenAI-compatible API
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Backend (OpenAI / Local / Ollama) │
└─────────────────────────────────────────────────────────┘
Development
# Install dependencies
pip install -r requirements.txt
# Run locally (stdio mode for testing)
cd src && python server.py --transport stdio
# Run tests
pytest tests/
Transferring to Offline Lab (Mojoverse)
-
Build the image on Cybertron:
docker save rlm-mcp-server:latest | gzip > rlm-mcp-server.tar.gz -
Transfer to Mojoverse via your usual method
-
Load on Mojoverse:
gunzip -c rlm-mcp-server.tar.gz | docker load -
Run with local L4-powered llama.cpp:
RLM_API_BASE=http://localhost:8080/v1 docker compose up
References
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.