scholar-memory
Enables scientific literature research through multi-agent search, analysis, and semantic memory, exposing 9 MCP tools for querying, storing, and retrieving research findings.
README
ScholarAgent
A multi-agent scientific literature research system that discovers papers on arXiv, Semantic Scholar, and GitHub, analyzes them through a 5-agent pipeline, and persists the findings in a semantic memory layer exposed to your coding agent as an MCP server.
The 30-second version
Ask your editor "what's the state of the art on RLHF reward models?" and ScholarAgent will:
- Search arXiv + Semantic Scholar, follow citation graphs, pull code examples from GitHub.
- Have specialist agents (Reader, Critic, Analyst, Synthesizer) extract claims, score methodology, find themes and contradictions, then write a Markdown literature review.
- Index every finding with embeddings in a local SQLite database at
~/.scholaragent/memory.db. - Expose 9 MCP tools so the next time you ask anything semantically adjacent,
memory_lookupreturns it instantly — no re-research needed.
A research run goes from a natural-language query to a cited review in ~5 seconds (quick depth) to ~5 minutes (deep depth).
Architecture at a glance
Your query
│
▼
Dispatcher (orchestrator, writes Python that calls sub-agents)
│
├─► Scout search arXiv + Semantic Scholar, follow citations
├─► Reader extract key_claims, methodology, results, limitations
├─► Critic score rigor + relevance, flag bias, rate reliability
├─► Analyst find themes (3+ papers), contradictions, gaps
└─► Synthesizer write structured Markdown review with citations
│
▼
ContextStream (structured pipeline state + per-agent conversation traces)
│
▼
MemoryStore (SQLite + embeddings, source_types: paper | code | docs | synthesized_report)
│
▼
MCP Server (9 tools over stdio, registered in Claude Code, Cursor, Windsurf, VS Code)
Each agent runs an RLM loop: the LLM generates Python code in a sandboxed REPL, the code executes, the output feeds back to the LLM, repeat until FINAL() terminates.
Multi-LLM routing. Scout uses a cheap fast model (gpt-4o-mini by default). Reader/Critic/Analyst/Synthesizer use a strong model (claude-sonnet-4-6 by default). Supported backends: OpenAI, Anthropic, and LM Studio (any OpenAI-compatible local server — zero API cost).
The 9 MCP tools
| Tool | Purpose | Latency |
|---|---|---|
memory_lookup |
Semantic search — compact summaries with IDs | ~100ms |
memory_get |
Fetch full content for one entry by ID | instant |
memory_research |
Run new research (quick/normal/deep × implementation/theory/comparison) | 5s–5min |
memory_store |
Manually save a finding, snippet, or insight | instant |
memory_forget |
Delete by ID or semantic similarity | instant |
memory_status |
DB stats + token usage + cost totals | instant |
memory_model_config |
Show active LLM backend configuration | instant |
memory_stream_list |
List recent ContextStream runs | instant |
memory_stream_get |
Full pipeline state + traces for one run (understand why, not just what) | instant |
Depth levels:
quick— Source search + dedup + index. No agents. ~5–10s.normal— Scout + Reader + Critic per paper (up to 3 parallel workers). ~30–60s.deep— Full Dispatcher orchestrating all 5 agents to produce a synthesized report. ~2–5min.
Focus modes (passed to every agent's system prompt):
implementation— emphasize code, APIs, how-to.theory— emphasize concepts, algorithms, math.comparison— emphasize alternatives, benchmarks, trade-offs.
Fallback is visible. Every memory_research response includes both requested_depth and actual_depth. If deep fell back to normal because the dispatcher timed out, you'll see fallback_reason in the response — no silent downgrades.
Installation
Prerequisites
- Python 3.12+
- One of:
OPENAI_API_KEY(for embeddings — required by default) plusANTHROPIC_API_KEY(for the strong model in normal/deep research), or- A local LM Studio server (everything free, no API keys needed, see below)
- Optional:
GITHUB_TOKENfor GitHub code search (otherwise you'll get 401s — non-fatal).
Install from PyPI (coming soon — currently install from source)
pip install scholaragent
scholaragent-install
The installer auto-detects:
- Claude Code —
~/.claude/settings.json - Cursor —
~/.cursor/mcp.json - Windsurf —
~/.windsurf/mcp.json - VS Code —
~/.vscode/mcp.json - LM Studio —
~/.lmstudio/mcp.json(same JSON shape as above). Also detects a running LM Studio athttp://localhost:1234and suggests--backend lmstudio. - Codex CLI —
~/.codex/config.toml(TOML; upserts[mcp_servers.scholar-memory]without touching other sections). - Docker Desktop MCP Toolkit — if
~/.docker/mcp/exists, the installer prints the exactdocker mcp server add …command to run (Docker manages its MCP registry via CLI, not a static file).
Each JSON-based target gets an mcpServers.scholar-memory entry in its JSON config, while Codex CLI gets a [mcp_servers.scholar-memory] table in ~/.codex/config.toml; both point at the scholaragent-server binary.
Install from source (bash installer)
git clone https://github.com/byBasiliosP/RLM-Agent.git
cd RLM-Agent
# cloud backend — export your keys first
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..." # required for normal/deep depth
export GITHUB_TOKEN="ghp_..." # optional
./install.sh
# or, fully local via LM Studio — no keys required
./install.sh --backend lmstudio
install.sh will:
- Check Python ≥ 3.12.
- Create
./.venv/andpip install -e .into it. - Verify the
scholaragent-serverentry point exists. - Choose a backend:
--backend lmstudioskips the cloud-key check entirely.- If cloud keys are missing and a local LM Studio is running on
localhost:1234, the installer offers to switch ([Y/n]prompt). Pass--yesto auto-accept.
- Delegate registration to
scholaragent-install, which upserts the MCP entry in every detected target (Claude Code / Cursor / Windsurf / VS Code / LM Studio / Codex CLI) and prints the exactdocker mcp server add …line if Docker Desktop MCP Toolkit is present.
Install from source (manual)
git clone https://github.com/byBasiliosP/RLM-Agent.git
cd RLM-Agent
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
scholaragent-install
Install for LM Studio (no API keys)
scholaragent-install --backend lmstudio \
--strong-model qwen3-30b-a3b \
--cheap-model llama-3.2-3b-instruct
Then start LM Studio at http://localhost:1234/v1 with both models loaded. To also run embeddings locally, set these in your shell profile:
export SCHOLAR_EMBEDDING_BACKEND=lmstudio
export SCHOLAR_EMBEDDING_MODEL=text-embedding-nomic-embed-text-v1.5
Uninstall
scholaragent-install --uninstall
# or
./install.sh --uninstall
Removes the scholar-memory entry from every agent config it finds. The ~/.scholaragent/memory.db stays put — delete it manually if you want a clean slate.
Restart your coding agent
Every coding agent caches MCP configuration at startup. After install, fully restart (not just reload window) so the 9 new tools appear.
Configuration
All config is environment variables — no config files.
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
— | Embeddings + OpenAI backend |
ANTHROPIC_API_KEY |
— | Anthropic backend (default strong model) |
GITHUB_TOKEN |
— | GitHub code search (higher rate limit; 401 without it) |
SCHOLAR_STRONG_BACKEND |
anthropic |
Backend for analysis agents |
SCHOLAR_STRONG_MODEL |
claude-sonnet-4-6 |
Strong model name |
SCHOLAR_CHEAP_BACKEND |
openai |
Backend for Scout |
SCHOLAR_CHEAP_MODEL |
gpt-4o-mini |
Cheap model name |
SCHOLAR_LMSTUDIO_URL |
http://localhost:1234/v1 |
LM Studio endpoint (when backend=lmstudio) |
SCHOLAR_EMBEDDING_BACKEND |
openai |
openai or lmstudio |
SCHOLAR_EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model |
SCHOLAR_MEMORY_DIR |
~/.scholaragent |
Data directory |
SCHOLAR_MEMORY_DB |
~/.scholaragent/memory.db |
SQLite DB path |
Python API
Skip MCP and use directly:
from scholaragent import ScholarAgent
agent = ScholarAgent(
strong_model={"backend": "anthropic", "model_name": "claude-sonnet-4-6"},
cheap_model={"backend": "openai", "model_name": "gpt-4o-mini"},
max_iterations=15,
verbose=True,
)
result = agent.research("State of the art on RLHF reward models")
print(result.result) # Markdown literature review
For a lower-level entrypoint that matches what the MCP server does internally:
from scholaragent.runtime import RuntimeContainer
from pathlib import Path
container = RuntimeContainer(
data_dir=Path.home() / ".scholaragent",
db_path=str(Path.home() / ".scholaragent/memory.db"),
model_config={
"strong": {"backend": "anthropic", "model_name": "claude-sonnet-4-6"},
"cheap": {"backend": "openai", "model_name": "gpt-4o-mini"},
},
)
pipeline = container.get_pipeline()
container.ensure_pipeline_agents()
result = pipeline.run("RLHF reward models", depth="deep", focus="implementation")
print(result["actual_depth"], result["entries_added"])
container.close()
Things you should know
Honest notes about warts, limits, and design choices.
Costs
- Quick depth is cheap — just embedding costs. A single research run embeds maybe 15 entries, which is a fraction of a cent on
text-embedding-3-small. - Normal depth runs Reader + Critic per paper, up to 3 in parallel. Expect ~$0.05–0.30 per run on default models.
- Deep depth runs the full pipeline with iteration loops. Expect ~$0.50–3.00 per run on default models.
- Set an API spending limit on your OpenAI/Anthropic accounts. The dispatcher has token budgets but they're per-run, not per-day.
The Linter agent exists but isn't wired up
scholaragent/agents/linter.py implements LinterAgent (static analysis over a code path), and there are tests for it, but it is not registered in the default AgentRegistry in either scholaragent/__init__.py or scholaragent/runtime.py. Calling call_agent("linter", ...) from the Dispatcher will raise because the registry has no entry for it. If you want linter integration, register it yourself or open a PR.
Source adapter caveats
- GitHub code search is hardcoded to
language="python"in the default SourceCollector. You can change it viaSourceCollector(default_code_language="rust")or passcode_language=tocollect(), but the MCP tool path uses the default. - Docs search (
sources/docs.py::search_docs) currently constructs a single URL:https://docs.python.org/3/search.html?q={query}. It is a stub.fetch_docs(url)against arbitrary URLs works fine — the search part doesn't. - Semantic Scholar rate-limits aggressively (429). Errors are caught and returned in the
errorslist; results from other sources still come through. - arXiv redirects HTTP→HTTPS; httpx follows redirects transparently but adds a small latency on the first call.
Memory store scalability
- Vector search is cosine similarity computed in Python against all stored embeddings. Fine for thousands of entries. If you index hundreds of thousands, consider swapping
MemoryStore.searchfor a real vector index (FAISS, sqlite-vss, etc.) — theEmbeddingBackendis an ABC to make this easier. - The SQLite file uses WAL mode. Multiple readers are fine; heavy concurrent writes will serialize through a single lock.
Dedup behavior
memory_researchcaches results for 7 days by default. Calling the same query twice inside that window returns{"status": "cached"}instead of re-running. Passforce=Trueto override.- Within a run, papers are deduped by arxiv_id first, then normalized title. S2 entries win over arXiv entries on collision because they carry citation counts.
Research pipeline fallback
- Deep falls back to normal if the Dispatcher errors or returns no result.
- Normal falls back to quick if Scout fails.
- Quick is the floor — it never falls back.
- The returned dict carries
requested_depth,actual_depth, andfallback_reasonso the caller sees exactly what happened. The legacydepthkey aliasesactual_depthfor backwards compatibility.
MCP transport
scholaragent-serverspeaks MCP over stdio — it's spawned by the coding agent as a subprocess. There is no HTTP endpoint.- The 9 tools are defined in
scholaragent/_manifest.py::MCP_TOOLS(single source of truth). A drift-guard test parses the@mcp.tool()decorators and fails the build if the manifest diverges.
Testing
- 609 tests across 36 files.
pytest-asynciois installed (transitive dep) but disabled viapyproject.toml:addopts = "-p no:asyncio". There are no async tests, and the plugin deadlocks with threading.Lock-based lazy init.- Run:
python -m pytest tests/ -v tests/test_installer.pyassumes a repo-level.venv/and fails in worktree checkouts.tests/test_web_tools_live.pyhits real search engines and is skipped by default.
Security
- API keys are read from the shell environment at runtime. They are never written to any config file the installer touches.
- The REPL has restricted builtins (no
input,eval,exec,compile).RESERVED_NAMES(FINAL,FINAL_VAR,call_agent,llm_query, etc.) are restored after every code execution so the LLM's generated code can't corrupt the scaffold.
Project structure
scholaragent/
├── __init__.py # Public API: ScholarAgent class
├── _manifest.py # MCP_TOOLS tuple — single source of truth
├── mcp_server.py # FastMCP server (9 tools, stdio)
├── runtime.py # RuntimeContainer — owns store/pipeline/agent-infra lifecycle
├── installer.py # CLI: scholaragent-install (reads _manifest.MCP_TOOLS)
├── core/ # SpecialistAgent ABC, Dispatcher, registry, LMHandler, ContextStream
├── agents/ # scout, reader, critic, analyst, synthesizer, linter (not registered)
├── clients/ # OpenAI, Anthropic, LM Studio, router, token counter, rate limiter
├── environments/ # Sandboxed LocalREPL
├── memory/
│ ├── store.py # SQLite + cosine search
│ ├── types.py # MemoryEntry, ResearchLogEntry, VALID_SOURCE_TYPES
│ ├── embeddings.py # EmbeddingBackend ABC + OpenAIEmbeddings + LRU cache
│ ├── research.py # ResearchPipeline — depth orchestration
│ ├── source_collector.py # SourceCollector — raw retrieval (arXiv, S2, GitHub, docs)
│ ├── indexer.py # ResultIndexer — MemoryEntry construction + store writes
│ └── research_result.py # ResearchResult dataclass (requested_depth, actual_depth)
├── tools/ # arxiv, semantic_scholar, web, pdf_extractor, quality
├── sources/ # github code search, html doc fetcher
└── utils/ # parsing, prompts, retry, budget, cost, llm cache
tests/ # 609 tests
install.sh # Bash installer (wraps scholaragent-install)
Development
pip install -e ".[dev]"
python -m pytest tests/ -v # full suite
python -m pytest tests/test_research_pipeline.py -v # one file
python -m pytest tests/ --ignore=tests/test_installer.py \
--ignore=tests/test_web_tools_live.py # skip env-specific
Citations & Acknowledgements
Research Foundations
ScholarAgent is built on ideas from the following academic work:
-
Recursive Language Models (RLM) — Zhang, A. L., Kraska, T., & Khattab, O. (2026). Recursive Language Models. arXiv:2512.24601. https://arxiv.org/abs/2512.24601 The core REPL-driven orchestration paradigm. Every agent loop in ScholarAgent follows the RLM pattern of code generation → execution → observation → iteration.
-
ReAct — Yao, S., Zhao, J., Yu, D., et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629. https://arxiv.org/abs/2210.03629 The thought–action–observation loop that underpins agent reasoning.
-
Multi-Agent Systems — Guo, T., et al. (2024). Large Language Model based Multi-Agents: A Survey of Progress and Challenges. IJCAI 2024. arXiv:2402.01680. https://arxiv.org/abs/2402.01680; Li, S., et al. (2024). arXiv:2412.17481. https://arxiv.org/abs/2412.17481 Surveys informing the five-agent specialist architecture.
Frameworks & Protocols
- RLM — REPL-driven LM orchestration patterns (Zhang et al., MIT CSAIL)
- MCP — Model Context Protocol for agent interoperability (Anthropic, 2024)
- FastMCP — Python MCP server framework by Jeremiah Lowin
Data Sources
- arXiv API — Open access to scientific papers (operated by Cornell University)
- Semantic Scholar Academic Graph API — Paper metadata and citation graphs (Allen Institute for AI)
- GitHub REST API — Code search (GitHub / Microsoft)
Key Libraries
- OpenAI Python SDK — LLM and embedding API client (OpenAI)
- Anthropic Python SDK — LLM API client (Anthropic)
- httpx — Async HTTP client (Encode)
- Rich — Terminal formatting (Will McGugan / Textualize)
- NumPy — Numerical operations (NumPy community)
- pypdf — PDF text extraction (pypdf community)
- Trafilatura — Web text extraction (Adrien Barbaresi)
License
Citations & Acknowledgements
Research Foundations
ScholarAgent is built on ideas from the following academic work:
-
Recursive Language Models (RLM) — Zhang, A. L., Kraska, T., & Khattab, O. (2026). Recursive Language Models. arXiv:2512.24601. https://arxiv.org/abs/2512.24601 The core REPL-driven orchestration paradigm. Every agent loop in ScholarAgent follows the RLM pattern of code generation → execution → observation → iteration.
-
ReAct — Yao, S., Zhao, J., Yu, D., et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629. https://arxiv.org/abs/2210.03629 The thought–action–observation loop that underpins agent reasoning.
-
Multi-Agent Systems — Guo, T., et al. (2024). Large Language Model based Multi-Agents: A Survey of Progress and Challenges. IJCAI 2024. arXiv:2402.01680. https://arxiv.org/abs/2402.01680; Li, S., et al. (2024). arXiv:2412.17481. https://arxiv.org/abs/2412.17481 Surveys informing the five-agent specialist architecture.
Frameworks & Protocols
- RLM — REPL-driven LM orchestration patterns (Zhang et al., MIT CSAIL)
- MCP — Model Context Protocol for agent interoperability (Anthropic, 2024)
- FastMCP — Python MCP server framework by Jeremiah Lowin
Data Sources
- arXiv API — Open access to scientific papers (operated by Cornell University)
- Semantic Scholar Academic Graph API — Paper metadata and citation graphs (Allen Institute for AI)
- GitHub REST API — Code search (GitHub / Microsoft)
Key Libraries
- OpenAI Python SDK — LLM and embedding API client (OpenAI)
- Anthropic Python SDK — LLM API client (Anthropic)
- httpx — Async HTTP client (Encode)
- Rich — Terminal formatting (Will McGugan / Textualize)
- NumPy — Numerical operations (NumPy community)
- pypdf — PDF text extraction (pypdf community)
- Trafilatura — Web text extraction (Adrien Barbaresi)
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.