StatTools
MCP server that lets AI agents discover and call R and Python statistical functions without writing code.
README
StatTools
MCP server that lets AI agents discover and call R and Python statistical functions without writing code.
What It Does
- Search ~48k functions on a fresh clone after
build-index, and ~336k after the full Phase 7 + 7b tarball waves:"mixed effects model"findslme4::lmer - Validate before executing:
stat_resolvechecks safety, generates parameter schema - Execute with structured JSON input/output: no R syntax, no script files, no console parsing
- Track session state: data handles, model handles, resolved functions
- Call methods on Python objects:
model.fit(X, y),model.predict(X_test),scaler.transform(X) - Auto-index after install:
stat_installmakes new packages immediately searchable
Architecture
Agent (Claude Code / Cursor / custom)
| MCP protocol (stdio)
v
TypeScript MCP Server
|-- SQLite FTS5 search index (~48k fresh-clone baseline, ~336k after the Phase 7 + 7b tarball waves)
|-- R Worker Pool (persistent subprocess, hot-standby, recycle/crash recovery)
|-- Python Worker (persistent subprocess, sklearn/statsmodels/scipy/pandas)
+-- Session state (handles, resolved functions, install jobs)
Quick Start
Prerequisites
- Node.js 22.x (enforced — see
.nvmrc) - R >= 4.1 with
jsonlitepackage installed - Python 3 with sklearn/statsmodels/scipy/pandas (optional — for Python workflows)
Install & Build
cd stattools
nvm use # Use pinned Node 22.x
npm install
npm run build
Build the Search Index
npm run build-index
Indexes all installed R packages + CRAN metadata (~2 minutes).
Connect to Claude Code
Add to ~/.claude/settings.json. Use the full path to your Node 22 binary — better-sqlite3 will crash under a different Node version:
{
"mcpServers": {
"stattools": {
"command": "/path/to/.nvm/versions/node/v22.x.x/bin/node",
"args": ["/absolute/path/to/stattools/dist/index.js"],
"env": {
"STATTOOLS_DATA_ROOTS": "/Users/me/data:/tmp",
"R_PATH": "/path/to/Rscript",
"PATH": "/path/to/R/bin:/path/to/node/bin:/usr/bin:/bin"
}
}
}
}
Find your Node 22 path with nvm which 22. R_PATH and PATH must include Rscript for the R worker pool to function.
Tools
| Tool | Purpose |
|---|---|
stat_search |
Search functions by natural language. Returns ranked results with safety class. |
stat_resolve |
Validate a function + get full parameter schema. Required before stat_call. |
stat_call |
Execute a resolved function with JSON arguments. Returns structured results. |
stat_method |
Call a method on a Python session object (fit, predict, transform, score). |
stat_load_data |
Load CSV/TSV/RDS into session. Supports runtime="python" for pandas. |
stat_session |
View session state: handles, resolved functions, worker status, install jobs. |
stat_describe |
Inspect a handle: schema, head, dimensions, summary, str. |
stat_install |
Install a CRAN package (async). Auto-indexes on completion. |
Example: R Workflow
stat_search({ query: "linear regression" })
-> stats::lm (safe), MASS::lm.ridge (safe), ...
stat_resolve({ package: "stats", function: "lm" })
-> { resolved: true, safety_class: "safe", schema: { formula, data, ... } }
stat_load_data({ file_path: "/tmp/sales.csv" })
-> { object_id: "sales", dimensions: { rows: 1000, cols: 8 }, ... }
stat_call({ package: "stats", function: "lm", args: { formula: "revenue ~ ad_spend", data: "sales" } })
-> { r_squared: 0.73, coefficients: { ad_spend: { estimate: 2.3, p_value: 0.001 }, ... } }
Example: Python Workflow
stat_load_data({ file_path: "/tmp/data.csv", runtime: "python", name: "df" })
-> { object_id: "df", class: "DataFrame", dimensions: { rows: 500, cols: 10 } }
stat_resolve({ package: "sklearn.linear_model", function: "LinearRegression" })
-> { resolved: true, runtime: "python", schema: { ... } }
stat_call({ package: "sklearn.linear_model", function: "LinearRegression", args: {}, assign_to: "model" })
-> { objects_created: [{ id: "model", type: "model" }] }
stat_method({ object: "model", method: "fit", positional_args: ["X_train", "y_train"] })
-> { coefficients: [2.3, -0.5], intercept: 1.2 }
stat_method({ object: "model", method: "predict", positional_args: ["X_test"], assign_to: "preds" })
-> { class: "ndarray", shape: [100], ... }
Safety Model
Functions are classified into tiers:
| Class | Behavior |
|---|---|
safe |
Fully callable. Pure computation. |
callable_with_caveats |
Callable with warnings (e.g., NSE, graphics, RNG). |
unsafe |
Blocked. File writes, network, system modification. |
unclassified |
Blocked by default. Discoverable but not callable. |
2,024 safety overrides in CSV (~2,048 classified in the built DB including Python defaults). Unclassified functions are blocked — extend coverage by adding entries to data/safety_overrides.csv.
Search Quality
Benchmark: 111 queries across 12 categories.
Fresh clone (after build-index only): ~48k functions, ~570 classified. Benchmark pass rate depends on which packages are installed locally and whether tarball extraction has been run. Expect ~90% on a standard R installation.
Expanded index (after the full Phase 7 + 7b tarball waves + ranking/callability updates): ~336k functions, ~2.0k classified. 100% top-3 and 93% top-1 on 97/97 installable queries (MRR: 0.962) — tested on a machine with a rich local R library including the easystats suite. ML, IO, visualization, mixed-models, wrangling, and diagnostics categories are at 100% top-1; weaker categories (testing, bayesian) sit at 83%.
The headline 100% number requires both a rich local R library and tarball extraction. Your mileage will vary based on which packages are installed.
Environment Variables
| Variable | Default | Description |
|---|---|---|
STATTOOLS_DATA_ROOTS |
Current directory | Colon-separated list of allowed data directories |
R_PATH |
Rscript |
Path to Rscript binary |
Setup Validation
After build + index, verify everything works:
npm run validate # Checks Node, R, build, index, server, and runs a real workflow
This runs 14 checks including safety-override integrity, starting the MCP server, inspecting Python runtime health, and executing a complete search → resolve → load → call → session workflow.
For real external-client validation through Claude Code CLI, including exact prompts for OLS, mixed-effects, reshape, ggplot2, and glmnet, see AGENT_WORKFLOW_RUNBOOK.md.
Development
nvm use # Enforce Node 22.x
npm test # Run the hermetic default test suite
npm run test:tarball-live # Optional live CRAN tarball smoke test
npm run test:benchmark # Run the heavy 111-query benchmark separately
npm run test:watch # Watch mode
npm run build # Compile TypeScript
npm run build-index # Rebuild search index
npm run apply-safety-overrides # Sync safety_overrides.csv into the current DB
npm run check-safety-overrides # Fail if safety_overrides.csv has orphan or duplicate IDs
npm run validate # Full setup validation
Status: Beta for Tier A workflows (v0.2.0)
Phase 6 closed with a four-round agent eval going from 80% → 84% → 92% → 98% weighted pass rate on a 25-task representative workflow set. The single remaining non-pass is an upstream R-package bug. See phase6-retrospective.md for the full story.
What works reliably:
- Search: ~90% top-3 on a fresh clone. On the fully expanded Phase 7 + 7b index, the benchmark is 100% top-3 and 93% top-1 on 99 installable queries (MRR 0.963).
- Core R workflows: OLS, logistic, t-test, ANOVA, correlation, random forest, PCA, k-means, mixed effects (lme4 random intercept/slope/GLMM), survival (Kaplan-Meier, Cox PH, Weibull), robust SE, broom tidy, VIF, stepwise selection, time series (auto.arima, STL, forecast), Bayesian regression (rstanarm), polynomial regression with model comparison, fixest panel regression — all validated end-to-end through agent evals.
- Data loading: CSV/TSV/RDS via
file_path, built-in R datasets viadataset(mtcars, iris, sleepstudy, lung, cbpp, Grunfeld, AirPassengers, ...), pandas DataFrame viaruntime="python". Handles register identically. - NSE-heavy verbs (dplyr, tidyr, ggplot2::aes):
stat_call'sexpressionsanddot_expressionsfields take R expression strings, parsed viarlang::parse_exprand forwarded as quosures. dplyr data-mask pronouns liken()and tidyselect helpers likeeverything()/-Speciesresolve correctly.stat_resolvereturns annse_hintfield for ~15 known NSE functions with worked examples. - Multi-object dispatch (
anova(m1, m2),AIC(m1, m2)):stat_call'sdot_argsfield resolves session handle IDs as positional...args. - Class coercion (factor/ts/matrix):
stat_call'scoercefield accepts whitelisted specs (factor,ts(frequency=N), etc.) and applies them before the call.stat_resolve'sclass_hintfield tells you when to use it. - Python workflows: structured errors with
python_state(spawn_failed/modules_missing/crashed/healthy),python_path,missing_modules,recent_stderr, andhint— no separatestat_sessionround trip required. - Verbose R functions: console output is captured/suppressed so it does not pollute the NDJSON channel.
- Handle system: models and data persist in session across calls.
- Install + auto-reindex:
stat_installinstalls and makes packages immediately searchable. - Worker stability: hot-standby pool, crash recovery, handle persistence across recycles.
What works with caveats:
- Python install path: the server uses whatever
python3/PYTHON_PATHresolves to at startup. If youpip installinto a different interpreter, the server won't see the modules. Install into the binarystat_sessionreports underpython.path, or setPYTHON_PATHexplicitly. - Bayesian: rstanarm/brms are slow (MCMC compilation) and classified as
callable_with_caveats.bayestestR::hdi(stanreg_model)currently throws a names-length error on rstanarm fits (upstream bug) — usebayestestR::describe_posterior(model, ci_method="HDI")instead. lm(weights = ...): theweightsarg is captured viamodel.frame, not the rlang/dplyr NSE machinery.expressions={"weights": "1/hp"}is rejected. Workaround: extract the column withstat_extractand pass the resulting numeric vector handle.- S3 dispatch on first positional arg (
randomForest,survival::Surv, etc.): when bothformulaandxare passed, R silently falls through to.default(matrix mode). Workaround: use matrix form (x=,y=) withcoerce={y:"factor"}for classification, or pass the formula as the first positional arg.
What doesn't work yet:
- Only ~2.0k of ~336k functions are classified as callable. The rest are discoverable but blocked by the fail-closed safety model. Extend coverage by adding entries to
data/safety_overrides.csv. - ~14.9k packages are still stubs (no function-level metadata).
data/tarball_targets_phase7.txtcovers 8,500 priority packages. - Tarball expansion is network-bound and incremental.
npm testis hermetic;npm run test:tarball-liverequires live CRAN access. - Top-1 search accuracy is 93%; weakest in
testingandbayesiancategories at 83%. Top-3 remains 100%. - No multi-tenant support — single-user local server only.
Known environment requirements:
- Node 22.x (enforced; better-sqlite3 will crash on other versions)
- R >= 4.1 with jsonlite
- macOS or Linux (not tested on Windows)
- For Python workflows: python3 with sklearn, scipy, statsmodels, pandas
Tier A Packages
Deeply classified packages with safety overrides, curated aliases, and workflow tests:
Core Stats: stats, base, utils, MASS, boot, cluster Tidyverse: dplyr, tidyr, ggplot2, readr, purrr, stringr, forcats, tibble, scales Modeling: lme4, nlme, mgcv, glmnet, survival, sandwich, car, lmtest, forecast ML: caret, randomForest, rpart, nnet, e1071 Model Output: broom, emmeans, marginaleffects, performance, parameters, effectsize Bayesian: rstanarm, brms, bayestestR Specialized: psych, lavaan, vegan, datawizard, insight, haven, data.table, fixest Python: sklearn (linear_model, ensemble, tree, svm, neighbors, cluster, decomposition, preprocessing, metrics, model_selection), statsmodels, scipy.stats, pandas
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.