MCP Servers

StatTools

MCP server that lets AI agents discover and call R and Python statistical functions without writing code.

README

StatTools

MCP server that lets AI agents discover and call R and Python statistical functions without writing code.

What It Does

Search ~48k functions on a fresh clone after build-index, and ~336k after the full Phase 7 + 7b tarball waves: "mixed effects model" finds lme4::lmer
Validate before executing: stat_resolve checks safety, generates parameter schema
Execute with structured JSON input/output: no R syntax, no script files, no console parsing
Track session state: data handles, model handles, resolved functions
Call methods on Python objects: model.fit(X, y), model.predict(X_test), scaler.transform(X)
Auto-index after install: stat_install makes new packages immediately searchable

Architecture

Agent (Claude Code / Cursor / custom)
  | MCP protocol (stdio)
  v
TypeScript MCP Server
  |-- SQLite FTS5 search index (~48k fresh-clone baseline, ~336k after the Phase 7 + 7b tarball waves)
  |-- R Worker Pool (persistent subprocess, hot-standby, recycle/crash recovery)
  |-- Python Worker (persistent subprocess, sklearn/statsmodels/scipy/pandas)
  +-- Session state (handles, resolved functions, install jobs)

Quick Start

Prerequisites

Node.js 22.x (enforced — see .nvmrc)
R >= 4.1 with jsonlite package installed
Python 3 with sklearn/statsmodels/scipy/pandas (optional — for Python workflows)

Install & Build

cd stattools
nvm use          # Use pinned Node 22.x
npm install
npm run build

Build the Search Index

npm run build-index

Indexes all installed R packages + CRAN metadata (~2 minutes).

Connect to Claude Code

Add to ~/.claude/settings.json. Use the full path to your Node 22 binary — better-sqlite3 will crash under a different Node version:

{
  "mcpServers": {
    "stattools": {
      "command": "/path/to/.nvm/versions/node/v22.x.x/bin/node",
      "args": ["/absolute/path/to/stattools/dist/index.js"],
      "env": {
        "STATTOOLS_DATA_ROOTS": "/Users/me/data:/tmp",
        "R_PATH": "/path/to/Rscript",
        "PATH": "/path/to/R/bin:/path/to/node/bin:/usr/bin:/bin"
      }
    }
  }
}

Find your Node 22 path with nvm which 22. R_PATH and PATH must include Rscript for the R worker pool to function.

Tools

Tool	Purpose
`stat_search`	Search functions by natural language. Returns ranked results with safety class.
`stat_resolve`	Validate a function + get full parameter schema. Required before `stat_call`.
`stat_call`	Execute a resolved function with JSON arguments. Returns structured results.
`stat_method`	Call a method on a Python session object (fit, predict, transform, score).
`stat_load_data`	Load CSV/TSV/RDS into session. Supports `runtime="python"` for pandas.
`stat_session`	View session state: handles, resolved functions, worker status, install jobs.
`stat_describe`	Inspect a handle: schema, head, dimensions, summary, str.
`stat_install`	Install a CRAN package (async). Auto-indexes on completion.

Example: R Workflow

stat_search({ query: "linear regression" })
  -> stats::lm (safe), MASS::lm.ridge (safe), ...

stat_resolve({ package: "stats", function: "lm" })
  -> { resolved: true, safety_class: "safe", schema: { formula, data, ... } }

stat_load_data({ file_path: "/tmp/sales.csv" })
  -> { object_id: "sales", dimensions: { rows: 1000, cols: 8 }, ... }

stat_call({ package: "stats", function: "lm", args: { formula: "revenue ~ ad_spend", data: "sales" } })
  -> { r_squared: 0.73, coefficients: { ad_spend: { estimate: 2.3, p_value: 0.001 }, ... } }

Example: Python Workflow

stat_load_data({ file_path: "/tmp/data.csv", runtime: "python", name: "df" })
  -> { object_id: "df", class: "DataFrame", dimensions: { rows: 500, cols: 10 } }

stat_resolve({ package: "sklearn.linear_model", function: "LinearRegression" })
  -> { resolved: true, runtime: "python", schema: { ... } }

stat_call({ package: "sklearn.linear_model", function: "LinearRegression", args: {}, assign_to: "model" })
  -> { objects_created: [{ id: "model", type: "model" }] }

stat_method({ object: "model", method: "fit", positional_args: ["X_train", "y_train"] })
  -> { coefficients: [2.3, -0.5], intercept: 1.2 }

stat_method({ object: "model", method: "predict", positional_args: ["X_test"], assign_to: "preds" })
  -> { class: "ndarray", shape: [100], ... }

Safety Model

Functions are classified into tiers:

Class	Behavior
`safe`	Fully callable. Pure computation.
`callable_with_caveats`	Callable with warnings (e.g., NSE, graphics, RNG).
`unsafe`	Blocked. File writes, network, system modification.
`unclassified`	Blocked by default. Discoverable but not callable.

2,024 safety overrides in CSV (~2,048 classified in the built DB including Python defaults). Unclassified functions are blocked — extend coverage by adding entries to data/safety_overrides.csv.

Search Quality

Benchmark: 111 queries across 12 categories.

Fresh clone (after build-index only): ~48k functions, ~570 classified. Benchmark pass rate depends on which packages are installed locally and whether tarball extraction has been run. Expect ~90% on a standard R installation.

Expanded index (after the full Phase 7 + 7b tarball waves + ranking/callability updates): ~336k functions, ~2.0k classified. 100% top-3 and 93% top-1 on 97/97 installable queries (MRR: 0.962) — tested on a machine with a rich local R library including the easystats suite. ML, IO, visualization, mixed-models, wrangling, and diagnostics categories are at 100% top-1; weaker categories (testing, bayesian) sit at 83%.

The headline 100% number requires both a rich local R library and tarball extraction. Your mileage will vary based on which packages are installed.

Environment Variables

Variable	Default	Description
`STATTOOLS_DATA_ROOTS`	Current directory	Colon-separated list of allowed data directories
`R_PATH`	`Rscript`	Path to Rscript binary

Setup Validation

After build + index, verify everything works:

npm run validate     # Checks Node, R, build, index, server, and runs a real workflow

This runs 14 checks including safety-override integrity, starting the MCP server, inspecting Python runtime health, and executing a complete search → resolve → load → call → session workflow.

For real external-client validation through Claude Code CLI, including exact prompts for OLS, mixed-effects, reshape, ggplot2, and glmnet, see AGENT_WORKFLOW_RUNBOOK.md.

Development

nvm use              # Enforce Node 22.x
npm test             # Run the hermetic default test suite
npm run test:tarball-live  # Optional live CRAN tarball smoke test
npm run test:benchmark     # Run the heavy 111-query benchmark separately
npm run test:watch   # Watch mode
npm run build        # Compile TypeScript
npm run build-index  # Rebuild search index
npm run apply-safety-overrides  # Sync safety_overrides.csv into the current DB
npm run check-safety-overrides  # Fail if safety_overrides.csv has orphan or duplicate IDs
npm run validate     # Full setup validation

Status: Beta for Tier A workflows (v0.2.0)

Phase 6 closed with a four-round agent eval going from 80% → 84% → 92% → 98% weighted pass rate on a 25-task representative workflow set. The single remaining non-pass is an upstream R-package bug. See phase6-retrospective.md for the full story.

What works reliably:

Search: ~90% top-3 on a fresh clone. On the fully expanded Phase 7 + 7b index, the benchmark is 100% top-3 and 93% top-1 on 99 installable queries (MRR 0.963).
Core R workflows: OLS, logistic, t-test, ANOVA, correlation, random forest, PCA, k-means, mixed effects (lme4 random intercept/slope/GLMM), survival (Kaplan-Meier, Cox PH, Weibull), robust SE, broom tidy, VIF, stepwise selection, time series (auto.arima, STL, forecast), Bayesian regression (rstanarm), polynomial regression with model comparison, fixest panel regression — all validated end-to-end through agent evals.
Data loading: CSV/TSV/RDS via file_path, built-in R datasets via dataset (mtcars, iris, sleepstudy, lung, cbpp, Grunfeld, AirPassengers, ...), pandas DataFrame via runtime="python". Handles register identically.
NSE-heavy verbs (dplyr, tidyr, ggplot2::aes): stat_call's expressions and dot_expressions fields take R expression strings, parsed via rlang::parse_expr and forwarded as quosures. dplyr data-mask pronouns like n() and tidyselect helpers like everything() / -Species resolve correctly. stat_resolve returns an nse_hint field for ~15 known NSE functions with worked examples.
Multi-object dispatch (anova(m1, m2), AIC(m1, m2)): stat_call's dot_args field resolves session handle IDs as positional ... args.
Class coercion (factor/ts/matrix): stat_call's coerce field accepts whitelisted specs (factor, ts(frequency=N), etc.) and applies them before the call. stat_resolve's class_hint field tells you when to use it.
Python workflows: structured errors with python_state (spawn_failed / modules_missing / crashed / healthy), python_path, missing_modules, recent_stderr, and hint — no separate stat_session round trip required.
Verbose R functions: console output is captured/suppressed so it does not pollute the NDJSON channel.
Handle system: models and data persist in session across calls.
Install + auto-reindex: stat_install installs and makes packages immediately searchable.
Worker stability: hot-standby pool, crash recovery, handle persistence across recycles.

What works with caveats:

Python install path: the server uses whatever python3 / PYTHON_PATH resolves to at startup. If you pip install into a different interpreter, the server won't see the modules. Install into the binary stat_session reports under python.path, or set PYTHON_PATH explicitly.
Bayesian: rstanarm/brms are slow (MCMC compilation) and classified as callable_with_caveats. bayestestR::hdi(stanreg_model) currently throws a names-length error on rstanarm fits (upstream bug) — use bayestestR::describe_posterior(model, ci_method="HDI") instead.
lm(weights = ...): the weights arg is captured via model.frame, not the rlang/dplyr NSE machinery. expressions={"weights": "1/hp"} is rejected. Workaround: extract the column with stat_extract and pass the resulting numeric vector handle.
S3 dispatch on first positional arg (randomForest, survival::Surv, etc.): when both formula and x are passed, R silently falls through to .default (matrix mode). Workaround: use matrix form (x=, y=) with coerce={y:"factor"} for classification, or pass the formula as the first positional arg.

What doesn't work yet:

Only ~2.0k of ~336k functions are classified as callable. The rest are discoverable but blocked by the fail-closed safety model. Extend coverage by adding entries to data/safety_overrides.csv.
~14.9k packages are still stubs (no function-level metadata). data/tarball_targets_phase7.txt covers 8,500 priority packages.
Tarball expansion is network-bound and incremental. npm test is hermetic; npm run test:tarball-live requires live CRAN access.
Top-1 search accuracy is 93%; weakest in testing and bayesian categories at 83%. Top-3 remains 100%.
No multi-tenant support — single-user local server only.

Known environment requirements:

Node 22.x (enforced; better-sqlite3 will crash on other versions)
R >= 4.1 with jsonlite
macOS or Linux (not tested on Windows)
For Python workflows: python3 with sklearn, scipy, statsmodels, pandas

Tier A Packages

Deeply classified packages with safety overrides, curated aliases, and workflow tests:

Core Stats: stats, base, utils, MASS, boot, cluster Tidyverse: dplyr, tidyr, ggplot2, readr, purrr, stringr, forcats, tibble, scales Modeling: lme4, nlme, mgcv, glmnet, survival, sandwich, car, lmtest, forecast ML: caret, randomForest, rpart, nnet, e1071 Model Output: broom, emmeans, marginaleffects, performance, parameters, effectsize Bayesian: rstanarm, brms, bayestestR Specialized: psych, lavaan, vegan, datawizard, insight, haven, data.table, fixest Python: sklearn (linear_model, ensemble, tree, svm, neighbors, cluster, decomposition, preprocessing, metrics, model_selection), statsmodels, scipy.stats, pandas

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured