rules-as-code-mcp
Deterministic SNAP eligibility logic, exposed as auditable Model Context Protocol tools. An AI agent can read, reason, and orchestrate, but the eligibility determination itself is made by versioned, tested, cited code that a caseworker, an auditor, or a court can inspect.
README
rules-as-code-mcp
Deterministic SNAP eligibility logic, exposed as auditable Model Context Protocol tools. An AI agent can read, reason, and orchestrate, but the eligibility determination itself is made by versioned, tested, cited code that a caseworker, an auditor, or a court can inspect.
This is the keystone of a four-project portfolio on AI in the public benefits safety net. It is the layer the other three projects hand the actual legal decision to.
In government, "the model said so" is not a basis for denying someone food or medical coverage. The boundary between what a model decides and what code decides is the whole reason agentic AI is deployable in the public sector. This project puts that boundary on the table and exposes it over MCP.
What it does
A household's facts go in. A determination comes out, and it is never a bare yes/no. Every determination carries:
- the decision (eligible / ineligible),
- the rule trace: each rule that fired, what it saw, and what it concluded,
- a policy citation behind every rule (7 CFR, USDA FNS, or the Michigan Bridges Eligibility Manual),
- the ruleset version and the policy effective dates it was computed under,
- a reproducible determination id (a hash of the inputs, not a random id, so re-running the same case reproduces the same id).
It implements the federal SNAP financial eligibility test for the 48 contiguous states and DC (FY2026 figures), with Michigan's broad-based categorical eligibility applied as a state option.
Quickstart
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # 83 unit tests over the rules core and server
python eval/run_eval.py # determination correctness (writes eval/report.md)
python eval/run_server_eval.py # server robustness (writes eval/server_report.md)
python clients/demo_client.py # drive the server over stdio (writes a transcript)
To wire the server into Claude Desktop or Claude Code, see
clients/demo.md. A captured end-to-end transcript is in
clients/demo_transcript.md.
The tools
Few and sharp. Three require the caseworker scope; two are anonymous-safe.
| Tool | Scope | Returns |
|---|---|---|
screen_programs(household) |
screening | Coarse "likely eligible" signal across SNAP and a simplified Medicaid income screen. Not a determination; stores nothing. |
check_program_eligibility(program, household) |
caseworker | A full SNAP determination: decision + rule trace + citations + version. |
list_required_verifications(program, household) |
caseworker | The documents a caseworker must verify, derived from the facts the household presents. |
explain_determination(determination_id) |
caseworker | A plain, step-by-step trace of a prior determination by id. |
lookup_policy(question) |
screening | Cited policy answer. Currently an honest stub with a seam to the policy-manual-rag index. |
Architecture
rules-as-code-mcp/
├── rules/ # the deterministic core (no MCP, no network)
│ ├── constants.py # FY2026 figures, derived from the poverty guidelines
│ ├── citations.py # rule_id -> policy citation (enforced by tests)
│ ├── version.py # ruleset version + effective dates
│ ├── models.py # pydantic domain models (also the validation layer)
│ ├── snap.py # the determination engine + net-income calculation
│ ├── programs.py # cross-program screening + verification requirements
│ └── tests/ # 63 unit tests over the logic
├── server/ # the MCP boundary
│ ├── main.py # FastMCP tool registration + error translation
│ ├── tools.py # pure, scope-aware tool logic
│ ├── auth.py # screening vs caseworker scope resolution
│ ├── errors.py # structured, recoverable tool errors
│ ├── store.py # determination cache for explain_determination
│ └── tests/ # 20 unit tests over scope + errors
├── clients/ # real MCP client demo + captured transcript
└── eval/ # labeled cases + the two eval harnesses
The dependency arrow points one way: server/ imports rules/, never the
reverse. The core has no idea it is being served over MCP, which is what keeps it
testable in isolation and reusable by the other portfolio projects (the
benefits-intake-agent imports this core directly or calls it over MCP).
How the rule logic works
For a household with no elderly or disabled member, eligibility is two income tests with deductions in between:
- Asset test. Waived under Michigan's broad-based categorical eligibility.
- Gross income test. Gross monthly income at or below 130% of the federal poverty guideline (200% under BBCE). Households with an elderly (60+) or disabled member are exempt from this test.
- Net income test. After the statutory deductions (20% of earned income, the standard deduction, dependent care, child support paid, medical expenses over $35 for elderly/disabled members, and the excess shelter deduction), net income must be at or below 100% of poverty.
The excess shelter deduction is capped at $744/month, except for households with
an elderly or disabled member, where it is uncapped. All FY2026 dollar figures
live in one file, rules/constants.py, each tied to a
citation. The income limits are the published USDA standards, independently
reproduced in the tests by deriving them from the 2025 HHS poverty guidelines.
When the FY2027 COLA is published, constants.py and version.py change. The
logic in snap.py does not. That separation is the point of rules-as-code.
The scope boundary
Two scopes, resolved from the request context and never from a model-supplied argument, so a model cannot escalate its own privileges:
- screening (anonymous): coarse signals, no PII collected or stored.
- caseworker (authenticated): full determinations, verifications, and stored explanations.
Over stdio the role comes from the RULES_MCP_ROLE environment variable; over
streamable-http it comes from the OAuth bearer token. The domain models carry no
names, SSNs, or addresses (only ages and flags), which is what lets anonymous
screening exist at all.
Evaluation
Evaluation is the deliverable, not the demo. Two harnesses, both runnable as CI
gates with --check.
Determination correctness (eval/run_eval.py, 18 hand-derived
labeled cases spanning eligible / ineligible / near-threshold / elderly /
deduction-edge / asset-waived):
| Metric | Result |
|---|---|
| Decision accuracy | 100% (18/18) |
| Rule-trace correctness (right rules fired) | 100% (18/18) |
| Citation correctness (right citation present) | 100% (18/18) |
| Net-income spot checks | 100% |
| Input-robustness (malformed input rejected) | 100% (7/7) |
Server robustness (eval/run_server_eval.py): 9/9
failure cases (privilege escalation, unsupported program, unknown id, malformed
and out-of-range input) return a clean, correctly-typed, recoverable structured
error. None crash; none leak an unauthorized result. The full failure-case table
with each structured response is in eval/server_report.md.
The labeled cases were derived by hand from 7 CFR 273.9 and the FY2026 standards, independently of the implementation, so the eval is a real check rather than a restatement of the code.
Synthetic data only, never real PII
Every household this project touches is synthetic and illustrative. Handling applicant data correctly is a hard requirement (you cannot touch real benefits data) and a core public-sector competency, so the design keeps PII out of the domain model entirely and gates all case detail behind the caseworker scope.
How it composes with the rest of the portfolio
benefits-intake-agent(the agent) callscheck_program_eligibilityhere rather than reasoning about eligibility itself. The model extracts messy facts; this code makes the call.policy-manual-rag(the RAG index) is whatlookup_policywill delegate to, so one server offers both deterministic determination and cited policy lookup.
Each repo also runs standalone.
What I would do differently at production scale
- Real rule sourcing. The FY2026 figures are hand-entered from USDA and CBPP and verified against the poverty guidelines. At scale these would be ingested from the authoritative releases with a review step, and older rulesets would be retained so any past determination stays reproducible.
- More than the financial test. SNAP has work requirements, categorical and immigration rules, and state variation this does not model. The architecture (one program, fully cited, versioned) is built to extend to those rather than to hide them.
- Production auth. The scope model is real, but the HTTP token table is a demonstration stand-in for an identity provider, and over stdio the role is an environment variable. A deployment would issue and validate scoped tokens through the agency's IdP.
- Persistence and audit. The determination store is process-local. Production needs durable storage with retention and an audit log of who asked for what.
Sources
- MCP: specification (2025-11-25),
Python SDK (
mcp1.27). - SNAP: 7 CFR 273.9, USDA FNS FY2026 COLA, CBPP SNAP eligibility guide, Michigan BEM 213 (FAP categorical eligibility).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.