fi-lookup-mcp
Resolves messy financial institution records against canonical regulatory identifiers from FDIC, NCUA, and FFIEC public datasets.
README
fi-lookup-mcp
A personal portfolio project demonstrating a tool-use, reconciliation, and lineage-tracing pattern over public regulatory data, implemented as a local MCP (Model Context Protocol) server. It speaks stdio, so it works with any MCP host — Claude Code (CLI) and Claude Desktop are both supported.
Built by Nelson Anievas, with development assisted by Claude Code. Public data only — no proprietary or employer systems involved.
What It Does
This server exposes 11 tools that allow an AI agent to resolve, enrich, and track the history of US financial institution records using canonical regulatory identifiers from FDIC, NCUA, and FFIEC public datasets.
The server handles three distinct patterns:
- Reconciliation: given a dirty external record (e.g.
"Mtn America FCU, Sandy UT"), return ranked candidate matches with confidence scores and match reasons - Lineage tracing: given an RSSD ID, return the full merger, acquisition, rebrand, and consolidation history — predecessors, successors, parent company, and subsidiaries — with real names resolved across 223,750 active and historical institutions
- Change feed: return all transformation events (mergers, failures, rebrands, splits) within a configurable lookback window, filterable by institution type, event type, and state — for dataset maintenance and regulatory monitoring
Tools
search_institutions
Free-text name search across all FDIC banks and NCUA credit unions. Supports filtering by institution type and state. Returns ranked candidates with fuzzy match scores.
get_institution_profile
Full regulatory profile lookup by any identifier — FDIC cert, NCUA charter number, or RSSD ID. Returns all available metadata including regulator, charter type, ABA routing number, deposit account count, and web address.
reconcile_institution
The centerpiece reconciliation tool. Takes a messy external record (name, optional city/state/identifier) and returns ranked candidate matches, each with a confidence score (0–1) and human-readable match reasons.
Scoring blends:
- Name similarity (0.6 weight): token-set ratio + Jaro-Winkler, with abbreviation expansion (FCU → federal credit union, Mtn → mountain, N.A. → national association)
- Geographic agreement (0.4 weight): state match (0.6) + city match (0.4)
- Exact identifier override: if a cert, charter, or RSSD is provided and matches, confidence is set to 1.0
crosswalk_identifiers
Translates between FDIC cert, NCUA charter number, and RSSD ID. Explains regulatory boundaries (e.g. why a credit union has no FDIC cert).
get_institution_history
Returns the full merger, acquisition, and rebrand lineage for any institution by RSSD ID. Resolves real names for both active and defunct predecessor/successor institutions using a 223,750-record historical name lookup built from FFIEC NIC active and closed attributes files. Includes parent company and subsidiary relationships.
Example output for JPMorgan Chase (RSSD 852218): 52 predecessors including Washington Mutual (FDIC-assisted, 2008), Bank One (merger, 2004), and Bear Stearns entities — all with resolved names and dates.
get_recent_changes
A configurable regulatory change feed built from FFIEC NIC Transformations data. Returns mergers, failures, rebrands, splits, and other structural events within a lookback window. Useful for identifying institutions that have changed status and may need dataset updates.
Each event carries the full metadata of both the predecessor and successor (name, type, regulator, city/state, FDIC cert / NCUA charter, ABA routing, deposit accounts, web address). For every predecessor with a portal on record, the tool also fetches its home/login URL and classifies whether it is still operating independently or has been consumed by the acquirer:
independent_portal_live— still served on its own domainconsumed_by_acquirer— redirects to the acquirer's domainredirects_elsewhere— redirects to a third domain (rebrand/division site)unreachable— portal did not respond (likely retired)
Portal checks run concurrently and are reported in a portal_summary tally. Lookups use a one-time RSSD index (O(1)), so the data-only path is near-instant; portal checks are the only network cost and can be tuned or disabled.
Parameters:
days: lookback window (default 365, max 3650)institution_type:"bank","cu", or"all"event_type:"merger","failure","split","rebrand", or"all"state: optional 2-letter state filtercheck_portals: fetch and classify predecessor portals (defaulttrue; setfalsefor an instant data-only feed)max_portal_checks: cap on portals fetched, most-recent first (default 50)
get_top_institutions
Returns the top N institutions ranked by deposit account count, with individual and cumulative market share percentages. Supports filtering by institution type.
export_institutions
Exports the full institution dataset to a CSV file with configurable filters, sorting, and market share calculations.
list_institutions
General-purpose browse/query tool over the complete FDIC + NCUA dataset, exposing all 21 metadata fields per institution (every other tool returns a trimmed projection). One tool that is searchable, filterable, sortable, and exportable:
- Search: case-insensitive substring across any subset of fields (
search_fields, or"all") - Filter: institution type; state (accepts
UTorUtah); min/max deposit accounts;has_routing,has_rssd,has_history - Sort: any field, ascending or descending (numeric fields sort numerically)
- Page:
limit/offsetwithhas_more/next_offsetfor inline browsing;fieldsprojects a subset - Export: set
export_pathto write all matched rows (not just the page) tocsvorjson; bare filenames default under~/Desktop, written atomically
Fields: name, type, source, regulator, city, state, fdic_cert, ncua_charter, rssdid, aba_routing, deposit_accounts, total_assets, web_address, charter_type, charter_type_desc, inst_category, parent_rssd, predecessor_count, successor_count, subsidiary_count.
refresh_cache
Rebuilds the local data snapshot from scratch — re-fetches FDIC data from the BankFind API (latest quarter auto-discovered), auto-downloads the newest NCUA quarterly ZIP, and re-reads the local FFIEC ZIPs. Runs the full NIC enrichment pipeline. Reports the data_as_of date for each source.
refresh_if_changed
Cost-effective conditional refresh: fingerprints all sources (FFIEC ZIP content hashes + latest FDIC/NCUA quarter) and rebuilds only when something actually changed, otherwise skips the expensive reprocessing and returns changed: false. This is the tool the monthly scheduler runs — see Scheduled updates.
Data Sources
All data is public regulatory data. No licensed or proprietary sources.
| Source | Data | Refresh |
|---|---|---|
| FDIC BankFind API | ~4,274 active banks: name, location, cert, RSSD, web address | API call |
| FDIC Financials API | Deposit account counts from most recent quarter | API call |
| NCUA Quarterly ZIP | ~4,336 active credit unions; deposit counts from FS220A; web addresses from FS220D | Manual download |
| FFIEC NIC Active Attributes | ABA primary routing numbers; joined via RSSD/cert/charter | Manual download |
| FFIEC NIC Closed Attributes | Historical institution names for 161,950 defunct entities | Manual download |
| FFIEC NIC Transformations | 59,071 merger/acquisition/rebrand/failure events | Manual download |
| FFIEC NIC Relationships | Parent/subsidiary/branch ownership structure | Manual download |
Total universe: 8,610 active institutions + 223,750 name-resolved historical records
Architecture
Claude Code / Claude Desktop (any MCP host)
|
| MCP stdio transport
v
server.py (FastMCP 3.4.2)
|
+-- search_institutions
+-- get_institution_profile
+-- reconcile_institution --> reconciler.py
+-- crosswalk_identifiers
+-- get_institution_history --> nic_names lookup (223,750 records)
+-- get_recent_changes --> CSV_TRANSFORMATIONS.zip
+-- get_top_institutions
+-- export_institutions
+-- list_institutions --> full dataset: search / filter / sort / export
+-- refresh_cache --> full rebuild (FDIC live + NCUA auto-download + FFIEC)
+-- refresh_if_changed --> conditional rebuild (monthly launchd job)
|
v
data_loader.py nic_loader.py
| |
+-- cache/fdic_institutions.json (NIC-enriched)
+-- cache/ncua_institutions.json (NIC-enriched)
+-- cache/call-report-data-*.zip
+-- cache/CSV_ATTRIBUTES_ACTIVE.zip
+-- cache/CSV_ATTRIBUTES_CLOSED.zip
+-- cache/CSV_TRANSFORMATIONS.zip
+-- cache/CSV_RELATIONSHIPS.zip
Key design decisions:
- Local cache first: runs fully offline after initial build; warm start skips live API calls
- NIC enrichment at save time: predecessor/successor/parent/subsidiary fields are written into the JSON cache so subsequent warm starts load enriched data instantly
- Atomic cache writes:
.tmprename pattern prevents corruption on interrupted writes - Stderr-only logging: never pollutes the MCP stdio JSON channel
- Abbreviation-aware normalization: improves recall on dirty external records
Local Data Setup
The cache/ directory is not committed to Git — populate it manually before first run.
Required downloads
| File | Source |
|---|---|
cache/CSV_ATTRIBUTES_ACTIVE.zip |
FFIEC NIC Data Download — Active Attributes |
cache/CSV_ATTRIBUTES_CLOSED.zip |
FFIEC NIC Data Download — Closed Attributes |
cache/CSV_TRANSFORMATIONS.zip |
FFIEC NIC Data Download — Transformations |
cache/CSV_RELATIONSHIPS.zip |
FFIEC NIC Data Download — Relationships |
FDIC is fetched live from the FDIC BankFind API (latest quarter auto-discovered) and NCUA quarterly ZIPs are now auto-downloaded — neither needs a manual download. Only the four FFIEC NIC ZIPs above must be placed in cache/ by hand, because FFIEC's bulk download is gated against scripted requests.
Scheduled updates
Each record carries a data_as_of date, and the snapshot keeps itself current with a cost-aware refresh strategy:
- FDIC / NCUA — auto-fetch the newest published quarter on every refresh.
- FFIEC — refreshed by dropping new ZIPs into
cache/(the bulk download is 403-gated to scripts, so it can't be auto-pulled). A content hash detects the change. refresh_if_changedrebuilds only when a source actually changed; a no-op run does cheap fingerprint checks (~0.3s CPU) and skips the expensive NIC reprocessing.
A monthly launchd job runs scheduled_refresh.py (which calls refresh_if_changed) at 03:00 on the 1st, logging to cache/refresh.log:
# Install / reload the monthly agent
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.fi-lookup.monthly-refresh.plist
# Run it once on demand
launchctl kickstart -k gui/$(id -u)/com.fi-lookup.monthly-refresh
# Remove it
launchctl bootout gui/$(id -u)/com.fi-lookup.monthly-refresh
Recommended cadence: monthly (bump to weekly only if you depend on the merger change-feed being current within days). The guard makes extra runs nearly free, so erring toward more frequent checks costs little.
Setup
Prerequisites
- Python 3.11+
- An MCP host — Claude Code (CLI) or Claude Desktop
Install
git clone https://github.com/nlsnnvas/fi-lookup-mcp.git
cd fi-lookup-mcp
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Download manual data files
Download the five ZIPs listed in the table above and place them in cache/. FFIEC files are available at ffiec.gov/npw/FinancialReport/DataDownload.
Build the data snapshot
python -c "import asyncio; from data_loader import build_snapshot; asyncio.run(build_snapshot())"
This fetches FDIC data live, reads all local ZIPs, runs NIC enrichment, and writes the JSON cache. Expect 2–3 minutes on first run.
Connect to an MCP host
Claude Code (CLI) — register the server with the venv interpreter:
claude mcp add fi-lookup -- "$(pwd)/.venv/bin/python" "$(pwd)/server.py"
Verify it loaded with claude mcp list, then the tools are available in any claude session in that scope.
Claude Desktop:
fastmcp install claude-desktop server.py --name "fi-lookup"
Then restart Claude Desktop.
Web dashboard (FI Explorer)
A local web UI over the same dataset and tools — no MCP client required. Built with Starlette + uvicorn (both ship with FastMCP, so no extra dependencies).
python web_app.py # serves http://127.0.0.1:8765
python web_app.py --port 9000 # custom port
Four tabs:
- Browse — searchable / filterable / sortable table over all institutions with every metadata field, plus CSV/JSON export (wraps
list_institutions) - Profile & Lineage — enter an RSSD ID for merger/acquisition lineage: predecessors, successors, parent, subsidiaries (wraps
get_institution_history) - Recent Changes — merger/failure/rebrand/split feed with optional portal verification, independent-vs-consumed (wraps
get_recent_changes) - Reconcile — paste a messy record for ranked candidate matches with confidence scores (wraps
reconcile_institution)
It is read-only and bound to 127.0.0.1 (localhost only) by default. It has no authentication, so do not expose it to a network or the internet as-is — see the note below. To reach it from another machine on a trusted LAN for a quick demo, run python web_app.py --host 0.0.0.0 and connect to http://<this-machine-ip>:8765; for anything beyond that, add authentication and serve it behind a proxy/tunnel.
Example Interactions
Reconciliation:
"I have a vendor row that says 'Mtn America FCU, Sandy UT' — what is it?"
reconcile_institution scores ~8,610 institutions and returns Mountain America Credit Union (NCUA #24692) at 0.984 confidence, with ABA routing, deposit account count, and charter type.
Lineage tracing:
"What is the full acquisition history of Bank of America?"
get_institution_history returns 117 predecessor institutions going back to 1960, including the 1998 BankAmerica merger, the 2008 Countrywide acquisition, and the 2009 Merrill Lynch absorption — all with resolved names and dates.
Change feed:
"What bank failures and mergers happened in the last 90 days?"
get_recent_changes returns 108 events grouped by type: 1 FDIC-assisted failure, 107 mergers — including Meadows Bank absorbed by AMERICA FIRST Credit Union and two bank-to-credit-union conversions.
Why This Pattern Matters
Financial institution data is notoriously messy. The patterns here are directly applicable to:
- Matching vendor/counterparty records to a canonical institution master
- Tracing merger lineage for compliance, KYC, or data governance
- Building regulatory change feeds for dataset maintenance automation
- Enriching internal datasets with public regulatory metadata
- Onboarding automation that maps free-text institution names to stable IDs
This project re-expresses reconciliation and lineage patterns from production AI agent work, using only public data.
Stack
- Python 3.11
- FastMCP 3.4.2
- rapidfuzz (fuzzy string matching)
- httpx (async HTTP)
- Claude Code / Claude Desktop (MCP host)
Framing Note
This is a tool-use, reconciliation, and lineage-tracing pattern — not RAG. The model calls structured tools that execute deterministic scoring and lookup logic against a pre-built regulatory snapshot and return ranked, explainable results. The NIC enrichment pipeline runs at startup and writes enriched data to the JSON cache, so subsequent tool calls are fast in-memory lookups.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.