redflag-mcp

redflag-mcp

An MCP server that provides queryable access to Anti-Money Laundering (AML) red flag knowledge from regulatory documents. It enables compliance officers to ask natural-language questions and receive relevant, sourced red flags from a local vector database.

Category
Visit Server

README

redflag-mcp

MCP server exposing AML red flag knowledge as queryable tools. Compliance officers ask natural-language questions; the server returns relevant, sourced red flags from a local vector database.

Overview

Three distinct workflows:

  1. Extraction — pull AML red flags out of PDFs or web pages using an LLM and save them as YAML
  2. Ingestion — embed the YAML files and load them into the local vector database
  3. Query — MCP server answers semantic search queries against that database

Extraction Pipeline

scripts/extract.py takes a regulatory document (PDF file or URL), sends its text to an OpenAI model, and writes a structured YAML file into data/source/. Each extracted entry includes a source_url linking back to the original document.

Prerequisites

uv sync
export OPENAI_API_KEY=sk-...

Adding PDFs in bulk (recommended workflow)

Step-by-step:

  1. Add the source URL to red_flag_sources/pdflinks.txt — one URL per line, in serial order. Line 1 → key 001, line 2 → 002, etc.
  2. Download the PDF and save it to red_flag_sources/pdf/ named NNN_short_descriptive_name.pdf, where NNN matches its line position in pdflinks.txt.
  3. Regenerate the registry: uv run python scripts/build_sources_registry.py
  4. Run extraction: uv run python scripts/extract.py --parallel

Key constraint: the NNN_ prefix in the filename must match the line number in pdflinks.txt. Line 1 = 001_*.pdf, line 2 = 002_*.pdf, etc. This is how the extractor links each PDF to its public source URL.


PDFs are stored in red_flag_sources/pdf/ and must be named with a zero-padded serial prefix:

red_flag_sources/pdf/
  001_fincen_alert_russian_sanctions_evasion.pdf
  002_ffiec_bsa_aml_examination_manual.pdf
  003_fatf_guidance_virtual_assets.pdf

Each serial number maps to a public URL for the source document. Maintain this mapping in red_flag_sources/pdflinks.txt — one URL per line, in serial order:

# FinCEN Russian Sanctions Evasion Alert
https://fincen.gov/sites/default/files/2022-06/Alert%20FIN-2022-Alert001_508C.pdf

# FFIEC BSA/AML Examination Manual
https://bsaaml.ffiec.gov/manual

# FATF Guidance on Virtual Assets
https://www.fatf-gafi.org/...

Blank lines and lines starting with # are ignored. After editing pdflinks.txt, regenerate sources.yaml:

uv run python scripts/build_sources_registry.py

Then run batch extraction:

uv run python scripts/extract.py --parallel

Only new (unprocessed) PDFs are extracted — previously processed sources are skipped automatically.

Batch extraction commands

# Sequential batch
uv run python scripts/extract.py

# Parallel batch (4 workers by default)
uv run python scripts/extract.py --parallel

# Parallel batch with custom worker count
uv run python scripts/extract.py --parallel 8

# Force re-extract everything
uv run python scripts/extract.py --force --parallel

# Process only PDFs in a serial range (e.g. 001 through 005)
uv run python scripts/extract.py --range 001-005

# Range + parallel
uv run python scripts/extract.py --range 001-005 --parallel

# Force re-extract a range
uv run python scripts/extract.py --force --range 001-005 --parallel

Note: --range applies only to numbered PDFs. Web URLs in Weblinks.md are excluded when a range is active.

Single source (ad hoc)

# Extract from a local PDF
uv run python scripts/extract.py red_flag_sources/pdf/001_fincen_alert.pdf

# Extract from a URL
uv run python scripts/extract.py https://example.com/regulatory-guidance

# Re-extract a source that was already processed
uv run python scripts/extract.py --force red_flag_sources/pdf/001_fincen_alert.pdf

For single-source PDFs, add the URL to pdflinks.txt and run build_sources_registry.py first so the extractor can populate source_url in the output.

What it does

  1. Fetches the document — downloads the web page (strips nav/footer/scripts) or reads text from the PDF via pdfplumber
  2. Sends to OpenAI — prompts gpt-4o-mini (override with OPENAI_EXTRACTION_MODEL) to extract every distinct AML red flag indicator as structured JSON
  3. Validates — each returned flag is checked against the RedFlagSource schema; invalid entries are skipped with a warning
  4. Writes YAML — saves to data/source/<slug>.yaml, one entry per red flag
  5. Updates the manifest — records the source in data/source/.extracted_sources.yaml to prevent re-processing

Output schema

Each entry in the YAML file has the following fields:

Field Type Required Description
id string yes Unique identifier, e.g. 001-fincen-alert-01
description string yes Standalone description of the red flag indicator
source_url string no Public URL of the source document
product_types list[string] no Financial products this applies to (e.g. depository, crypto, msb)
regulatory_source string no Source document name or authority (e.g. FinCEN Alert FIN-2022-Alert001)
risk_level string no high, medium, or low
category string no AML typology (e.g. structuring, sanctions_evasion, shell_company)
simulation_type string no Optional simulation complexity code (e.g. 1A, 2B)

Deduplication

data/source/.extracted_sources.yaml tracks every processed source by its canonical path or URL. Sources already in the manifest are skipped in both batch and single-source mode. Use --force to re-extract a source regardless.


Ingestion

After extraction, embed the YAML files and load them into the vector database:

uv run python scripts/ingest.py

This reads all YAML files in data/source/, generates embeddings with nomic-embed-text-v1.5, and upserts records into LanceDB at data/vectors/. Run this before connecting the MCP server to Claude Desktop — the ~275 MB embedding model downloads on first use and connection timeouts will occur if it happens at server startup.


MCP Server

# Start server (stdio mode, for Claude Desktop / Claude Code)
uv run python -m redflag_mcp

# Start in MCP inspector
uv run mcp dev src/redflag_mcp/server.py

# Start as HTTP server (for OpenAI agents or other HTTP clients)
MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run python -m redflag_mcp

The server exposes three tools: search_red_flags, get_red_flag, and list_filters. It is fully offline after ingestion — no API keys required at query time.


Development

uv sync                          # Install dependencies
uv run pytest tests/             # Run tests
uv run ruff check src/           # Lint
uv run mypy src/                 # Type check

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured