SEC-MCP
MCP server for analyzing SEC filings (10-K, 10-Q, 8-K) with industry-aware financial extraction and BERT-based NLP.
README
SEC-MCP
MCP server for analyzing SEC filings (10-K, 10-Q, 8-K) with industry-aware financial extraction and BERT-based NLP.
Features
- Company Search — Look up companies by ticker or name via SEC EDGAR
- Standardized Financials — Industry-aware XBRL extraction with ~250 concept mappings across 5 industry classes (standard, bank, insurance, REIT, utility)
- Validation — Automatic sanity checks (revenue ≥ net income, accounting equation, segment vs total detection)
- Filing Access — Fetch filing text and specific sections (Risk Factors, MD&A, etc.)
- Sentiment Analysis — FinBERT financial sentiment (positive/negative/neutral)
- Summarization — BART-based hierarchical summarization for long filing sections
- Entity Extraction — NER for companies, people, locations + regex for monetary values, dates, percentages
Setup
# Clone
git clone https://github.com/YOUR_USERNAME/SEC-MCP.git
cd SEC-MCP
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install
pip install -e ".[dev]"
# Configure EDGAR identity (required by SEC)
cp .env.example .env
# Edit .env and set EDGAR_IDENTITY="Your Name your@email.com"
Available Tools
Base / Discovery
| Tool | Description |
|---|---|
search_company |
Search by ticker/name → CIK, ticker, SIC code, industry |
get_filing_list |
List filings, filter by form type (10-K, 10-Q, 8-K) |
Financials (standardized, industry-aware, validated)
| Tool | Description |
|---|---|
get_financials |
Full standardized extraction: metrics, ratios, validation, opt. statements |
get_financials_batch |
Same as above for N tickers in parallel |
get_income_statement |
Just the income statement rows |
get_balance_sheet |
Just the balance sheet rows |
get_cash_flow |
Just the cash flow rows |
get_financial_ratios |
Just computed ratios (margins, ROA, ROE, leverage, etc.) |
compare_companies |
Side-by-side metrics + ratios for multiple tickers |
Filing Text
| Tool | Description |
|---|---|
get_filing_text |
Full filing or specific section text (supports aliases like 'risk factors') |
NLP Analysis
| Tool | Description |
|---|---|
analyze_sentiment |
FinBERT sentiment on text or filing section |
summarize_filing |
Hierarchical BART summarization |
extract_entities |
NER (ORG, PER, LOC, MONEY, DATE, PERCENT) |
analyze_filing |
Combined sentiment + summary + entities in one call |
How financials extraction works
Industry detection
The SIC code is used to classify a company into one of 5 industry classes:
| Class | SIC Range | Revenue Strategy |
|---|---|---|
| standard | Everything else | First match: Revenues, RevenueFromContractWithCustomer, SalesRevenueNet, … |
| bank | 6020–6299 | Try total (Revenues, NetRevenues), then aggregate NII + non-interest + trading + fees |
| insurance | 6310–6411 | Try total, then aggregate premiums + investment income + fees |
| reit | 6500–6553 | Lease revenue + other income |
| utility | 4900–4991 | Electric + gas utility revenue |
XBRL concept dictionary
xbrl_mappings.py maps ~250 XBRL concepts to 20+ standardized metrics. Each metric has an ordered list of concepts to try — earlier entries are preferred. Some entries are marked aggregate=True (sum all matching, used for multi-component revenue like banks).
Validation rules
Every extraction runs these checks:
- revenue ≥ net income (when both positive) — catches segment-only revenue
- Assets = Liabilities + Equity (within 5%) — catches mismatched concepts
- Revenue not null — warns if no concept matched
- Bank segment check — flags if bank revenue < 80% of net income
- Gross margin 0–100% — for standard companies
Warnings are returned in the validation array so the AI can explain or retry.
Usage
Run as MCP server (STDIO)
python -m sec_mcp.server
Using with your app (Cursor, Claude Desktop, etc.)
- Configure MCP so your app starts the SEC-MCP server (see below).
- Set
EDGAR_IDENTITYin.envor in the MCP server env. - The AI chooses the right tool per request:
- "Apple's financials" →
get_financials("AAPL") - "Compare AAPL vs MSFT vs GOOGL" →
compare_companies(["AAPL","MSFT","GOOGL"]) - "Morgan Stanley income statement" →
get_income_statement("MS") - "What are Apple's risk factors?" →
get_filing_textwith section='risk factors'
- "Apple's financials" →
Cursor / Claude Desktop configuration
{
"mcpServers": {
"sec-mcp": {
"command": "python",
"args": ["-m", "sec_mcp.server"],
"cwd": "/path/to/SEC-MCP",
"env": {
"EDGAR_IDENTITY": "Your Name your@email.com"
}
}
}
}
Configuration
| Variable | Default | Description |
|---|---|---|
EDGAR_IDENTITY |
SEC-MCP sec-mcp@example.com |
Your identity for SEC EDGAR API |
SENTIMENT_MODEL |
ProsusAI/finbert |
Sentiment analysis model |
SUMMARIZATION_MODEL |
facebook/bart-large-cnn |
Summarization model |
NER_MODEL |
dslim/bert-base-NER |
NER model |
MAX_CHUNK_TOKENS |
512 |
Max tokens per chunk |
CHUNK_OVERLAP_TOKENS |
128 |
Overlap between chunks |
Architecture
src/sec_mcp/
├── server.py # MCP tool definitions (14 tools)
├── edgar_client.py # EDGAR API wrapper (company search, filings, text)
├── financials.py # Standardized extraction engine + validation
├── xbrl_mappings.py # XBRL concept → metric dictionary (5 industry classes)
├── models.py # Pydantic models (StandardizedFinancials, ratios, etc.)
├── config.py # Environment config
└── nlp/
├── sentiment.py # FinBERT
├── summarizer.py # BART
└── ner.py # NER
NLP Models
Models are lazy-loaded (downloaded on first use, ~2.5GB total):
- ProsusAI/finbert — Financial sentiment, trained on SEC filings
- facebook/bart-large-cnn — Abstractive summarization
- dslim/bert-base-NER — Named entity recognition
Development
# Run tests
pytest
# Run tests (skip slow model tests)
pytest -m "not slow"
# Lint
ruff check src/ tests/
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.