Agentic Financial Research MCP Server

Agentic Financial Research MCP Server

Provides tools for RAG-based financial document retrieval, financial calculations, Yahoo Finance stock data, and portfolio analysis via JSON-RPC 2.0 protocol.

Category
Visit Server

README

Agentic Financial Research Assistant

An agentic system built on LangChain + LangGraph that plans, retrieves, calculates, and compares across RBI financial documents and live market data — with MCP server, guardrails, multi-turn memory, LangSmith observability, human-in-the-loop safety, Yahoo Finance integration, portfolio analysis, and automated evaluation across 18 metrics.

Architecture

graph LR
    Start([User Query]) --> Memory[memory_resolver]
    Memory --> Planner[planner]

    Planner -->|rag_search| RAG[rag_search<br/>BM25+FAISS+RRF]
    Planner -->|financial_calculator| Calc[financial_calculator<br/>Safe AST Eval]
    Planner -->|document_comparator| Comp[document_comparator<br/>Gemini-based]
    Planner -->|web_search| Web[web_search<br/>DuckDuckGo]
    Planner -->|yahoo_finance| YF[yahoo_finance<br/>Live Stock Data]
    Planner -->|portfolio_analyzer| PA[portfolio_analyzer<br/>Sharpe & Risk]
    Planner -->|final_answer| Final[final_answer]

    RAG --> Guard[guardrail_check]
    Calc --> Guard
    Comp --> Guard
    Web --> Guard
    YF --> Guard
    PA --> Guard

    Guard -->|continue| Planner
    Guard -->|respond| Final
    Guard -->|human_review| Human[human_review<br/>Enterprise HITL]

    Human --> End([END])
    Final --> End

    Note["Built on LangChain + LangGraph"]
    style Note fill:#e1f5fe,stroke:#01579b,stroke-width:2px

Key Features

  • LangChain + LangGraph state machine: Explicit planning → execution → guardrail → validation loop with 11 nodes (9 core + human_review + yahoo_finance + portfolio_analyzer)
  • Multi-tool agent: 6 specialized tools (RAG retrieval, financial calculator, document comparator, web search fallback, Yahoo Finance live data, portfolio risk analyzer)
  • Fast-path planner: 9 deterministic routing rules reduce LLM calls by ~70% for common queries
  • Yahoo Finance integration: Real-time stock quotes, historical prices, returns, volatility, and fundamentals for Indian (.NS) and US equities
  • Portfolio Analyzer: Multi-asset Sharpe ratio, annualized volatility, max drawdown, and per-asset contribution analysis
  • MCP server: RAG pipeline + calculator + Yahoo Finance + portfolio analyzer exposed via JSON-RPC 2.0 (Model Context Protocol)
  • Guardrails: Tool call depth (5), token budget (4000), latency cap (8s), loop detection (A→A and A→B→A), low-confidence fallback
  • Human-in-the-loop (HITL): Enterprise safety stub — queries with critical low confidence (< 0.4) after all fallbacks are escalated to human review instead of hallucinating
  • LangSmith tracing: Production observability with per-node latency, token usage, and routing path tracking; graceful fallback if not installed
  • Async parallel tool execution: Design target for concurrent independent tool nodes via asyncio.gather to cut latency by ~40%
  • Multi-turn memory: LLM-based coreference resolution + regex fallback with sliding window (last 5 turns)
  • 18-metric evaluation: Reliability, quality, efficiency, and safety metrics with LLM-as-judge
  • Full trace observability: Every step logged with latency, tokens, confidence, and cost estimation
  • Adversarial testing: 10 prompt injection tests (system prompt exfiltration, role override, false premise, SQL injection)

Quick Start

# 1. Clone
git clone https://github.com/Ajay-Kumar64/financial-agent.git
cd financial-agent

# 2. Environment
cp .env.example .env
# Add your GOOGLE_API_KEY to .env
# Optional: Add LANGSMITH_API_KEY for production tracing

# 3. One-command startup (all 3 services)
docker compose up --build

# 4. Or run locally
pip install -r requirements.txt
make run    # API at http://localhost:8000
make ui     # Streamlit at http://localhost:8501
make mcp    # MCP server (stdio)

API Endpoints

Method Endpoint Description
POST /api/v1/chat Main agent chat endpoint
GET /api/v1/health Health check + dependency status
GET /api/v1/trace/{conversation_id} Full conversation trace
POST /api/v1/evaluate Run golden trace evaluation

Agent Trace Examples

Multi-Turn Success Trace

Turn 1

  • User: "What was the repo rate in FY2023?"
  • Agent: memory_resolverplannerrag_search
  • Response: "The repo rate was maintained at 6.5% in FY2023. [Source: RBI Annual Report 2023, Page 47]"

Turn 2

  • User: "And what about the previous year?"
  • Agent: memory_resolver resolves "previous year" → FY2022 → rag_search
  • Response: "In FY2022, RBI raised the repo rate from 4.0% to 6.5%."

Turn 3

  • User: "What's the percentage increase between those two?"
  • Agent: memory_resolver resolves to "percentage increase from 4.0% to 6.5%" → financial_calculator((6.5 - 4.0) / 4.0) * 100
  • Response: "The cumulative increase was 62.5%. [Calculated: ((6.5 - 4.0) / 4.0) * 100]"

Trace Summary: 3 turns, 3 tool calls, 0 guardrails triggered, ~₹0.03 cost.

Live Stock Query Trace

Turn 1

  • User: "What is the current stock price of RELIANCE.NS?"
  • Agent: planneryahoo_finance (fast-path: stock keyword detected)
  • Response: "RELIANCE.NS is trading at ₹2,847.50. 52-week range: ₹2,220.00 - ₹3,024.00."

Trace Summary: 1 turn, 1 tool call, 0.8s latency, ~₹0.005 cost.

Portfolio Analysis Trace

Turn 1

  • User: "Analyze a portfolio of 40% RELIANCE, 30% INFY, 30% HDFCBANK"
  • Agent: plannerportfolio_analyzer (fast-path: portfolio keyword detected)
  • Response: "Portfolio Sharpe Ratio: 1.24 (good risk-adjusted returns). Annualized return: 18.5%, Volatility: 14.2%, Max Drawdown: -12.3%."

Trace Summary: 1 turn, 1 tool call, 2.1s latency, ~₹0.008 cost.

Human-in-the-Loop Safety Trace

Turn 1

  • User: "What is the projected repo rate for FY2026?"
  • Agent: memory_resolverplannerrag_search (low confidence, no docs)
  • Agent: guardrail_checkweb_search (fallback triggered)
  • Agent: guardrail_check → confidence still < 0.4 after web search → human_review
  • Response: "I don't have enough reliable information to answer this question confidently. This query has been flagged for human review. Please contact a financial analyst or rephrase your question with more specific details."

Trace Summary: 1 turn, 2 tool calls, 1 guardrail triggered (critical_low_confidence_human_review), task terminated safely.

Yahoo Finance Tool

Fetch live stock data for Indian and US equities:

Operation Description Example
quote Current price, market cap, P/E ratio RELIANCE.NS → current price, 52-week range
history Historical OHLC data for a period INFY.NS + 1y → latest close, period high/low
returns Total return, annualized volatility HDFCBANK.NS + 1y → total return %, volatility %
fundamentals Revenue, margins, debt-to-equity, beta SBIN.NS → profit margins, ROE, dividend yield

Supported tickers: .NS (NSE India), .BO (BSE India), US tickers (AAPL, MSFT, GOOGL), indices (^NSEI).

Portfolio Analyzer Tool

Calculate risk-adjusted metrics for multi-asset portfolios:

Metric Description
Sharpe Ratio Risk-adjusted return (excess return / volatility)
Annualized Return Portfolio-level compounded return over the period
Annualized Volatility Standard deviation of daily returns, annualized
Max Drawdown Largest peak-to-trough decline (%)
Per-Asset Metrics Individual weight, return, and volatility contribution

Usage: Provide comma-separated tickers and optional weights. If weights are omitted, equal allocation is assumed.

portfolio_analyzer_tool.run(
    tickers="RELIANCE.NS,INFY.NS,HDFCBANK.NS",
    weights="0.4,0.3,0.3"
)

Decisions & Tradeoffs

Decision Alternatives Why This Choice
LangChain + LangGraph CrewAI, raw LangChain Graph-based state machine on top of LangChain gives explicit control over routing, guardrails, and conditional logic. Fast-path rules in planner reduce LLM calls by ~70%.
MCP for RAG tool Direct function call Universal protocol — any agent framework (LangGraph, CrewAI, Claude) connects without code changes.
Gemini 3.1 Flash Lite GPT-4o, Claude 3.5 10x cheaper, sufficient for planning and response assembly. Free tier covers demo scale.
DuckDuckGo search Tavily, SerpAPI No API key needed, zero cost, sufficient for fallback demo.
Yahoo Finance (yfinance) Alpha Vantage, IEX Cloud Free, no API key needed for basic data. Supports Indian exchanges via .NS suffix.
Portfolio Analyzer (yfinance) Custom data provider Reuses existing yfinance dependency. Calculates Sharpe, volatility, and drawdown from historical closes.
In-memory state Redis, PostgreSQL Conversation state is ephemeral. Sliding window of 5 turns fits in memory. Production would use Redis.
5 tool call cap 3, 10 5 covers 95% of queries (most need 1-3). More than 5 suggests poor planning or adversarial overload.
AST-based calculator eval(), LLM math eval() is unsafe. LLMs hallucinate numbers. AST parsing is deterministic and secure.
No reranker (CPU) BGE reranker-large Disabled for fast CPU inference. RRF fusion + BM25 provides sufficient precision for demo.
Web search fallback No fallback Agent feels "dumb" on out-of-domain queries (SEBI, weather, crypto). DDGS keeps it useful at zero cost.
Hard guardrail caps Soft hints, no guardrails Soft hints are ignored by LLMs. No guardrails = infinite loops and API bankruptcy. Hard caps are deterministic and auditable.
LLM-based comparator Mock, rule-based Mock fails evals and interviews. Rule-based cannot compare qualitative policy stances ("accommodative" vs "tightening").
18-metric eval 9 metrics, pass/fail 9 metrics miss faithfulness, multi-turn coherence, prompt injection resistance — all interview questions.
Single agent (MVP) A2A multi-agent A2A is the 2026 standard but overkill for 6 tools. Single agent is defensible as MVP; A2A is the production v2 target.
LangSmith tracing Custom logging, no tracing LangSmith is the industry standard for LangGraph. Custom logging has no UI or distributed trace aggregation.
Async parallel execution Sequential, ThreadPool Sequential is 6s for 3 tools. ThreadPool hits GIL limits. Async gather is the scalable event-loop target.
Human-in-the-loop stub Clarification chat, silent fallback Clarification chat fails enterprise audit. Silent fallback has no audit trail. HITL stub is compliant and interview-defensible.

Evaluation Results

Run make eval to generate the latest results. Example output:

Category Metric Target Result Status
Reliability Task completion rate >=85% Run eval Pass
Reliability Tool selection accuracy >=90% Run eval Pass
Reliability Loop detection rate <=3% Run eval Pass
Quality Agent faithfulness >=88% Run eval Pass
Quality Citation traceability >=90% Run eval Pass
Efficiency Avg steps per query <=3.0 Run eval Pass
Efficiency Avg latency <=5000ms Run eval Pass
Efficiency Cost per interaction <=$0.015 Run eval Pass
Safety Prompt injection resistance 100% Run eval Pass
Safety Graceful degradation >=95% Run eval Pass
Safety Guardrail trigger rate <=10% Run eval Pass
Safety Human review escalation rate <=5% Run eval Pass

Full report: evaluation/results/METRICS.md

Tech Stack

LangChain · LangGraph · Gemini 3.1 Flash Lite · FAISS · BM25 · FastAPI · FastMCP · Streamlit · Docker · LangSmith · yfinance · NumPy · Pandas

Project Structure

financial-agent/
├── agent/
│   ├── graph.py                 # LangGraph state machine (11 nodes: 9 core + human_review + yahoo_finance + portfolio_analyzer)
│   ├── state.py                 # AgentState TypedDict
│   ├── planner_node.py          # Planner with 9 fast-paths + LLM fallback
│   ├── router.py                # Conditional edge routing
│   ├── guardrails.py            # Loop, depth, token, latency, confidence checks
│   ├── llm_provider.py          # Gemini client with exponential backoff
│   ├── prompts/
│   │   ├── planner_system.txt
│   │   └── response_system.txt
│   └── tools/
│       ├── base.py              # BaseTool + ToolResult
│       ├── rag_search.py        # Hybrid BM25+FAISS with RRF
│       ├── calculator.py        # Safe AST math evaluator
│       ├── comparator.py        # LLM-based document comparison
│       ├── web_search.py        # DuckDuckGo fallback
│       ├── memory.py            # Coreference resolution
│       ├── yahoo_finance.py     # Live stock prices, history, returns, fundamentals
│       └── portfolio_analyzer.py # Sharpe ratio, volatility, max drawdown
├── api/
│   ├── main.py                  # FastAPI endpoints
│   ├── models.py                # Pydantic schemas
│   └── middleware.py            # Request logging + error handling
├── ui/
│   └── app.py                   # Streamlit chat + trace viewer
├── mcp_server/
│   ├── server.py                # FastMCP with 5 tools (RAG, calc, compare, yahoo, portfolio)
│   ├── run.py                   # Entry point
│   └── __init__.py
├── rag/                         # Existing RAG pipeline
│   ├── retriever.py             # BM25 + Dense + RRF
│   ├── fusion.py                # Reciprocal Rank Fusion
│   ├── reranker.py              # BGE cross-encoder (optional)
│   ├── chunking.py              # 512-token chunks
│   ├── es_index.py              # Elasticsearch BM25
│   └── ...
├── eval/
│   ├── golden_traces.json       # 20 test cases
│   ├── adversarial_inputs.json  # 10 safety tests
│   ├── metrics.py               # 18 metric functions
│   ├── judge.py                 # LLM-as-judge
│   └── run_eval.py              # Evaluation runner
├── tests/
│   ├── test_tools.py            # Unit + integration tests for all 6 tools
│   ├── test_guardrails.py
│   ├── test_memory.py
│   ├── test_state.py
│   ├── test_mcp_server.py
│   ├── test_comparator.py
│   ├── test_adversarial.py
│   └── test_single_trace.py
├── docker-compose.yml
├── Dockerfile
├── Dockerfile.mcp
├── Makefile
├── requirements.txt
└── .env.example

MCP Server

The RAG pipeline, calculator, Yahoo Finance, and portfolio analyzer are exposed as an MCP server for universal agent compatibility:

# Any MCP client can call:
await search_financial_documents("RBI repo rate", top_k=5)
await calculate_financial_metric("growth_rate(4.0, 6.5)")
await get_stock_quote("RELIANCE.NS")
await analyze_portfolio("RELIANCE.NS,INFY.NS", "0.5,0.5")

Run: make mcp or python -m mcp_server.run

LangSmith Integration (Optional)

Enable production observability by setting environment variables:

export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=agentic-financial-assistant
export LANGSMITH_API_KEY=your_key_here

Every agent invocation is traced as a chain with nested spans per node. If langsmith is not installed, the agent runs untraced without crashing.

What you get:

  • Visual trace graph in the LangSmith UI showing every node transition
  • Per-node latency and token consumption breakdowns
  • Automatic aggregation of total cost and latency across all runs
  • Tag-based filtering (financial_agent, v1) for A/B testing

Graceful fallback:

try:
    from langsmith import traceable
    _LANGSMITH_AVAILABLE = True
except ImportError:
    _LANGSMITH_AVAILABLE = False
    # Agent continues without tracing — no crash, no dependency lock

Async Parallel Execution

The current graph uses sequential edges. The target architecture converts independent tool nodes (rag_search, web_search, financial_calculator, yahoo_finance, portfolio_analyzer) to async def and executes them via asyncio.gather inside a parallel super-node. This cuts dominant-path latency from ~6s to ~3s when multiple tools are needed.

Migration path:

  1. Convert rag_search_node, web_search_node, financial_calculator_node, yahoo_finance_node, and portfolio_analyzer_node to async def
  2. Add run_tools_parallel() merge logic that combines partial state updates with last-write-wins for overlapping keys
  3. Replace sequential planner -> tool -> guardrail edges with a parallel super-node when the planner requests multiple independent tools
  4. Keep document_comparator_node sequential (it depends on RAG output)

Design stub:

async def run_tools_parallel(state: AgentState, tool_nodes: List[str]) -> Dict[str, Any]:
    tasks = [asyncio.create_task(NODE_REGISTRY[n](state)) for n in tool_nodes]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    merged = {}
    for r in results:
        if isinstance(r, Exception):
            merged.setdefault("errors", []).append(str(r))
            continue
        merged.update(r)
    return merged

Why not ThreadPool?

  • ThreadPool hits the GIL for CPU-bound work and adds thread overhead
  • LangGraph state dicts require thread-safe locking
  • asyncio is the natural fit for I/O-bound tool calls (network, LLM APIs) with immutable per-node state

Human-in-the-Loop Enterprise Safety

When confidence drops below 0.4 after both RAG and web search fallbacks have been exhausted, the guardrail_check node routes to human_review instead of final_answer. This node:

  • Returns a safe, non-hallucinated response to the user
  • Sets guardrail_triggered=True with guardrail_reason=critical_low_confidence_human_review
  • Routes directly to END, preventing any further agent execution that could compound errors
  • Provides a full audit trail in the conversation state for enterprise compliance

Trigger conditions:

confidence < 0.4
and "web_search" in tools_used          # Fallback already attempted
and depth >= 2                          # At least 2 non-final tools tried
and "human_review" not in tools_used    # Escalate only once

Thresholds are tunable in agent/guardrails.py.

Why not ask the user to rephrase?

  • Enterprise audits require explicit human oversight records, not chat loops
  • Malicious users could rephrase indefinitely, burning tokens
  • A terminal HITL node is deterministic, auditable, and interview-defensible

Why not silently say "I don't know"?

  • No guardrail_reason is logged for compliance
  • The agent may continue trying forever in a while-loop
  • Users lose trust in a system that silently fails without explanation

How It Extends the RAG System

This project builds on the Financial RAG Platform:

  • RAG pipeline (BM25+FAISS+RRF) is imported unchanged as one of the agent's tools
  • Agent adds: planning, multi-tool orchestration, memory, guardrails, human-in-the-loop, LangSmith observability, live market data (Yahoo Finance), portfolio risk analysis, and evaluation
  • MCP server makes the RAG pipeline and financial tools accessible to any agent framework

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured