GemmaJnana
Multi-domain MCP server enabling AI agents to plan and execute vacation travel and birthday party logistics via local tool pipelines.
README
GemmaJnana

A production-pattern reference implementation demonstrating how to build multi-domain AI agent orchestration using Google's Gemma 2 (2B) model, Ollama, the Model Context Protocol (MCP), and an asynchronous FastAPI gateway. The package includes a networked MCP server architecture with dynamic skill discovery, a ReAct reasoning loop, and a fully animated chat playground UI.
Note: This project demonstrates correct architectural patterns (MCP resources, ReAct loops, SSE streaming, multi-domain routing) with mock tool handlers. It is designed as an educational reference and portfolio showcase, not a production-deployed service. See Enterprise Assessment for what a full production deployment would require.
Table of Contents
- Project Architecture Flow
- Dynamic Routing Design Patterns
- Domain Architecture Overview
- Agent Capabilities & Limitations
- File Structure
- Prerequisites
- How to Run
- Running Unit Tests
- API Reference
- Enterprise Assessment
- Step-by-Step Code Execution Trace
Project Architecture Flow
Below is the visual flow of the GemmaJnana local multi-domain architecture.

Data Flow Diagram
graph TD
Client[Playground UI] -->|1. POST Request| Backend[FastAPI Gateway]
Backend -->|2. Run Agent| Agent[Agent Dispatcher]
Agent -->|3a. Load Skills| Skills[(JSON Skills)]
Agent -->|3b. Connect via SSE| MCPServer[MCP SSE Server <br> port 8001 / 8002]
Agent -->|4. list_tools| MCPServer
Agent -->|5. Query LLM| Ollama[Ollama Server]
Ollama -->|6. Tool Calls| Agent
Agent -->|7a. call_tool| MCPServer
MCPServer -->|7b. Run Handler| Tools[Tools Submodule]
Tools -->|7c. JSON Result| MCPServer
MCPServer -->|7d. Tool Output| Agent
Agent -->|8a. Yield Trace| Backend
Backend -->|8b. SSE Cards| Client
Agent -->|9. Final Results| Backend
Backend -->|10. Final Query| Ollama
Ollama -->|11. Stream| Backend
Backend -->|12. SSE Chunks| Client
Ollama -.->|Load Model| Model[(Gemma 2)]
Dynamic Agentic Pipeline Flow
sequenceDiagram
autonumber
participant Client as "Browser Client"
participant App as "app.py"
participant Agent as "agent.py"
participant Ollama as "Ollama Service"
participant MCPServer as "MCP SSE Server (port 8001/8002)"
Note over App,MCPServer: [Step 0] Setup: MCP servers start and write endpoints (ports 8001/8002) to .env
Client->>App: POST /chat/stream with history & domain
Note over App: Reads target MCP URL from .env<br/>(e.g., TRAVEL_MCP_URL)
App->>MCPServer: Connects via SSE & reads resource "skills://list"
MCPServer-->>App: Returns predefined pipelines & sequences
Note over App: Compiles system prompt with skills context
App->>Agent: check_and_run_tools(messages, model, domain)
Note over Agent: Reads target MCP URL from .env<br/>(e.g., TRAVEL_MCP_URL)
Note over Agent: Connects to MCP SSE Server via SSE transport
Agent->>MCPServer: list_tools()
MCPServer-->>Agent: Returns active domain tools
loop ReAct Multi-Step Loop (up to 8 turns)
Agent->>Ollama: [LLM Call] ollama.chat with history + tool context + skills context
Ollama-->>Agent: Returns tool_calls (or empty if finished)
break If no more tool calls
Note over Agent: Exit Loop
end
Agent-->>App: Yield status event (Connecting/Reasoning/Thought)
App-->>Client: Stream SSE trace update (UI Pulsing/Thought card)
Agent->>MCPServer: call_tool(name, arguments)
MCPServer-->>Agent: Returns tool results (JSON string)
Agent-->>App: Yield status event (Tool execution result)
App-->>Client: Stream SSE trace update (UI tool result card)
Note over Agent: Append tool results to message context
end
Agent-->>App: Returns aggregated tool messages, results, and LLM call counts
Note over Agent: Closes connection to MCP SSE Server
App->>Ollama: [LLM Call] Stream final chat response (with complete tool context)
Ollama-->>App: Stream response chunks
App-->>Client: Stream SSE chat chunks
Dynamic Skills & Tools Discovery Mechanics
To establish complete decoupling between the orchestration agent and the domain-specific services, the system relies on dynamic runtime discovery rather than static imports or local directory scanning:
1. Dynamic Skills Discovery (Compile-Time Prompt Construction)
- Goal: Inject domain-specific workflow guidelines (e.g. Flight booking pipeline must follow
search_flights➔book_flight) into the system prompt. - Mechanism: When a client request lands,
app.pytriggersget_system_prompt(domain). This method connects to the matching server SSE URL configured in the environment, performs the MCP protocol handshake, and reads the custom resource URI:res = await session.read_resource("skills://list") - Payload: The MCP server returns a JSON-serialized list of pipelines (each with a name, strict execution sequence, and logic rules).
- Injection: The agent deserializes this list, formats it as Markdown, and appends it to the system prompt. The SSE connection is then closed.
2. Dynamic Tools Discovery (ReAct Runtime Registration)
- Goal: Expose active server capabilities (schemas and definitions) directly to the LLM.
- Mechanism: At the start of
check_and_run_tools(), the agent connects to the resolved target server(s). It performs the standard handshake and requests the list of tools via JSON-RPC:tools_res = await session.list_tools() - Binding: The agent registers the returned schemas in a local
tool_to_sessionmapping. This router matches the requested function name to the corresponding active SSE server session when the LLM makes a tool call. - Awareness: The tool schemas are converted into Ollama's tool configuration format and passed to the model on each reasoning turn (
client.chat(tools=...)).
Key Terminology & Concepts
To help understand the architecture of this project, here is a glossary of the key technologies and design patterns used:
- Model Context Protocol (MCP): An open-standard specification created by Anthropic. It standardizes how client agents (like the Gateway) communicate with external data sources, prompts, and tool servers over a structured protocol.
- Server-Sent Events (SSE): A lightweight, unidirectional web technology standard (part of HTML5) that allows a server to push real-time events asynchronously to a client over a persistent HTTP connection. In this project, SSE is used for:
- MCP SSE Transport: Connecting the client orchestrator to the MCP servers on ports
8001and8002(where the client posts JSON-RPC payloads, and the server pushes event states). - Gateway SSE Streaming: Streaming the dynamic agent execution trace cards (Connecting ➔ Discovered ➔ Thought ➔ Calling Tool ➔ Result) directly to the browser playground in real-time.
- MCP SSE Transport: Connecting the client orchestrator to the MCP servers on ports
- JSON-RPC: A stateless, lightweight Remote Procedure Call (RPC) protocol encoded in JSON. It serves as the standard message format for the MCP specification (e.g.
tools/list,tools/call,resources/read). - ReAct (Reasoning & Acting): An agentic loop pattern that interleaves reasoning thoughts ("what should I do next?") with active executions ("call tool X") in sequential steps to resolve user prompts.
- FastMCP: A high-level Python SDK that sits on top of the base
mcplibrary, making it extremely easy to declare tools (@mcp.tool()) and resources (@mcp.resource()) with automatic schema extraction from Python type hints and docstrings. - MCP Resources: A standard MCP capability enabling servers to expose static or dynamic read-only data (using URI schemes like
skills://list) to client agents.
Dynamic Routing Design Patterns (Automated Domain Switching)
To automate domain and skill selection rather than forcing the user to manually select the domain via the UI dropdown, two primary design patterns can be used:
Pattern 1: Intent Classification (Two-Pass Routing)
A lightweight classification pass is performed on the backend before the agent starts:
- Classifier Turn: The backend intercepts the user's prompt and queries the LLM with a brief instructions:
"Classify the user's query into one of these categories: 'travel' or 'party'. Respond only with the category name."
- Targeted Connection: Once the category is determined, the agent connects only to that specific MCP server (port
8001or8002), loads that domain's instructions, and executes.
- Pros: Keeps LLM context small and focused; minimizes token usage and potential model drift.
- Cons: Requires a preliminary LLM reasoning pass, introducing a small latency before tool execution begins.
Pattern 2: Unified Multi-MCP Router (Single-Pass Agent) - [Implemented in this project]
Exposes all tools and skills from both domains to the LLM simultaneously:
- Unified Client Session: The client (
agent.py) opens dual SSE connections to both the Travel and Party MCP servers at startup. - Tool Aggregation: The complete set of 14 tools is discovered, combined, and registered. When the LLM issues a tool call, the agent automatically routes it to the corresponding server session.
- Preloaded Skills: Predefined JSON skills from both domains are merged into the system prompt, allowing the LLM to select and follow the correct pipeline dynamically based on the input text.
- Pros: Single-pass execution (no classifier latency); allows the LLM to mix-and-match tools from both domains in the same session.
- Cons: Increases the token size of the system prompt and tool pool.
Domain Architecture Overview
GemmaJnana is structured as a multi-domain agentic framework supporting two fully featured planning assistant types. You can toggle between domains dynamically via the UI settings bar:
1. Vacation Travel Planner (mcp_servers/travel/)
Equips the agent with 7 tools to search flight/hotel inventory, make bookings, rent vehicles, schedule tourist attractions, and compile final formatted itinerary documents:
search_flights(origin, destination, date)book_flight(flight_id)search_hotels(city, budget)book_hotel(hotel_name, nights)rent_car(city, car_type)book_attraction(city, activity)generate_travel_itinerary(bookings)
Predefined Travel Skills (JSON pipelines under skills/):
- Full Vacation Planner Pipeline:
search_flights➔book_flight➔search_hotels➔book_hotel➔generate_travel_itinerary - Quick Flight Booking Pipeline:
search_flights➔book_flight➔generate_travel_itinerary - Accommodation & Ground Services Pipeline:
search_hotels➔book_hotel➔rent_car➔book_attraction
2. Birthday Party Planner (mcp_servers/party/)
Equips the AI assistant to manage invitations, budget estimations, venue scheduling, cake ordering, entertainment hiring, theme decorations, and reminders:
invite_guests(guest_names)budget_expenses(rsvp_count)book_venue(venue_name, guest_count)order_cake(flavor, size, inscription)hire_entertainment(type)buy_decorations(theme)send_reminders(guest_emails, location)
Predefined Party Skills (JSON pipelines under skills/):
- Core Event Planning Sequence:
invite_guests➔budget_expenses➔book_venue➔order_cake➔send_reminders - Invitation & Budget Setup:
invite_guests➔budget_expenses➔send_reminders - Logistics & Theme Purchasing:
book_venue➔order_cake➔hire_entertainment➔buy_decorations
Agent Capabilities & Limitations
Before running the application, it is important to understand what this local agentic stack is capable of and its current boundaries:
| What the Agent CAN Do | What the Agent CANNOT Do |
|---|---|
| Enforced Skill Sequences: Strictly follows sequence pipelines defined in JSON (e.g. searching, booking, and itinerary generation in order). | Real-world Bookings: All bookings are mocked locally for testing safety. It does not spend real money or call live airline/hotel APIs. |
| Dynamic Data Passing: Feeds output states from previous steps into subsequent calls (e.g., passing RSVP counts to budget calculation). | Real-Time Inventory Access: The toolset operates on simulated local catalogs; it does not query actual live commercial availability. |
| Interactive Domain Switching: Automatically updates LLM instruction templates, loaded tools, and active skills when domain toggling. | Dynamic Mid-Sequence Input: Once a sequence starts, the agent executes it autonomously; it cannot pause to prompt you for decisions mid-flow. |
| Comprehensive Logging: Formats execution tracer cards in the UI and records detailed step-by-step telemetry in local logs. | Extremely Complex Logic: Running a 2B parameter model locally is excellent for sequence orchestration, but it may occasionally deviate on highly complex reasoning. |
File Structure
ollama-gemma-agents-mcp-skills/
├── mcp_servers/ <-- Multi-domain MCP Servers
│ ├── travel/ <-- Vacation Travel Planner Domain
│ │ ├── skills/ (JSON Skill pipelines)
│ │ ├── tools/ (Tool implementations & schemas)
│ │ └── mcp_server_travel.py (FastMCP Server process)
│ └── party/ <-- Birthday Party Planner Domain
│ ├── skills/ (JSON Skill pipelines)
│ ├── tools/ (Tool implementations & schemas)
│ └── mcp_server_party.py (FastMCP Server process)
├── tests/ <-- Complete Unit Test Suite
│ ├── test_handlers.py (Validates all 14 tool handlers)
│ ├── test_mcp.py (Verifies tool registrations)
│ └── test_skills.py (Validates JSON sequence formats)
├── agent.py (Asynchronous agent orchestrator & ReAct runner)
├── app.py (FastAPI Gateway & SSE event streams)
├── index.html (Beautiful dark-mode chat playground client)
├── logger.py (Global session logger)
├── start.sh (Automation installer & launcher)
└── stop.sh (Automation teardown script)
Prerequisites
To run this application, make sure you have:
- macOS (the automated installer
start.shassumes Mac context). - Python 3.x with dependencies listed in
requirements.txt:pip install fastapi uvicorn ollama mcp fastmcp python-dotenv
How to Run
-
Start the entire service stack:
./start.shThis script automatically updates Ollama, pulls the
gemma2:2bmodel, installs required dependencies, runs the backend API Gateway (port8435), and spins up a local web server (port8080). -
Open the Web Playground: Navigate to
http://localhost:8080in your browser. Toggle between Vacation Travel Planner and Birthday Party Planner domains dynamically using the selector in the upper right. -
Shut down servers gracefully: Press
Ctrl+Cin your terminal, or to guarantee all background processes (FastAPI, Ollama) exit cleanly, run:./stop.sh
Running Unit Tests
We maintain a rigorous test suite validating the registry, schemas, and execution responses of all tool handlers:
python3 -m unittest discover -s tests
API Reference
The backend FastAPI gateway runs at http://127.0.0.1:8435:
GET /health: Diagnoses model presence and connection status.GET /tools?domain=...: Lists active tools dynamically registered by FastMCP for the specified domain.GET /skills?domain=...: Retrieves predefined JSON skill pipelines for the specified domain dynamically from the MCP server resource.POST /chat: Simple non-streaming message response endpoint.POST /chat/stream: Initiates an SSE text stream event channel, sending live tracer cards for active tools.
Enterprise Assessment
This project implements correct architectural patterns for agentic AI orchestration. The following assessment maps each dimension of the current implementation against what a production-deployed enterprise service would require.
Summary
| Area | Current State | Production Requirement | Gap Severity |
|---|---|---|---|
| Connection Lifecycle | Fresh SSE/TCP per request | Connection pooling with persistent sessions | 🔴 Critical |
| Resilience | No timeouts, retries, or circuit breakers | Timeouts on all external calls, retry with backoff | 🔴 Critical |
| Security | Open CORS, no auth, no rate limiting | JWT/OAuth2, RBAC, rate limiting, TLS, secrets vault | 🔴 Critical |
| Scalability | Single-process Uvicorn, http.server for UI |
Multi-worker Gunicorn, Nginx, Redis for shared state | 🔴 Critical |
| Observability | Custom text logger to local files | Structured JSON logs, OpenTelemetry, Prometheus | 🟡 High |
| Deployment | Bash scripts (start.sh / stop.sh) |
Docker Compose / Kubernetes, CI/CD, IaC | 🟡 High |
| LLM Safety | No input/output guardrails | Prompt injection protection, output filtering | 🟡 High |
| API Maturity | No versioning, no pagination | /v1/ versioning, pagination, correlation IDs |
🟡 Medium |
1. Connection Lifecycle (🔴 Critical)
Every chat request opens fresh TCP + SSE + MCP connections, performs work, then tears them down:
# agent.py — runs on EVERY request
async with AsyncExitStack() as stack:
for name, url in urls:
read, write = await stack.enter_async_context(sse_client(url))
session = await stack.enter_async_context(ClientSession(read, write))
await session.initialize() # MCP handshake per request
Impact: 100 concurrent users = 100+ simultaneous TCP connections opening/closing. Each pays latency for TCP handshake + SSE setup + MCP initialize(). Under load, this exhausts file descriptors and causes cascading failures.
Production pattern: Connection pool with health-checked persistent sessions, or a sidecar that maintains long-lived MCP connections.
2. Resilience (🔴 Critical)
No defensive measures exist for external service failures:
# No timeout — a hung MCP server blocks the worker forever
async with sse_client(mcp_url) as (read, write): ...
# No timeout — slow model inference locks the async worker
response = await client.chat(model=model_name, messages=..., tools=...)
| Missing | Impact |
|---|---|
| Request timeouts | Hung server blocks worker indefinitely |
| Circuit breaker | Down server gets hammered with connection attempts |
| Retry with backoff | Transient errors cause immediate failure |
| Graceful degradation | One failed MCP server can block all requests |
3. Security (🔴 Critical)
| Vulnerability | Current Code | Risk |
|---|---|---|
| Open CORS | allow_origins=["*"] in app.py |
Any website can call the API |
| No authentication | No middleware | Anyone on network can invoke tools |
| No rate limiting | None | Single client can exhaust Ollama |
| Plaintext transport | HTTP only | Traffic visible on network |
Secrets in .env |
Plain text file | Exposed if host compromised |
4. Scalability (🔴 Critical)
Frontend — The UI is served by Python's built-in development server, which is single-threaded and blocking:
# start.sh — not suitable for production
python3 -m http.server 8080
Backend — Single Uvicorn process with no worker pool:
# app.py — single worker, no crash recovery
uvicorn.run(app, host=host, port=port)
State — Session tracking is process-local and lost on restart:
# logger.py — in-memory, not shareable across workers
session_files = {}
5. Observability (🟡 High)
| What Exists | What Enterprise Needs |
|---|---|
| Text logger to local files | Structured JSON logs → ELK / Datadog |
| No request correlation IDs | Distributed tracing (OpenTelemetry) |
| No metrics | Prometheus: latency, error rates, queue depth |
Basic /health check |
Liveness + readiness probes for all services |
6. Deployment (🟡 High)
| Current | Enterprise |
|---|---|
bash start.sh |
Docker Compose / Kubernetes manifests |
pkill -f to stop processes |
Container orchestration with restart policies |
| No environment separation | dev / staging / prod configurations |
| No CI/CD | Automated test → build → deploy pipeline |
7. LLM Safety (🟡 High)
| Gap | Risk |
|---|---|
| No prompt injection protection | Users can manipulate system behavior |
| No output guardrails | Model may return harmful content |
| No token budget management | Large histories can exceed context window |
| Hardcoded 8-iteration limit | Not configurable, no warning at limit |
8. Production Architecture Target
To deploy this as an enterprise service, the architecture would evolve to:
graph TD
LB["Load Balancer / CDN"] --> Nginx["Nginx (Static + Reverse Proxy)"]
Nginx --> Auth["Auth Middleware (JWT/OAuth2)"]
Auth --> Gateway["FastAPI Gateway (Multi-Worker)"]
Gateway --> RateLimit["Rate Limiter (Redis)"]
Gateway --> Pool["MCP Connection Pool"]
Pool --> MCP1["MCP Server 1 (Container)"]
Pool --> MCP2["MCP Server 2 (Container)"]
Gateway --> Ollama["Ollama (GPU Node)"]
Gateway --> Redis[("Redis (Session + Cache)")]
Gateway --> Logs["Structured Logging"]
Logs --> ELK["ELK / Datadog"]
Gateway --> Metrics["Prometheus Metrics"]
Metrics --> Grafana["Grafana Dashboards"]
What This Project Is
✅ A clean, well-tested reference architecture demonstrating correct MCP + agentic AI patterns ✅ An educational implementation of ReAct reasoning, SSE streaming, and multi-domain routing ✅ A portfolio showcase with production-grade patterns and mock tool handlers ✅ An excellent starting point for building a production-deployed service
The architectural patterns are sound — they need the operational infrastructure layer (connection pooling, auth, observability, containerization) to carry enterprise production load.
Step-by-Step Code Execution Trace (Debugging Walkthrough)
To make it easy to follow the flow of control, here is a step-by-step trace showing exactly how inputs and outputs travel through the codebase for different scenarios.
Scenario A: Vacation Flight Booking (Travel Domain)
User Prompt: "I want to book a flight from New York to Paris on 2026-08-10 and generate my itinerary."
sequenceDiagram
autonumber
participant UI as "Playground UI"
participant App as "app.py"
participant Agent as "agent.py"
participant MCPServer as "Travel MCP SSE Server (port 8001)"
participant Ollama as "Ollama Service"
Note over App,MCPServer: [Step 0] Setup: Travel MCP server starts on port 8001 and registers URL in .env
UI->>App: POST /chat/stream {"messages": [{"role": "user", "content": "I want to book a flight from New York to Paris on 2026-08-10 and generate my itinerary."}], "domain": "travel"}
Note over App: Reads TRAVEL_MCP_URL from .env
App->>MCPServer: Connects & reads resource "skills://list"
MCPServer-->>App: Returns travel JSON skills
Note over App: Compiles system prompt with skills context
App->>Agent: check_and_run_tools(...)
Note over Agent: Reads TRAVEL_MCP_URL from .env
Note over Agent: Connects to Travel MCP SSE Server (port 8001)
Agent->>MCPServer: list_tools()
MCPServer-->>Agent: Returns travel tools [search_flights, book_flight, search_hotels, book_hotel, rent_car, book_attraction, generate_travel_itinerary]
Note over Agent,Ollama: Iteration 1
Agent->>Ollama: [LLM Call] check tool calls (iteration 1)
Ollama-->>Agent: returns tool_calls: [search_flights(origin='New York', destination='Paris', date='2026-08-10')]
Agent-->>App: Yield status event (Calling search_flights...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='search_flights', arguments={'origin': 'New York', 'destination': 'Paris', 'date': '2026-08-10'})
MCPServer-->>Agent: returns {"flights": [{"flight_id": "FL-101", ...}], "content": "..."}
Agent-->>App: Yield status event (search_flights finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 2
Agent->>Ollama: [LLM Call] check tool calls (iteration 2)
Ollama-->>Agent: returns tool_calls: [book_flight(flight_id='FL-101')]
Agent-->>App: Yield status event (Calling book_flight...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='book_flight', arguments={'flight_id': 'FL-101'})
MCPServer-->>Agent: returns {"booking": {"flight_id": "FL-101", "booking_code": "NVX21M", ...}}
Agent-->>App: Yield status event (book_flight finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 3
Agent->>Ollama: [LLM Call] check tool calls (iteration 3)
Ollama-->>Agent: returns tool_calls: [generate_travel_itinerary(bookings=[{"flight_id": "FL-101", ...}])]
Agent-->>App: Yield status event (Calling generate_travel_itinerary...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='generate_travel_itinerary', arguments={'bookings': [...]})
MCPServer-->>Agent: returns {"itinerary_text": "...", "content": "..."}
Agent-->>App: Yield status event (generate_travel_itinerary finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 4
Agent->>Ollama: [LLM Call] check tool calls (iteration 4)
Ollama-->>Agent: returns no more tool_calls
Agent-->>App: returns tool_messages, tool_results, llm_calls (4)
Note over Agent: Closes SSE connection
App->>Ollama: [LLM Call] Stream final chat response (with tool results)
Ollama-->>App: Stream response chunks
App-->>UI: Stream final SSE chat chunks
Step 0: Server Setup & Configuration (Pre-request Setup)
Before any user request is made, the MCP servers must expose themselves to the gateway:
- Service Execution: The Vacation Travel Planner and Birthday Party Planner MCP servers are launched as separate, standalone processes (listening on ports
8001and8002respectively). - Server Exposure: The servers register their respective SSE connection endpoints within the shared
.envfile:TRAVEL_MCP_URL=http://127.0.0.1:8001/ssePARTY_MCP_URL=http://127.0.0.1:8002/sseThis acts as the local service registry that the agent reads at runtime to locate available servers.
Step 1: Frontend Request
The user selects the Vacation Travel Planner domain and enters the prompt. The browser client (index.html) issues an HTTP POST request to /chat/stream with the conversation history and active domain:
- Payload:
{ "messages": [ {"role": "user", "content": "I want to book a flight from New York to Paris on 2026-08-10 and generate my itinerary."} ], "domain": "travel", "session_name": "default" }
Step 2: Gateway Entry & Agent Call (app.py & agent.py)
- Gateway Call & Skills Discovery: The gateway endpoint
chat_streaminapp.pyreceives the payload. Before calling the agent orchestrator, it fetches the dynamic system prompt by callingget_system_prompt(domain). This method connects to the Travel MCP server via SSE, queries the custom resource URIskills://list, parses the returned JSON containing the travel skills (e.g. Flight Booking, Full Itinerary pipelines), injects them into the system prompt, and closes the session. - Dynamic Server Discovery: The gateway then calls
check_and_run_tools(). The agent reads the system environment variables to resolveTRAVEL_MCP_URL(http://127.0.0.1:8001/sse). - Transport Handshake: The agent opens an SSE connection using
sse_client(mcp_url), initializes aClientSession, and performs the protocol handshake. - Dynamic Tool Registration: The agent calls
list_tools()over JSON-RPC. The Travel MCP server returns the schemas for the 7 travel tools. The agent registers them dynamically in a localtool_to_sessionregistry to route future tool calls to the correct session, and translates them to Ollama's tool-calling definitions.
Step 3: ReAct Reasoning Loop & Execution Trace (agent.py)
- Turn 1 (Search): The agent queries Ollama. Ollama identifies that the user wants to book a flight but needs flight options first, returning a request to call
search_flights(origin='New York', destination='Paris', date='2026-08-10'). The agent executes the tool over HTTP/SSE, which returns a list of flight choices (includingFL-101). - Turn 2 (Booking): The agent queries Ollama with the flight options. Ollama selects flight
FL-101and requestsbook_flight(flight_id='FL-101'). The agent executes this tool and receives a confirmation code. - Turn 3 (Itinerary): The agent queries Ollama with the booking confirmation. Ollama realizes all steps for the requested pipeline are complete and requests
generate_travel_itinerary(bookings=[...]). The agent runs the compiler to produce a structured document. - Turn 4 (Finished): The agent queries Ollama one final time, which returns no further tool calls.
Step 4: Cleanup & Final Inference (app.py)
- The agent closes the active network connection to the travel MCP SSE server.
- The gateway appends the tool messages history to the conversation list and requests the final streaming inference from Ollama.
- Ollama generates a friendly conversational summary containing the travel itinerary details, which streams directly to the frontend.
Scenario B: Party Setup & Venue Booking (Party Domain)
User Prompt: "Invite Bob and Alice, compute the budget, and book venue Cozy Club for 15 guests."
sequenceDiagram
autonumber
participant UI as "Playground UI"
participant App as "app.py"
participant Agent as "agent.py"
participant MCPServer as "Party MCP SSE Server (port 8002)"
participant Ollama as "Ollama Service"
Note over App,MCPServer: [Step 0] Setup: Party MCP server starts on port 8002 and registers URL in .env
UI->>App: POST /chat/stream {"messages": [{"role": "user", "content": "Invite Bob and Alice, compute the budget, and book venue Cozy Club for 15 guests."}], "domain": "party"}
Note over App: Reads PARTY_MCP_URL from .env
App->>MCPServer: Connects & reads resource "skills://list"
MCPServer-->>App: Returns party JSON skills
Note over App: Compiles system prompt with skills context
App->>Agent: check_and_run_tools(...)
Note over Agent: Reads PARTY_MCP_URL from .env
Note over Agent: Connects to Party MCP SSE Server (port 8002)
Agent->>MCPServer: list_tools()
MCPServer-->>Agent: Returns party tools [invite_guests, budget_expenses, book_venue, order_cake, hire_entertainment, buy_decorations, send_reminders]
Note over Agent,Ollama: Iteration 1
Agent->>Ollama: [LLM Call] check tool calls (iteration 1)
Ollama-->>Agent: returns tool_calls: [invite_guests(guest_names=['Bob', 'Alice'])]
Agent-->>App: Yield status event (Calling invite_guests...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='invite_guests', arguments={'guest_names': ['Bob', 'Alice']})
MCPServer-->>Agent: returns {"rsvp_count": 2, "content": "..."}
Agent-->>App: Yield status event (invite_guests finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 2
Agent->>Ollama: [LLM Call] check tool calls (iteration 2)
Ollama-->>Agent: returns tool_calls: [budget_expenses(rsvp_count=2)]
Agent-->>App: Yield status event (Calling budget_expenses...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='budget_expenses', arguments={'rsvp_count': 2})
MCPServer-->>Agent: returns {"budget": {"rsvp_count": 2, "total_estimated_cost": 80, ...}}
Agent-->>App: Yield status event (budget_expenses finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 3
Agent->>Ollama: [LLM Call] check tool calls (iteration 3)
Ollama-->>Agent: returns tool_calls: [book_venue(venue_name='Cozy Club', guest_count=15)]
Agent-->>App: Yield status event (Calling book_venue...)
App-->>UI: Stream SSE tracer update
Agent->>MCPServer: call_tool(name='book_venue', arguments={'venue_name': 'Cozy Club', 'guest_count': 15})
MCPServer-->>Agent: returns {"booking": {"venue_name": "Cozy Club", "guest_count": 15, "confirmation": "VNU-J4YJRH", ...}}
Agent-->>App: Yield status event (book_venue finished)
App-->>UI: Stream SSE tracer update
Note over Agent,Ollama: Iteration 4
Agent->>Ollama: [LLM Call] check tool calls (iteration 4)
Ollama-->>Agent: returns no more tool_calls
Agent-->>App: returns tool_messages, tool_results, llm_calls (4)
Note over Agent: Closes SSE connection
App->>Ollama: [LLM Call] Stream final chat response
Ollama-->>App: Stream response chunks
App-->>UI: Stream final SSE chat chunks
Step 0: Server Setup & Configuration (Pre-request Setup)
Before any user request is made, the MCP servers must expose themselves to the gateway:
- Service Execution: The Vacation Travel Planner and Birthday Party Planner MCP servers are launched as separate, standalone processes (listening on ports
8001and8002respectively). - Server Exposure: The servers register their respective SSE connection endpoints within the shared
.envfile:TRAVEL_MCP_URL=http://127.0.0.1:8001/ssePARTY_MCP_URL=http://127.0.0.1:8002/sseThis acts as the local service registry that the agent reads at runtime to locate available servers.
Step 1: Frontend Request
The user selects the Birthday Party Planner domain and enters the prompt. The browser client (index.html) issues an HTTP POST request to /chat/stream with the conversation history and active domain:
- Payload:
{ "messages": [ {"role": "user", "content": "Invite Bob and Alice, compute the budget, and book venue Cozy Club for 15 guests."} ], "domain": "party", "session_name": "default" }
Step 2: Gateway Entry & Agent Call (app.py & agent.py)
- Gateway Call & Skills Discovery: The gateway endpoint
chat_streaminapp.pyreceives the payload. Before calling the agent orchestrator, it fetches the dynamic system prompt by callingget_system_prompt(domain). This method connects to the Party MCP server via SSE, queries the custom resource URIskills://list, parses the returned JSON containing the party skills (e.g. Core Event Planning, Invitation & Budget sequences), injects them into the system prompt, and closes the session. - Dynamic Server Discovery: The gateway then calls
check_and_run_tools(). The agent reads the system environment variables to resolvePARTY_MCP_URL(http://127.0.0.1:8002/sse). - Transport Handshake: The agent opens an SSE connection using
sse_client(mcp_url), initializes aClientSession, and performs the protocol handshake. - Dynamic Tool Registration: The agent calls
list_tools()over JSON-RPC. The Party MCP server returns the schemas for the 7 party tools (likeinvite_guests,budget_expenses,book_venue). The agent registers them dynamically in a localtool_to_sessionregistry to route future tool calls to the correct session, and translates them to Ollama's tool-calling definitions.
Step 3: ReAct Reasoning Loop & Execution Trace (agent.py)
- Turn 1 (Invite Guests): The agent queries Ollama. Ollama identifies that the user wants to invite Bob and Alice, requesting a call to
invite_guests(guest_names=['Bob', 'Alice']). The agent executes the tool over HTTP/SSE, which returns an RSVP count of 2. - Turn 2 (Compute Budget): The agent queries Ollama with the RSVP update. Ollama selects the
budget_expenses(rsvp_count=2)tool. The agent executes this tool and receives a budget cost calculation of $80. - Turn 3 (Book Venue): The agent queries Ollama with the budget details. Ollama notices that the user also requested booking the venue "Cozy Club" for 15 guests, and invokes
book_venue(venue_name='Cozy Club', guest_count=15). The agent runs this tool and receives the reservation confirmation code (VNU-J4YJRH). - Turn 4 (Finished): The agent queries Ollama one final time, which returns no further tool calls.
Step 4: Cleanup & Final Inference (app.py)
- The agent closes the active network connection to the party MCP SSE server.
- The gateway appends the tool messages history to the conversation list and requests the final streaming inference from Ollama.
- Ollama generates a friendly conversational summary containing the invitation stats, budget estimate, and venue booking details, which streams directly to the frontend.
Step-by-Step Execution Log Trace (As-Is from Logger)
[app.py:event_generator:147] Received chat stream request for domain 'party'. Temperature=0.3
[app.py:event_generator:157] Message history loaded with system prompt for domain 'party'. Total messages: 2
+ [agent.py:check_and_run_tools:158] Connecting to 1 active MCP server(s)...
+ [agent.py:check_and_run_tools:175] Initializing dynamic MCP tool discovery...
[agent.py:check_and_run_tools:213] Discovered 7 tool(s) from MCP server: ['invite_guests', 'budget_expenses', 'book_venue', 'order_cake', 'hire_entertainment', 'buy_decorations', 'send_reminders']
+ [agent.py:check_and_run_tools:217] [LLM Call] Checking if the model requests any tool calls (iteration 1)...
[agent.py:check_and_run_tools:277] Model requested 1 tool call(s) at iteration 1: ['invite_guests']
[agent.py:check_and_run_tools:300] Executing tool 'invite_guests' via MCP with args: {'guest_names': ['Bob', 'Alice']}
[invite_guests.py:handler:25] Sending party invitations to 2 guests...
[invite_guests.py:handler:32] RSVP count received: 2/2
[agent.py:check_and_run_tools:374] Tool 'invite_guests' execution completed successfully.
+ [agent.py:check_and_run_tools:217] [LLM Call] Checking if the model requests any tool calls (iteration 2)...
[agent.py:check_and_run_tools:277] Model requested 1 tool call(s) at iteration 2: ['budget_expenses']
[agent.py:check_and_run_tools:300] Executing tool 'budget_expenses' via MCP with args: {'rsvp_count': 2}
[budget_expenses.py:handler:23] Estimating costs for 2 guests...
[budget_expenses.py:handler:42] Total estimated budget: $80
[agent.py:check_and_run_tools:374] Tool 'budget_expenses' execution completed successfully.
+ [agent.py:check_and_run_tools:217] [LLM Call] Checking if the model requests any tool calls (iteration 3)...
[agent.py:check_and_run_tools:277] Model requested 1 tool call(s) at iteration 3: ['book_venue']
[agent.py:check_and_run_tools:300] Executing tool 'book_venue' via MCP with args: {'venue_name': 'Cozy Club', 'guest_count': 15}
[book_venue.py:handler:28] Reserving venue 'Cozy Club' for 15 guests...
[book_venue.py:handler:41] Venue booked successfully. Confirmation: VNU-J4YJRH
[agent.py:check_and_run_tools:374] Tool 'book_venue' execution completed successfully.
+ [agent.py:check_and_run_tools:217] [LLM Call] Checking if the model requests any tool calls (iteration 4)...
[agent.py:check_and_run_tools:267] No more tool calls requested by the model at iteration 4.
[app.py:event_generator:174] Extending history with 6 tool message(s).
+ [app.py:event_generator:177] [LLM Call] Calling Ollama chat stream...
[app.py:event_generator:193] Stream completed successfully. Sent 288 chunk(s).
[app.py:event_generator:196] Session Summary: Total LLM Calls: 5 | Executed Tool Calls: ['invite_guests', 'budget_expenses', 'book_venue']
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.