agentic-mcp-server
Enables a fully local multi-agent AI system with 45 specialized agents and 50 predefined workflows via MCP, powered by Ollama and requiring no internet.
README
Multi-Agent AI Application (100% Offline)
A fully local, open-source agentic AI system with 45 specialized agents, 50 predefined workflows, and an MCP server — powered by Ollama. No API keys, no cloud services, no internet required after setup.
Table of Contents
- Complete Installation Guide
- Local LLM Setup (Ollama)
- Local MCP Server Setup
- External Local MCP Servers
- Quick Start
- CLI Commands Reference
- Agents (45)
- Workflows (50)
- Use Cases with Examples
- MCP Client Integration
- Advanced Features
- Configuration Reference
- Project Structure
- Extending the System
- Troubleshooting
Complete Installation Guide
Prerequisites
- Python 3.12+
- 8GB+ RAM (16GB recommended for 13B models)
- ~5GB disk space (for model + dependencies)
- No GPU required (but speeds up inference)
Step 1: Install Ollama (Local LLM Runtime)
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Windows — download installer from https://ollama.com/download
# Verify installation
ollama --version
Step 2: Start Ollama Service
# Start the Ollama daemon
ollama serve
# It runs on http://localhost:11434 by default
# Keep this terminal open, or run as a system service:
# sudo systemctl enable ollama && sudo systemctl start ollama (Linux)
Step 3: Pull a Local LLM Model
# Recommended: good balance of speed and quality
ollama pull llama3.1:8b
# Alternatives:
ollama pull mistral # Fast, 7B params
ollama pull codellama:13b # Best for code tasks
ollama pull qwen2.5:7b # Good multilingual
ollama pull llama3.1:70b # Best quality (needs 40GB+ RAM)
ollama pull deepseek-coder:6.7b # Specialized for code
ollama pull phi3:mini # Smallest, fastest
# Verify model is available
ollama list
Step 4: Set Up the Application
cd multi-agent-app
# Create Python virtual environment
python3 -m venv .venv
# Activate it
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows PowerShell
# .venv\Scripts\activate.bat # Windows CMD
# Install all dependencies
pip install -r requirements.txt
Step 5: Configure (Optional)
Edit config.py if you changed defaults:
OLLAMA_BASE_URL = "http://localhost:11434" # Ollama address
MODEL_NAME = "llama3.1:8b" # Model you pulled
MAX_TOKENS = 2048 # Max response length
Step 6: Run
# Interactive CLI mode
python main.py
# Or as MCP server
python main.py --mcp-server
Local LLM Setup (Ollama)
Managing Models
# List installed models
ollama list
# Pull a new model
ollama pull <model-name>
# Remove a model
ollama rm <model-name>
# Show model details
ollama show llama3.1:8b
# Test a model directly
ollama run llama3.1:8b "Hello, how are you?"
Recommended Models by Use Case
| Use Case | Model | RAM Needed | Speed |
|---|---|---|---|
| General (default) | llama3.1:8b |
8GB | Fast |
| Code-heavy work | codellama:13b |
16GB | Medium |
| Fast responses | mistral or phi3:mini |
4-8GB | Very fast |
| Complex reasoning | llama3.1:70b |
40GB+ | Slow |
| Multilingual | qwen2.5:7b |
8GB | Fast |
| Code + explanation | deepseek-coder:6.7b |
8GB | Fast |
Switch Model at Runtime
No restart needed — switch in the CLI:
/model codellama:13b
/model mistral
Ollama Configuration
# Change Ollama host/port (if needed)
export OLLAMA_HOST=0.0.0.0:11434
# Set GPU layers (for partial GPU offload)
export OLLAMA_NUM_GPU=999
# Set number of threads
export OLLAMA_NUM_THREAD=8
Local MCP Server Setup
This App as an MCP Server
Your multi-agent system IS an MCP server. Start it:
# stdio transport (for Claude Desktop, Cursor, etc.)
python main.py --mcp-server
# SSE/HTTP transport (for web clients or remote access on LAN)
python main.py --mcp-server --transport sse --host 0.0.0.0 --port 8080
What Gets Exposed via MCP
| Type | Count | Description |
|---|---|---|
| Tools | 47 | run_multi_agent + 45 individual agent tools + list_agents |
| Resources | 2 | agents://list, config://system |
MCP Server CLI Arguments
python main.py --mcp-server [OPTIONS]
Options:
--transport {stdio,sse} Transport protocol (default: stdio)
--host HOST Bind address for SSE (default: 0.0.0.0)
--port PORT Port for SSE (default: 8080)
External Local MCP Servers
Your agents can consume tools from OTHER local MCP servers running on your machine.
Install Local MCP Servers
# Install Node.js MCP servers (one-time, cached locally)
npx -y @modelcontextprotocol/server-filesystem /tmp
npx -y @modelcontextprotocol/server-sqlite mydb.sqlite
npx -y @modelcontextprotocol/server-memory
# Or install Python-based MCP servers
pip install mcp-server-fetch
pip install mcp-server-git
Configure External Servers
Edit config.py:
EXTERNAL_MCP_SERVERS = [
# Filesystem access — agents can read/write local files
{
"name": "filesystem",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/jaspal/projects"]
},
# SQLite — agents can query local databases
{
"name": "sqlite",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sqlite", "/home/jaspal/data/app.db"]
},
# Memory/Knowledge base — persistent agent memory
{
"name": "memory",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-memory"]
},
# Git — agents can interact with local repos
{
"name": "git",
"transport": "stdio",
"command": "python",
"args": ["-m", "mcp_server_git", "--repo", "/home/jaspal/projects/myapp"]
},
# Custom local MCP server (running on localhost)
{
"name": "custom-tools",
"transport": "sse",
"url": "http://localhost:9090/sse"
},
]
How It All Connects (100% Local)
┌─────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ┌─────────────┐ ┌──────────────────────────────┐ │
│ │ Ollama │ │ Multi-Agent App │ │
│ │ (Local LLM) │◄───►│ 45 agents + supervisor │ │
│ │ :11434 │ │ MCP server (stdio/SSE) │ │
│ └─────────────┘ └──────────┬───────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌───────────┐ ┌───────────────┐ │
│ │ MCP Server: │ │MCP Server:│ │ MCP Server: │ │
│ │ filesystem │ │ sqlite │ │ memory │ │
│ │ (local files)│ │ (local db)│ │ (local store) │ │
│ └──────────────┘ └───────────┘ └───────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ MCP Clients: Claude Desktop / Cursor / VS Code │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Network: ZERO external traffic │
└─────────────────────────────────────────────────────────┘
Quick Start
# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Run the app
cd multi-agent-app
source .venv/bin/activate
python main.py
Then type:
🧑 You: write a Python REST API with Flask for a todo app
The supervisor automatically routes to the best agent(s) and returns the result.
CLI Commands Reference
Agent Execution
| Command | Description | Example |
|---|---|---|
| (just type) | Auto-route via supervisor | write a REST API for users |
/ask <agent> msg |
Run specific agent | /ask coder write binary search |
/chain <a|b|c> msg |
Chain agents sequentially | /chain coder|reviewer|tester build calculator |
/parallel <a,b> msg |
Run concurrently | /parallel coder,security write login |
/compare <a,b> msg |
Compare outputs | /compare coder,refactorer implement sort |
/workflow <name> msg |
Predefined pipeline | /workflow full_dev build todo app |
/auto msg |
Auto-select workflow | /auto fix the login bug |
/feedback <a> <r> msg |
Iterative refinement | /feedback coder reviewer write parser |
/batch <agent> t1;;t2 |
Batch process | /batch coder sort;;search;;hash |
/stream msg |
Stream response | /stream explain monads |
File Context
| Command | Description | Example |
|---|---|---|
/file <path> msg |
Single file context | /file src/app.py review this |
/files <p1,p2> msg |
Multiple files | /files api.py,db.py find bugs |
Session Management
| Command | Description |
|---|---|
/save [name] |
Save session |
/load <name> |
Load session |
/export |
Export as markdown |
/sessions |
List sessions |
/history |
Show history |
/clear |
Clear history |
Memory
| Command | Description |
|---|---|
/remember <key> note |
Store persistent note |
/recall [key] |
Recall notes |
/forget [key] |
Clear memory |
System
| Command | Description |
|---|---|
/agents |
List all 45 agents |
/workflows |
List all 50 workflows |
/tokens |
Token usage stats |
/model <name> |
Switch model |
/health |
Check Ollama |
/retry |
Re-run last request |
/help |
Show help |
quit / exit |
Exit |
Agents (45)
| Agent | Description | Temp |
|---|---|---|
| researcher | Gathers information and provides summaries | 0.7 |
| coder | Writes clean, production-quality code | 0.3 |
| reviewer | Reviews code/content for quality and correctness | 0.4 |
| planner | Breaks down complex tasks into actionable steps | 0.5 |
| debugger | Diagnoses errors and suggests targeted fixes | 0.2 |
| writer | Writes documentation, emails, and reports | 0.6 |
| tester | Writes test cases and testing strategies | 0.3 |
| optimizer | Performance optimization and bottleneck analysis | 0.3 |
| security | Security analysis and vulnerability detection | 0.2 |
| data_analyst | Data analysis, SQL queries, and data modeling | 0.4 |
| devops | CI/CD, Docker, Kubernetes, and infrastructure | 0.3 |
| translator | Translation and localization between languages | 0.5 |
| architect | System architecture design and trade-off analysis | 0.4 |
| mentor | Explains concepts and guides learning | 0.6 |
| summarizer | Condenses content into key points and summaries | 0.3 |
| api_designer | API design, OpenAPI specs, and contracts | 0.3 |
| database | Schema design, SQL optimization, and DB architecture | 0.3 |
| ux_designer | UI/UX design, wireframes, and accessibility | 0.5 |
| refactorer | Code restructuring and maintainability improvements | 0.2 |
| explainer | Code walkthroughs and detailed explanations | 0.5 |
| validator | Verifies implementations match requirements | 0.2 |
| automator | Automation scripts, CLI tools, and workflows | 0.3 |
| migrator | Code/database/infrastructure migrations | 0.3 |
| prompt_engineer | Crafts and optimizes LLM prompts | 0.4 |
| diagrammer | Creates Mermaid/PlantUML system diagrams | 0.3 |
| estimator | Effort and time estimation for tasks | 0.4 |
| compliance | Regulatory compliance and standards audits | 0.2 |
| product_manager | Requirements, user stories, and prioritization | 0.5 |
| interviewer | Interview questions and answer evaluation | 0.5 |
| git_expert | Git workflows, branching, and conflict resolution | 0.3 |
| accessibility | WCAG compliance and inclusive design | 0.3 |
| performance_tester | Load testing and scalability analysis | 0.3 |
| error_handler | Error handling patterns and resilience | 0.3 |
| documentation | API docs, changelogs, and guides | 0.5 |
| regex_expert | Crafts and explains regular expressions | 0.2 |
| shell_expert | Shell scripting and Unix tools | 0.3 |
| ml_engineer | ML pipelines, training, and evaluation | 0.4 |
| concurrency | Async, threading, and parallel processing | 0.3 |
| config_manager | Configuration, env vars, and feature flags | 0.3 |
| code_generator | Boilerplate, scaffolding, and templates | 0.3 |
| tech_lead | Technical decisions and team guidance | 0.4 |
| seo_expert | SEO optimization and web performance | 0.4 |
| monitoring | Observability, alerting, and SRE practices | 0.3 |
| networking | DNS, load balancing, and network architecture | 0.3 |
| contract_tester | API contract testing and compatibility | 0.2 |
Workflows (50)
Development
| Workflow | Pipeline |
|---|---|
full_dev |
planner → coder → reviewer → tester |
code_review |
coder → reviewer → tester |
bug_fix |
debugger → coder → tester |
refactor |
explainer → refactorer → reviewer → tester |
scaffold |
planner → code_generator → coder → tester |
optimize |
coder → optimizer → reviewer |
error_resilience |
error_handler → coder → tester → reviewer |
concurrent_system |
architect → concurrency → coder → tester → reviewer |
API & Backend
| Workflow | Pipeline |
|---|---|
api_build |
api_designer → coder → tester → writer |
api_full |
api_designer → code_generator → coder → tester → documentation → security |
api_contract |
api_designer → contract_tester → tester → documentation |
db_design |
planner → database → reviewer |
microservice |
architect → api_designer → coder → contract_tester → devops |
full_stack |
planner → architect → api_designer → database → coder → tester |
data_pipeline |
data_analyst → coder → tester → devops |
DevOps & Infrastructure
| Workflow | Pipeline |
|---|---|
deploy |
devops → security → validator |
production_ready |
coder → error_handler → security → performance_tester → devops |
release |
tester → security → compliance → documentation → devops |
observability |
monitoring → devops → shell_expert |
config_setup |
config_manager → devops → validator |
network_setup |
networking → security → devops → validator |
git_workflow |
git_expert → devops → automator |
shell_automation |
shell_expert → automator → tester |
Security & Quality
| Workflow | Pipeline |
|---|---|
security_audit |
coder → security → compliance |
compliance_check |
security → compliance → validator |
full_review |
explainer → reviewer → security → optimizer → accessibility |
perf_audit |
performance_tester → optimizer → reviewer |
Frontend & UX
| Workflow | Pipeline |
|---|---|
frontend |
ux_designer → coder → accessibility → reviewer |
ux_audit |
ux_designer → accessibility → reviewer |
seo_optimize |
seo_expert → coder → performance_tester |
Documentation & Learning
| Workflow | Pipeline |
|---|---|
docs |
researcher → writer → reviewer |
learn |
researcher → mentor → summarizer |
code_explain |
explainer → diagrammer → summarizer |
team_onboard |
documentation → diagrammer → mentor → explainer |
translate |
translator → reviewer → writer |
Planning & Management
| Workflow | Pipeline |
|---|---|
design |
planner → architect → diagrammer |
estimate |
planner → estimator → reviewer |
tech_spec |
product_manager → architect → api_designer → estimator |
tech_decision |
researcher → tech_lead → architect → estimator |
mvp |
product_manager → planner → coder → tester |
startup_mvp |
product_manager → planner → architect → code_generator → coder → tester → devops |
Migration & Modernization
| Workflow | Pipeline |
|---|---|
migrate |
planner → migrator → tester → reviewer |
legacy_modernize |
explainer → architect → migrator → coder → tester |
Incident & Operations
| Workflow | Pipeline |
|---|---|
incident |
debugger → devops → summarizer |
incident_response |
debugger → monitoring → devops → summarizer → documentation |
Specialized
| Workflow | Pipeline |
|---|---|
ml_project |
researcher → ml_engineer → coder → tester → documentation |
regex_build |
regex_expert → tester → explainer |
prompt_craft |
prompt_engineer → tester → optimizer |
interview_prep |
researcher → interviewer → mentor |
onboarding |
explainer → mentor → diagrammer |
Use Cases with Examples
🚀 Build a New Feature
/workflow full_dev implement user authentication with JWT and refresh tokens
🐛 Fix a Bug
/workflow bug_fix TypeError: Cannot read property 'map' of undefined in UserList.tsx
🏗️ Design a System
/workflow design design a real-time notification system for 100k users
📝 Write Documentation
/workflow docs document the payment processing module with API reference
🔒 Security Review
/files src/auth.py,src/middleware.py security audit these files
⚡ Optimize Performance
/workflow perf_audit our API response time is 2s, analyze and optimize
🎯 Direct Agent Call
/ask coder write a Python decorator for caching with TTL
/ask database design a schema for multi-tenant SaaS
/ask devops write a GitHub Actions CI/CD pipeline for a Node.js app
/ask shell_expert write a bash script to backup PostgreSQL daily
🔄 Iterative Refinement
/feedback coder reviewer write a thread-safe LRU cache in Python
(Coder writes → reviewer critiques → coder improves → until approved)
📊 Compare Approaches
/compare architect,optimizer design a caching strategy for product catalog
⚡ Parallel Execution
/parallel security,optimizer,accessibility audit this React component
📋 Batch Processing
/batch coder implement stack;;implement queue;;implement linked list;;implement BST
🤖 Auto-Routing
/auto our login endpoint is returning 500 errors in production
(Automatically selects bug_fix workflow)
📁 Multi-File Analysis
/files src/api.py,src/models.py,src/tests.py review for consistency issues
🎓 Learning
/workflow learn explain event-driven architecture with examples
/ask mentor explain the CAP theorem like I'm a junior developer
🚢 Production Release
/workflow release prepare v2.0 release for the payment service
🏢 Full Startup MVP
/workflow startup_mvp build a SaaS invoicing app with Stripe integration
MCP Client Integration
Claude Desktop
Add to ~/.config/claude/claude_desktop_config.json (Linux) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"multi-agent": {
"command": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/.venv/bin/python",
"args": ["/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/main.py", "--mcp-server"]
}
}
}
Cursor
Add to Cursor MCP settings:
{
"multi-agent": {
"command": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/.venv/bin/python",
"args": ["main.py", "--mcp-server"],
"cwd": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app"
}
}
VS Code (Copilot MCP)
{
"mcp": {
"servers": {
"multi-agent": {
"command": "python",
"args": ["main.py", "--mcp-server"],
"cwd": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app"
}
}
}
}
Any MCP Client (SSE/HTTP)
# Start HTTP server
python main.py --mcp-server --transport sse --port 8080
# Connect from any MCP client to: http://localhost:8080/sse
Advanced Features
🔄 Feedback Loop
Iteratively refines output until approved:
/feedback coder reviewer write a production-ready connection pool
Agent writes → reviewer evaluates → agent improves → repeat (max 3 rounds).
🧠 Persistent Memory
Notes that survive across sessions:
/remember project Using PostgreSQL 15 with pgvector extension
/remember style snake_case, type hints, 4-space indent
/recall project
/forget project
⚡ Parallel Execution
Multiple agents simultaneously:
/parallel security,optimizer,reviewer analyze this module
📦 Batch Processing
Same agent, multiple tasks:
/batch tester write tests for login;;signup;;logout;;password-reset
🎯 Auto-Routing
Keyword-based workflow selection:
/auto deploy our app to kubernetes with monitoring
📁 Multi-File Context
Cross-file analysis:
/files src/api.py,src/models.py,tests/test_api.py find inconsistencies
🔀 Custom Chains
Build pipelines on the fly:
/chain planner|architect|coder|tester|documentation build a rate limiter
💾 Session Persistence
/save my-project
/load my-project
/export
📊 Token Tracking
/tokens
# Output: Tokens: ~12,450 total (4,200 in / 8,250 out) | Requests: 7
Configuration Reference
config.py
| Setting | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API endpoint |
MODEL_NAME |
llama3.1:8b |
Default model |
TEMPERATURE |
0.7 |
Default temperature |
MAX_TOKENS |
2048 |
Max response tokens |
MCP_SERVER_NAME |
MultiAgentSystem |
MCP server name |
MCP_SERVER_TRANSPORT |
stdio |
Default transport |
MCP_SSE_HOST |
0.0.0.0 |
SSE bind address |
MCP_SSE_PORT |
8080 |
SSE port |
MCP_REQUEST_TIMEOUT |
300 |
Request timeout (seconds) |
EXTERNAL_MCP_SERVERS |
[] |
External local MCP servers |
Environment Variables (Ollama)
export OLLAMA_HOST=0.0.0.0:11434 # Bind address
export OLLAMA_NUM_GPU=999 # GPU layers
export OLLAMA_NUM_THREAD=8 # CPU threads
export OLLAMA_KEEP_ALIVE=5m # Model keep-alive time
Project Structure
multi-agent-app/
├── agent_registry.py # 45 agent definitions (single source of truth)
├── config.py # All settings + supervisor prompt
├── graph.py # LangGraph supervisor orchestration
├── runners.py # Execution modes + 50 workflows + advanced features
├── session.py # History, tokens, save/load/export
├── main.py # CLI dispatcher + entry point
├── mcp_server.py # FastMCP server (dynamic tool registration)
├── tool_registry.py # External MCP server consumption
├── requirements.txt # Python dependencies
└── sessions/ # Saved sessions + agent memory
├── *.json # Session files
├── *.md # Exported conversations
└── memory.json # Persistent agent memory
Extending the System
Add a New Agent
Add one entry to agent_registry.py:
"my_agent": {
"description": "What it does",
"temperature": 0.3,
"prompt": "You are a ... agent. Your job is to: 1) ... 2) ... 3) ...",
},
Automatically available in: supervisor routing, /ask, /chain, MCP tools.
Add a New Workflow
Add to WORKFLOWS in runners.py:
"my_workflow": ["planner", "my_agent", "reviewer", "tester"],
Add External MCP Server
Add to EXTERNAL_MCP_SERVERS in config.py:
{"name": "my-server", "transport": "stdio", "command": "python", "args": ["my_server.py"]}
Troubleshooting
Ollama not reachable
# Check if running
curl http://localhost:11434/api/tags
# Start it
ollama serve
Model not found
# List available models
ollama list
# Pull the model
ollama pull llama3.1:8b
Slow responses
- Use a smaller model:
/model mistralor/model phi3:mini - Reduce
MAX_TOKENSin config.py - Use GPU: install CUDA/ROCm drivers
Out of memory
- Use smaller model:
llama3.1:8binstead of13b/70b - Close other applications
- Set
OLLAMA_NUM_GPU=0to use CPU only (slower but less RAM)
Import errors
# Make sure venv is activated
source .venv/bin/activate
# Reinstall dependencies
pip install -r requirements.txt
Requirements
- Python 3.12+
- Ollama (any version)
- 8GB+ RAM (16GB recommended)
- No GPU required
- No internet after initial setup
Python Dependencies
langchain>=0.3.0
langchain-ollama>=0.2.0
langgraph>=0.2.0
pydantic>=2.0.0
mcp>=1.0.0
requests>=2.28.0
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.