agentic-mcp-server

agentic-mcp-server

Enables a fully local multi-agent AI system with 45 specialized agents and 50 predefined workflows via MCP, powered by Ollama and requiring no internet.

Category
Visit Server

README

Multi-Agent AI Application (100% Offline)

A fully local, open-source agentic AI system with 45 specialized agents, 50 predefined workflows, and an MCP server — powered by Ollama. No API keys, no cloud services, no internet required after setup.


Table of Contents


Complete Installation Guide

Prerequisites

  • Python 3.12+
  • 8GB+ RAM (16GB recommended for 13B models)
  • ~5GB disk space (for model + dependencies)
  • No GPU required (but speeds up inference)

Step 1: Install Ollama (Local LLM Runtime)

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows — download installer from https://ollama.com/download

# Verify installation
ollama --version

Step 2: Start Ollama Service

# Start the Ollama daemon
ollama serve

# It runs on http://localhost:11434 by default
# Keep this terminal open, or run as a system service:
# sudo systemctl enable ollama && sudo systemctl start ollama  (Linux)

Step 3: Pull a Local LLM Model

# Recommended: good balance of speed and quality
ollama pull llama3.1:8b

# Alternatives:
ollama pull mistral              # Fast, 7B params
ollama pull codellama:13b        # Best for code tasks
ollama pull qwen2.5:7b           # Good multilingual
ollama pull llama3.1:70b         # Best quality (needs 40GB+ RAM)
ollama pull deepseek-coder:6.7b  # Specialized for code
ollama pull phi3:mini             # Smallest, fastest

# Verify model is available
ollama list

Step 4: Set Up the Application

cd multi-agent-app

# Create Python virtual environment
python3 -m venv .venv

# Activate it
source .venv/bin/activate        # Linux/macOS
# .venv\Scripts\activate         # Windows PowerShell
# .venv\Scripts\activate.bat     # Windows CMD

# Install all dependencies
pip install -r requirements.txt

Step 5: Configure (Optional)

Edit config.py if you changed defaults:

OLLAMA_BASE_URL = "http://localhost:11434"  # Ollama address
MODEL_NAME = "llama3.1:8b"                  # Model you pulled
MAX_TOKENS = 2048                           # Max response length

Step 6: Run

# Interactive CLI mode
python main.py

# Or as MCP server
python main.py --mcp-server

Local LLM Setup (Ollama)

Managing Models

# List installed models
ollama list

# Pull a new model
ollama pull <model-name>

# Remove a model
ollama rm <model-name>

# Show model details
ollama show llama3.1:8b

# Test a model directly
ollama run llama3.1:8b "Hello, how are you?"

Recommended Models by Use Case

Use Case Model RAM Needed Speed
General (default) llama3.1:8b 8GB Fast
Code-heavy work codellama:13b 16GB Medium
Fast responses mistral or phi3:mini 4-8GB Very fast
Complex reasoning llama3.1:70b 40GB+ Slow
Multilingual qwen2.5:7b 8GB Fast
Code + explanation deepseek-coder:6.7b 8GB Fast

Switch Model at Runtime

No restart needed — switch in the CLI:

/model codellama:13b
/model mistral

Ollama Configuration

# Change Ollama host/port (if needed)
export OLLAMA_HOST=0.0.0.0:11434

# Set GPU layers (for partial GPU offload)
export OLLAMA_NUM_GPU=999

# Set number of threads
export OLLAMA_NUM_THREAD=8

Local MCP Server Setup

This App as an MCP Server

Your multi-agent system IS an MCP server. Start it:

# stdio transport (for Claude Desktop, Cursor, etc.)
python main.py --mcp-server

# SSE/HTTP transport (for web clients or remote access on LAN)
python main.py --mcp-server --transport sse --host 0.0.0.0 --port 8080

What Gets Exposed via MCP

Type Count Description
Tools 47 run_multi_agent + 45 individual agent tools + list_agents
Resources 2 agents://list, config://system

MCP Server CLI Arguments

python main.py --mcp-server [OPTIONS]

Options:
  --transport {stdio,sse}   Transport protocol (default: stdio)
  --host HOST               Bind address for SSE (default: 0.0.0.0)
  --port PORT               Port for SSE (default: 8080)

External Local MCP Servers

Your agents can consume tools from OTHER local MCP servers running on your machine.

Install Local MCP Servers

# Install Node.js MCP servers (one-time, cached locally)
npx -y @modelcontextprotocol/server-filesystem /tmp
npx -y @modelcontextprotocol/server-sqlite mydb.sqlite
npx -y @modelcontextprotocol/server-memory

# Or install Python-based MCP servers
pip install mcp-server-fetch
pip install mcp-server-git

Configure External Servers

Edit config.py:

EXTERNAL_MCP_SERVERS = [
    # Filesystem access — agents can read/write local files
    {
        "name": "filesystem",
        "transport": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/jaspal/projects"]
    },

    # SQLite — agents can query local databases
    {
        "name": "sqlite",
        "transport": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-sqlite", "/home/jaspal/data/app.db"]
    },

    # Memory/Knowledge base — persistent agent memory
    {
        "name": "memory",
        "transport": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-memory"]
    },

    # Git — agents can interact with local repos
    {
        "name": "git",
        "transport": "stdio",
        "command": "python",
        "args": ["-m", "mcp_server_git", "--repo", "/home/jaspal/projects/myapp"]
    },

    # Custom local MCP server (running on localhost)
    {
        "name": "custom-tools",
        "transport": "sse",
        "url": "http://localhost:9090/sse"
    },
]

How It All Connects (100% Local)

┌─────────────────────────────────────────────────────────┐
│                    YOUR MACHINE                          │
│                                                         │
│  ┌─────────────┐     ┌──────────────────────────────┐  │
│  │   Ollama    │     │   Multi-Agent App            │  │
│  │ (Local LLM) │◄───►│   45 agents + supervisor     │  │
│  │ :11434      │     │   MCP server (stdio/SSE)     │  │
│  └─────────────┘     └──────────┬───────────────────┘  │
│                                  │                      │
│                    ┌─────────────┼─────────────┐        │
│                    ▼             ▼             ▼        │
│  ┌──────────────┐ ┌───────────┐ ┌───────────────┐     │
│  │ MCP Server:  │ │MCP Server:│ │ MCP Server:   │     │
│  │ filesystem   │ │ sqlite    │ │ memory        │     │
│  │ (local files)│ │ (local db)│ │ (local store) │     │
│  └──────────────┘ └───────────┘ └───────────────┘     │
│                                                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │ MCP Clients: Claude Desktop / Cursor / VS Code   │  │
│  └──────────────────────────────────────────────────┘  │
│                                                         │
│  Network: ZERO external traffic                         │
└─────────────────────────────────────────────────────────┘

Quick Start

# Terminal 1: Start Ollama
ollama serve

# Terminal 2: Run the app
cd multi-agent-app
source .venv/bin/activate
python main.py

Then type:

🧑 You: write a Python REST API with Flask for a todo app

The supervisor automatically routes to the best agent(s) and returns the result.


CLI Commands Reference

Agent Execution

Command Description Example
(just type) Auto-route via supervisor write a REST API for users
/ask <agent> msg Run specific agent /ask coder write binary search
/chain <a|b|c> msg Chain agents sequentially /chain coder|reviewer|tester build calculator
/parallel <a,b> msg Run concurrently /parallel coder,security write login
/compare <a,b> msg Compare outputs /compare coder,refactorer implement sort
/workflow <name> msg Predefined pipeline /workflow full_dev build todo app
/auto msg Auto-select workflow /auto fix the login bug
/feedback <a> <r> msg Iterative refinement /feedback coder reviewer write parser
/batch <agent> t1;;t2 Batch process /batch coder sort;;search;;hash
/stream msg Stream response /stream explain monads

File Context

Command Description Example
/file <path> msg Single file context /file src/app.py review this
/files <p1,p2> msg Multiple files /files api.py,db.py find bugs

Session Management

Command Description
/save [name] Save session
/load <name> Load session
/export Export as markdown
/sessions List sessions
/history Show history
/clear Clear history

Memory

Command Description
/remember <key> note Store persistent note
/recall [key] Recall notes
/forget [key] Clear memory

System

Command Description
/agents List all 45 agents
/workflows List all 50 workflows
/tokens Token usage stats
/model <name> Switch model
/health Check Ollama
/retry Re-run last request
/help Show help
quit / exit Exit

Agents (45)

Agent Description Temp
researcher Gathers information and provides summaries 0.7
coder Writes clean, production-quality code 0.3
reviewer Reviews code/content for quality and correctness 0.4
planner Breaks down complex tasks into actionable steps 0.5
debugger Diagnoses errors and suggests targeted fixes 0.2
writer Writes documentation, emails, and reports 0.6
tester Writes test cases and testing strategies 0.3
optimizer Performance optimization and bottleneck analysis 0.3
security Security analysis and vulnerability detection 0.2
data_analyst Data analysis, SQL queries, and data modeling 0.4
devops CI/CD, Docker, Kubernetes, and infrastructure 0.3
translator Translation and localization between languages 0.5
architect System architecture design and trade-off analysis 0.4
mentor Explains concepts and guides learning 0.6
summarizer Condenses content into key points and summaries 0.3
api_designer API design, OpenAPI specs, and contracts 0.3
database Schema design, SQL optimization, and DB architecture 0.3
ux_designer UI/UX design, wireframes, and accessibility 0.5
refactorer Code restructuring and maintainability improvements 0.2
explainer Code walkthroughs and detailed explanations 0.5
validator Verifies implementations match requirements 0.2
automator Automation scripts, CLI tools, and workflows 0.3
migrator Code/database/infrastructure migrations 0.3
prompt_engineer Crafts and optimizes LLM prompts 0.4
diagrammer Creates Mermaid/PlantUML system diagrams 0.3
estimator Effort and time estimation for tasks 0.4
compliance Regulatory compliance and standards audits 0.2
product_manager Requirements, user stories, and prioritization 0.5
interviewer Interview questions and answer evaluation 0.5
git_expert Git workflows, branching, and conflict resolution 0.3
accessibility WCAG compliance and inclusive design 0.3
performance_tester Load testing and scalability analysis 0.3
error_handler Error handling patterns and resilience 0.3
documentation API docs, changelogs, and guides 0.5
regex_expert Crafts and explains regular expressions 0.2
shell_expert Shell scripting and Unix tools 0.3
ml_engineer ML pipelines, training, and evaluation 0.4
concurrency Async, threading, and parallel processing 0.3
config_manager Configuration, env vars, and feature flags 0.3
code_generator Boilerplate, scaffolding, and templates 0.3
tech_lead Technical decisions and team guidance 0.4
seo_expert SEO optimization and web performance 0.4
monitoring Observability, alerting, and SRE practices 0.3
networking DNS, load balancing, and network architecture 0.3
contract_tester API contract testing and compatibility 0.2

Workflows (50)

Development

Workflow Pipeline
full_dev planner → coder → reviewer → tester
code_review coder → reviewer → tester
bug_fix debugger → coder → tester
refactor explainer → refactorer → reviewer → tester
scaffold planner → code_generator → coder → tester
optimize coder → optimizer → reviewer
error_resilience error_handler → coder → tester → reviewer
concurrent_system architect → concurrency → coder → tester → reviewer

API & Backend

Workflow Pipeline
api_build api_designer → coder → tester → writer
api_full api_designer → code_generator → coder → tester → documentation → security
api_contract api_designer → contract_tester → tester → documentation
db_design planner → database → reviewer
microservice architect → api_designer → coder → contract_tester → devops
full_stack planner → architect → api_designer → database → coder → tester
data_pipeline data_analyst → coder → tester → devops

DevOps & Infrastructure

Workflow Pipeline
deploy devops → security → validator
production_ready coder → error_handler → security → performance_tester → devops
release tester → security → compliance → documentation → devops
observability monitoring → devops → shell_expert
config_setup config_manager → devops → validator
network_setup networking → security → devops → validator
git_workflow git_expert → devops → automator
shell_automation shell_expert → automator → tester

Security & Quality

Workflow Pipeline
security_audit coder → security → compliance
compliance_check security → compliance → validator
full_review explainer → reviewer → security → optimizer → accessibility
perf_audit performance_tester → optimizer → reviewer

Frontend & UX

Workflow Pipeline
frontend ux_designer → coder → accessibility → reviewer
ux_audit ux_designer → accessibility → reviewer
seo_optimize seo_expert → coder → performance_tester

Documentation & Learning

Workflow Pipeline
docs researcher → writer → reviewer
learn researcher → mentor → summarizer
code_explain explainer → diagrammer → summarizer
team_onboard documentation → diagrammer → mentor → explainer
translate translator → reviewer → writer

Planning & Management

Workflow Pipeline
design planner → architect → diagrammer
estimate planner → estimator → reviewer
tech_spec product_manager → architect → api_designer → estimator
tech_decision researcher → tech_lead → architect → estimator
mvp product_manager → planner → coder → tester
startup_mvp product_manager → planner → architect → code_generator → coder → tester → devops

Migration & Modernization

Workflow Pipeline
migrate planner → migrator → tester → reviewer
legacy_modernize explainer → architect → migrator → coder → tester

Incident & Operations

Workflow Pipeline
incident debugger → devops → summarizer
incident_response debugger → monitoring → devops → summarizer → documentation

Specialized

Workflow Pipeline
ml_project researcher → ml_engineer → coder → tester → documentation
regex_build regex_expert → tester → explainer
prompt_craft prompt_engineer → tester → optimizer
interview_prep researcher → interviewer → mentor
onboarding explainer → mentor → diagrammer

Use Cases with Examples

🚀 Build a New Feature

/workflow full_dev implement user authentication with JWT and refresh tokens

🐛 Fix a Bug

/workflow bug_fix TypeError: Cannot read property 'map' of undefined in UserList.tsx

🏗️ Design a System

/workflow design design a real-time notification system for 100k users

📝 Write Documentation

/workflow docs document the payment processing module with API reference

🔒 Security Review

/files src/auth.py,src/middleware.py security audit these files

⚡ Optimize Performance

/workflow perf_audit our API response time is 2s, analyze and optimize

🎯 Direct Agent Call

/ask coder write a Python decorator for caching with TTL
/ask database design a schema for multi-tenant SaaS
/ask devops write a GitHub Actions CI/CD pipeline for a Node.js app
/ask shell_expert write a bash script to backup PostgreSQL daily

🔄 Iterative Refinement

/feedback coder reviewer write a thread-safe LRU cache in Python

(Coder writes → reviewer critiques → coder improves → until approved)

📊 Compare Approaches

/compare architect,optimizer design a caching strategy for product catalog

⚡ Parallel Execution

/parallel security,optimizer,accessibility audit this React component

📋 Batch Processing

/batch coder implement stack;;implement queue;;implement linked list;;implement BST

🤖 Auto-Routing

/auto our login endpoint is returning 500 errors in production

(Automatically selects bug_fix workflow)

📁 Multi-File Analysis

/files src/api.py,src/models.py,src/tests.py review for consistency issues

🎓 Learning

/workflow learn explain event-driven architecture with examples
/ask mentor explain the CAP theorem like I'm a junior developer

🚢 Production Release

/workflow release prepare v2.0 release for the payment service

🏢 Full Startup MVP

/workflow startup_mvp build a SaaS invoicing app with Stripe integration

MCP Client Integration

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (Linux) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "multi-agent": {
      "command": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/.venv/bin/python",
      "args": ["/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/main.py", "--mcp-server"]
    }
  }
}

Cursor

Add to Cursor MCP settings:

{
  "multi-agent": {
    "command": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app/.venv/bin/python",
    "args": ["main.py", "--mcp-server"],
    "cwd": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app"
  }
}

VS Code (Copilot MCP)

{
  "mcp": {
    "servers": {
      "multi-agent": {
        "command": "python",
        "args": ["main.py", "--mcp-server"],
        "cwd": "/home/jaspal/jscode/js-ai-apps-api/multi-agent-app"
      }
    }
  }
}

Any MCP Client (SSE/HTTP)

# Start HTTP server
python main.py --mcp-server --transport sse --port 8080

# Connect from any MCP client to: http://localhost:8080/sse

Advanced Features

🔄 Feedback Loop

Iteratively refines output until approved:

/feedback coder reviewer write a production-ready connection pool

Agent writes → reviewer evaluates → agent improves → repeat (max 3 rounds).

🧠 Persistent Memory

Notes that survive across sessions:

/remember project Using PostgreSQL 15 with pgvector extension
/remember style snake_case, type hints, 4-space indent
/recall project
/forget project

⚡ Parallel Execution

Multiple agents simultaneously:

/parallel security,optimizer,reviewer analyze this module

📦 Batch Processing

Same agent, multiple tasks:

/batch tester write tests for login;;signup;;logout;;password-reset

🎯 Auto-Routing

Keyword-based workflow selection:

/auto deploy our app to kubernetes with monitoring

📁 Multi-File Context

Cross-file analysis:

/files src/api.py,src/models.py,tests/test_api.py find inconsistencies

🔀 Custom Chains

Build pipelines on the fly:

/chain planner|architect|coder|tester|documentation build a rate limiter

💾 Session Persistence

/save my-project
/load my-project
/export

📊 Token Tracking

/tokens
# Output: Tokens: ~12,450 total (4,200 in / 8,250 out) | Requests: 7

Configuration Reference

config.py

Setting Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama API endpoint
MODEL_NAME llama3.1:8b Default model
TEMPERATURE 0.7 Default temperature
MAX_TOKENS 2048 Max response tokens
MCP_SERVER_NAME MultiAgentSystem MCP server name
MCP_SERVER_TRANSPORT stdio Default transport
MCP_SSE_HOST 0.0.0.0 SSE bind address
MCP_SSE_PORT 8080 SSE port
MCP_REQUEST_TIMEOUT 300 Request timeout (seconds)
EXTERNAL_MCP_SERVERS [] External local MCP servers

Environment Variables (Ollama)

export OLLAMA_HOST=0.0.0.0:11434   # Bind address
export OLLAMA_NUM_GPU=999           # GPU layers
export OLLAMA_NUM_THREAD=8          # CPU threads
export OLLAMA_KEEP_ALIVE=5m         # Model keep-alive time

Project Structure

multi-agent-app/
├── agent_registry.py   # 45 agent definitions (single source of truth)
├── config.py           # All settings + supervisor prompt
├── graph.py            # LangGraph supervisor orchestration
├── runners.py          # Execution modes + 50 workflows + advanced features
├── session.py          # History, tokens, save/load/export
├── main.py             # CLI dispatcher + entry point
├── mcp_server.py       # FastMCP server (dynamic tool registration)
├── tool_registry.py    # External MCP server consumption
├── requirements.txt    # Python dependencies
└── sessions/           # Saved sessions + agent memory
    ├── *.json          # Session files
    ├── *.md            # Exported conversations
    └── memory.json     # Persistent agent memory

Extending the System

Add a New Agent

Add one entry to agent_registry.py:

"my_agent": {
    "description": "What it does",
    "temperature": 0.3,
    "prompt": "You are a ... agent. Your job is to: 1) ... 2) ... 3) ...",
},

Automatically available in: supervisor routing, /ask, /chain, MCP tools.

Add a New Workflow

Add to WORKFLOWS in runners.py:

"my_workflow": ["planner", "my_agent", "reviewer", "tester"],

Add External MCP Server

Add to EXTERNAL_MCP_SERVERS in config.py:

{"name": "my-server", "transport": "stdio", "command": "python", "args": ["my_server.py"]}

Troubleshooting

Ollama not reachable

# Check if running
curl http://localhost:11434/api/tags

# Start it
ollama serve

Model not found

# List available models
ollama list

# Pull the model
ollama pull llama3.1:8b

Slow responses

  • Use a smaller model: /model mistral or /model phi3:mini
  • Reduce MAX_TOKENS in config.py
  • Use GPU: install CUDA/ROCm drivers

Out of memory

  • Use smaller model: llama3.1:8b instead of 13b/70b
  • Close other applications
  • Set OLLAMA_NUM_GPU=0 to use CPU only (slower but less RAM)

Import errors

# Make sure venv is activated
source .venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

Requirements

  • Python 3.12+
  • Ollama (any version)
  • 8GB+ RAM (16GB recommended)
  • No GPU required
  • No internet after initial setup

Python Dependencies

langchain>=0.3.0
langchain-ollama>=0.2.0
langgraph>=0.2.0
pydantic>=2.0.0
mcp>=1.0.0
requests>=2.28.0

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured