Skill Context Manager

Skill Context Manager

Provides context-aware skill selection for AI agents, reducing token usage by 85-98% and improving accuracy through semantic retrieval, session memory, and feedback learning.

Category
Visit Server

README

Skill Context Manager (SCM)

Context-aware skill selection for AI agents — Solves the "too many skills" problem.

Reduces skill context tokens by 85-98%, improves skill selection accuracy, and learns from feedback.

Python 3.11+ SQLite FTS5 MCP Version Tests


The Problem

Problem Impact Root Cause
Users install 50-100+ skills 30K-60K tokens consumed pre-conversation Static tool injection doesn't scale
Agent loses direction Accuracy drops from 95% → <50% (Anthropic eval, >50 tools) "Lost in the Middle" + metadata overload
Forgets skills after 20-30 messages Re-reads everything every turn No session memory
Similar skills indistinguishable Can't decide which to pick Keyword search is insufficient
Wrong skill selected Wasted tokens + cost from retries No feedback loop

Research References

  • SkillRouter (CVPR 2026): 91.7% of cross-encoder attention goes to skill body, only 1.0% to description — metadata alone is insufficient.
  • Anthropic Tool Search: BM25-based deferred loading, 85% token reduction.
  • Anthropic internal eval: Opus 4 accuracy from 79.5% → 49% with >50 tools.

Solution

SCM is a proxy layer between the agent and the skill directory. Instead of loading all skills into context, SCM performs:

  1. Two-Stage Retrieval (SkillRouter architecture) — Retrieve → Rerank
  2. Session Memory — Remembers which skills were used, boosts them when relevant
  3. Feedback Loop — Bayesian weight updates from success/failure data
  4. Single Shared Database — Eliminates cross-DB bugs
  5. Graceful Degradation — Works at every dependency level

Token Savings

Scenario Before After Savings
100 skills metadata loaded ~30K tokens ~300 tokens 99%
50 MCP tools loaded ~72K tokens ~8.7K tokens 88%
Session tracking (50 messages) Skills forgotten 100% recall N/A
Query latency (77 skills) ~7ms (BM25) Instant

Installation

Requirements

  • Python 3.11+
  • uv (Astral) — auto-installed if missing
  • git

One-Click Install (recommended)

# Basic install (18 seconds)
curl -fsSL https://raw.githubusercontent.com/Mavis2103/skill-context-manager/main/scripts/install.sh | bash

# With MCP auto-setup (configures Hermes Agent + OpenCode)
curl -fsSL https://raw.githubusercontent.com/Mavis2103/skill-context-manager/main/scripts/install.sh | bash -s -- --with-mcp

# Custom directory
curl -fsSL ... | bash -s -- --scm-dir ~/custom/path

The installer will:

Step What happens
✅ Pre-flight Check Python 3.11+, install uv if needed
✅ Clone git clone --depth 1 to ~/Workspaces/skill-context-manager
✅ Venv uv venv + uv pip install -e . — zero-dependency core
✅ Symlink ~/.local/bin/scm — auto-PATH via profile.d + shell rc
✅ Index Auto-index ~/.hermes/skills/, ~/.claude/skills/, ~/.cursor/skills/
✅ Sanity Smoke test + version check

Manual Install

# Requirements: Python 3.11+, uv, git
git clone https://github.com/Mavis2103/skill-context-manager.git
cd skill-context-manager
uv venv
source .venv/bin/activate
uv pip install -e .

# Optional: AI models for embedding search and reranking
uv pip install scm[full]

# Index common skill directories
scm index --dir ~/.hermes/skills/

# Add to PATH
echo 'export PATH="$PATH:'$(pwd)'/.venv/bin"' >> ~/.bashrc
source ~/.bashrc

Uninstall

curl -fsSL https://raw.githubusercontent.com/Mavis2103/skill-context-manager/main/scripts/install.sh | bash -s -- --uninstall

Removes: source, venv, database, symlink, PATH config, MCP config.

Features

1. Semantic Skill Retrieval

Find skills using hybrid BM25 + embedding, zero-dependency fallback.

# BM25 (FTS5) — stdlib only, fast, precise for keywords
scm query "kubernetes deploy helm" --method bm25

# Embedding — semantic search (requires sentence-transformers)
scm query "orchestrate container cluster management" --method embedding

# Hybrid (default) — best of both worlds
scm query "deploy app to production" --method hybrid

2. Session Tracking

Remembers which skills were used in a session — no more forgetting:

scm session start --id "chat-abc-123"
scm session use --skill k8s-deploy --query "deploy"
scm session use --skill docker-build --query "build image"

# Export context for the agent — only ~30 tokens
scm session context --id "chat-abc-123"

# Output:
# {
#   "active_skills": ["k8s-deploy", "docker-build"],
#   "context_size_tokens": 42,
#   "matching_skills": [...]
# }

3. Feedback Loop — Self-Learning

SCM improves over time:

# Record feedback
scm feedback record --query "deploy app" --skill k8s-deploy --success true --rating 5
scm feedback record --query "deploy app" --skill helm --success false

# View statistics
scm feedback stats
# 📊 Feedback Statistics
#    Total feedback:     47
#    Success rate:       87%
#    Query patterns:     12
#    Skills with data:   8
#    Top skills by success rate:
#      • k8s-deploy: 15/16 (94%)
#      • docker-build: 8/10 (80%)

4. Metadata Optimization

Compress descriptions to save tokens:

# Preview
scm optimize --dir ~/.hermes/skills/ --dry-run
# 📊 Potential savings:
#    Before: 1,847 meta tokens
#    After:  1,240 meta tokens
#    Saved:  607 tokens per load (33%)

# Apply
scm optimize --dir ~/.hermes/skills/ --no-dry-run

5. Usage Analytics

scm insights
# 📈 Usage Insights (last 30 days)
#    Total queries:     142
#    Tokens saved:      ~28,400
#    Retrieval methods: bm25: 89, hybrid: 42, embedding: 11
#    Top skills used:
#      • k8s-deploy: 23 times
#      • pytest-run: 18 times
#      • docker-build: 15 times

scm stats
# 📊 Skill Index Statistics
#    Total skills:     24
#    Categories:       5
#    Metadata tokens:  1,847
#    Body tokens:      12,430

Architecture

User Request
    │
    ▼
┌─────────────────────────────────────────────────┐
│ 1. Query Analysis                               │
│    - Extract key terms                          │
│    - Embed query (if embedding enabled)         │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│ 2. Stage 1: Retrieval (top 20)                   │
│    ┌──────────┐   ┌──────────┐   ┌──────────┐   │
│    │  BM25    │ + │Embedding │ = │  Hybrid  │   │
│    │ (FTS5)   │   │ (cosine) │   │ (0.3+0.7)│   │
│    └──────────┘   └──────────┘   └──────────┘   │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│ 3. Context Injection                             │
│    + Session boost (recently used +0.5)          │
│    + Feedback weights (Bayesian prior)           │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│ 4. Stage 2: Rerank (top 5)                       │
│    Cross-encoder: query × skill body             │
│    "cross-encoder/ms-marco-MiniLM-L6-v2"         │
│    ~50ms on CPU for 20 candidates                │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│ 5. Output                                        │
│    - Top 5 skill names + descriptions (~300 t)   │
│    - Session context (~30 tokens)                │
│    - Agent loads only the 1 skill body it needs  │
└─────────────────────────────────────────────────┘

Token Flow

Without SCM:
  Session start: load 50 skills metadata = 50 × 60 tokens = 3,000 tokens
  Agent picks 1, but ALL 50 stay in context
  Session grows → agent forgets → re-load all: +3,000 tokens
  Total waste: ~6,000+ tokens per session

With SCM:
  Session start: active skills only = 3 × 15 tokens = 45 tokens
  Query → top 5 metadata = 5 × 40 tokens = 200 tokens
  Session tracker: ~30 tokens
  Total: ~275 tokens per query
  Savings: 85-98%

MCP Server

SCM runs as an MCP server with 11 tools, compatible with any MCP-compatible agent.

Multi-Agent Setup Registry

SCM v0.3.0 ships with a single-command setup for 13 agent platforms. Instead of manually configuring each agent's MCP settings, run:

# Configure for ALL supported agents at once
scm mcp setup --all

# Or pick specific agents
scm mcp setup --claude-code --cursor --windsurf --hermes

# Remove SCM config from all agents
scm mcp setup --all --uninstall

# List all supported platforms with their config paths
scm mcp setup --list

Supported platforms:

Agent Flag Config Path
Claude Code --claude-code ~/.claude.json
Claude Desktop --claude-desktop ~/.config/Claude/claude_desktop_config.json
Cursor --cursor ~/.cursor/mcp.json
Windsurf --windsurf ~/.codeium/windsurf/mcp_config.json
Cline --cline VS Code globalStorage/cline_mcp_settings.json
Gemini CLI --gemini ~/.gemini/settings.json
VS Code (Copilot) --vscode VS Code User/mcp.json
Zed --zed ~/.config/zed/settings.json
Codex CLI --codex ~/.codex/config.toml
Goose --goose ~/.config/goose/config.yaml
Continue.dev --continue ~/.continue/config.yaml
OpenCode --opencode ~/.config/opencode/opencode.json
Hermes Agent --hermes ~/.hermes/config.yaml

Each platform gets the correct config format automatically:

  • JSON mcpServers — Claude Code, Desktop, Cursor, Windsurf, Cline, Gemini
  • JSON servers (type: stdio) — VS Code
  • JSON context_servers — Zed
  • JSON mcp (type: local) — OpenCode
  • YAML mcp_servers — Hermes
  • YAML extensions — Goose
  • YAML mcpServers (list) — Continue.dev
  • TOML [mcp_servers.scm] — Codex CLI

Verify Configuration

# Check which agents have SCM configured
scm mcp status

# Output (example):
# SCM MCP Status
#   ✅ Claude Code: Configured
#   ✅ Cursor: Configured
#   ○ Windsurf: Config exists, not configured
#   · Zed: Config not found

Quick Start

# Auto-configure for all agents (idempotent)
scm mcp setup --all

# Check configuration status
scm mcp status

# Start server in stdio mode (default)
python3 -m scm.mcp_server

# Start server in HTTP/SSE mode
python3 -m scm.mcp_server --http --port 8321

Available Tools

Tool Layer Description
skill_query Retrieve Find the most relevant skills for a task
skill_index Index Index skills from a directory
skill_stats Index Get database statistics
skill_session_start Session Start a tracking session
skill_session_use Session Record skill usage
skill_session_context Session Export session context (~30 tokens)
skill_session_end Session End a session
skill_optimize Optimize Compress metadata to save tokens
skill_feedback Feedback Record usage feedback
skill_feedback_stats Feedback View feedback statistics
skill_insights Analytics Usage analytics dashboard

Per-Agent Config Formats (Reference)

Each agent uses a unique config format. The scm mcp setup command handles all of these automatically — these examples are for reference:

Hermes Agent (~/.hermes/config.yaml):

mcp_servers:
  scm:
    command: python3
    args: ["-m", "scm.mcp_server"]
    allowed_tools:
      - skill_query
      - skill_session_start
      - skill_session_use
      - skill_session_context
      - skill_session_end
      - skill_feedback
      - skill_feedback_stats
      - skill_stats
      - skill_insights

Test connection:

hermes mcp test scm
# ✓ Connected (738ms)
# ✓ Tools discovered: 11

After that, Hermes Agent automatically discovers and can call the MCP tools.

OpenCode (~/.config/opencode/opencode.json):

{
  "mcp": {
    "scm": {
      "type": "local",
      "command": ["python3", "-m", "scm.mcp_server"],
      "enabled": true
    }
  }
}

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "scm": {
      "command": "python3",
      "args": ["-m", "scm.mcp_server"]
    }
  }
}

VS Code (Copilot) (~/.config/Code/User/mcp.json):

{
  "servers": {
    "scm": {
      "type": "stdio",
      "command": "python3",
      "args": ["-m", "scm.mcp_server"]
    }
  }
}

Codex CLI (~/.codex/config.toml):

[mcp_servers.scm]
command = "python3"
args = ["-m", "scm.mcp_server"]

Remote Mode (HTTP/SSE)

# Start server
python3 -m scm.mcp_server --http --port 8321

# Client config
{
  "mcpServers": {
    "scm": {
      "url": "http://localhost:8321/sse"
    }
  }
}

Agent Skill Template (for Hermes Agent skills)

Create a skill-router/SKILL.md:

---
name: skill-router
description: Select and load the most relevant agent skills using semantic search
---

When a skill needs to be selected for a task, use:
  scm query "<user_task>" --top 3 --format json
Then load the SKILL.md body of the top-matching skill.

Graceful Degradation

Dependencies Features Available
Python stdlib only BM25 (FTS5) + Session tracking + Feedback
+ sentence-transformers Semantic embedding search
+ transformers + torch Cross-encoder reranking
+ feedback data Self-improving Bayesian weights

The zero-dependency core works immediately without installing anything extra. AI models are optional.

Comparison with Alternatives

Solution Progressive Discovery Semantic Search Session Memory Feedback Loop Token Cost Zero-Dep
Claude Code Skills ✅ Load on-demand ❌ Keyword ❌ No ❌ No ~500 tokens
MCP Tool Search ✅ Deferred load ✅ BM25 ❌ No ❌ No ~500 tokens
SkillRouter (CVPR) ❌ All at once ✅ Cross-encoder ❌ No ✅ Yes Training needed ❌ GPU
Hermes Skills ✅ Metadata only ❌ Keyword ❌ No ❌ No ~3K tokens
Lunar MCPX ✅ Tool groups ✅ Custom ❌ No ❌ No ~8.7K tokens
✨ SCM (This) ✅ Metadata only ✅ BM25 + Embedding + Cross-encoder ✅ Full session tracking ✅ Bayesian ~275 tokens

Project Structure

skill-context-manager/
├── src/scm/
│   ├── __init__.py          # Version + schema init
│   ├── cli.py               # CLI interface (argparse, 9 subcommands)
│   ├── db.py                # Shared database connection (single DB, WAL)
│   ├── indexer.py           # Skill indexing engine (FTS5)
│   ├── retriever.py         # BM25 + embedding retrieval
│   ├── reranker.py          # Cross-encoder reranking
│   ├── session.py           # Session state tracker
│   ├── optimizer.py         # Skill metadata optimizer
│   ├── feedback.py          # Feedback collection + Bayesian learning
│   ├── tracker.py           # Usage analytics
│   ├── models.py            # Data models (Skill, QueryResult, SessionState, FeedbackRecord)
│   └── mcp_server.py        # MCP server (11 tools)
│   └── mcp_setup.py         # Multi-agent MCP setup registry (13 platforms)
├── tests/
│   ├── test_models.py       # 13 tests — data models + YAML parsing
│   ├── test_indexer.py      # 11 tests — index/reindex/empty/WAL
│   ├── test_retriever.py    # 9 tests — BM25/hybrid/session boost/empty
│   ├── test_session_feedback.py  # 21 tests — session lifecycle + feedback
│   ├── test_optimizer.py    # 9 tests — compression/expansion/info-leak
│   ├── test_tracker.py      # 8 tests — recording/insights/daily-trend
│   ├── test_reranker.py     # 6 tests — fallback/empty/top-k/custom model
│   ├── test_mcp_setup.py    # 26 tests — multi-agent registry (13 platforms)
│   └── test_regression.py   # 24 tests — bug regression coverage
├── scripts/
│   ├── install.sh           # One-click install
│   ├── benchmark.sh         # Performance benchmark
│   └── demo.sh              # Interactive demo
├── configs/
│   └── default.yaml         # Default configuration
├── docs/
│   ├── ARCHITECTURE.md      # Detailed architecture docs
│   └── MCP-INTEGRATION.md   # MCP integration guide
├── pyproject.toml
├── LICENSE
└── README.md

Storage

Single SQLite database (~/.scm/scm.db) with WAL mode:

Table Purpose
skills + skills_fts (FTS5) Skill index + full-text search
sessions + session_skills Session tracking
feedback + skill_weights + query_patterns Feedback & learning
usage_events + daily_stats Usage analytics

Workflow Example

1. Agent receives a new task

User: "Deploy app to production"

Agent internally calls:
  → skill_query(query="deploy app to production", top_k=3)
  → Returns: [kubernetes-deploy (0.92), docker-build (0.78), monitoring (0.45)]
  → Agent loads kubernetes-deploy SKILL.md, executes deploy steps
  → skill_session_use(session_id="...", skill_name="kubernetes-deploy", success=true)

2. Agent needs context injection

Agent generates system prompt block:
  "Session active skills: [kubernetes-deploy]
   Related skills: [docker-build, helm-chart]
   Estimated context: 15 tokens"

3. Agent encounters a similar task later

# This query doesn't need to scan all skills again.
# Session tracker knows kubernetes-deploy was used and boosts it.
# Saves 50-200 tokens per query.
scm session context --id "..." --query "scale deployment"

Development

Run Tests

# All 101 tests (77 original + 24 regression)
uv run pytest -v

# Specific module
uv run pytest tests/test_indexer.py -v

# Just regression tests
uv run pytest tests/test_regression.py -v

# Coverage (optional)
uv run pytest --cov=src/scm/ tests/

Supported Skill Formats

  • SKILL.md with YAML frontmatter (Hermes Agent, Claude Code)
  • Plain text files (directory name = skill name)

Database Migration

# SCM auto-migrates schema on version changes (CREATE TABLE IF NOT EXISTS)
# No manual migration needed

Roadmap

  • [x] Research & Architecture (SkillRouter, Anthropic, MCP scalability)
  • [x] Core indexing engine (FTS5 + BM25)
  • [x] Semantic retrieval (embedding + hybrid)
  • [x] Session tracker with persistence
  • [x] Metadata optimizer (compress + expand)
  • [x] Cross-encoder reranker (miniLM)
  • [x] Feedback loop with Bayesian weights
  • [x] Usage analytics and insights
  • [x] MCP Server (11 tools)
  • [x] Hermes Agent integration
  • [x] OpenCode integration
  • [x] Single shared DB (eliminates cross-DB bugs)
  • [x] 77 tests across all modules
  • [x] 101 tests + 16 bug fixes (v0.2.1)
  • [ ] GUI dashboard
  • [ ] Multi-agent session sharing

References

  1. SkillRouter: Retrieval-Augmented Skill Selection for LLM Agents at Scale — Zheng et al., CVPR 2026. arXiv:2603.22455
  2. Advanced Tool Use & Tool Search — Anthropic Engineering Blog. Link
  3. MCP Tool Scalability Problem — Jenova AI. Link
  4. Skills Over MCPs: Context-Efficient Agent Capabilities — Agentic Engineer. Link
  5. Beyond the Prompt: Agent Skills as Dynamic Context Management — Dev.to. Link

License

MIT — Copyright (c) 2026 Mavis2103

Changelog

See CHANGELOG.md for version history. Current: v0.3.0.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured