SecureLLM MCP Server
Enables AI assistants to interact with NixOS development tools, manage builds, and optimize workflows through natural language.
README
SecureLLM MCP Server
Enterprise-Grade Model Context Protocol Server for Intelligent Development Workflows
Overview
SecureLLM MCP is a production-ready Model Context Protocol (MCP) server that transforms AI assistants into intelligent development partners. Built with enterprise-grade architecture, it combines advanced caching, reasoning systems, and comprehensive tooling to deliver unprecedented productivity for NixOS and systems programming workflows.
Key Capabilities
- Semantic Intelligence: 50-70% cost reduction through embedding-based query caching
- Hybrid Reasoning: Context inference, multi-step planning, and causal impact analysis
- Production-Ready: Circuit breakers, retry logic, structured logging, and Prometheus metrics
- NixOS First-Class: Deep integration with Nix ecosystem - package debugging, flake management, build optimization
- Emergency Framework: Laptop thermal protection during intensive builds
- Knowledge Management: Persistent learning with SQLite + FTS5 full-text search
- Security-Focused: SOPS secrets management, OAuth integration, sandboxed execution
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ MCP CLIENT (Claude, Cline) │
└────────────────────────────┬────────────────────────────────────────┘
│ stdio/HTTP
▼
┌─────────────────────────────────────────────────────────────────────┐
│ SecureLLM MCP Server Core │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Semantic │ │ Smart Rate │ │ Knowledge │ │
│ │ Cache │ │ Limiter │ │ Database │ │
│ │ (Embeddings) │ │ (Circuit │ │ (SQLite + │ │
│ │ │ │ Breaker) │ │ FTS5) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reasoning │ │ Development │ │ Infrastructure │
│ Systems │ │ Tools │ │ Management │
│ │ │ │ │ │
│ • Context │ │ • Nix Package │ │ • SSH Remote │
│ Inference │ │ Debugger │ │ Execution │
│ • Multi-Step │ │ • Build Analyzer │ │ • System Health │
│ Planner │ │ • Flake Ops │ │ Monitoring │
│ • Causal │ │ • Web Search │ │ • Emergency │
│ Analysis │ │ • Browser Auto │ │ Framework │
│ • Adaptive │ │ • Research Agent │ │ • Backup Manager │
│ Learning │ │ • Code Analysis │ │ • Log Analysis │
└──────────────┘ └──────────────────┘ └──────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Observability & Security │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Prometheus │ │ Structured │ │ OAuth/ │ │ Sandboxed │ │
│ │ Metrics │ │ Logging │ │ GitHub │ │ Execution │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Features
🧠 Intelligent Caching Layer
Semantic Cache - Industry-first embedding-based caching for MCP servers:
- Semantic Similarity Detection: Understands that "check system temperature" and "verify thermal status" are equivalent queries
- Cost Optimization: 50-70% reduction in tool execution costs
- Automatic Expiration: TTL-based cache invalidation with periodic cleanup
- Performance Metrics: Real-time hit/miss rates, token savings, similarity scores
// Queries like these hit the same cache:
"What's the current CPU temperature?"
"Check thermal status of the system"
"Show me processor heat levels"
🎯 Smart Rate Limiting
Production-grade request management with circuit breaker pattern:
- Per-Provider Queuing: FIFO request queues with configurable limits
- Circuit Breaker: Automatic failure detection and recovery
- Exponential Backoff: Intelligent retry with jitter
- Metrics Collection: Request latency percentiles (p50, p95, p99), error categorization, queue depths
- Prometheus Export: HTTP metrics endpoint for observability
🗄️ Knowledge Management System
Persistent learning infrastructure with advanced search:
- SQLite + FTS5: Full-text search with Porter stemming and Unicode support
- Session Management: Contextual conversation tracking across interactions
- Structured Storage: Typed entries (insights, decisions, code, references)
- Priority System: High/medium/low classification for relevance ranking
- Project Watcher: Automatic file system monitoring and knowledge extraction
🔧 NixOS Development Tools
Comprehensive tooling for NixOS ecosystem:
- Package Debugger: Diagnose and fix Nix package build failures
- Flake Operations: Build, update, and manage Nix flakes
- Build Analyzer: Performance profiling and optimization recommendations
- Hash Calculator: Automatic SHA256 calculation for fetchurl/fetchFromGitHub
- Configuration Generator: Smart Nix expression generation
🛡️ Emergency Framework
Laptop protection during intensive operations:
- Thermal Monitoring: Real-time CPU/GPU temperature tracking
- Rebuild Safety Checks: Pre-build thermal validation
- Automatic Throttling: Force cooldown when temperature exceeds thresholds
- Forensic Analysis: Post-build thermal profiling with detailed reports
- War Room Mode: Live monitoring during critical operations
🔍 Hybrid Reasoning (Beta)
Next-generation AI capabilities currently in development:
- Context Inference Engine: Automatic entity extraction from user input and project state
- Proactive Action Engine: Execute preparatory checks before asking questions
- Multi-Step Planner: Decompose complex tasks into dependency-ordered steps
- Causal Reasoning: Predict change impacts through dependency graph analysis
- Adaptive Learning: Continuous improvement from interaction feedback
Installation
Prerequisites
- Node.js: 22.0+ (native ESM support)
- NixOS: Recommended for full feature set
- SQLite: 3.35+ (for FTS5 support)
- Optional: llama.cpp server for semantic caching embeddings
Quick Start
# Clone repository
git clone https://github.com/marcosfpina/securellm-mcp.git
cd securellm-mcp
# Install dependencies
npm install
# Build
npm run build
# Run server
node build/src/index.js
Environment Configuration
Create .env file:
# Core Configuration
PROJECT_ROOT=/path/to/your/project
ENABLE_KNOWLEDGE=true
KNOWLEDGE_DB_PATH=~/.local/share/securellm/knowledge.db
# Semantic Cache (Optional)
ENABLE_SEMANTIC_CACHE=true
SEMANTIC_CACHE_THRESHOLD=0.85
SEMANTIC_CACHE_TTL=3600
LLAMA_CPP_URL=http://localhost:8080
# API Keys (loaded via SOPS in production)
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
DEEPSEEK_API_KEY=your_key_here
# Observability
METRICS_PORT=9090
LOG_LEVEL=info
MCP Client Integration
Claude Desktop
// ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"securellm": {
"command": "node",
"args": ["/path/to/securellm-mcp/build/src/index.js"],
"env": {
"PROJECT_ROOT": "/your/project/path"
}
}
}
}
Cline (VSCodium/VSCode)
// ~/.config/VSCodium/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
{
"mcpServers": {
"securellm": {
"command": "node",
"args": ["/path/to/securellm-mcp/build/src/index.js"],
"env": {
"PROJECT_ROOT": "${workspaceFolder}"
}
}
}
}
Usage Examples
Package Debugging
// Diagnose why a Nix package won't build
await mcp.call("package_diagnose", {
package_path: "./pkgs/custom-app/default.nix",
package_type: "js",
build_test: true
});
// Download package from GitHub with automatic hash calculation
await mcp.call("package_download", {
package_name: "awesome-tool",
package_type: "tar",
source: {
type: "github_release",
github: {
repo: "owner/awesome-tool",
tag: "v1.2.3",
asset_pattern: "*.tar.gz"
}
}
});
Emergency Framework
// Check if it's safe to rebuild
await mcp.call("rebuild_safety_check");
// Monitor thermals during build
await mcp.call("thermal_warroom", {
duration: 120 // Monitor for 2 minutes
});
// Get forensic analysis after thermal event
await mcp.call("thermal_forensics", {
duration: 180,
skip_rebuild: false
});
Knowledge Management
// Create development session
const session = await mcp.call("create_session", {
summary: "Implementing new authentication module"
});
// Save insights during development
await mcp.call("save_knowledge", {
session_id: session.id,
entry_type: "decision",
content: "Using JWT tokens instead of sessions for API auth",
tags: ["auth", "api", "jwt"],
priority: "high"
});
// Search past decisions
const results = await mcp.call("search_knowledge", {
query: "authentication jwt",
entry_type: "decision",
limit: 5
});
System Health Monitoring
// Comprehensive health check
await mcp.call("system_health_check", {
detailed: true
});
// Analyze system logs
await mcp.call("system_log_analyzer", {
service: "sshd",
since: "1 hour ago",
level: "error"
});
// Service management
await mcp.call("system_service_manager", {
action: "restart",
service: "nginx"
});
Research & Analysis
// Deep research on technical topics
await mcp.call("research_agent", {
topic: "Rust async runtime comparison",
depth: "comprehensive",
sources: ["github", "reddit", "documentation"]
});
// Analyze codebase complexity
await mcp.call("analyze_complexity", {
directory: "./src",
include_patterns: ["**/*.ts"],
metrics: ["cyclomatic", "cognitive", "maintainability"]
});
// Find potentially dead code
await mcp.call("find_dead_code", {
directory: "./src",
extensions: [".ts", ".js"]
});
Resources
The server exposes several MCP resources for querying system state:
config://current- Current SecureLLM configurationlogs://audit- Recent audit log entriesmetrics://usage- Provider usage statisticsmetrics://prometheus- Prometheus-format metricsmetrics://semantic-cache- Cache performance statsdocs://api- API documentation
// Query cache performance
const stats = await mcp.read("metrics://semantic-cache");
console.log(`Hit rate: ${stats.hitRate}%`);
console.log(`Tokens saved: ${stats.tokensSaved}`);
Performance
Benchmarks
- Semantic Cache Lookup: < 10ms (in-memory embedding comparison)
- Knowledge DB Search: < 50ms (FTS5 indexed queries)
- Rate Limiter Overhead: < 5ms per request
- Circuit Breaker Decision: < 1ms
Scalability
- Memory Footprint: ~512MB base + 256MB per active reasoning session
- Database Size: ~100MB per 10,000 knowledge entries
- Concurrent Requests: 100+ simultaneous tool calls (per-provider queuing)
- Cache Storage: ~1KB per cached response
Security
Secrets Management
- SOPS Integration: Encrypted secrets stored in
secrets.yaml - Environment Variables: Runtime API key injection
- No Hardcoded Credentials: All sensitive data externalized
Sandboxed Execution
- Tool Whitelisting: Configurable allowed commands
- Path Restrictions: Sandboxed file system access
- Network Isolation: Optional network policy enforcement
Audit Trail
- Structured Logging: All actions logged with context
- Knowledge DB Audit: Complete interaction history
- Metrics Retention: 30-day historical performance data
Development
Project Structure
securellm-mcp/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── knowledge/
│ │ └── database.ts # SQLite + FTS5 implementation
│ ├── middleware/
│ │ ├── semantic-cache.ts # Embedding-based caching
│ │ ├── rate-limiter.ts # Smart rate limiting
│ │ ├── circuit-breaker.ts # Failure detection
│ │ ├── retry-strategy.ts # Exponential backoff
│ │ └── metrics-collector.ts # Performance tracking
│ ├── reasoning/
│ │ ├── context-manager.ts # Context inference
│ │ ├── multi-step-planner.ts # Task decomposition
│ │ └── proactive-executor.ts # Pre-action execution
│ ├── tools/
│ │ ├── package-diagnose.ts # Nix package debugging
│ │ ├── emergency/ # Thermal protection
│ │ ├── laptop-defense/ # System safety
│ │ ├── system/ # Health monitoring
│ │ ├── ssh/ # Remote execution
│ │ ├── browser/ # Web automation
│ │ └── nix/ # Nix ecosystem tools
│ ├── types/
│ │ ├── knowledge.ts # Knowledge DB schemas
│ │ ├── semantic-cache.ts # Cache type definitions
│ │ └── middleware/ # Middleware types
│ └── utils/
│ ├── logger.ts # Pino structured logging
│ ├── project-detection.ts # Auto project root detection
│ └── host-detection.ts # NixOS hostname resolution
├── docs/ # Architecture documentation
├── tests/ # Integration tests
└── build/ # Compiled output
Building from Source
# Development mode with watch
npm run watch
# Production build
npm run build
# Run tests
npm test
# Type checking
npx tsc --noEmit
Contributing
- Architecture Changes: Review
docs/HYBRID-REASONING-ARCHITECTURE.md - Code Style: Follow existing TypeScript patterns, use Zod for validation
- Testing: Add integration tests for new tools
- Documentation: Update README and inline JSDoc comments
Roadmap
Phase 1: Core Infrastructure ✅
- [x] MCP server implementation
- [x] Knowledge database (SQLite + FTS5)
- [x] Smart rate limiter with circuit breaker
- [x] Semantic cache with embeddings
- [x] Nix package debugging tools
- [x] Emergency framework
- [x] Prometheus metrics
Phase 2: Reasoning Systems 🚧
- [x] Context inference engine
- [x] Proactive action executor
- [x] Multi-step task planner
- [ ] Causal dependency analyzer
- [ ] Adaptive learning system
Phase 3: Advanced Tools 🚧
- [x] SSH remote execution suite
- [ ] Browser automation tools
- [ ] Sensitive data handling
- [ ] File organization system
- [ ] Advanced code analysis
Phase 4: Enterprise Features
- [ ] Multi-user support
- [ ] Role-based access control
- [ ] Distributed caching
- [ ] Horizontal scaling
- [ ] SaaS deployment
Monitoring & Observability
Prometheus Metrics
Expose metrics on HTTP endpoint:
# Start metrics server
export METRICS_PORT=9090
node build/src/index.js
# Query metrics
curl http://localhost:9090/metrics
Available metrics:
mcp_rate_limiter_requests_total{provider="deepseek"}mcp_rate_limiter_request_duration_seconds{provider="openai"}mcp_circuit_breaker_state{provider="anthropic"}mcp_semantic_cache_hits_totalmcp_semantic_cache_tokens_saved_total
Structured Logging
Pino-based JSON logging:
{
"level": "info",
"time": 1704196800000,
"msg": "Semantic cache hit",
"similarity": 0.92,
"toolName": "thermal_check",
"tokensSaved": 150
}
Troubleshooting
Common Issues
1. Semantic cache not working
# Verify llama.cpp server is running
curl http://localhost:8080/health
# Check cache database exists
ls -lh ~/.local/share/securellm/semantic_cache.db
# Enable debug logging
export LOG_LEVEL=debug
2. Rate limiter throttling requests
# Check current queue status
# (use rate_limiter_status tool via MCP)
# Adjust rate limits in config
# See src/config/rate-limits.ts
3. Knowledge DB corruption
# Backup and rebuild
cp ~/.local/share/securellm/knowledge.db{,.backup}
rm ~/.local/share/securellm/knowledge.db
# Restart server (will recreate schema)
License
MIT License - See LICENSE file
Acknowledgments
Built with:
- Model Context Protocol SDK - MCP protocol implementation
- better-sqlite3 - High-performance SQLite bindings
- Pino - Fast structured logging
- Zod - TypeScript schema validation
Inspired by:
- NixOS community's declarative infrastructure philosophy
- The MCP ecosystem's vision for AI-native tooling
- Production systems engineering best practices
Contact
Author: marcosfpina Project: github.com/marcosfpina/securellm-mcp Issues: GitHub Issues
Built for developers who demand production-grade tooling.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.