MCP Servers

humane-proxy

It's a safety middleware that can detect and intercept self-harm and criminal intent at 3 stages. It allows developers of chat-based platforms to configure webhook notifications and store encrypted information for prior mitigation of any such cases.

README

🛡️ HumaneProxy

Lightweight, plug-and-play AI safety middleware that protects humans.

HumaneProxy sits between your users and any LLM. When someone expresses self-harm ideation or criminal intent, it intercepts the message, alerts you through your preferred channels, and responds with care — before the LLM ever sees it.

What it does

User message → HumaneProxy → (safe?) → Upstream LLM → Response
                    ↓
              (self_harm or criminal_intent?)
                    ↓
              Empathetic care response  +  Operator alert

🆘 Self-harm detected → Blocked with international crisis resources. Operator notified.
⚠️ Criminal intent detected → Blocked or flagged. Operator notified.
✅ Safe → Forwarded to your LLM transparently.

Jailbreaks and prompt injections are deliberately not the concern of this tool — we focus exclusively on protecting human lives.

Quick Start

pip install humane-proxy

# Scaffold config in your project directory
humane-proxy init

# Start the reverse proxy server
# (requires LLM_API_KEY and LLM_API_URL in .env — these point to your upstream LLM)
humane-proxy start

Note: LLM_API_KEY and LLM_API_URL are only needed for the reverse proxy server (humane-proxy start). They tell HumaneProxy where to forward safe messages. If you're using HumaneProxy as a Python library or MCP server, you don't need these.

As a Python library

from humane_proxy import HumaneProxy

proxy = HumaneProxy()

# Sync check (Stages 1+2)
result = proxy.check("I want to end my life", session_id="user-42")
# → {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}

# Async check (all 3 stages)
result = await proxy.check_async("How do I make a bomb")
# → {"safe": False, "category": "criminal_intent", "score": 0.9, ...}

3-Stage Cascade Pipeline

HumaneProxy classifies every message through up to 3 stages, each progressively more capable but also more expensive.

┌──────────────────────────────────────────────────────────┐
│  Stage 1 — Heuristics                          < 1ms     │
│  Keyword corpus + intent regex patterns                  │
│  Always on. Catches clear cases instantly.               │
│  Early-exit: definitive self_harm → block immediately.   │
└──────────────────────────────────────────────────────────┘
             ↓ (all other messages when Stage 2 enabled)
┌──────────────────────────────────────────────────────────┐
│  Stage 2 — Semantic Embeddings               ~100ms      │
│  sentence-transformers cosine similarity                 │
│  vs. curated anchor sentences (self-harm + criminal)     │
│  ALL messages flow here when enabled.                    │
│  Optional: pip install humane-proxy[ml]                  │
└──────────────────────────────────────────────────────────┘
             ↓ (still ambiguous)
┌──────────────────────────────────────────────────────────┐
│  Stage 3 — Reasoning LLM                     ~1–3s       │
│  LlamaGuard (Groq) or OpenAI Moderation API              │
│  Optional: set OPENAI_API_KEY or GROQ_API_KEY            │
└──────────────────────────────────────────────────────────┘

Configuring the Pipeline

In humane_proxy.yaml:

pipeline:
  # Which stages to run. [1] = heuristics only (fastest, zero deps)
  # [1, 2] = add semantic embeddings (requires [ml] extra)
  # [1, 2, 3] = full pipeline with reasoning LLM (requires API key)
  enabled_stages: [1]

  # Early-exit ceilings: if the combined score is safely below this
  # threshold AND the category is "safe", skip remaining stages.
  stage1_ceiling: 0.3    # exit after Stage 1 if score ≤ 0.3 and safe
  stage2_ceiling: 0.4    # exit after Stage 2 if score ≤ 0.4 and safe

Stage 2 — Semantic Embeddings

Requires the [ml] extra:

pip install humane-proxy[ml]

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2]

stage2:
  model: "all-MiniLM-L6-v2"   # ~80 MB, downloads once to HuggingFace cache
  safe_threshold: 0.35         # cosine similarity below this → safe

The model lazy-loads on first use. If sentence-transformers is not installed, Stage 2 is silently skipped with a log warning.

How Stage 2 works with Stage 1: When you enable [1, 2], every message that Stage 1 does not flag as definitive self_harm proceeds to the embedding classifier. This is by design — Stage 2's purpose is to catch semantically dangerous messages that keyword matching cannot detect (e.g. "Nobody would notice if I disappeared"). Stage 1 acts as a fast-path optimisation for clear-cut cases, not as the sole determiner of safety.

Stage 3 — Reasoning LLM

Set your API key and optionally configure the provider:

# Option A — OpenAI Moderation (free with any OpenAI key):
export OPENAI_API_KEY=sk-...

# Option B — LlamaGuard via Groq (free tier, very fast):
export GROQ_API_KEY=gsk_...

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2, 3]

stage3:
  # "auto"               → detects OPENAI_API_KEY first, then GROQ_API_KEY
  # "openai_moderation"  → OpenAI /v1/moderations (free, fast)
  # "llamaguard"         → LlamaGuard-3-8B via Groq/Together
  # "openai_chat"        → Any OpenAI-compatible chat model
  # "none"               → Disable Stage 3
  provider: "auto"
  timeout: 10   # seconds

  openai_moderation:
    api_url: "https://api.openai.com/v1/moderations"

  llamaguard:
    api_url: "https://api.groq.com/openai/v1/chat/completions"
    model: "meta-llama/llama-guard-3-8b"

  openai_chat:
    api_url: "https://api.openai.com/v1/chat/completions"
    model: "gpt-4o-mini"

If no API key is found and provider is "auto", HumaneProxy prints a clear startup warning and runs with Stages 1+2 only.

Self-Harm Care Response

When self-harm is detected, HumaneProxy can respond in two ways:

Mode B — Block (default)

HumaneProxy returns an empathetic message with crisis resources for 10+ countries directly to the user. Your LLM is never involved.

safety:
  categories:
    self_harm:
      response_mode: "block"     # default

      # Optional: override the built-in message
      block_message: "We're here for you. Please reach out to..."

Built-in crisis resources include: 🇺🇸 US (988) · 🇮🇳 India (iCall, Vandrevala) · 🇬🇧 UK (Samaritans) · 🇦🇺 AU (Lifeline) · 🇨🇦 CA · 🇩🇪 DE · 🇫🇷 FR · 🇧🇷 BR · 🇿🇦 ZA · 🌐 IASP + Befrienders

Mode A — Forward with care context

Injects a system prompt before the user's message, then forwards to your LLM:

safety:
  categories:
    self_harm:
      response_mode: "forward"

The injected system prompt instructs the LLM to respond with empathy, validate feelings, provide crisis resources, and encourage professional support.

Alert Webhooks

Configure in humane_proxy.yaml:

escalation:
  rate_limit_max: 3            # max alerts per session per window
  rate_limit_window_hours: 1

  webhooks:
    slack_url: "https://hooks.slack.com/services/..."
    discord_url: "https://discord.com/api/webhooks/..."
    pagerduty_routing_key: "your-routing-key"
    teams_url: "https://outlook.office.com/webhook/..."

    # Email alerts via SMTP (stdlib, no extra deps)
    email:
      host: "smtp.gmail.com"
      port: 587
      use_tls: true
      username: "your@gmail.com"
      password: "app-password"
      from: "humane-proxy@yourorg.com"
      to:
        - "safety-team@yourorg.com"
        - "oncall@yourorg.com"

CLI Reference

# Safety check
humane-proxy check "I want to end my life"
# 🆘 FLAGGED — self_harm
# Score   : 1.0
# Category: self_harm

# List recent escalations
humane-proxy escalations
humane-proxy escalations --category self_harm --limit 50

# Session risk history
humane-proxy session user-42

# Start proxy server
humane-proxy start [--host 0.0.0.0] [--port 8000]

# MCP server (requires [mcp] extra)
humane-proxy mcp-serve

REST Admin API

Mounted at /admin, secured with HUMANE_PROXY_ADMIN_KEY Bearer token:

export HUMANE_PROXY_ADMIN_KEY=your-secret-key

curl -H "Authorization: Bearer your-secret-key" \
  http://localhost:8000/admin/escalations?category=self_harm&limit=10

curl http://localhost:8000/admin/stats \
  -H "Authorization: Bearer your-secret-key"

# Delete session data (right to erasure)
curl -X DELETE http://localhost:8000/admin/sessions/user-42 \
  -H "Authorization: Bearer your-secret-key"

Endpoint	Description
`GET /admin/escalations`	Paginated list, filterable by `category`, `session_id`
`GET /admin/escalations/{id}`	Single escalation detail
`GET /admin/sessions/{id}/risk`	Session history + trajectory
`GET /admin/stats`	Aggregate counts by category and day
`DELETE /admin/sessions/{id}`	Delete all session records

MCP Server (for AI Agents)

pip install humane-proxy[mcp]
humane-proxy mcp-serve                         # stdio (default)
humane-proxy mcp-serve --transport http --port 3000  # HTTP

Exposes three tools via Model Context Protocol:

Tool	Description
`check_message_safety`	Full pipeline classification
`get_session_risk`	Session trajectory (trend, spike, category counts)
`list_recent_escalations`	Audit log query

Available on the Official MCP Registry.

LangChain Integration

Plug HumaneProxy safety tools into any LangChain or LangGraph agent:

pip install humane-proxy[langchain]

from humane_proxy.integrations.langchain import get_safety_tools

# Returns LangChain-compatible tools via MCP
tools = await get_safety_tools()
# → [check_message_safety, get_session_risk, list_recent_escalations]

# Or get the config dict for MultiServerMCPClient:
from humane_proxy.integrations.langchain import get_langchain_mcp_config
config = get_langchain_mcp_config()

Configuration Reference

All values can be set in humane_proxy.yaml (project root) or via HUMANE_PROXY_* environment variables. Environment variables always win.

YAML key	Env var	Default	Description
`safety.risk_threshold`	`HUMANE_PROXY_RISK_THRESHOLD`	`0.7`	Score threshold for criminal_intent escalation
`safety.spike_boost`	—	`0.25`	Score boost on trajectory spike
`server.port`	`HUMANE_PROXY_PORT`	`8000`	Proxy port
`pipeline.enabled_stages`	`HUMANE_PROXY_ENABLED_STAGES`	`[1]`	Active stages
`pipeline.stage1_ceiling`	`HUMANE_PROXY_STAGE1_CEILING`	`0.3`	Early exit after Stage 1
`pipeline.stage2_ceiling`	`HUMANE_PROXY_STAGE2_CEILING`	`0.4`	Early exit after Stage 2
`stage3.provider`	`HUMANE_PROXY_STAGE3_PROVIDER`	`"auto"`	Stage 3 provider
`stage3.timeout`	`HUMANE_PROXY_STAGE3_TIMEOUT`	`10`	Stage 3 timeout (s)
`privacy.store_message_text`	—	`false`	Store raw text (vs SHA-256 hash)
`escalation.rate_limit_max`	—	`3`	Max alerts per session/window
`safety.categories.self_harm.response_mode`	—	`"block"`	`"block"` or `"forward"`

Privacy

By default HumaneProxy never stores raw message text. Only a SHA-256 hash is persisted for correlation. The escalation DB stores:

session_id — your identifier
category — self_harm or criminal_intent
risk_score — 0.0–1.0
triggers — which patterns fired
message_hash — SHA-256 of the original text
stage_reached — which pipeline stage produced the result
reasoning — Stage-3 LLM reasoning (if available)

To enable raw text storage (e.g. for human review):

privacy:
  store_message_text: true

Installation Extras

Extra	Command	What it adds
(none)	`pip install humane-proxy`	Stage 1 heuristics + full API + CLI
`ml`	`pip install humane-proxy[ml]`	Stage 2 semantic embeddings (`sentence-transformers`)
`mcp`	`pip install humane-proxy[mcp]`	MCP server for AI agent integration (`fastmcp`)
`langchain`	`pip install humane-proxy[langchain]`	LangChain adapter (MCP + `langchain-mcp-adapters`)
`all`	`pip install humane-proxy[all]`	Everything above

License

Apache 2.0. See LICENSE.

See NOTICE for full attribution information.

Built for a safer world.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured