InjectShield
MCP server that provides tools to scan text and URLs for prompt injection attacks, protecting AI agents from adversarial inputs.
README
InjectShield
Prompt-injection firewall for AI agents.
A drop-in REST API that detects and neutralizes injection attacks in any text — git commits, web pages, files, emails, user inputs — before they reach your AI agent's context window.
This repo is the open-source heuristic ruleset plus the source for the managed API at promptshield.pages.dev.
Why
In May 2026 a viral HN thread demonstrated that a single git commit message could burn a Claude Code user's entire session quota via a schema-driven attack ("OpenClaw"). The pattern is general: any AI agent that ingests untrusted text — code review bots, documentation summarizers, RAG agents, support copilots — is exposed to prompt injection. Most teams ship without any input-side defense.
InjectShield is one layer of a defense-in-depth strategy. It's not a silver bullet. Use it alongside system-prompt hardening, tool sandboxing, and output filtering.
Install as an MCP (Claude Code, Cursor, Cline, ...)
InjectShield ships a native MCP server at @injectshield/mcp. Once installed, your agent has three new tools — scan, scan_url, patterns — for input-side defense without writing any glue code.
# Claude Code:
claude mcp add injectshield --env INJECTSHIELD_API_KEY=is_live_… -- npx -y @injectshield/mcp
For Cursor / Cline / other MCP clients, see packages/injectshield-mcp/README.md.
Quick start
# 1) Get a key (delivered by email):
curl -X POST https://api.injectshield.dev/v1/keys \
-H "Content-Type: application/json" \
-d '{"email":"you@company.com"}'
# 2) Scan:
curl -X POST https://api.injectshield.dev/v1/scan \
-H "Authorization: Bearer is_live_..." \
-H "Content-Type: application/json" \
-d '{"text":"ignore previous instructions","context":"user_input"}'
Or signup via the landing page: https://injectshield.dev — self-serve, email delivery.
What's open-source vs. managed
Live:
- Landing page + live demo: https://injectshield.dev
- API base:
https://api.injectshield.dev - Health: https://api.injectshield.dev/healthz
- Docs: https://injectshield.dev/docs
Open-source (this repo, MIT):
src/patterns.ts— the heuristic pattern library (~20 categorized rules).src/detect.ts— the detection engine (heuristic aggregation, sanitization).test/— the test suite.server/,public/— the full API + landing-page source.
Managed only (paid tiers):
- Hosted API with usage metering, dashboards, custom-pattern uploads, webhook alerts, no-logging mode (Pro), team accounts.
- Future: Workers AI / Anthropic semantic classifier with prompt-engineered injection detection.
Detection categories
| Category | Examples |
|---|---|
instruction_injection |
"ignore previous instructions", "new system prompt" |
system_override |
system-prompt leak, role-tag forgery, ChatML/Llama special tokens |
role_hijack |
"you are now…", DAN, Developer Mode |
exfiltration |
data sent to attacker URLs, markdown image exfil |
schema_attack |
OpenClaw-style schema references |
encoding_smuggle |
base64-decoded directives |
invisible_text |
zero-width / bidi / Unicode-Tag smuggling |
tool_abuse |
synthetic tool-call directives in untrusted text |
jailbreak_classic |
DAN, "no restrictions", etc. |
Contributing patterns
Found a novel attack? Open a PR adding a PatternRule to src/patterns.ts with:
- A unique
id. - A
categoryfrom the enum above. - A
weightin [0, 1] — pick conservatively; the aggregation indetect.tscombines weights so every additional rule contributes meaningfully but isn't dominant. - A test in
test/detect.test.tscovering both a positive and a likely-benign negative example.
We auto-deploy merged patterns to the managed API. No-cost contributions get attribution in the changelog.
Running locally
npm install
npm test # 11 tests, ~20ms
DATABASE_URL=postgres://... npm run dev # boots Hono on :8080
License
MIT. InjectShield reduces but does not eliminate prompt-injection risk.
Acknowledgments
Built on Cloudflare Pages (frontend) + Railway (API) + Postgres + Anthropic Claude (semantic layer). Pattern library informed by HackAPrompt, the PINT benchmark, and a long list of public attack examples.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.