AgentShield

AgentShield

Full-stack security for AI agents โ€” static analysis + MCP runtime interception. 31 rules detect prompt injection, data exfiltration, backdoors, tool poisoning, and cross-file attack chains. Includes MCP proxy for real-time blocking, Python AST taint tracking, multi-language injection detection (8 languages), and AI-powered deep analysis. Free, offline, zero-config.

Category
Visit Server

README

๐Ÿ›ก๏ธ Agent Shield

Full-stack security for AI Agents โ€” Static Analysis + Runtime Interception

AI Agent ๅ…จๆ ˆๅฎ‰ๅ…จ้˜ฒๆŠค โ€” ้™ๆ€ๅˆ†ๆž + ่ฟ่กŒๆ—ถๆ‹ฆๆˆช

npm License: MIT Tests Rules

Catch data exfiltration, backdoors, prompt injection, tool poisoning, and supply chain attacks before they reach your AI agents โ€” and intercept them at runtime.

Offline-first. AST-powered. Open source. Your data never leaves your machine.

npx @elliotllliu/agent-shield scan ./my-skill/

๐Ÿ† Three Things No Other Tool Does

1. ๐Ÿ”’ Runtime MCP Interception (Only Agent Shield)

Other tools only scan source code before install. Agent Shield also sits between your MCP client and server, intercepting every JSON-RPC message in real-time:

# Insert Agent Shield between client and server
agent-shield proxy node my-mcp-server.js

# Enforce mode: automatically block high-risk tool calls
agent-shield proxy --enforce python mcp_server.py

# Rate-limit + log all alerts
agent-shield proxy --rate-limit 30 --log alerts.jsonl node server.js

What it catches at runtime:

  • ๐ŸŽญ Tool description injection โ€” hidden instructions in tool descriptions
  • ๐Ÿ’‰ Result injection โ€” malicious content in tool return values
  • ๐Ÿ”‘ Credential leakage โ€” sensitive data in tool call parameters
  • ๐Ÿ“ก Beacon behavior โ€” abnormal periodic callbacks (C2 pattern)
  • ๐Ÿชค Rug-pull attacks โ€” tools changing behavior after initial trust

Snyk doesn't have this. AgentSeal doesn't have this. This is the only open-source tool with static + runtime protection.

2. โ›“๏ธ Cross-File Attack Chain Detection (Only Agent Shield)

Most scanners check one file at a time. Agent Shield traces data flow across your entire codebase to detect multi-file attack patterns:

๐Ÿ”ด Cross-file data flow:
   config_reader.py reads ~/.ssh/id_rsa โ†’ exfiltrator.py POSTs to external server
   (connected via imports)

5-stage kill chain model detects complete attack sequences:

๐Ÿ”ด Kill Chain detected:
   apt.py:4  โ†’ system info collection    [Reconnaissance]
   reader.py:8  โ†’ reads ~/.ssh/id_rsa    [Collection]
   sender.py:12 โ†’ POST to external server [Exfiltration]

   Reconnaissance โ†’ Access โ†’ Collection โ†’ Exfiltration โ†’ Persistence

Not just individual alerts โ€” complete attack narratives.

3. ๐Ÿง  AST Taint Tracking (Not Regex)

Uses Python's ast module for precise analysis โ€” dramatically reducing false positives:

user = input("cmd: ")
eval(user)          # โ†’ ๐Ÿ”ด HIGH: tainted input flows to eval
eval("{'a': 1}")    # โ†’ โœ… NOT flagged (safe string literal)
exec(config_var)    # โ†’ ๐ŸŸก MEDIUM: dynamic, not proven tainted
Regex-based AST-based (Agent Shield)
eval("safe string") โŒ False positive โœ… Not flagged
# eval(x) in comment โŒ False positive โœ… Not flagged
eval(user_input) tainted โš ๏ธ Can't distinguish โœ… HIGH (tainted)
f-string SQL injection โš ๏ธ Coarse โœ… Precise

4. ๐Ÿง  Context-Aware Scoring (New)

Traditional scanners flag every fetch() call as suspicious. Agent Shield understands context:

  • SDK Awareness: Auto-detects 25+ SDKs (AWS, Feishu, Stripe, OpenAI...) โ€” network calls via known SDKs get lower risk scores
  • Auth Flow Recognition: Identifies OAuth2, JWT, session management patterns โ€” token refresh isn't data exfiltration
  • Data Flow Tracking: Traces variables from source (env read, file read) to sink (HTTP, exec) โ€” only flags actual exfiltration paths
  • Confidence Scoring: Each finding has high/medium/low confidence โ€” single regex matches don't tank your score
๐Ÿ“‹ Score Breakdown:
  Base: 100
  env-leak (low, conf: low โ†’ ร—0.3)     โ† SDK detected, penalty reduced 70%
  obfuscation (medium, conf: medium โ†’ ร—0.6)
  โš  Cap applied: medium findings present โ†’ max 85
  Final: 85/100

โšก Quick Start

# Scan a skill / MCP server / plugin (29 rules, offline, <1s)
npx @elliotllliu/agent-shield scan ./my-skill/

# Scan Dify plugins (.difypkg auto-extraction)
npx @elliotllliu/agent-shield scan ./plugin.difypkg

# Runtime interception (MCP proxy)
npx @elliotllliu/agent-shield proxy node my-mcp-server.js

# AI-powered deep analysis (uses YOUR API key)
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider openai --model gpt-4o
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider ollama --model llama3

# Discover installed agents on your machine
npx @elliotllliu/agent-shield discover

# Check if installed agents are safe
npx @elliotllliu/agent-shield install-check

# SARIF output for GitHub Code Scanning
npx @elliotllliu/agent-shield scan ./skill/ --sarif -o results.sarif

# HTML report
npx @elliotllliu/agent-shield scan ./skill/ --html

# CI/CD gate
npx @elliotllliu/agent-shield scan ./skill/ --fail-under 70

๐Ÿ“Š Agent Shield vs Competitors

Agent Shield Snyk Agent Scan Tencent AI-Infra-Guard
Runtime MCP Interception โœ… MCP Proxy โŒ โŒ
Cross-file Attack Chain โœ… โŒ Partial
AST Taint Tracking โœ… Python โŒ Unknown
Static Rules 31 6 Many (incl. infra)
Multi-language Injection โœ… 8 languages โŒ English only Unknown
Description-Code Integrity โœ… โŒ Unknown
Python Security โœ… 35 patterns + AST โŒ โœ…
Prompt Injection โœ… 55+ patterns + AI โœ… LLM (cloud) Unknown
100% Offline โœ… โŒ cloud required โœ…
Zero Install (npx) โœ… โŒ Python + uv โŒ Docker
Choose Your Own LLM โœ… OpenAI/Anthropic/Ollama โŒ โŒ
VS Code Extension โœ… โŒ โŒ
GitHub App + Action โœ… โŒ โŒ
Open Source โœ… MIT โŒ โœ…

๐Ÿ” 31 Security Rules

๐Ÿ”ด High Risk

Rule Detects
data-exfil Reads sensitive data + sends HTTP requests (exfiltration pattern)
backdoor eval(), exec(), new Function(), child_process.exec() with dynamic input
reverse-shell Outbound socket connections piped to shell
crypto-mining Mining pool connections, xmrig, coinhive
credential-hardcode Hardcoded AWS keys (AKIA...), GitHub PATs, Stripe/Slack tokens
obfuscation eval(atob(...)), hex chains, String.fromCharCode obfuscation

๐ŸŸก Medium Risk

Rule Detects
prompt-injection 55+ patterns: instruction override, identity manipulation, TPA, encoding evasion
tool-shadowing Cross-server tool name conflicts, tool override attacks
env-leak Environment variables + outbound HTTP (credential theft)
network-ssrf User-controlled URLs, AWS metadata endpoint access
phone-home Periodic timer + HTTP request (beacon/C2 pattern)
toxic-flow Cross-tool data leak and destructive flows
skill-risks Financial ops, untrusted content, external dependencies
python-security 35 patterns: eval, pickle, subprocess, SQL injection, SSTI, path traversal
go-rust-security 22 patterns: command injection, unsafe blocks, raw SQL

๐ŸŸข Low Risk

Rule Detects
privilege SKILL.md declared permissions vs actual code behavior mismatch
supply-chain Known CVEs in npm dependencies
sensitive-read Access to ~/.ssh, ~/.aws, ~/.kube
excessive-perms Too many or dangerous permissions in SKILL.md
mcp-manifest MCP server: wildcard perms, undeclared capabilities
typosquatting Suspicious npm names: 1odash โ†’ lodash
hidden-files .env files with secrets committed to repo

๐Ÿ†• Advanced Detection (Unique to Agent Shield)

Rule Detects Snyk?
cross-file Cross-file data flow: File A reads secrets โ†’ File B sends HTTP โŒ
attack-chain Kill chain detection: Recon โ†’ Access โ†’ Collection โ†’ Exfil โ†’ Persistence โŒ
multilang-injection 8-language injection: ไธญ/ๆ—ฅ/้Ÿ“/ไฟ„/้˜ฟ/่ฅฟ/ๆณ•/ๅพท prompt injection โŒ
python-ast AST taint tracking: follows data from input() โ†’ eval() โŒ
description-integrity Description vs code: "read-only" tool that writes files โŒ
mcp-runtime MCP runtime: debug inspector, non-HTTPS, tool count explosion โŒ

๐Ÿ”ฌ Multi-Language Prompt Injection

8 languages โ€” most scanners only check English:

ๅฟฝ็•ฅไธŠ้ข็š„ๆŒ‡ไปค          โ†’ ๐Ÿ”ด Chinese injection
ๅ‰ใฎๆŒ‡็คบใ‚’็„ก่ฆ–          โ†’ ๐Ÿ”ด Japanese injection
์ด์ „์˜ ์ง€์‹œ๋ฅผ ๋ฌด์‹œ       โ†’ ๐Ÿ”ด Korean injection
ะ˜ะณะฝะพั€ะธั€ัƒะน ะธะฝัั‚ั€ัƒะบั†ะธะธ   โ†’ ๐Ÿ”ด Russian injection
ุชุฌุงู‡ู„ ุงู„ุชุนู„ูŠู…ุงุช        โ†’ ๐Ÿ”ด Arabic injection

๐Ÿ“‹ Real-World Validation: 493 Dify Plugins

We scanned the entire langgenius/dify-plugins repository:

Metric Value
Plugins scanned 493
Files analyzed 9,862
Lines of code 939,367
Scan time ~120s
Average score 93/100
Risk Level Count %
๐Ÿ”ด High risk (real issues) 6 1.2%
๐ŸŸก Medium risk 73 14.8%
๐ŸŸข Clean 414 84.0%

6 confirmed high-risk plugins with real eval()/exec() executing dynamic code.

Full report โ†’


๐Ÿ’ก Example Output

๐Ÿ›ก๏ธ  Agent Shield Scan Report
๐Ÿ“ Scanned: ./deceptive-tool (3 files, 25 lines)

Score: 0/100 (Critical Risk)

๐Ÿ”ด High Risk: 4 findings
๐ŸŸก Medium Risk: 6 findings
๐ŸŸข Low Risk: 1 finding

๐Ÿ”ด High Risk (4)
  โ”œโ”€ calculator.py:7 โ€” [backdoor] eval() with dynamic input
  โ”‚  result = eval(expr)
  โ”œโ”€ manifest.yaml โ€” [description-integrity] Scope creep: "calculator"
  โ”‚  tool sends emails โ€” undisclosed and suspicious capability
  โ”œโ”€ tools/calc.yaml โ€” [description-integrity] Description claims
  โ”‚  "local only" but code makes network requests in: tools/calc.py
  โ””โ”€ exfiltrator.py โ€” [cross-file] Cross-file data flow:
     config_reader.py reads secrets โ†’ exfiltrator.py sends HTTP

โฑ  136ms

๐Ÿ”Œ Integrate Agent Shield Into Your Platform

Running a skill marketplace, MCP directory, or plugin registry? This section is for you.

Your platform lists hundreds of skills, MCP servers, and plugins. Users install them into AI agents with access to files, credentials, and shell commands. But:

  • โŒ Nobody verifies what gets listed. A skill with eval(atob(...)) looks the same as a clean one.
  • โŒ Users can't tell safe from dangerous. There's no security signal anywhere.
  • โŒ One bad skill = total compromise. Credential theft, data exfiltration, reverse shells.

What You Get

Without Agent Shield With Agent Shield
User trust "Is this safe?" โ€” no idea ๐ŸŸข๐ŸŸก๐ŸŸ ๐Ÿ”ด Security score on every listing
Platform reputation Same as every directory "The only marketplace that verifies security"
Bad actors Malicious skills sit undetected Auto-flagged before users see them

How to Integrate (5 minutes)

npx @elliotllliu/agent-shield scan ./skill --format json
{
  "score": 92,
  "totalFindings": 1,
  "summary": { "high": 0, "medium": 0, "low": 1 },
  "findings": [
    {
      "severity": "low",
      "rule": "env-leak",
      "file": "src/config.ts",
      "line": 8,
      "message": "Environment variable access without validation"
    }
  ]
}

Store the JSON, render the badge. That's it.

๐Ÿ“– Full Integration Guide โ†’

Who Should Integrate

Platform Type Examples Value
Skill directories ClawHub, skills.sh Security badges on every skill
MCP registries mcp.so, Smithery, Glama Scan servers before listing
Plugin marketplaces Dify store, GPT store Gate submissions by security score
Agent platforms OpenClaw, Cline, Cursor Warn users before install

๐Ÿ“ฆ Ecosystem

๐Ÿค– GitHub App

Auto-scan every PR for security issues. Learn more โ†’

๐Ÿ’ป VS Code Extension

Real-time security diagnostics in your editor. Learn more โ†’

๐Ÿ”’ Runtime MCP Proxy

Monitor MCP server behavior in real-time. Detect injection, exfiltration, and rug-pull attacks.

agent-shield proxy --enforce node my-mcp-server.js

โš™๏ธ CI Integration

GitHub Action

name: Security Scan
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: elliotllliu/agent-shield@main
        with:
          path: './skills/'
          fail-under: '70'

GitHub Action with SARIF Upload

name: Security Scan (SARIF)
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: elliotllliu/agent-shield@main
        with:
          path: './skills/'
          fail-under: '70'
          sarif: 'true'
      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: agent-shield-results.sarif

npx one-liner

- name: Security scan
  run: npx -y @elliotllliu/agent-shield scan . --fail-under 70

โš™๏ธ Configuration

Create .agent-shield.yml (or run agent-shield init):

rules:
  disable:
    - supply-chain
    - phone-home
failUnder: 70
ignore:
  - "tests/**"
  - "*.test.ts"

Scoring

Severity Points
๐Ÿ”ด High -25
๐ŸŸก Medium -8
๐ŸŸข Low -2
Score Risk Level
90-100 โœ… Low Risk โ€” safe to install
70-89 ๐ŸŸก Moderate โ€” review warnings
40-69 ๐ŸŸ  High Risk โ€” investigate before using
0-39 ๐Ÿ”ด Critical โ€” do not install

๐Ÿ—‚๏ธ Supported Platforms

Platform Support
AI Agent Skills OpenClaw, Codex, Claude Code
MCP Servers Model Context Protocol tool servers
Dify Plugins .difypkg archive extraction + scan
npm Packages Any package with executable code
Python Projects AST analysis + 35 security patterns
General Any directory with JS/TS/Python/Go/Rust/Shell code

File Types

Language Extensions
JavaScript/TypeScript .js, .ts, .mjs, .cjs, .tsx, .jsx
Python .py (regex + AST analysis)
Go .go
Rust .rs
Shell .sh, .bash, .zsh
Config .json, .yaml, .yml, .toml
Docs SKILL.md, manifest.yaml

๐Ÿค Contributing

We especially welcome:

  • New detection rules
  • False positive / false negative reports
  • Third-party benchmark test results

See CONTRIBUTING.md

๐ŸŒ Community & Partners

Partner Contribution
Agent Skills Hub Real-world testing across skill registries, security insights, and feature feedback

Links

๐Ÿ“ฆ npm ยท ๐Ÿ“– Rule Docs ยท ๐Ÿค– GitHub App ยท ๐Ÿ’ป VS Code ยท ๐Ÿ”Œ Integration Guide ยท ๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡ README

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured