veil-mcp
An agent-native shell as an MCP server, designed for LLM agents (like Claude Code) to execute commands with structured output, lazy detail retrieval, and effect tracking, minimizing token usage.
README
veil-mcp
An agent-native shell, as an MCP server. The first shell layer designed assuming the operator is an LLM agent (Claude Code), not a human at a terminal.
A terminal dumps everything into the scrollback and the agent swallows it all — regex-ing fragile text, round-tripping for state, with no undo. veil turns that into structured data: effects come back typed, detail is addressable and pulled only when needed, and risky commands get a real safety net.
quiet-by-default · addressable · lazy-detail · structured-replaces-text
- Effects are data. Files changed via git porcelain (or a syscall trace), not a
git statusround-trip. - Verify in one call.
expect: { exit: 0, file_exists: "dist/index.js" }— no follow-upls/grep. - Safety net.
sh_checkpoint/sh_restore(CoW snapshot) and an opt-in kernelsandboxfor risky commands. - Detail is stored, not re-emitted. Pull
sh_detail id=cmd3 selector=stdoutonly if needed — and it survives a server restart (disk-backed). - Token economy is the consequence, not the bet. Quiet output is a side effect of returning structure; the structure and safety stay valuable even as context windows grow. Condensing hides bulk while surfacing failure signal — a best-effort heuristic, measured at 100% recall on a labeled corpus of buried failures (see Output honesty and Metrics).
Why veil
Much of what an agent gets from veil — quieter output, fewer manual checks — you can
approximate with Bash + truncation + careful prompting. The reason to adopt is the
three things a shell genuinely cannot do, each with a number you can reproduce
(npm run metrics):
- Verify in one call, not three.
expectfolds arun → check → greploop into a single structured call (exit,stdout_matches,file_exists,changed, …), and effects come back typed — so "what changed?" needs nogit statusround-trip. → 55% fewer MCP round-trips across five common tasks (11 → 5). - A real safety net — checkpoint & roll back.
sh_checkpoint/sh_restorewrap a risky refactor in an undo. On a same-volume APFS tree it's a copy-on-write clone: ~1.5× faster than an rsync mirror and near space-free (~0 MB vs 60 MB on a 60 MB tree), cheap enough to make a checkpoint-before-every-risky-step a habit. - Kernel-enforced confinement, not a prompt.
sandbox: trueconfines writes to cwd + temp (and optionally denies network) via the OS sandbox — and refuses to run rather than execute unconfined where unavailable. An adversarial corpus of escape attempts is blocked 5/5 while a legitimate in-cwd write still lands.
Everything else — quiet output, addressable detail, retry, classification — is genuine convenience layered on top, not the moat. Honest scope: the sandbox is solid on macOS and experimental on Linux (see Security); the numbers above are from Metrics, reproducible locally with no account.
Why an MCP server, not a forked shell
Most of the value — token-aware output, addressable detail, effect diff, asserts,
retry, dry-run — is a presentation/orchestration layer. An MCP server delivers
it natively to how the LLM already consumes tools, in weeks not months, with no
200k-line C fork to maintain. The things a shell fork also can't do alone (atomic
rollback, a real capability sandbox, syscall trace) live in the FS + kernel;
veil drives those layers (APFS clonefile, sandbox-exec/bubblewrap, strace)
rather than reimplementing a shell.
Install
Fastest path — no clone, no build (runs via npx):
claude mcp add veil -- npx -y veil-mcp
npx -y veil-mcp init # drop the agent nudge into this project's CLAUDE.md
For other MCP-speaking agents (Cursor, Windsurf, Zed, …), add to the MCP server config:
{ "mcpServers": { "veil": { "command": "npx", "args": ["-y", "veil-mcp"] } } }
Prefer to run straight from source, without npm? npx -y github:vkmtx/veil-mcp works too.
veil init is the zero-friction setup step: it writes (idempotently) the block that
tells the agent to prefer sh_run. See Adoption for why that step exists.
From source
git clone https://github.com/vkmtx/veil-mcp && cd veil-mcp
npm install # builds dist/ via the prepare script
claude mcp add veil -- node "$(pwd)/dist/index.js"
# dev, no build step: npm run dev (tsx src/index.ts)
Add the nudge in this repo's CLAUDE.md (or your global
~/.claude/CLAUDE.md) so the agent prefers sh_run for effect-bearing / verbose
commands. To enforce it, see the guard hook.
Tools
| Tool | Purpose |
|---|---|
sh_run |
Execute a command → quiet structured result (exit, duration, files changed, token-aware stdout/stderr). Options below. |
sh_detail |
Pull full stored stdout/stderr/meta/trace for a previous run by id — no re-run. Records are disk-backed, so this works even after the server restarts. match=<regex> greps the stored stream (matching lines + numbers) to find a value condensing hid, without dumping it all. |
sh_plan |
Static safety pre-check (not an execution dry-run): predicts a command's blast-radius category (read-only / mutating / destructive / network / complex / unknown), reversibility, and file effects without executing. A top-level pipeline/list (a && b, c | d) is decomposed and classified per-segment, worst case wins; substitution/redirect/glob are undecidable and stay complex (though a destructive verb behind one is still surfaced). Errors bias toward over-flagging for the patterns it recognizes; genuinely-undecidable constructs stay complex/unknown — an honest limit, not a universal guarantee. |
sh_checkpoint |
Snapshot a directory under a label (rollback point). APFS copy-on-write clone when possible, else rsync mirror. |
sh_restore |
Restore a directory from a checkpoint (undo); refuses a target dir different from where the checkpoint was taken. |
sh_checkpoints |
List checkpoint labels. |
sh_run options
| Option | Effect |
|---|---|
command |
The shell command (required). |
cwd |
Working directory (defaults to the server's cwd). |
full |
Return uncondensed stdout/stderr inline (escape hatch from condensing). |
timeout_ms |
Per-command timeout (default 120s). On expiry the whole process group is killed (SIGTERM→SIGKILL), so a compound command's grandchildren (sleep 5; …) are reaped too. |
expect |
Post-conditions verified in the same call: exit, stdout_contains, stdout_matches, stderr_empty, file_exists, file_absent, changed, max_ms. Failures surface in assert_ok + assertions_failed — no second ls/grep/git status needed. |
retries / retry_on_exit / backoff_ms |
Declarative retry; attempts is reported when > 1. |
sandbox |
Real OS sandbox (feature K). true confines file writes to cwd + temp; { network: false } also denies network; { writable: [...] } adds roots. Refuses to run if unavailable — never executes unconfined. Sets sandboxed: true. |
trace |
Structured FS/syscall trace (feature A, Linux strace). Surfaces trace_summary (paths read/written + syscall count); full trace via sh_detail selector=trace. Best-effort: no tracer → command still runs, trace_unavailable: true. |
Result fields (emitted only when relevant — the quiet contract)
id, exit, ok, ms; then attempts, stdout_lines/stderr_lines (TRUE
emitted counts), files_changed, timed_out, stdout_truncated/stderr_truncated,
stdout_binary/stderr_binary, sandboxed, trace_summary/trace_unavailable,
assert_ok/assertions_failed, advice, hint, and the condensed stdout/stderr.
Output honesty
Condensing saves tokens but must not hide signal. Therefore:
- A mid-stream failure between head and tail is surfaced inline (content-aware
signals.ts) — the lexicon covers crash idioms with no error/fail keyword too (Segmentation fault,SIGSEGV,CONFLICT,! [rejected],timed out, …). If more distinct signals exist than fit inline, the marker reports the true total with a+N moreoverflow note (never a silent cap). This is a best-effort heuristic, not a proof — but it is measured: 100% recall on a labeled corpus (Metrics). - A byte-capped stream is labeled (the kept tail is marked; the torn first fragment dropped) and never presents its tail as the head.
stdout_lines/stderr_linesare the true emitted count, not just retained bytes.- Binary output (NUL bytes) is stored base64 and flagged, not mangled to mojibake.
- The effect-diff is skipped for statically read-only commands (and via
VEIL_EFFECTS=0); achangedassertion still forces it. advice(never blocks) nudges on the highest-signal issue: a sandbox denial → how to widen it; an unconfined destructive command →sandbox:true/sh_checkpoint; an interactive/TTY command → use raw Bash.
Example
// build AND verify the artifact exists, in one call — no follow-up ls
sh_run { "command": "npm run build", "expect": { "exit": 0, "file_exists": "dist/index.js" } }
// predict blast radius before running (git is classified per-subcommand)
sh_plan { "command": "git push --force" } // → { category: "destructive", warning: "…" }
// confine a risky command to the cwd, deny network
sh_run { "command": "./untrusted.sh", "sandbox": { "network": false } }
// safety net around a refactor
sh_checkpoint { "label": "pre-refactor" }
sh_restore { "label": "pre-refactor" } // undo everything
// find a value a condensed 50k-line log hid — no re-run, no full dump
sh_detail { "id": "cmd9", "selector": "stdout", "match": "ERROR|version=" }
Adoption
veil is opt-in and complementary to Bash, not a drop-in replacement. Its value
only lands when the agent actually reaches for sh_run on effect-bearing or verbose
commands — and an agent left to its own judgment will often default to raw Bash. Two
levers close that gap, in increasing strength:
- The nudge — a short block in the project's
CLAUDE.mdtelling the agent to prefersh_run.veil initwrites it for you (idempotent). Soft, zero-friction. - The guard hook —
hooks/veil-guard.shhard-blocks verbose/dangerous Bash and steers it tosh_run, while explicitly letting interactive/long-running/background commands through. Stronger, opt-in per machine.
Honest limit: there is no native integration, so adoption depends on one of the above
being configured. veil init reduces that to a single command; it does not remove the
step. If you only want the tools available without changing agent behavior, skip both —
sh_run is still callable directly.
Configuration
All tunables are env-overridable (no rebuild):
| Env var | Default | Meaning |
|---|---|---|
VEIL_INLINE_MAX_LINES |
45 | stdout shorter than this (lines) is returned whole |
VEIL_HEAD_LINES |
20 | lines kept from the top when condensing |
VEIL_TAIL_LINES |
20 | lines kept from the bottom when condensing |
VEIL_MAX_LINE_CHARS |
1000 | max chars of any single inline line (longer → capped with a pointer) |
VEIL_STDERR_INLINE_ON_FAIL |
60 | on failure, show up to this many stderr lines inline |
VEIL_TIMEOUT_MS |
120000 | default per-command timeout (0 = none) |
VEIL_MAX_STREAM_BYTES |
5000000 | max bytes stored per stream (older dropped) |
VEIL_MAX_RECORDS |
500 | max addressable run records (oldest evicted) |
VEIL_STATE_DIR |
auto | on-disk record store base (auto: $XDG_STATE_HOME/veil → ~/.local/state/veil → $TMPDIR/veil). none/off = memory-only |
VEIL_RECORD_TTL_MS |
86400000 | persisted records older than this are pruned on boot (0 = keep) |
VEIL_EFFECTS |
true | compute the git effect-diff (set 0 to skip in huge repos) |
Optional: enforce with a hook
The nudge is a soft preference. hooks/veil-guard.sh
is a PreToolUse guard that hard-blocks only verbose (installs / builds / test
runners) or dangerous (rm -rf, dd, mkfs, raw-device writes) Bash commands,
steering them to sh_run. It is a routing guard, not a security boundary: it
changes which tool the agent reaches for, never whether a command may run (it is
fail-open and VEIL_BYPASS-able, and sh_run will execute the same command). Real
containment is the kernel sandbox, not this hook. Commands sh_run can't help with are explicitly
allowed through to raw Bash: long-running / dev / watch / start servers,
backgrounded jobs (trailing &), process management (kill/pkill/pgrep), and
interactive/TTY tools (vim/less/top/tail -f) — blocking those would only
break the flow, since sh_run blocks until exit and has no TTY/background. It is
fail-open (any parse error → allow, so a bug can never block all Bash) with an
escape hatch: prefix a command with VEIL_BYPASS=1 to force raw Bash. Enable globally in ~/.claude/settings.json:
{ "hooks": { "PreToolUse": [
{ "matcher": "Bash",
"hooks": [{ "type": "command",
"command": "/bin/sh '/ABSOLUTE/PATH/veil-mcp/hooks/veil-guard.sh'" }] }
] } }
Takes effect on the next Claude Code restart. Remove the entry to disable.
Security
sh_run executes arbitrary shell commands with the user's privileges and
exposes the server's full environment (including secrets in env vars) to them. This
is by design — it is a shell. Run it only in trusted contexts.
For a single risky command, opt into real kernel confinement with sandbox
(macOS sandbox-exec; Linux bubblewrap, experimental): writes confined to cwd +
temp, network optionally denied. Off by default; when requested where unavailable
the call refuses rather than running unconfined.
Portability is honest, not universal. The sandbox is an opt-in best-effort layer,
not a headline guarantee: macOS is solid; Linux bubblewrap is experimental and needs
unprivileged user namespaces; containers (Docker, Codespaces, many CI runners) and
Ubuntu 24.04+ often restrict those, so sandbox is probed lazily on the first
sandboxed call and reports unavailable there — by design, so it refuses rather than
pretends. The default,
non-sandboxed path works everywhere. (A namespace-free Linux backend via Landlock is
on the roadmap to cover the container case.)
Roadmap
| Feature | Status | |
|---|---|---|
| I | token-aware output | ✅ done |
| J | addressable output (sh_detail, match) |
✅ done |
| H | effect diff (git porcelain / trace-derived) | ✅ done |
| G | inline assertions (expect) |
✅ done |
| M | declarative retry/timeout | ✅ done |
| B / K-lite | static safety pre-check + classification (sh_plan) — segment-aware, not an execution dry-run |
✅ done |
| C | checkpoint / rollback | ✅ done |
| K | real sandbox (macOS sandbox-exec) |
✅ done |
| C+ | atomic CoW checkpoints (APFS clonefile) |
✅ done — same-volume APFS; cross-volume / non-APFS falls back to the rsync mirror and reports method: rsync (no false "clone") |
| J+ | disk-backed record store (sh_detail survives restart, TTL-pruned) |
✅ done (v0.4) |
| — | veil init zero-friction project setup |
✅ done (v0.4) |
| K+ | Linux sandbox (bubblewrap) | 🧪 experimental — write-confine validated on Linux CI (needs unprivileged user namespaces; Ubuntu 24.04+ / containers restrict them, so sandbox reports unavailable there unless relaxed) |
| A | structured trace (Linux strace) |
🧪 experimental — capture validated on Linux CI |
| K++ | namespace-free Linux sandbox (Landlock) — covers containers/Codespaces where bwrap can't | 🔭 planned |
| — | streaming / PTY + background jobs (sh_logs/sh_stop) |
🔭 planned |
See CHANGELOG.md for version history and ARCHITECTURE.md for the module/feature map.
Testing — verify it yourself
Don't take the numbers on trust — reproduce them. Everything runs locally, no account:
git clone https://github.com/vkmtx/veil-mcp && cd veil-mcp && npm install
npm run typecheck # tsc --noEmit
npm test # end-to-end smoke (228 assertions; some are platform-gated, so a single run executes a subset)
npm run backtest # byte-savings regression (a weighted bulk-condense ratio + a per-command envelope-overhead floor)
npm run bench # detailed 5-dimension benchmark (economy, latency, per-feature, condense, session)
npm run metrics # value metrics: agent-turns-saved, sandbox-escapes-blocked, signal-recall, checkpoint cost
CI (.github/workflows/ci.yml) runs the suite on macOS
and Linux (with bubblewrap + strace installed), so the Linux-only sandbox (K+)
and trace (A) paths are exercised where this dev machine (macOS) can't. The badge at
the top reflects the latest run; the claims here are whatever that pipeline asserts.
Metrics
npm run metrics quantifies the value the rest of this README describes — the moat
that raw Bash + truncation can't replicate. Numbers below are illustrative
(machine-dependent for the timing rows); reproduce them yourself.
| Metric | Result | What it measures |
|---|---|---|
| Agent turns saved | 55% fewer round-trips (11 → 5) | MCP calls collapsed by expect + effects + retry across 5 common tasks — counts calls, not bytes, so it holds as context windows grow |
| Sandbox escapes blocked | 5 / 5 (control write lands) | adversarial outside-cwd / spawned-child / symlink / network writes denied by the kernel; a legitimate in-cwd write still succeeds (selective, not deny-all) |
| Signal recall | 100% on a 10-fixture corpus | buried failures surfaced from the elided middle, incl. non-keyword crash idioms (SIGSEGV, CONFLICT, ! [rejected], timed out) |
| Checkpoint cost | clone ~1.5× faster, ~0 MB vs rsync 60 MB | CoW clone latency + disk vs the rsync mirror on a 60 MB tree (macOS / same-volume APFS) |
The deterministic rows (turns, recall) are asserted in the smoke suite from the same fixtures, so the published figures can't silently drift. On platforms without the real sandbox/clone, those rows self-report unavailable rather than printing a number.
Community
Contributions, questions, and ideas are welcome — this is an early project and a good time to shape it.
- 💬 Discussions — questions, ideas, show-and-tell.
- 🐛 Issues — bugs and feature requests (templates provided).
- 🌱 Good first issues — scoped entry points for a first PR.
- 📖 CONTRIBUTING.md — dev setup and the test gate.
- 🤝 Code of Conduct — be decent.
License
MIT — see LICENSE.
Status
v0.4 — experimental. Features I, J, H, G, M, B, K-lite, C, K, C+, plus v0.4's
disk-backed store and veil init, built and tested (228 smoke assertions +
backtest + value metrics, all green on macOS); K+ and A structured and validated on Linux CI.
A young, single-author project — judge it by the reproducible suite above, not its age.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.