Agent Arena MCP Server
Enables external agents to vet trades via a signed safety firewall, retrieve live leaderboard rankings, and list competing agents in a tournament environment.
README
Agent Arena (bitarena)
A live proving ground and safety firewall for autonomous trading agents on Bitget.
▶ Live: bitarena.vercel.app — call the signed firewall and verify a verdict yourself (browser or curl).
Built on open-source foundations (Vibe-Trading, FinRL, TradingAgents and others). See
NOTICEfor full attribution. The Arena engine, safety firewall, scoring, signed ledger, and Bitget integration are original work.
For judges — confirm it in 60 seconds
- See it live — then verify it yourself in one click: open bitarena.vercel.app; the LIVE FIREWALL badge ticks a freshly Ed25519-signed verdict on the real BTC price every few seconds. Click the badge → it verifies that live verdict's signature in your browser (Web Crypto, no server) → then hit "Tamper a byte" and watch the same signature go ✗ invalid. Trustless, tamper-evident, on live data.
- Run it:
uv venv && uv pip install -e ".[dev,api,mcp]" && uv run pytest(255 tests, offline) — ormake verifyfor the full gate (tests · lint · doc-numbers · evidence · red-team). - Verify the evidence yourself, offline:
uv run python scripts/verify_evidence.py→ re-checks every signed ledger (9,230 records) + certificate, all pinned to the published issuer. - Integrate in 5 lines:
uv run python scripts/integrate_example.py→ a third-party bot vets and offline-verifies its trades against the live deploy. - Read the threat model:
THREAT_MODEL.md— every threat mapped to the gate that stops it and the test/red-team case that proves it, with honest residual risks.
The thesis
The bottleneck in agentic trading is not alpha — it is trust. You cannot hand real capital to an autonomous agent unless you can (1) prove it is not just a lucky backtest, and (2) guarantee it physically cannot do something insane.
Everyone builds agents that generate trades. Agent Arena builds the layer that decides which agents deserve to be trusted with capital:
- A universal safety firewall. Every order from every agent passes through one
fail-closed gate that returns a signed
ALLOW/ALLOW_CAPPED/REJECTcertificate before anything reaches the exchange. No agent can blow up — and a market-wide kill-switch forces the whole fleet to de-risk-only in a fast crash. - A live tournament. Multiple autonomous agents (a debate swarm, an RL agent, a persona team, a single-LLM control) trade Bitget side by side on equal capital.
- Overfit-aware scoring. Agents are ranked with institutional rigor — Deflated Sharpe, Probability of Backtest Overfitting, walk-forward, drawdown — not raw PnL.
It is exposed over MCP, so any external agent or IDE can (a) ask the firewall to vet a trade, or (b) enter the arena and compete.
Why it spans every track
- Trading Agent — the competitors are fully autonomous perceive → decide → execute loops.
- Trading Infra — the firewall, benchmark, signed ledger, and MCP server are reusable infrastructure any developer can integrate.
- US Stock AI — the arena + firewall run across six of Bitget's tokenized US stocks (AAPL, TSLA, NVDA, MSFT, GOOGL, META).
Architecture
bitarena/
domain/ core value objects: TradeIntent, Verdict + signed cert, Mandate, market types
firewall/ Ed25519 signed certs · pure risk gates · fail-closed evaluate()
connectors/ ExchangeConnector protocol · PaperExchange · Bitget v2 REST client
perception/ technical features · Bitget Agent Hub Skills (macro/sentiment/news/onchain/technical)
agents/ swarm · regime (Playbook mirror) · persona team · Q-learning RL · momentum · buy-hold · funding-carry · Qwen LLM debate
arena/ tournament engine · per-agent portfolio/PnL · leaderboard · TrustAllocator · LiveArena (resumable live mode)
scoring/ Sharpe/Sortino/drawdown · Deflated Sharpe / PSR / PBO
ledger/ append-only Ed25519-signed trade log (Bitget-required fields, tamper-evident)
mcp/ MCP server: vet_trade(), get_leaderboard(), list_agents()
api/ FastAPI: /firewall /verify /pubkey /leaderboard /live /ledger /debate (+ serves the UI)
research/ funding-carry edge study (walk-forward + Deflated Sharpe)
web/ production single-page UI: firewall · arena · ledger · debate · verify
playbook/ four published Bitget GetAgent Playbooks — see playbook/PUBLISHED.md
Quickstart
Fastest path (needs uv): make setup then make demo (tests + signed verdict +
red-team), or make serve for the UI + API at http://localhost:8000. Or manually:
# 1. environment (uv recommended) — api+mcp extras let the full suite run
uv venv
uv pip install -e ".[dev,api,mcp]"
# 2. run the test suite (offline, no network, no keys needed)
uv run pytest
# 3. run a tournament on real Bitget data (trade logs + leaderboard)
uv run python scripts/run_arena.py --source bitget --instrument perp --bars 1000
# 4. try the firewall directly (signed verdict)
uv run python scripts/demo_firewall.py --symbol BTCUSDT --side buy --notional 999999
# 5. red-team the firewall (proves 0 unsafe orders pass)
uv run python scripts/redteam.py
# 6. trust allocator: fund agents by verified performance vs equal-weight
uv run python scripts/allocator_demo.py --regime
Deploy the firewall to a public URL in minutes — see DEPLOY.md.
For live Bitget data / orders, copy .env.example to .env and fill in your
Bitget API keys (read permission is enough for market data and the read-only
arena; trade permission — ideally on a dedicated sub-account — is needed for live
order placement).
Run the API and MCP server
uv pip install -e ".[api,mcp]"
uv run uvicorn bitarena.api.app:app --port 8000 # UI at / · HTTP: /health /firewall /verify /pubkey /leaderboard /live /ledger /debate
uv run python -m bitarena.mcp.server # MCP (stdio): vet_trade, get_leaderboard, list_agents
Connect the MCP server from Claude Desktop / Cursor / Codex — add this to your MCP client
config (e.g. claude_desktop_config.json), pointing --directory at your clone:
{
"mcpServers": {
"bitarena": {
"command": "uv",
"args": ["--directory", "/path/to/bitarena", "run", "python", "-m", "bitarena.mcp.server"]
}
}
}
Then ask your agent to "vet a BTCUSDT buy of $50 through the bitarena firewall" — it calls
vet_trade and gets back a signed verdict. No Bitget keys needed for the offline path.
Live mode (paper → live): run the arena continuously on real Bitget data — each call
processes new candles and persists state (portfolios + signed ledgers + cursor), so it
resumes across runs. Schedule it (cron / a deployed worker) and GET /live serves the
continuously-growing tournament:
uv run python scripts/live_step.py --symbol BTCUSDT --instrument perp --state evidence/live
Vet a trade over HTTP:
curl -s localhost:8000/firewall \
-H 'content-type: application/json' \
-d '{"agent_id":"my-agent","symbol":"BTCUSDT","side":"buy","notional_usd":50}'
Integrate in Python — a third-party bot vets every trade in a few lines (no Arena code beyond the client), against the public deploy or your own host:
from bitarena.client import FirewallClient
fw = FirewallClient("https://bitarena.vercel.app")
v = fw.vet("BTCUSDT", "buy", notional_usd=50)
if v.allowed: # ALLOW / ALLOW_CAPPED
place_my_order("BTCUSDT", "buy", v.effective_notional_usd)
assert v.verify(fw.issuer_key()) # signature intact AND signed by this arena — offline
Full runnable example: uv run python scripts/integrate_example.py (hits the live deploy).
Bring your own agent — the arena is an open platform: any object with an agent_id and a
decide(obs) -> TradeIntent | None competes, firewall-gated and overfit-scored like the
built-ins. That's the entire contract:
class MeanReversionAgent: # ~15 lines, no arena internals
agent_id = "my-mean-reversion"
def decide(self, obs):
candles = obs.market.get_candles(obs.symbol, obs.instrument, limit=20)
if len(candles) < 20:
return None
sma = sum(c.close for c in candles) / len(candles)
target = obs.equity_usd * 0.5 if obs.price < sma else 0.0 # long below SMA, else flat
return rebalance_to_target(agent_id=self.agent_id, obs=obs, target_notional_signed=target)
Drop it into the agents=[...] list and it competes. Runnable: make custom-agent
(scripts/custom_agent_example.py).
Verify it yourself — every certificate is independently checkable, with no trust in
this server. The Verify tab checks the Ed25519 signature
entirely in your browser (Web Crypto) and pins the embedded key to the published issuer —
the certificate never leaves your machine. Offline, scripts/verify_cert.py and
FirewallClient.verify() need nothing but the cert; POST /verify and GET /pubkey are the
server-side equivalents:
uv run python scripts/demo_firewall.py --symbol BTCUSDT --side buy --notional 50 > v.json
uv run python scripts/verify_cert.py --file v.json # -> ✓ signature VALID (fully offline)
Or re-verify the entire evidence pack in one command — every signed ledger's hash-chain
and signatures, every certificate, all pinned to the published issuer (config/issuer_pubkey.hex):
uv run python scripts/verify_evidence.py
# -> ✓ 53 ledgers, 9,230 signed records, certs + red-team — signed, chained, pinned, untampered
Documentation
| Doc | What |
|---|---|
SUBMISSION.md |
The submission narrative — problem → thesis → how it works → tracks → evidence → honest self-assessment |
SUBMISSION_PACKET.md |
The actionable packet — IDs/links, per-track mapping, ready-to-paste form answers, owner checklist |
PITCH.md |
One-page judge / investor pitch |
SELF_ASSESSMENT.md |
Honest rubric-by-rubric rating (strengths + limits) |
DEMO.md |
3-minute demo storyboard |
DEPLOY.md |
Deploy the firewall to a public URL |
FRONTEND.md |
Frontend handoff spec |
evidence/README.md |
Reproducible results on real Bitget data |
playbook/PUBLISHED.md |
The four published Bitget Playbooks |
NOTICE |
Open-source attribution |
Status
Complete and tested — 255 passing tests, lint-clean, fully offline: the signed tamper-evident firewall (red-teamed, 0 unsafe orders pass), a live Bitget connector (real data verified), the arena with seven competitors (conflict-gated swarm, the published-Playbook regime mirror, persona team, Q-learning RL, momentum, buy-hold, and a funding-carry agent that harvests real perpetual funding) plus an optional live Qwen LLM debate agent, anti-overfit scoring (Deflated Sharpe / PSR / PBO), the TrustAllocator, the signed ledger, the MCP + HTTP API, an independent verifier, and the production UI. The four core mechanisms — the firewall, the signed ledger, the overfit-aware scoring, and the portfolio accounting (value conservation) — are property-tested over thousands of randomized inputs (not just hand-picked cases), and the live-data parsers are fuzz-tested against malformed exchange responses.
Four strategies are published on Bitget's GetAgent platform (real on-platform
backtests): Momentum Breakout BTC (Sharpe 1.68, PF 2.33), Momentum Breakout ETH (PF 1.42),
Adaptive Regime BTC (Sharpe 0.72, PF 1.74), and Adaptive Regime ETH (Sharpe 2.15, PF 3.34,
best risk-adjusted) — plus three more honestly withheld for underperforming on real data. A funding-carry edge is validated on real Bitget funding history; the
firewall benchmarks at ~0.1 ms per signed verdict. See
evidence/ and playbook/PUBLISHED.md.
Frontend
web/index.html is the production single-page UI (designed in Claude Design, implemented
here): an interactive firewall console, the live leaderboard, the signed ledger, the LLM
debate view, and an independent certificate verifier. The API serves it at /, and it
falls back to bundled demo data when offline.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.