GoldenMCP
A Web3 MCP evaluation marketplace enabling standardized Inspect evals with Walrus-backed results, ENS identity, Chainlink attestation, and x402 nanopayments on Arc.
README
GoldenMCP
Web3 MCP evaluation marketplace: standardized Inspect evals, Walrus-backed results, ENS identity, Chainlink attestation, and x402 discovery on Arc.
Evals run against live Web3 MCP servers, get scored on data accuracy / tool-path / token efficiency, and the results are published to Walrus, attested by Chainlink Confidential AI, and written onchain to an MCP registry on Arc. Agents then pay a small USDC micropayment (x402) to look up the best-scoring MCP for a capability.
Bounties — find your code
Each bounty's integration lives in a small number of files. Links go straight to the relevant source on main.
ENS — MCP discovery via ENSIP-25/26
ENS names are the public identity for each scored MCP; text records point at Walrus eval blobs and the onchain registry, resolved live (no hard-coded names).
| What | Code |
|---|---|
ENS text-record resolver (resolve_text, resolve_agent_context, resolve_eval_blob, resolve_mcp_endpoint) |
packages/identity/src/goldenmcp_identity/registry.py |
Registry SDK (ens_name field, register/lookup) |
packages/identity/ |
| Live ENS resolver UI | apps/web/src/app/ens/page.tsx |
Chainlink — CRE eval orchestration + Confidential AI attestation
A Chainlink CRE workflow orchestrates the whole pipeline: it calls the eval-runner to score an MCP, submits the score manifest to Confidential AI (CAI) for attestation, publishes to Walrus, then writes the score + attestation onchain.
The attestation is the completed TEE inference — there is no synthetic tx hash. The pipeline records the CAI inference_id and the bytes32 transcript hash (the enclave's response_digest, falling back to sha256(output)) on-chain via recordAttestation, mirroring Chainlink's official undercollateralized-loan example.
| What | Code |
|---|---|
| CRE pipeline (eval → CAI attest → Walrus → Arc) | workflows/eval-pipeline/src/pipeline.ts |
| CRE workflow entrypoint + cron trigger | workflows/eval-pipeline/src/workflow.ts |
CAI submit/poll + attestation parsing (caiAttest, parseCaiAttestation) |
workflows/eval-pipeline/src/pipeline.ts |
| eval-runner HTTP service CRE calls | packages/eval-runner/ |
| CRE workflow config | workflows/eval-pipeline/workflow.yaml |
Arc — x402 USDC nanopayments for MCP lookup
The marketplace MCP is x402-gated: lookups return HTTP 402 with a USDC price until a payment header is present. Scores are written to an ERC-8004-inspired registry deployed on Arc, where USDC is the native gas token.
| What | Code |
|---|---|
| x402-gated lookup server (402 challenge, price ladder, settlement) | packages/marketplace-mcp/src/goldenmcp_marketplace/app.py |
MCP registry contract (register, updateCapabilityScore, recordAttestation) |
contracts/mcp-registry/src/MCPRegistry.sol |
| Arc deploy script | contracts/mcp-registry/script/Deploy.s.sol |
| x402 lookup agent demo | demo/lookup_agent.py |
CRE → Arc registry write (writeToArc) |
workflows/eval-pipeline/src/pipeline.ts |
Sui / Walrus — eval blob storage (not a bounty)
Walrus is the Sui-native decentralized blob store. Every score manifest and raw Inspect .eval log is written to Walrus testnet via its publisher/aggregator HTTP API, and ENS + registry records point at the resulting walrus://<blobId>. Listed here for completeness even though Sui is not a bounty.
| What | Code |
|---|---|
Walrus publisher/aggregator client (upload, download, *_json) |
packages/walrus-client/src/goldenmcp_walrus/client.py |
walrus:// fsspec adapter + index (Inspect View log dir) |
packages/walrus-client/ |
| Web demo Walrus manifest fetch | apps/web/src/lib/data.ts |
Workflow diagrams
Eval pipeline (Chainlink CRE)
A CRE cron trigger fetches the benchmark list, then runs each MCP/capability through scoring, attestation, storage, and the onchain write. The eval-runner calls are async: the pipeline kicks off a run and polls until it reaches scored / published. CAI and Arc steps are skipped when their credentials are absent, so the pipeline is simulatable without secrets.
flowchart TD
Cron([CRE cron trigger]) -->|GET /benchmarks| Runner[eval-runner HTTP]
Runner -->|benchmark list| Loop[runPipeline per benchmark]
Loop -->|"POST /eval/inspect, then poll GET /eval/runs/:id until scored"| Score[score manifest]
Runner -.->|runs Inspect eval| MCP[(Web3 MCP server)]
Score --> HasCAI{CAI configured?}
HasCAI -->|yes| CAI[Confidential AI TEE<br/>POST /v1/inference + poll/callback]
HasCAI -->|no| Pub
CAI -->|inference_id + transcript_hash| Pub
Pub["POST /eval/publish, then poll until published"] --> Walrus[(Walrus: manifest + raw .eval log)]
Walrus --> HasReg{registry set?}
HasReg -->|yes| Arc[writeToArc<br/>updateCapabilityScore + recordAttestation]
HasReg -->|no| Done([done])
Arc --> Registry[(MCPRegistry on Arc)]
Registry --> ENS[ENS records point at Walrus + registry]
x402 lookup + payment (Arc)
An agent asks the marketplace for the best MCP for a capability. The first call returns a 402 with a USDC price (it scales with min_score); the agent pays in USDC on Arc and retries with an X-PAYMENT header. The marketplace then builds a score index from the registry + Walrus and returns the top match.
sequenceDiagram
participant Agent as lookup_agent.py
participant Market as marketplace-mcp (x402)
participant Reg as MCPRegistry (Arc)
participant Wal as Walrus
Agent->>Market: POST /tools/lookup (capability, min_score)
Market-->>Agent: 402 Payment Required (price_usdc, payee, network arc-testnet)
Note over Agent: pay USDC on Arc
Agent->>Market: POST /tools/lookup + X-PAYMENT header
Note over Market: _load_index builds the score index
Market->>Reg: list_agent_ids + getCapabilityScore per capability
Market->>Wal: download_json(manifest blob)
Market->>Market: filter by min_score, sort by composite
Market-->>Agent: results[] top MCP (ens_name, mcp_endpoint, composite,<br/>attestation_id, transcript_hash) + payment_settled
Setup
Prerequisites
- Python 3.12, managed with
uv(nopip) bunfor the web app and CRE TypeScript workflowfoundry(forge,cast) for contracts and wallet generation- An LLM API key (e.g. Anthropic) and reachable Web3 MCP endpoints
Install
# Python toolchain + workspace
uv python install 3.12
uv sync --all-packages
# Credentials — copy and fill in
cp .env.example .env
Or bootstrap a demo machine (generates a cast wallet, sets MCP URLs, runs uv sync):
chmod +x scripts/setup_eval_env.sh
./scripts/setup_eval_env.sh # full bootstrap
./scripts/setup_eval_env.sh --check # prerequisites only
Eval chain defaults: Base (8453) for quote evals; Fraxtal (252) for odos_swap. Fund EVM_EVAL_ADDRESS on Base (and Fraxtal for Odos swaps). ENS identity uses Sepolia separately.
Run
# Unit tests
uv run pytest packages/ -v
# Run an eval against a live MCP (needs LLM key + MCP endpoints in .env)
uv run inspect eval goldenmcp/lifi_quote --model anthropic/claude-3-5-haiku-20241022
uv run inspect eval goldenmcp/odos_quote --model anthropic/claude-3-5-haiku-20241022
# eval-runner HTTP service (the API the CRE workflow calls)
uv run python -m goldenmcp_eval_runner
# Marketplace MCP (x402-gated lookup)
uv run python -m goldenmcp_marketplace
# x402 lookup agent demo (needs Arc wallet + x402)
uv run python demo/lookup_agent.py --capability quote --min-score 0.9
# Web demo (leaderboard, eval viewer, ENS resolver)
cd apps/web && bun install && bun run dev
Walrus + Inspect View
GoldenMCP stores eval logs on Walrus with an indexed walrus:// path (S3-style keys over content-addressed blobs). After the first upload, set WALRUS_INDEX_BLOB_ID in .env from the walrus_index_blob_id field printed by post_eval_walrus.py.
# Upload scored eval + raw Inspect log bytes
uv run python scripts/post_eval_walrus.py --mcp lifi --capability quote --log ./logs/your-run.json
# List logs from Walrus (same as s3:// log-dir)
uv run inspect view start --log-dir walrus://evals/goldenmcp
Inspect View requires native .eval / JSON log files at indexed paths — not score-manifest JSON alone.
Scoring
| Dimension | Weight |
|---|---|
| DataScore | 0.45 |
| PathScore | 0.35 |
| TokenEfficiency | 0.20 |
Binary fail (composite 0.0) on prompt injection, disallowed tools, or policy violations.
See docs/scoring.md.
Structure
packages/inspect-web3 Inspect tasks + scorers
packages/walrus-client walrus:// fsspec + HTTP client
packages/marketplace-mcp x402 MCP server
packages/identity ENS + registry SDK
packages/eval-runner HTTP service for CRE
apps/web Leaderboard, eval viewer, ENS resolver
workflows/eval-pipeline Chainlink CRE workflow
contracts/mcp-registry ERC-8004-inspired MCP registry (Arc)
Architecture overview: docs/architecture.md. All implementation plans: docs/plans/.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.