GoldenMCP

GoldenMCP

A Web3 MCP evaluation marketplace enabling standardized Inspect evals with Walrus-backed results, ENS identity, Chainlink attestation, and x402 nanopayments on Arc.

Category
Visit Server

README

GoldenMCP

Web3 MCP evaluation marketplace: standardized Inspect evals, Walrus-backed results, ENS identity, Chainlink attestation, and x402 discovery on Arc.

Evals run against live Web3 MCP servers, get scored on data accuracy / tool-path / token efficiency, and the results are published to Walrus, attested by Chainlink Confidential AI, and written onchain to an MCP registry on Arc. Agents then pay a small USDC micropayment (x402) to look up the best-scoring MCP for a capability.

Bounties — find your code

Each bounty's integration lives in a small number of files. Links go straight to the relevant source on main.

ENS — MCP discovery via ENSIP-25/26

ENS names are the public identity for each scored MCP; text records point at Walrus eval blobs and the onchain registry, resolved live (no hard-coded names).

What Code
ENS text-record resolver (resolve_text, resolve_agent_context, resolve_eval_blob, resolve_mcp_endpoint) packages/identity/src/goldenmcp_identity/registry.py
Registry SDK (ens_name field, register/lookup) packages/identity/
Live ENS resolver UI apps/web/src/app/ens/page.tsx

Chainlink — CRE eval orchestration + Confidential AI attestation

A Chainlink CRE workflow orchestrates the whole pipeline: it calls the eval-runner to score an MCP, submits the score manifest to Confidential AI (CAI) for attestation, publishes to Walrus, then writes the score + attestation onchain.

The attestation is the completed TEE inference — there is no synthetic tx hash. The pipeline records the CAI inference_id and the bytes32 transcript hash (the enclave's response_digest, falling back to sha256(output)) on-chain via recordAttestation, mirroring Chainlink's official undercollateralized-loan example.

What Code
CRE pipeline (eval → CAI attest → Walrus → Arc) workflows/eval-pipeline/src/pipeline.ts
CRE workflow entrypoint + cron trigger workflows/eval-pipeline/src/workflow.ts
CAI submit/poll + attestation parsing (caiAttest, parseCaiAttestation) workflows/eval-pipeline/src/pipeline.ts
eval-runner HTTP service CRE calls packages/eval-runner/
CRE workflow config workflows/eval-pipeline/workflow.yaml

Arc — x402 USDC nanopayments for MCP lookup

The marketplace MCP is x402-gated: lookups return HTTP 402 with a USDC price until a payment header is present. Scores are written to an ERC-8004-inspired registry deployed on Arc, where USDC is the native gas token.

What Code
x402-gated lookup server (402 challenge, price ladder, settlement) packages/marketplace-mcp/src/goldenmcp_marketplace/app.py
MCP registry contract (register, updateCapabilityScore, recordAttestation) contracts/mcp-registry/src/MCPRegistry.sol
Arc deploy script contracts/mcp-registry/script/Deploy.s.sol
x402 lookup agent demo demo/lookup_agent.py
CRE → Arc registry write (writeToArc) workflows/eval-pipeline/src/pipeline.ts

Sui / Walrus — eval blob storage (not a bounty)

Walrus is the Sui-native decentralized blob store. Every score manifest and raw Inspect .eval log is written to Walrus testnet via its publisher/aggregator HTTP API, and ENS + registry records point at the resulting walrus://<blobId>. Listed here for completeness even though Sui is not a bounty.

What Code
Walrus publisher/aggregator client (upload, download, *_json) packages/walrus-client/src/goldenmcp_walrus/client.py
walrus:// fsspec adapter + index (Inspect View log dir) packages/walrus-client/
Web demo Walrus manifest fetch apps/web/src/lib/data.ts

Workflow diagrams

Eval pipeline (Chainlink CRE)

A CRE cron trigger fetches the benchmark list, then runs each MCP/capability through scoring, attestation, storage, and the onchain write. The eval-runner calls are async: the pipeline kicks off a run and polls until it reaches scored / published. CAI and Arc steps are skipped when their credentials are absent, so the pipeline is simulatable without secrets.

flowchart TD
    Cron([CRE cron trigger]) -->|GET /benchmarks| Runner[eval-runner HTTP]
    Runner -->|benchmark list| Loop[runPipeline per benchmark]

    Loop -->|"POST /eval/inspect, then poll GET /eval/runs/:id until scored"| Score[score manifest]
    Runner -.->|runs Inspect eval| MCP[(Web3 MCP server)]

    Score --> HasCAI{CAI configured?}
    HasCAI -->|yes| CAI[Confidential AI TEE<br/>POST /v1/inference + poll/callback]
    HasCAI -->|no| Pub
    CAI -->|inference_id + transcript_hash| Pub

    Pub["POST /eval/publish, then poll until published"] --> Walrus[(Walrus: manifest + raw .eval log)]
    Walrus --> HasReg{registry set?}
    HasReg -->|yes| Arc[writeToArc<br/>updateCapabilityScore + recordAttestation]
    HasReg -->|no| Done([done])
    Arc --> Registry[(MCPRegistry on Arc)]
    Registry --> ENS[ENS records point at Walrus + registry]

x402 lookup + payment (Arc)

An agent asks the marketplace for the best MCP for a capability. The first call returns a 402 with a USDC price (it scales with min_score); the agent pays in USDC on Arc and retries with an X-PAYMENT header. The marketplace then builds a score index from the registry + Walrus and returns the top match.

sequenceDiagram
    participant Agent as lookup_agent.py
    participant Market as marketplace-mcp (x402)
    participant Reg as MCPRegistry (Arc)
    participant Wal as Walrus

    Agent->>Market: POST /tools/lookup (capability, min_score)
    Market-->>Agent: 402 Payment Required (price_usdc, payee, network arc-testnet)
    Note over Agent: pay USDC on Arc
    Agent->>Market: POST /tools/lookup + X-PAYMENT header

    Note over Market: _load_index builds the score index
    Market->>Reg: list_agent_ids + getCapabilityScore per capability
    Market->>Wal: download_json(manifest blob)
    Market->>Market: filter by min_score, sort by composite

    Market-->>Agent: results[] top MCP (ens_name, mcp_endpoint, composite,<br/>attestation_id, transcript_hash) + payment_settled

Setup

Prerequisites

  • Python 3.12, managed with uv (no pip)
  • bun for the web app and CRE TypeScript workflow
  • foundry (forge, cast) for contracts and wallet generation
  • An LLM API key (e.g. Anthropic) and reachable Web3 MCP endpoints

Install

# Python toolchain + workspace
uv python install 3.12
uv sync --all-packages

# Credentials — copy and fill in
cp .env.example .env

Or bootstrap a demo machine (generates a cast wallet, sets MCP URLs, runs uv sync):

chmod +x scripts/setup_eval_env.sh
./scripts/setup_eval_env.sh          # full bootstrap
./scripts/setup_eval_env.sh --check  # prerequisites only

Eval chain defaults: Base (8453) for quote evals; Fraxtal (252) for odos_swap. Fund EVM_EVAL_ADDRESS on Base (and Fraxtal for Odos swaps). ENS identity uses Sepolia separately.

Run

# Unit tests
uv run pytest packages/ -v

# Run an eval against a live MCP (needs LLM key + MCP endpoints in .env)
uv run inspect eval goldenmcp/lifi_quote --model anthropic/claude-3-5-haiku-20241022
uv run inspect eval goldenmcp/odos_quote --model anthropic/claude-3-5-haiku-20241022

# eval-runner HTTP service (the API the CRE workflow calls)
uv run python -m goldenmcp_eval_runner

# Marketplace MCP (x402-gated lookup)
uv run python -m goldenmcp_marketplace

# x402 lookup agent demo (needs Arc wallet + x402)
uv run python demo/lookup_agent.py --capability quote --min-score 0.9

# Web demo (leaderboard, eval viewer, ENS resolver)
cd apps/web && bun install && bun run dev

Walrus + Inspect View

GoldenMCP stores eval logs on Walrus with an indexed walrus:// path (S3-style keys over content-addressed blobs). After the first upload, set WALRUS_INDEX_BLOB_ID in .env from the walrus_index_blob_id field printed by post_eval_walrus.py.

# Upload scored eval + raw Inspect log bytes
uv run python scripts/post_eval_walrus.py --mcp lifi --capability quote --log ./logs/your-run.json

# List logs from Walrus (same as s3:// log-dir)
uv run inspect view start --log-dir walrus://evals/goldenmcp

Inspect View requires native .eval / JSON log files at indexed paths — not score-manifest JSON alone.

Scoring

Dimension Weight
DataScore 0.45
PathScore 0.35
TokenEfficiency 0.20

Binary fail (composite 0.0) on prompt injection, disallowed tools, or policy violations.

See docs/scoring.md.

Structure

packages/inspect-web3     Inspect tasks + scorers
packages/walrus-client    walrus:// fsspec + HTTP client
packages/marketplace-mcp  x402 MCP server
packages/identity         ENS + registry SDK
packages/eval-runner      HTTP service for CRE
apps/web                  Leaderboard, eval viewer, ENS resolver
workflows/eval-pipeline   Chainlink CRE workflow
contracts/mcp-registry    ERC-8004-inspired MCP registry (Arc)

Architecture overview: docs/architecture.md. All implementation plans: docs/plans/.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured