MCP Servers

clbench-fireworks-rft

Enables reinforcement fine-tuning of Qwen3-8B on the CLBench poker task using the eval-protocol MCP-Gym framework on Fireworks infrastructure, with structured tool calls via MCP.

README

clbench-fireworks-rft

Reinforcement Fine-Tuning of Qwen3-8B on the CLBench exploitable_poker task, running on Fireworks infrastructure via eval-protocol MCP-Gym.

This is a port of the sr-networks/clbench-verifiers GRPO setup (Will Brown's verifiers framework + PrimeIntellect hosted training) onto Fireworks RFT. The CLBench poker simulator, action parsing, and reward shaping carry over unchanged; only the RL-framework glue is rewritten.

Why a port and not a copy

verifiers and Fireworks RFT are different abstractions. What transfers vs. what changes:

`verifiers` / Prime (upstream repo)	Fireworks RFT (this repo)	status
`CLBenchEnv(vf.MultiTurnEnv)` (`env.py`)	`poker_adapter.py` + `poker_mcp.py` MCP gym	ported
`task.reset()/.step()/.get_instance_outcomes()`	`PokerEnv` wrapper + `poker_act` tool	ported
`rubric.py` reward fns	`reward.py` → `test_poker_rft.py` evaluator	ported
`parsing.py` + `guided_json` (vLLM)	MCP tool params are the `PokerAction` schema	obsolete (tool-calling enforces structure)
`cl-benchmark` poker task	imported unchanged	no change
`vf.RLTrainer` GRPO + local vLLM	`firectl create reinforcement-fine-tuning-job`	platform
TOML configs + `prime train`	RFT job flags (see Launch)	re-expressed

Key win: because the poker_act tool's parameters are exactly the PokerAction fields (action / thinking / amount), the model emits structured tool calls and malformed-JSON parse failures become impossible — the old guided_json + parse_failure_penalty machinery is no longer needed.

Files

file	role
`poker_adapter.py`	`PokerEnv` + `PokerAdapter` — wraps the CLBench task (the `env.py` port)
`poker_mcp.py`	`PokerMcp(McpGym)` — registers the `poker_act` tool, control-plane reward/termination
`server.py`	MCP-Gym server launcher (`python server.py --port N`)
`reward.py`	evaluator scoring (the `rubric.py` port): mean instance reward + illegal-action penalty
`test_poker_rft.py`	`@evaluation_test` binding dataset + gym + model + reward
`make_dataset.py`	generates `poker_dataset.jsonl` (one EvaluationRow per seed)
`poker_dataset.jsonl`	64-seed training dataset (regenerate with `make_dataset.py`)
`requirements.txt`	deps Fireworks installs into the rollout container
`validate_connection.py`	optional connectivity/structured-output smoke test (needs a served model)
`setup.sh`	installs deps + creates the `bin/python` 3.11 shim

Setup

./setup.sh                                  # deps + bin/python shim
export FIREWORKS_API_KEY="fw_..."           # https://fireworks.ai/account/api-keys
firectl set-api-key "$FIREWORKS_API_KEY"
export PATH="$PWD/bin:$PATH"                 # python3.11 shim first (gym spawns `python server.py`)

firectl (Go binary) install: brew tap fw-ai/firectl && brew trust fw-ai/firectl && brew install fw-ai/firectl/firectl

Launch an RFT job

Two paths. The direct firectl path is what we actually used (it avoids a CLI bug — see Gotchas).

A) Upload the evaluator, then launch with firectl ← used

# 1. upload the evaluator (env + reward) so Fireworks builds the rollout container
eval-protocol create rft \
  --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --dry-run --skip-validation -y          # uploads evaluator; ignore the poller timeout

# 2. confirm evaluator is ACTIVE, upload dataset, create the job
firectl create dataset clbench-poker-qwen3-8b-data poker_dataset.jsonl
firectl create reinforcement-fine-tuning-job \
  --base-model accounts/fireworks/models/qwen3-8b \
  --dataset clbench-poker-qwen3-8b-data \
  --evaluator accounts/<ACCOUNT>/evaluators/test-poker-rftpytest-poker-rft \
  --output-model clbench-poker-qwen3-8b \
  --epochs 2 --learning-rate 1e-6 --temperature 1.0 \
  --max-output-tokens 1024 --response-candidates-count 8

B) Pure eval-protocol (once the poller bug is fixed upstream)

eval-protocol create rft --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --training-config-output-model clbench-poker-qwen3-8b \
  --training-config-epochs 2 --training-config-learning-rate 1e-6 \
  --inference-parameters-temperature 1.0 --inference-parameters-max-output-tokens 1024 \
  --inference-parameters-response-candidates-count 8

Config mapping from the Prime TOML

rollouts_per_example=8 → --response-candidates-count 8 (GRPO group size) · temperature=1.0 · max_tokens=1024 → --max-output-tokens 1024 · enable_thinking=false baked into the gym prompt · guided_json → tool-call schema (free).

Training runs

See RUNS.md for the full log. Summary:

run	job id	base	output model	epochs	candidates	status
1	`hj1u6nxa`	qwen3-8b (free)	clbench-poker-qwen3-8b	2	8	launched 2026-06-25, RUNNING

Monitor: firectl get reinforcement-fine-tuning-job <job-id> · dashboard: https://app.fireworks.ai/dashboard

Gotchas (hard-won)

from __future__ import annotations breaks eval-protocol. It stringifies annotations, so FastMCP tool registration (issubclass("str", Context)) and the @evaluation_test signature validator both fail. Do not use it in poker_mcp.py or test_poker_rft.py.
FastMCP (this version) crashes on Optional[int] tool params while locating the Context arg. poker_act uses a plain int = -1 sentinel instead.
firectl needs firectl set-api-key; it does not read FIREWORKS_API_KEY automatically. (firectl whoami additionally needs OIDC signin — ignore it.)
eval-protocol create rft has a poller bug: it polls …/evaluators/<file>.py::<func> — the .py:: makes the URL malformed → HTTP 400 → false 10-minute "evaluator not ready" timeout. The evaluator is actually ACTIVE; launch via firectl.
macOS python is often 2.7. The gym spawns python server.py, so bin/python must shim to the 3.11 interpreter that has the deps and be first on PATH.
Rollouts run on Fireworks, in a container built from requirements.txt — so a local serverless deployment of the base model is not required for training (only for local pytest rollouts).

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured