clbench-fireworks-rft

clbench-fireworks-rft

Enables reinforcement fine-tuning of Qwen3-8B on the CLBench poker task using the eval-protocol MCP-Gym framework on Fireworks infrastructure, with structured tool calls via MCP.

Category
Visit Server

README

clbench-fireworks-rft

Reinforcement Fine-Tuning of Qwen3-8B on the CLBench exploitable_poker task, running on Fireworks infrastructure via eval-protocol MCP-Gym.

This is a port of the sr-networks/clbench-verifiers GRPO setup (Will Brown's verifiers framework + PrimeIntellect hosted training) onto Fireworks RFT. The CLBench poker simulator, action parsing, and reward shaping carry over unchanged; only the RL-framework glue is rewritten.


Why a port and not a copy

verifiers and Fireworks RFT are different abstractions. What transfers vs. what changes:

verifiers / Prime (upstream repo) Fireworks RFT (this repo) status
CLBenchEnv(vf.MultiTurnEnv) (env.py) poker_adapter.py + poker_mcp.py MCP gym ported
task.reset()/.step()/.get_instance_outcomes() PokerEnv wrapper + poker_act tool ported
rubric.py reward fns reward.pytest_poker_rft.py evaluator ported
parsing.py + guided_json (vLLM) MCP tool params are the PokerAction schema obsolete (tool-calling enforces structure)
cl-benchmark poker task imported unchanged no change
vf.RLTrainer GRPO + local vLLM firectl create reinforcement-fine-tuning-job platform
TOML configs + prime train RFT job flags (see Launch) re-expressed

Key win: because the poker_act tool's parameters are exactly the PokerAction fields (action / thinking / amount), the model emits structured tool calls and malformed-JSON parse failures become impossible — the old guided_json + parse_failure_penalty machinery is no longer needed.


Files

file role
poker_adapter.py PokerEnv + PokerAdapter — wraps the CLBench task (the env.py port)
poker_mcp.py PokerMcp(McpGym) — registers the poker_act tool, control-plane reward/termination
server.py MCP-Gym server launcher (python server.py --port N)
reward.py evaluator scoring (the rubric.py port): mean instance reward + illegal-action penalty
test_poker_rft.py @evaluation_test binding dataset + gym + model + reward
make_dataset.py generates poker_dataset.jsonl (one EvaluationRow per seed)
poker_dataset.jsonl 64-seed training dataset (regenerate with make_dataset.py)
requirements.txt deps Fireworks installs into the rollout container
validate_connection.py optional connectivity/structured-output smoke test (needs a served model)
setup.sh installs deps + creates the bin/python 3.11 shim

Setup

./setup.sh                                  # deps + bin/python shim
export FIREWORKS_API_KEY="fw_..."           # https://fireworks.ai/account/api-keys
firectl set-api-key "$FIREWORKS_API_KEY"
export PATH="$PWD/bin:$PATH"                 # python3.11 shim first (gym spawns `python server.py`)

firectl (Go binary) install: brew tap fw-ai/firectl && brew trust fw-ai/firectl && brew install fw-ai/firectl/firectl


Launch an RFT job

Two paths. The direct firectl path is what we actually used (it avoids a CLI bug — see Gotchas).

A) Upload the evaluator, then launch with firectl ← used

# 1. upload the evaluator (env + reward) so Fireworks builds the rollout container
eval-protocol create rft \
  --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --dry-run --skip-validation -y          # uploads evaluator; ignore the poller timeout

# 2. confirm evaluator is ACTIVE, upload dataset, create the job
firectl create dataset clbench-poker-qwen3-8b-data poker_dataset.jsonl
firectl create reinforcement-fine-tuning-job \
  --base-model accounts/fireworks/models/qwen3-8b \
  --dataset clbench-poker-qwen3-8b-data \
  --evaluator accounts/<ACCOUNT>/evaluators/test-poker-rftpytest-poker-rft \
  --output-model clbench-poker-qwen3-8b \
  --epochs 2 --learning-rate 1e-6 --temperature 1.0 \
  --max-output-tokens 1024 --response-candidates-count 8

B) Pure eval-protocol (once the poller bug is fixed upstream)

eval-protocol create rft --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --training-config-output-model clbench-poker-qwen3-8b \
  --training-config-epochs 2 --training-config-learning-rate 1e-6 \
  --inference-parameters-temperature 1.0 --inference-parameters-max-output-tokens 1024 \
  --inference-parameters-response-candidates-count 8

Config mapping from the Prime TOML

rollouts_per_example=8--response-candidates-count 8 (GRPO group size) · temperature=1.0 · max_tokens=1024--max-output-tokens 1024 · enable_thinking=false baked into the gym prompt · guided_json → tool-call schema (free).


Training runs

See RUNS.md for the full log. Summary:

run job id base output model epochs candidates status
1 hj1u6nxa qwen3-8b (free) clbench-poker-qwen3-8b 2 8 launched 2026-06-25, RUNNING

Monitor: firectl get reinforcement-fine-tuning-job <job-id> · dashboard: https://app.fireworks.ai/dashboard


Gotchas (hard-won)

  • from __future__ import annotations breaks eval-protocol. It stringifies annotations, so FastMCP tool registration (issubclass("str", Context)) and the @evaluation_test signature validator both fail. Do not use it in poker_mcp.py or test_poker_rft.py.
  • FastMCP (this version) crashes on Optional[int] tool params while locating the Context arg. poker_act uses a plain int = -1 sentinel instead.
  • firectl needs firectl set-api-key; it does not read FIREWORKS_API_KEY automatically. (firectl whoami additionally needs OIDC signin — ignore it.)
  • eval-protocol create rft has a poller bug: it polls …/evaluators/<file>.py::<func> — the .py:: makes the URL malformed → HTTP 400 → false 10-minute "evaluator not ready" timeout. The evaluator is actually ACTIVE; launch via firectl.
  • macOS python is often 2.7. The gym spawns python server.py, so bin/python must shim to the 3.11 interpreter that has the deps and be first on PATH.
  • Rollouts run on Fireworks, in a container built from requirements.txt — so a local serverless deployment of the base model is not required for training (only for local pytest rollouts).

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured