claude-code-codex-agents

claude-code-codex-agents

Enables Claude Code to delegate tasks to OpenAI's Codex CLI (GPT-5.4) with structured execution traces, parallel execution, session persistence, and adversarial code review.

Category
Visit Server

README

claude-code-codex-agents

MIT License Python 3.12+ Tests MCP Compatible

日本語版 README はこちら

Give Claude Code structured Codex traces, not raw output.

For Claude Code users who want GPT-5.4 as a real tool: claude-code-codex-agents parses the entire JSONL event stream from Codex CLI and returns a structured execution report -- which tools it used, which files it touched, how long it took, and what went wrong. No other Codex MCP bridge does this.

Architecture Overview

graph LR
    A["Claude Code<br/>(Opus 4.6)"] -->|MCP Protocol| B["claude-code-codex-agents<br/>MCP Server"]
    B -->|"subprocess + stdin"| C[Codex CLI]
    C -->|JSONL stream| B
    C -->|API call| D["OpenAI API<br/>(GPT-5.4)"]
    B -->|Structured Report| A

Without vs With claude-code-codex-agents

Without -- You call Codex CLI and get a wall of text. You don't know what tools it used, what files it changed, or if it actually succeeded.

With claude-code-codex-agents -- Claude Code gets a structured execution trace:

[Codex gpt-5.4] Completed

⏱ Execution time: 8.3s
🧵 Thread: 019d436e-4c39-7093-b7ed-f8a26aca7938

📦 Tools used (3):
  ✅ read_file — src/auth.py
  ✅ edit_file — src/auth.py
  ✅ shell — python -m pytest tests/

📁 Files touched (1):
  • src/auth.py

━━━ Codex Response ━━━
Fixed the authentication logic. Token validation order was incorrect.

Why claude-code-codex-agents?

There are 6+ Codex MCP bridges on GitHub. Here's what makes this one different:

Other bridges claude-code-codex-agents
Output Raw text dump Structured trace (tools, files, timing, errors)
Parallel tasks 1 at a time Up to 6 simultaneous
Session continuity Stateless threadId persistence across calls
Security Pass-through 3-tier sandbox + terminal injection prevention
Tests Few or none 59 tests (parsing, security, sessions, edge cases, agent lifecycle)
Review Basic or none Adversarial Review Loop (GPT-5.4 challenges Claude's code)

Key Features

  • Full JSONL Trace Parsing -- Every Codex event (tool calls, file ops, errors) parsed into a structured report
  • Parallel Execution -- Run up to 6 Codex tasks simultaneously via parallel_execute
  • Session Management -- Continue previous threads with session_continue (threadId persistence)
  • Agent Lifecycle -- Run Codex as a background Claude Code-style worker via spawn_codex_agent, send_codex_agent_input, and wait_codex_agent
  • Adversarial Review Loop -- GPT-5.4 reviews Claude's code from a different perspective
  • Sandbox Security -- 3-tier policy (read-only / workspace-write / danger-full-access) + terminal injection prevention
  • Cross-Model Discussion -- Get GPT-5.4's opinion on design decisions via discuss
  • Zero External Dependencies -- Just FastMCP + Codex CLI. No databases, no Docker, no config files
  • Japanese Native -- Full Japanese prompt and report support
  • 59 Tests -- Comprehensive coverage including security, parsing, session management, agent lifecycle, and edge cases

Quick Start

1. Install Codex CLI

npm install -g @openai/codex
codex login

2. Install claude-code-codex-agents

git clone https://github.com/tsunamayo7/claude-code-codex-agents.git
cd claude-code-codex-agents
uv sync

3. Add to your MCP client

Claude Code (~/.claude/settings.json):

{
  "mcpServers": {
    "claude-code-codex-agents": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "--directory", "/path/to/claude-code-codex-agents", "python", "server.py"],
      "env": { "PYTHONUTF8": "1" }
    }
  }
}

<details> <summary><b>Cursor</b> (~/.cursor/mcp.json)</summary>

{
  "mcpServers": {
    "claude-code-codex-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/claude-code-codex-agents", "python", "server.py"],
      "env": { "PYTHONUTF8": "1" }
    }
  }
}

</details>

<details> <summary><b>VS Code / Windsurf</b></summary>

Add to your MCP settings:

{
  "claude-code-codex-agents": {
    "command": "uv",
    "args": ["run", "--directory", "/path/to/claude-code-codex-agents", "python", "server.py"],
    "env": { "PYTHONUTF8": "1" }
  }
}

</details>

Tools

Tool Description Sandbox
execute Delegate tasks to Codex with structured trace report workspace-write
trace_execute Same as execute, plus full event timeline workspace-write
parallel_execute Run up to 6 tasks simultaneously read-only
review Adversarial code review by GPT-5.4 read-only
explain Code explanation (brief/medium/detailed) read-only
generate Code generation with optional file output workspace-write
discuss Get GPT-5.4's perspective on design decisions read-only
session_continue Continue a previous Codex thread workspace-write
session_list List session history with thread IDs -
spawn_codex_agent Launch a background Codex worker with default / explorer / worker roles role-based
send_codex_agent_input Continue a background Codex worker with follow-up instructions same as agent
wait_codex_agent Wait for an agent turn and fetch the last structured result -
list_codex_agents Inspect tracked background Codex agents -
close_codex_agent Close an idle Codex agent -
status Check Codex CLI status and auth -

Claude Code-Style Agents

The new agent lifecycle tools let Claude Code treat Codex more like a persistent sub-agent than a one-shot CLI call.

  • Use spawn_codex_agent to start a background worker with a role preset: default for balanced execution, explorer for read-heavy investigation, worker for implementation.
  • Use send_codex_agent_input to continue the same worker after you read its last result.
  • Use wait_codex_agent to poll for completion without blocking other work.
  • Use list_codex_agents and close_codex_agent to manage idle workers.

Real-World Example: Adversarial Code Review

Claude Code writes code, then asks GPT-5.4 to review it:

[Codex Review] GPT-5.4 Review Result

⏱ Execution time: 15.7s

━━━ Codex Response ━━━
- [CRITICAL] `run(cmd)` calls `os.system(cmd)` directly -- command injection
  if `cmd` contains user input. Use `subprocess.run([...], shell=False)`.

- [WARNING] `divide(a, b)` raises ZeroDivisionError when b == 0.
  Add a pre-check or explicit error message.

- [INFO] No type hints on function signatures. Add `def divide(a: float,
  b: float) -> float:` for readability.

Real-World Example: Parallel Execution

Analyze multiple tasks simultaneously:

[Parallel Execution Complete] 3 tasks

━━━ Task 1 ✅ ━━━
Instruction: Analyze src/auth.py for security issues
⏱ 5.2s
...

━━━ Task 2 ✅ ━━━
Instruction: Review database query patterns in src/db.py
⏱ 7.8s
...

━━━ Task 3 ✅ ━━━
Instruction: Check error handling in src/api.py
⏱ 4.1s
...

Architecture

sequenceDiagram
    participant C as Claude Code
    participant H as claude-code-codex-agents
    participant X as Codex CLI
    participant O as OpenAI API

    C->>H: MCP tool call (execute)
    H->>H: _validate() + _enforce_sandbox()
    H->>X: subprocess (stdin prompt)
    X->>O: API request (GPT-5.4)
    O-->>X: Response
    X-->>H: JSONL event stream
    H->>H: parse_jsonl_events() → CodexTrace
    H->>H: _sanitize() → format_report()
    H-->>C: Structured report

Security Model

Sandbox Mode File Write Shell Exec Use Case
read-only Blocked Blocked Review, explain, discuss
workspace-write CWD only Allowed Execute, generate
danger-full-access Anywhere Allowed Full system access (use with caution)

Additional protections:

  • ANSI/OSC escape sequence sanitization (terminal injection prevention)
  • Input validation on all parameters
  • Process kill on timeout
  • --ephemeral flag (no persistent Codex state)

Development

# Setup
git clone https://github.com/tsunamayo7/claude-code-codex-agents.git
cd claude-code-codex-agents
uv sync --extra dev

# Run tests (59 tests)
uv run pytest tests/ -v

# Run server directly
uv run python server.py

Project structure: Single file (server.py, ~820 lines). Easy to read, modify, and contribute.

Use Cases

  1. Cross-Model Code Review -- Claude writes code, GPT-5.4 reviews it. Eliminates single-model bias.
  2. Parallel Codebase Analysis -- Analyze 6 files simultaneously, get structured reports for each.
  3. Design Discussion -- Get GPT-5.4's alternative perspective on architectural decisions via discuss.
  4. Session-Based Refactoring -- Large refactoring across multiple session_continue calls with context preservation.
  5. AI Second Opinion -- When Claude's answer seems off, ask GPT-5.4 for a sanity check.

Requirements

  • Python 3.12+
  • Codex CLI (npm install -g @openai/codex)
  • OpenAI account (Codex CLI must be authenticated via codex login)
  • uv (recommended) or pip

Related Projects

Helix Ecosystem

  • helix-ai-studio — All-in-one AI chat studio with 7 providers, RAG, MCP tools, and pipeline
  • helix-pilot — GUI automation MCP server — AI controls Windows desktop via local Vision LLM
  • helix-agent — Extend Claude Code with local Ollama models — cut token costs by 60-80%
  • helix-sandbox — Secure sandbox MCP server — Docker + Windows Sandbox

Alternative Codex Bridges

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured