RegexForge
RegexForge gives AI agents a reliable way to get a regex without asking an LLM to hallucinate one. Pass in labeled examples (strings that should match, strings that shouldn't) plus an optional description; get back the regex, a proof matrix showing it handles every example, and a backtracking-risk audit flagging catastrophic-backtracking patterns. Pure symbolic synthesis over a template bank with
README
regexforge
An MCP (Model Context Protocol) server that synthesizes production-grade regexes from labeled examples. Zero LLM at serve time — pure symbolic synthesis over a template bank with a character-class inference fallback. Every response includes a proof matrix and a backtracking-risk audit.
- MCP transport: HTTP (streamable) · Protocol version:
2024-11-05 - MCP endpoint:
https://regexforge.jason-12c.workers.dev/mcp - MCP manifest: https://regexforge.jason-12c.workers.dev/.well-known/mcp.json
The MCP Tool
regexforge exposes a single MCP tool. An AI client (Claude Desktop, Cline, Continue, Cursor, or any MCP-aware agent) calls it the same way it would call any other MCP tool — tools/call over JSON-RPC 2.0.
regexforge_synth
Synthesize a battle-tested regex from labeled examples.
Input schema (what the model supplies):
{
"type": "object",
"required": ["examples"],
"properties": {
"description": {
"type": "string",
"description": "Optional natural-language description of the target pattern. Used only for tie-breaking when multiple templates fit."
},
"examples": {
"type": "array",
"minItems": 2,
"maxItems": 100,
"items": {
"type": "object",
"required": ["text", "match"],
"properties": {
"text": { "type": "string", "maxLength": 2048 },
"match": { "type": "boolean", "description": "true if the regex should match this string; false if it should NOT match." }
}
}
}
}
}
Output schema:
{
"regex": "string",
"flags": "string",
"source": "template | char_class",
"template_name": "string (if source=template)",
"test_matrix": [
{ "text": "string", "expected": "boolean", "actual": "boolean", "pass": "boolean" }
],
"all_pass": "boolean",
"backtrack_risk": "none | low | high",
"backtrack_reasons": [ "string" ],
"candidates_considered": "integer",
"candidates_passing": "integer",
"notes": [ "string" ]
}
Errors return JSON-RPC error objects with structured remediation:
not_expressible(HTTP 422) — examples imply a non-regular language (balanced-parens, counting, etc.).no_credits(HTTP 402) — wallet empty, purchase via/v1/credits.missing_input(HTTP 400) —fixfield tells you exactly what's missing.
Connecting from MCP Clients
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"regexforge": {
"transport": {
"type": "http",
"url": "https://regexforge.jason-12c.workers.dev/mcp"
},
"headers": {
"Authorization": "Bearer YOUR_API_KEY_HERE"
}
}
}
}
Get an API key with: curl -X POST https://regexforge.jason-12c.workers.dev/v1/keys (free, 50 credits on signup).
Python (official mcp SDK)
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
async def main():
url = "https://regexforge.jason-12c.workers.dev/mcp"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
async with streamablehttp_client(url, headers=headers) as (read, write, _):
async with ClientSession(read, write) as session:
await session.initialize()
tools = await session.list_tools()
result = await session.call_tool(
"regexforge_synth",
arguments={
"description": "ISO 8601 date like 2024-12-30",
"examples": [
{"text": "2024-12-30", "match": True},
{"text": "2023-01-01", "match": True},
{"text": "12/30/2024", "match": False},
{"text": "abc", "match": False},
],
},
)
print(result.content[0].text)
TypeScript (official @modelcontextprotocol/sdk)
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
new URL("https://regexforge.jason-12c.workers.dev/mcp"),
{ requestInit: { headers: { Authorization: "Bearer YOUR_API_KEY" } } }
);
const client = new Client({ name: "demo", version: "1.0.0" }, { capabilities: {} });
await client.connect(transport);
const res = await client.callTool({
name: "regexforge_synth",
arguments: {
description: "ethereum wallet address",
examples: [
{ text: "0x8ABCE477e22B76121f04c6c6a69eE2e6a12De53e", match: true },
{ text: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913", match: true },
{ text: "0x123", match: false },
{ text: "xyz", match: false },
],
},
});
console.log(res.content[0].text);
Raw JSON-RPC over HTTP
If you'd rather speak the protocol directly:
# 1. initialize
curl -X POST https://regexforge.jason-12c.workers.dev/mcp \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_KEY" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize"}'
# 2. list tools
curl -X POST https://regexforge.jason-12c.workers.dev/mcp \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_KEY" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list"}'
# 3. call the tool
curl -X POST https://regexforge.jason-12c.workers.dev/mcp \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_KEY" \
-d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"regexforge_synth","arguments":{"examples":[{"text":"2024-12-30","match":true},{"text":"abc","match":false}]}}}'
What happens under the hood
- Template bank match — tests ~65 pre-compiled battle-tested regex templates (email, UUID v4, ISO date, semver, ETH address, US phone, SHA-256, base64, US zip, MAC address, etc.) against every example. Any template that classifies all examples correctly is a candidate.
- Tie-break — if multiple templates pass, picks the one whose keywords best match the caller's
description, breaking further ties by pattern length. - Character-class inference fallback — if no template fits, extracts the longest common prefix and suffix from the positive examples, infers the middle as a union of character classes
[a-z0-9-]{n,m}with length bounds, then verifies the synthesized pattern rejects every negative example. - Return with proof — the response includes the full
test_matrixso the caller (model) can verify every example classifies correctly before using the regex in code. - Backtracking audit — a static pass over the returned regex flags nested quantifiers, backreferences, and lookaround that might cause catastrophic backtracking.
All deterministic. No LLM call at serve time. Typical latency <100 ms.
Example agent workflow
An AI agent writing code that needs to parse user-supplied strings calls regexforge_synth instead of generating the regex itself (which weaker models get wrong constantly):
Model thinks: "I need to parse this SKU-1234-AB pattern. Let me not hallucinate a regex."
Calls:
regexforge_synth({ description: "SKU like SKU-1234-AB", examples: [<3 positives, 5 negatives>] })Gets back:
{ regex: "^SKU-[-0-9A-Z]{7}$", all_pass: true, backtrack_risk: "none", source: "char_class" }Pastes
"^SKU-[-0-9A-Z]{7}$"into its code with confidence that every known example classifies correctly.
This is a single tool call instead of: (a) LLM writes a regex, (b) LLM writes test cases, (c) LLM simulates regex execution, (d) LLM second-guesses and rewrites, … which burns 10x the tokens and still gets it wrong.
Auth & pricing
- Programmatic key issuance:
POST /v1/keys→{ key, credits: 50 }. No email. No captcha. Agents mint their own. - Per-call cost: $0.002. Packs: starter ($5 / 2,500), scale ($50 / 30k), bulk ($500 / 350k).
- Payment (agent-autonomous):
POST /v1/credits { "pack": "starter" }returns a real Stripe Checkout URL + x402 USDC-on-Base headers. Complete payment, thenPOST /v1/credits/verify { "session_id": "cs_..." }credits the key. - Every error response includes a structured
fixfield telling the agent exactly what to change.
Other discovery surfaces (agent-only, machine-readable)
| Endpoint | Format |
|---|---|
GET /.well-known/ai-plugin.json |
OpenAI plugin manifest |
GET /.well-known/mcp.json |
MCP server manifest (tools + transport) |
GET /llms.txt |
llms.txt standard |
GET /openapi.json |
OpenAPI 3.1 |
GET /v1/pricing |
Machine-readable pricing |
GET /v1/errors |
Full error code catalog |
GET / |
Root index of all of the above |
Implementation
- Transport: HTTP streamable (MCP spec
2024-11-05), JSON-RPC 2.0 - Deployment: Cloudflare Workers (cold start <10 ms)
- Built on:
@walko/agent-microsaas— a skeleton that handles the MCP transport, discovery manifests, bearer-key auth, and credit ledger. regexforge itself is ~300 LOC of pure synthesis logic.
License
Apache-2.0.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.