rocq-piler

rocq-piler

MCP server that bridges LLMs with the Rocq (Coq) proof assistant via LSP, enabling interactive theorem proving with AI.

Category
Visit Server

README

πŸ€– rocq-piler

rocq-piler

Let rocq-piler do the heavy lifting for your proofs.

Overview

This project demonstrates how a Language Server Protocol (LSP) interface to the Rocq proof assistant (formerly Coq) enables neurosymbolic programming: the seamless integration of neural networks (LLMs) with symbolic reasoning systems (proof assistants).

What is Neurosymbolic Programming?

Neurosymbolic programming combines:

  • Neural/Statistical AI (LLMs): Pattern recognition, natural language understanding, heuristic search
  • Symbolic AI (Proof assistants): Logical reasoning, formal verification, guaranteed correctness

This approach leverages the complementary strengths of both paradigms:

Neural (LLM) Symbolic (Rocq)
Intuition & pattern matching Rigorous logical deduction
Generates candidate solutions Verifies correctness
Probabilistic & approximate Deterministic & exact
Learns from examples Reasons from axioms
Fast but fallible Slow but infallible

How LSP Enables Neurosymbolic Rocq Programming

The Traditional Problem

Historically, interacting with proof assistants required:

  • Deep knowledge of proof tactics and syntax
  • Understanding of internal proof state representations
  • Manual, tedious proof construction
  • Limited automation capabilities

This created a barrier between AI systems (which excel at pattern matching and generation) and proof assistants (which excel at verification).

The LSP Solution

The Language Server Protocol provides a standardized, machine-friendly interface that:

  1. Exposes Proof State: Query current goals, hypotheses, and proof context at any position
  2. Enables Interactive Editing: Apply tactics and immediately observe their effects
  3. Supports Speculative Execution: Try tactics without committing to file edits (PΓ©tanque API)
  4. Provides Structured Feedback: Parse errors, type information, and verification results

This transforms the proof assistant into a verification oracle that an LLM can query interactively.

In this implementation, the bridge is also project-aware: it can detect Coq/Rocq project roots, reopen documents under the correct workspace, and keep rocq-lsp aligned with _CoqProject, _RocqProject, or dune-project layouts.

The Neurosymbolic Loop

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   LLM       β”‚  ← Reads: proof goals, context, error messages
β”‚  (Neural)   β”‚  β†’ Generates: candidate tactics, proof strategies
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       ↓ (proposes tactics)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LSP Server  β”‚  ← Bridges neural and symbolic systems
β”‚  (Interface) β”‚  β†’ Translates between LLM and Rocq
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       ↓ (executes & verifies)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Rocq      β”‚  ← Verifies: type-checks, proves correctness
β”‚  (Symbolic)  β”‚  β†’ Returns: success/failure, updated goals
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow:

  1. LLM reads proof goal via LSP (coq_open_goals)
  2. LLM generates candidate tactic(s) based on patterns learned from training
  3. Tactics are applied via LSP (coq_insert_tactic)
  4. Rocq verifies the tactic is type-correct and logically sound
  5. If successful: proof progresses (new subgoals or QED)
  6. If failed: error message guides LLM to try alternatives
  7. Loop continues until proof is complete

Why This Is Powerful

This architecture achieves:

Guided Generation: The LLM doesn't need to be "correct" β€” only creative. Rocq acts as a discriminator that filters out invalid proofs.

Incremental Verification: Each step is verified immediately, preventing cascading errors.

Explainable AI: Every LLM decision is justified by a formal proof that can be audited by humans.

Learning from Mistakes: Structured error feedback allows the LLM to refine its search strategy.

Safe Automation: The LLM can explore aggressively because Rocq guarantees soundness.

Key Capabilities Enabled by LSP

1. Goal-Directed Search

The LLM can:

  • Query the current proof goal: "Prove that forall n, n + 0 = n"
  • Understand the hypothesis context: "Given: n : nat"
  • Generate relevant tactics: "I'll try induction on n"

LSP Tool: coq_open_goals β€” returns structured goal representation

2. Speculative Execution

The LLM can:

  • Try multiple tactics without modifying the file
  • Quickly discard failures
  • Only commit tactics that successfully reduce the goal

LSP Tools: coq_get_state_at_pos, coq_run_tactic, coq_goals_for_state (PΓ©tanque API)

3. Interactive Proof Refinement

The LLM can:

  • Incrementally build proofs tactic-by-tactic
  • Respond to verification failures by adjusting strategy
  • Learn which patterns work in different contexts

LSP Tool: coq_insert_tactic β€” applies tactic and returns updated goals

4. Error-Driven Learning

The LLM can:

  • Receive structured error messages from Rocq
  • Understand type mismatches, unification failures, etc.
  • Adjust tactics based on specific error feedback

LSP Feature: All tools return structured error information

The Hash-Driven Workflow

rocq-piler uses a content-addressed admit system: every open goal in a proof has a unique hash computed from its goal text. This turns proof construction into a clean cycle:

focus_proof β†’ get hashes β†’ insert_tactic(admit_hash=X) β†’ repeat until Qed

Example: Proving (True /\ True) /\ True in 3 calls

1. focus_proof(file="Thm.v", name="deep_conj")
   ← 1 admit: abc123  L7: (True / True) / True

2. insert_tactic(admit_hash="abc123", tactic="split.\n- admit.\n- admit.")
   ← 2 admits remaining:
      def456  L8: True / True   (hyps: empty, goal: True / True)
      ghi789  L9: True           (hyps: empty, goal: True)

3. insert_tactic(admit_hash="def456", tactic="split; exact I.")
   insert_tactic(admit_hash="ghi789", tactic="exact I.")
   ← Qed applied

Why hashes? Every admit with the same goal text shares the same hash. Close four True admits at once β€” across any bullet depth β€” in a single call. The LLM reads hashes from focus_proof, writes tactics for them, and counts down to Qed.

Example: Dependent Type Progress Theorem in 9 calls

See examples/dep_vec.v for a full dependently-typed language (Nat + Vec n) with both preservation and progress theorems proved entirely via hash-driven MCP calls:

1. focus_proof β†’ hash for the theorem goal
2. insert_tactic(admit_hash=X, tactic="{induction + 7 stubs}") β†’ 7 case hashes with hyps+goal inline
3-9. insert_tactic(admit_hash=<case>, tactic="...") Γ— 7 β†’ Qed

Each response shows hypotheses + goal per admit, so the LLM writes each case's tactic without extra focus_proof calls.

MCP Tools

Tool Description
focus_proof One-stop shop: proof state, script, all open admits with hashes + hyps + goals
insert_tactic Insert a tactic; pass admit_hash to target specific admits by hash; multi-line blocks supported
try_step Speculatively execute a tactic without modifying the file
open_goals Quick goal query with compact/Prev mode options
proof_state Full proof context including name, statements, goal state
undo_step Roll back the last N edit operations
reset_proof Wipe a proof body to fresh Admitted.
add_lemma / delete_lemma Insert/remove lemma stubs
delete_step Remove the last tactic line from a proof
edit_file Find-and-replace text edits with LSP re-sync
check_file Force full document checking
check_range Get diagnostics for a specific line range
search_lemmas / inspect_term / inspect_about / locate_term / require_lib Knowledge & search
snap_state / exec_tactic / state_goals Low-level PΓ©tanque API

Current Server Capabilities

  • Content-addressed admits: every open goal gets a hash β€” same goal text = same hash, close all matching admits at once across any bullet depth
  • Hypotheses + goal per admit: focus_proof and insert_tactic responses include full hypothesis context for each open admit
  • Multi-line stub insertion: open a proof, insert N bulleted admits, and get N hashes back β€” all in one insert_tactic call
  • Re-seal with multi-goal expansion: when a tactic produces N > 1 goals, they are optionally expanded to N individually addressable bulleted admits
  • Auto-Qed: proof automatically closed when all tactic admits and focused goals are eliminated; guarded against premature firing when background goals remain
  • Dynamic workspace switching: automates restarting rocq-lsp under different project roots
  • Speculative execution: try tactics safely via try_step or PΓ©tanque state APIs before committing
  • Bullet-aware proof navigation: focus_proof reports bullet stack depth, given-up counts, and sibling context
  • Safe undo/reset: tracks edit operations, supports multi-step undo

Benefits of This Approach

For AI/LLM Developers

  • Structured Interface: No need to parse Rocq syntax or output
  • Immediate Feedback: Know instantly if a tactic succeeds
  • Safe Exploration: Can't generate unsound proofs
  • Reduced Search Space: Type system eliminates invalid candidates

For Proof Engineers

  • AI Assistance: Let LLMs handle tedious proof details
  • Guaranteed Soundness: All proofs are formally verified
  • Interactive Refinement: Guide AI when it gets stuck
  • Explainable Results: Every proof step is auditable

For Researchers

  • Benchmark Platform: Test AI proof capabilities
  • Formal ML: Train models on verified code
  • Safe Code Generation: Generate formally verified programs
  • Hybrid Intelligence: Study human-AI-proof collaboration

Real-World Applications

1. Automated Theorem Proving

LLMs can tackle routine lemmas while experts focus on complex proofs.

2. Formally Verified Software

Generate code with machine-checked correctness proofs (e.g., crypto, compilers).

3. Mathematical Formalization

Convert natural language math into Rocq, assisted by LLMs.

4. Education

Interactive tutoring systems that teach both intuition (LLM) and rigor (Rocq).

5. AI Safety Research

Build verifiably safe AI systems with formal guarantees.

Getting Started

Prerequisites

# Install Rocq and rocq-lsp
opam install coq-lsp  # rocq-lsp coming soon

# Verify installation
coq-lsp --version  # or rocq-lsp --version

Installation

cd rocq-piler
npm install
npm run build

Usage with OpenCode

Add to your MCP settings:

{
  "mcpServers": {
    "rocq-piler": {
      "command": "node",
      "args": [
        "/path/to/rocq-piler/dist/index.js",
        "--workspace-root",
        "/path/to/your/rocq/project"
      ]
    }
  }
}

The server can start with one workspace root and later switch automatically when a tool call targets a file inside a different Rocq project.

Then in OpenCode:

You: "Help me prove that addition is commutative in Rocq"

OpenCode: [Uses coq_open_goals to see the goal]
OpenCode: [Uses coq_insert_tactic to try "induction n"]
OpenCode: [Iteratively builds the proof using LSP feedback]
OpenCode: "Proof complete! Here's what I did..."

Real Proof Example: Dependent Type System

See examples/dep_vec.v for a complete, working example of a simple dependently-typed language:

Language: Nat + Vec n β€” base type Nat, dependent Vec n of length n
Theorems proved via MCP tools:

  • βœ… Preservation (7 cases): has_type t T β†’ step t t' β†’ has_type t' T β€” proved in 9 MCP calls
  • βœ… Progress (7 cases): has_type t T β†’ value t ∨ βˆƒ t', step t t' β€” proved in 9 MCP calls

Both proofs use the same pattern: focus_proof β†’ multi-line stub β†’ case hashes with hyps+goal β†’ close by hash one-by-one.

A larger PCF+References benchmark is in benchmarks/complete/pcf_ref.v:

  • βœ… 9 helper lemmas + preservation theorem (21 cases)
  • βœ… Full heap semantics with store extension, weakening, and substitution
  • βœ… All cases closed with Qed
  • βœ… Total cost: 4 cents ($0.0351 actual, 50 API calls, DeepSeek V4)

Benchmarks

Benchmark Theorem Cases Lemmas API Calls Cost
Dep Vec (dep_vec.v) Preservation + Progress 7+7 0 18 β€”
PCF + Ref (pcf_ref.v) Preservation 21 9 50 $0.04

PCF + References (Type Preservation)

A PCF language extended with mutable references (allocation, dereference, assignment), heap semantics, and store typing. The preservation theorem establishes that well-typed programs remain well-typed after a step, with the store typing possibly extended.

Auxiliary lemmas proved: extends_refl, extends_trans, nth_error_extends, store_weakening, shift_typing, subst_typing, heap_ok_extends, heap_lookup_has_type, heap_ok_update

Source: benchmarks/incomplete/pcf_ref.v (challenge) β†’ benchmarks/complete/pcf_ref.v (solution)

Additional Examples

  • examples/dep_vec.v β€” Dependent type system (Nat + Vec n) with preservation and progress theorems, proved hash-driven
  • examples/test_issues.v β€” PCF + References (21-case preservation theorem)
  • examples/example.v β€” Simple examples for each MCP tool

The Future of Neurosymbolic Programming

This LSP-based architecture represents a paradigm shift:

From: "AI generates code, humans verify"
To: "AI and proof assistants collaborate in real-time"

From: "LLMs are black boxes"
To: "Every LLM decision has a formal proof"

From: "Formal verification is for experts"
To: "LLMs make formal methods accessible"

What's Already Possible

As demonstrated by the proofs in this repository:

  • βœ… Hash-driven proof construction: focus_proof β†’ get hashes β†’ insert_tactic(admit_hash=X) β†’ repeat until Qed
  • βœ… Multi-case induction proofs: 7 cases closed in 9 total MCP calls with full hypothesis context per case
  • βœ… Multi-bullet depth: same hash works across -/+/* bullet levels β€” close all matching admits in one call
  • βœ… Dependent type progress + preservation: Nat + Vec n language fully verified
  • βœ… Interactive refinement: speculative execution via try_step with immediate feedback

What's Coming

As LLMs improve and proof assistants expose richer APIs:

  • Automated formalization of mathematical papers
  • AI-assisted discovery of new theorems
  • Verified AI systems with formal safety guarantees
  • Natural language to formal proof translation
  • Large-scale formal verification of real-world software

Learn More

  • Specification: See MCP_COQ_LSP_SPEC.md
  • Implementation: See docs/IMPLEMENTATION.md
  • Quick Start: See docs/QUICKSTART.md
  • Project config detection: See docs/PROJECT-CONFIG-DETECTION.md
  • Rocq Documentation: https://rocq-prover.org/
  • rocq-lsp Project: https://github.com/ejgallego/coq-lsp

Contributing

This is a research prototype demonstrating neurosymbolic programming principles. Contributions welcome:

  • Add support for other proof assistants (Lean, Isabelle)
  • Improve tactic suggestion heuristics
  • Build benchmark suites for AI proof capabilities
  • Develop training datasets from verified code

License

MIT License - See LICENSE file for details


Built with: TypeScript, rocq-lsp, Model Context Protocol

Design: Content-addressed admits, hash-driven workflow, full hypothesis context per goal

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured