OmniWeave
A high-performance, polyglot code-analysis graph for coding agents that enables cross-language and cross-process code relationship queries in sub-millisecond time.
README
<div align="center">
OmniWeave
A high-performance, polyglot code-analysis graph for coding agents
Weave a whole repository — across languages, across processes — into one navigable graph.
The relationships that matter most to an agent are exactly the ones a language server and grep can't follow: a Python orchestrator that shells out to an R script, a Snakemake rule that runs an external binary, an S4 method dispatched at runtime. OmniWeave makes those hops first-class, typed, traversable edges — answered in a single sub-millisecond query.
</div>
Why OmniWeave
A coding agent already has grep and an LSP. OmniWeave earns its place by winning exactly where they stop:
- Cross-language. An LSP is scoped to one language. OmniWeave links a
.smkrule, a.nfprocess, or a plain Python function to the R / Python / Perl script it runs across the process boundary — and on into that script's functions and methods. - Cross-process. A subprocess argument is opaque to a language server. OmniWeave resolves
subprocess.run([...]),os.system(...),child_process.*,exec.Command, and workflow directives into a real edge you can traverse both ways. - Dynamic dispatch. Runtime dispatch is invisible to static call graphs. OmniWeave models R's S4
setGeneric/setMethodas a dispatch graph (class → method → generic) and routes a bare generic call to the right entry point. - Token economy. One typed, traversable answer instead of a dozen
greppasses the agent has to re-parse. The graph is built by relationship, not padded by language count.
Honest by construction. Every inferred edge carries a
provenanceand aconfidence. What can't be known statically — a runtime-built path, NSE, runtime dispatch — is skipped, never guessed. The agent is never handed a fabricated edge it might trust.
Does an agent actually do better with OmniWeave?
Not a claim — a measurement. An A/B benchmark across 3 rounds, 8 real repositories, 24 headless runs (Claude Sonnet, identical prompts). The only variable is whether OmniWeave's MCP graph is attached; both arms keep the same built-in grep / read / bash. Tool-calls are the reliable effort signal (token cost is prompt-cache-sensitive); both are reported.
| Query · repo | Correct? | Tool calls (with / without) | Cost |
|---|---|---|---|
| Single-point lookup · small repos (≤ 450 files) | tie | 17 / 31 (−45%) | ≈ tie |
| Reverse / multi-hop · small repos | tie | 16 / 34 (−53%) | −16% |
| Reverse blast-radius · django (3,005 files) | tie | 2 / 31 (−94%) | −64% |
| Reverse blast-radius · vscode (11,538 files) | tie | 2 / 47 (−96%) | −76% |
On vscode, the plain grep / read agent reached the same correct answer — but spent 47 tool calls, 1.13 M input tokens, and ~6 minutes brute-force-reading files to map every call site back to its enclosing function. With OmniWeave: 2 calls, 95 K tokens, 77 seconds — one structural query instead of a file-by-file sweep. The bigger the repo, the more grep's read budget explodes; OmniWeave stays O(1).
What this honestly shows. Correctness was a tie in every tier above: on greppable, uniquely-named queries, a thorough grep/read agent stays complete even at 11.5 K-file scale. OmniWeave's edge is effort, tokens, latency, and cost — and it widens with scale, not exclusive correctness. Correctness only diverges on the structurally-ungreppable query — a runtime-dispatch target behind an ambiguous name, a cross-process bridge, a transitive blast radius — which is precisely the terrain OmniWeave's typed edges are built for. (Small sample; the A/B harness is included under scripts/agent-eval/ and is fully reproducible.)
Performance
Performance is a design constraint here, not an afterthought.
| Reads | Sub-millisecond. The graph is a local SQLite database (node:sqlite, WAL) — reads never block the writer. |
| Indexing | ~100 files in under 350 ms on real repositories. A pool of WebAssembly tree-sitter workers parses in parallel and recycles memory on a fixed cadence so long runs stay flat. |
| Footprint | 100% local. No daemon to babysit, no cloud round-trip, no embeddings service. The index lives next to your code and stays fresh through an incremental file watcher. |
| Hot paths | Audited for worst-case behavior. The script-path scanner that runs on every source file at index time is provably linear — a deliberately-crafted adversarial input that took 97 seconds under a naive regex resolves in 0.1 ms here. |
| Degradation | Bounded everywhere it matters: parse timeouts, per-function fan-out caps, worker recycling, and a 2-second-debounced watcher with a staleness banner instead of a silent stale read. |
Capabilities
1. Cross-language / cross-process edges (crossLang)
From any indexed file — Python, JavaScript/TypeScript, Go, or a workflow rule — OmniWeave follows a shell-out to the local script it runs:
def run_analysis(counts, out):
subprocess.run(["Rscript", "scripts/deseq.R", counts, out]) # → crossLang → scripts/deseq.R
callees(run_analysis) → scripts/deseq.R # the R script it runs
callers(scripts/deseq.R) → run_analysis # every site that runs it
It handles the idioms real code actually uses — array and flat-string forms, the f"{sys.path[0]}/tool.py" "this-directory" dispatcher pattern, top-level __main__ entry points — and it rejects the ones it can't resolve (interpolated basenames, variable paths, an interpreter that's merely echo'd).
2. Multiple-dispatch semantic graph
R's S4 object system dispatches at runtime. OmniWeave makes the static skeleton navigable: setMethod becomes a method node wired to its class (contains) and its generic (overrides), and a bare dispersions(x) call routes to the generic — with the concrete dispatch targets one hop away along the dispatch graph. The pattern generalizes to any multiple-dispatch or virtual-method language.
3. Workflow data-flow DAG
Snakemake and Nextflow pipelines become a graph: each rule/process is a step, its input:/output: files are shared artifact nodes, and a producer and consumer that name the same path land on the same node — so the pipeline DAG is navigable with the standard callers/callees tools.
4. External-tool graph (invokes)
A pipeline step that runs an external binary (bwa, samtools, STAR) gets an edge to a shared tool node:
callers(STAR) → star_index, star_align, … # every step in the pipeline that runs STAR
This is the cross-process hop no language server can follow and that local-script analysis doesn't cover.
Use it from an agent
OmniWeave is MCP-native. Point your agent at it and it gains a code-intelligence toolset:
The four core tools — explore, node, search, callers — are exposed by default; the rest are opt-in via the OMNIWEAVE_MCP_TOOLS allowlist (fewer tools = fewer mis-picks).
| Tool | Answers |
|---|---|
explore |
"How does X work / survey this area / trace this flow?" — the primary tool: one capped call returns the relevant symbols' source grouped by file and rides the polyglot edges (dispatch, cross-process, workflow) where callers and an LSP stop |
search |
"What is the symbol named X?" (just kind + location + signature) |
callers / callees |
"What calls this?" / "What does this call?" — every call site with file:line, including cross-language and cross-process hops and callback registrations |
node |
"Show me this symbol's (or file's) source + its caller/callee trail and blast radius" — a drop-in for Read on indexed files |
impact |
"What would changing this break?" |
files / status |
directory listing · index health |
omniweave serve --mcp # stdio MCP server
omniweave init -i # index the current repo
omniweave callers <symbol> # or query directly from the CLI
Quick start
git clone https://github.com/SolvingLab/OmniWeave.git
cd OmniWeave
npm install && npm run build # tsc + vendored tree-sitter WASM (Node ≥ 22.5 for node:sqlite)
node dist/bin/omniweave.js init -i
node dist/bin/omniweave.js serve --mcp
Engineering
- Hand-written extractors, no
.scm. Each language is a focused TypeScript walker — adding a language or a relationship is a small, testable change, not a grammar rewrite. - Eval-gated. A recall/precision harness with edge, reachability, and negative assertions guards every capability — red before the feature, green after, with teeth that fail if a target regresses. 1490 unit tests, 25 evaluation gates, zero known false positives across six real repositories.
- A §1.5 benchmark (
npm run benchmark) measures, honestly, the bounded class of queries where the graph wins, ties, or loses againstgrep/LSP — including the ones it loses.
extraction (WASM tree-sitter workers)
→ graph (node:sqlite + FTS5)
→ resolution (name + import + framework resolvers, dispatch & cross-language synthesizers)
→ MCP server
Scope
OmniWeave is a general code-analysis graph. Bioinformatics — R/S4, Snakemake/Nextflow, mixed tool-and-data pipelines — is its proving ground precisely because it is the hardest polyglot, cross-process terrain there is: general engine, proven on the hardest domain.
License & acknowledgments
MIT — see LICENSE. OmniWeave builds on the foundation of the open-source codegraph project (MIT); the extraction/graph/MCP core is inherited, and the cross-language, cross-process, dispatch, workflow, and tool layers are OmniWeave's own.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.