codebase-index
AST-aware codebase indexing with semantic search, exposed as an MCP server. Enables semantic search and file context retrieval across your codebase using natural language queries.
README
codebase-index
AST-aware codebase indexing with semantic search, exposed as an MCP server. Think Cursor's codebase awareness, but for Claude Code (or any MCP client).
What it does: Parses your codebase with tree-sitter, chunks it intelligently (functions, components, hooks, stores, configs, classes with method extraction, types), embeds locally with Ollama, stores in LanceDB, and exposes semantic search via MCP.
Quick Start
# Prerequisites
brew install ollama
ollama serve
ollama pull nomic-embed-text
# Clone the tool (one-time)
git clone https://github.com/LevelPanic/codebase-index.git ~/codebase-index
cd ~/codebase-index && npm install
# In your repo
cd ~/your-project
npx tsx ~/codebase-index/src/cli/index.ts init # generates config + wires .mcp.json
npx tsx ~/codebase-index/src/cli/index.ts index --full # build the index
The init command automatically:
- Creates
codebase-index.config.jsonwith detected file patterns - Adds
.codebase-index/to.gitignore - Wires the MCP server into
.mcp.jsonwith the correct path
Restart Claude Code and it has search_codebase and get_file_context tools.
How It Works
Your repo files
│
▼
tree-sitter AST parsing
│
▼
Smart chunks (functions, components, hooks, stores, configs, classes, types, Prisma models)
│
▼
Ollama embeddings (nomic-embed-text, local, free)
│
▼
LanceDB vector storage (just files on disk)
│
▼
MCP server (stdio) → search_codebase / get_file_context
- Chunking is AST-aware — not dumb line splits. Each function, component, type definition, and Prisma model is its own chunk.
- Context-enriched — function chunks include referenced type definitions inline, so embeddings capture the full picture.
- Smart truncation — large chunks keep signature + head + tail instead of cutting off at the bottom (preserves return statements and JSX output).
- Class method extraction — large classes are split into individual method chunks instead of one truncated blob.
- Chunk type detection — React hooks (
useXxx), Zustand/Redux stores, config objects, and barrel files are all detected and tagged. - Small type batching — tiny adjacent type aliases are merged into a single chunk to reduce embedding calls.
- Embeddings are local — Ollama runs on your machine. No API keys, no network, no cost.
- Storage is embedded — LanceDB is just a directory on disk. No database server.
- Branch-aware — when on a feature branch, search results for modified files return live content from disk.
Configuration
The config file codebase-index.config.json goes at your repo root. It's optional — the tool works with zero config on any TypeScript/JavaScript project.
{
"include": ["src/**/*.{ts,tsx,js,jsx}"],
"exclude": ["**/node_modules/**", "**/dist/**"],
"output": ".codebase-index",
"baseBranch": "main",
"embedding": {
"provider": "ollama",
"url": "http://localhost:11434",
"model": "nomic-embed-text"
},
"tags": []
}
Tags
Tags let you add custom metadata dimensions to your chunks. Each tag has rules that match against file paths or content, and the tag values become filterable in the MCP search tool.
{
"tags": [
{
"name": "layer",
"defaultValue": "other",
"rules": [
{ "pattern": "app/api/", "value": "backend" },
{ "pattern": "components/", "value": "frontend" },
{ "pattern": "lib/", "value": "shared" }
]
},
{
"name": "domain",
"defaultValue": "general",
"rules": [
{ "pattern": "billing", "value": "billing" },
{ "pattern": "auth", "value": "auth" },
{ "contentPattern": "Stripe|stripe", "value": "billing" }
]
}
]
}
With these tags configured, search_codebase automatically gets layer and domain as optional filter parameters.
Monorepo Example
{
"include": [
"apps/**/*.{ts,tsx,js,jsx}",
"packages/**/*.{ts,tsx,js,jsx}",
"apps/**/schema.prisma"
],
"tags": [
{
"name": "platform",
"defaultValue": "all",
"rules": [
{ "pattern": "facebook", "value": "meta" },
{ "pattern": "tiktok", "value": "tiktok" },
{ "pattern": "google-ads", "value": "google" }
]
},
{
"name": "app",
"defaultValue": "unknown",
"rules": [
{ "pattern": "apps/web/", "value": "web" },
{ "pattern": "apps/api/", "value": "api" },
{ "pattern": "packages/", "value": "packages" }
]
}
]
}
CLI
Run commands via npx tsx ~/codebase-index/src/cli/index.ts <command> from your repo directory. Or create a shell alias:
alias cbi="npx tsx ~/codebase-index/src/cli/index.ts"
cbi init [--force] Generate config + wire .mcp.json
cbi index Incremental index (changed files only)
cbi index --full Full reindex (drop and rebuild)
cbi stats Show index statistics
cbi serve Start MCP server (stdio)
Indexing
- Full index: Parses all files, embeds everything, rebuilds the database. ~15-20 min for ~5K files on Apple Silicon.
- Incremental index: Only re-embeds files changed since last indexed commit. ~10 seconds for typical daily changes.
- Stats: Shows chunk counts broken down by type and configured tags.
MCP Tools
search_codebase
Semantic search across the codebase. Returns relevant code chunks ranked by similarity.
| Parameter | Type | Description |
|---|---|---|
query |
string | Natural language search query |
limit |
number | Max results (default 10) |
<tag_name> |
enum | Filter by any configured tag |
get_file_context
Get all indexed chunks for a specific file. Shows the file's structure — functions, components, types.
| Parameter | Type | Description |
|---|---|---|
file_path |
string | Relative path from repo root |
Chunk Types
The chunker detects and labels each chunk:
| Type | What it captures |
|---|---|
function |
Functions and arrow functions |
component |
React components (JSX-returning functions in .tsx/.jsx) |
hook |
React hooks (useXxx naming convention) |
store |
Zustand/Redux stores (create(), *Store, *Slice) |
config |
Plain object/array literals (route maps, constants, configs) |
class |
Class overview with method listing |
method |
Individual methods extracted from large classes |
type |
Type aliases, interfaces, enums |
model |
Prisma model/enum/type blocks |
summary |
File-level overview (imports + export listing) |
Prerequisites
- Node.js 18+
- Git — the repo must be a git repository with at least one commit
- Ollama running locally with an embedding model (
ollama serve && ollama pull nomic-embed-text) - C++ compiler (for tree-sitter native module — Xcode CLI tools on macOS, build-essential on Linux)
How Freshness Works
The index tracks the base branch (default: main). When you're on a feature branch:
- Search results are returned from the index as normal
- If a result points to a file modified on your branch (
git diff main...HEAD) - The tool reads the live file from disk and re-parses it
- You get current content, not stale indexed content
This means the index only needs to track main — feature branch changes are always live.
Using in Multiple Repos
Clone once, use everywhere. Run init in each repo — it auto-detects the structure and wires everything up:
cd ~/project-a && npx tsx ~/codebase-index/src/cli/index.ts init
cd ~/project-b && npx tsx ~/codebase-index/src/cli/index.ts init
Each repo gets its own config, its own .codebase-index/ directory, and its own .mcp.json entry.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.