codetex-mcp

codetex-mcp

A commit-aware code context manager for LLMs that indexes Git repositories into a multi-tier knowledge hierarchy (repo overviews, file summaries, symbol details) with SQLite vector search, serving context via the Model Context Protocol.

Category
Visit Server

README

codetex-mcp

A commit-aware code context manager for LLMs. Indexes Git repositories into a multi-tier knowledge hierarchy — repo overviews, file summaries, and symbol details — stored in SQLite with vector search. Serves context to LLM clients via the Model Context Protocol (MCP) or a local CLI.

What It Does

codetex builds a structured, searchable index of your codebase that LLMs can query on demand:

  • Tier 1 — Repo Overview: Purpose, architecture, directory structure, key technologies, entry points
  • Tier 2 — File Summaries: Per-file purpose, public interfaces, dependencies, roles
  • Tier 3 — Symbol Details: Function/class signatures, parameters, return types, call relationships

Summaries are generated by an LLM (Anthropic Claude). Embeddings are computed locally with sentence-transformers for semantic search. Everything is stored in a single SQLite database with sqlite-vec for vector queries.

Incremental sync means only changed files are re-analyzed when you update your code.

Requirements

Installation

# With pip
pip install codetex-mcp

# With uv (recommended)
uv tool install codetex-mcp

Quick Start

1. Set your Anthropic API key

# Via environment variable
export ANTHROPIC_API_KEY=sk-ant-...

# Or via config
codetex config set llm.api_key sk-ant-...

2. Add a repository

# Local repo
codetex add /path/to/your/project

# Remote repo (clones to ~/.codetex/repos/)
codetex add https://github.com/user/repo.git

3. Index it

# Preview what indexing will cost (no API calls)
codetex index my-project --dry-run

# Build the full index
codetex index my-project

4. Query your codebase

# Repo overview (Tier 1)
codetex context my-project

# File summary (Tier 2)
codetex context my-project --file src/auth/login.py

# Symbol detail (Tier 3)
codetex context my-project --symbol authenticate_user

# Semantic search
codetex context my-project --query "how is authentication implemented?"

5. Keep it up to date

# Incremental sync — only re-analyzes changed files
codetex sync my-project

MCP Server Setup

The MCP server lets LLM clients (like Claude Code, Cursor, Windsurf, etc.) query your indexed codebases directly.

Claude Code

Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "codetex": {
      "command": "codetex",
      "args": ["serve"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

If you installed with uv tool, use the full path:

{
  "mcpServers": {
    "codetex": {
      "command": "/path/to/codetex",
      "args": ["serve"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Find the path with which codetex or uv tool dir.

Other MCP Clients

Any client that supports MCP stdio transport can use codetex. The server command is:

codetex serve

Available MCP Tools

Once connected, the LLM has access to 7 tools:

Tool Description
get_repo_overview Tier 1 repo overview (architecture, technologies, entry points)
get_file_context Tier 2 file summary with symbol list
get_symbol_detail Tier 3 full symbol detail (signature, params, relationships)
search_context Semantic search across all indexed context
get_repo_status Index status (staleness, file/symbol counts, last indexed)
sync_repo Trigger incremental sync from within the LLM session
list_repos List all registered repositories

CLI Reference

codetex add <target>

Register a git repository. Accepts a local path or remote URL.

codetex add .                                    # Current directory
codetex add /path/to/repo                        # Local path
codetex add https://github.com/user/repo.git     # Remote (clones locally)
codetex add git@github.com:user/repo.git         # SSH remote

codetex index <repo-name>

Build a full index for a registered repository.

codetex index my-project                # Full index
codetex index my-project --dry-run      # Preview (files, symbols, estimated LLM calls/tokens)
codetex index my-project --path src/    # Index only files under src/

codetex sync <repo-name>

Incremental sync to the current HEAD. Only files changed since the last indexed commit are re-analyzed.

codetex sync my-project                 # Sync changes
codetex sync my-project --dry-run       # Preview what would change
codetex sync my-project --path src/     # Sync only changes under src/

codetex context <repo-name>

Query indexed context at any tier.

codetex context my-project                              # Tier 1: repo overview
codetex context my-project --file src/main.py           # Tier 2: file summary
codetex context my-project --symbol MyClass             # Tier 3: symbol detail
codetex context my-project --query "error handling"     # Semantic search

codetex status <repo-name>

Show index status: indexed commit, current HEAD, staleness, file/symbol counts, token usage.

codetex list

List all registered repositories with their index status.

codetex config show

Display the current configuration.

codetex config set <key> <value>

Update a configuration value.

codetex config set llm.api_key sk-ant-...
codetex config set llm.model claude-sonnet-4-5-20250929
codetex config set indexing.max_file_size_kb 1024
codetex config set indexing.max_concurrent_llm_calls 10

Configuration

Configuration is loaded in layers (last wins):

  1. Defaults — sensible out-of-the-box values
  2. TOML file~/.codetex/config.toml
  3. Environment variables — override everything

Config file

# ~/.codetex/config.toml

[storage]
data_dir = "~/.codetex"                  # Base directory for DB and cloned repos

[llm]
provider = "anthropic"                   # LLM provider (currently: anthropic)
model = "claude-sonnet-4-5-20250929"     # Model used for summarization
api_key = "sk-ant-..."                   # Anthropic API key

[indexing]
max_file_size_kb = 512                   # Skip files larger than this
max_concurrent_llm_calls = 5             # Parallel LLM requests during indexing
tier1_rebuild_threshold = 0.10           # Rebuild repo overview if >=10% of files changed on sync

[embedding]
model = "all-MiniLM-L6-v2"              # Sentence-transformers model for embeddings

Environment variables

Variable Maps to Example
ANTHROPIC_API_KEY llm.api_key sk-ant-...
CODETEX_DATA_DIR storage.data_dir /custom/path
CODETEX_LLM_PROVIDER llm.provider anthropic
CODETEX_LLM_MODEL llm.model claude-sonnet-4-5-20250929
CODETEX_MAX_FILE_SIZE_KB indexing.max_file_size_kb 1024
CODETEX_MAX_CONCURRENT_LLM indexing.max_concurrent_llm_calls 10
CODETEX_TIER1_THRESHOLD indexing.tier1_rebuild_threshold 0.15
CODETEX_EMBEDDING_MODEL embedding.model all-MiniLM-L6-v2

File Exclusion

Files are filtered through multiple stages:

  1. Default excludesnode_modules/, __pycache__/, .git/, dist/, build/, .venv/, *.lock, *.min.js, *.pyc, *.so, etc.
  2. .gitignore — standard gitignore rules from your repo
  3. .codetexignore — same syntax as .gitignore, placed in your repo root. Use !pattern to un-ignore files
  4. File size — files exceeding max_file_size_kb are skipped
  5. Binary detection — files with null bytes in the first 8 KB are skipped

Language Support

Language Tree-sitter (full AST) Fallback (regex)
Python Yes Yes
JavaScript Yes Yes
TypeScript Yes Yes
Go Yes Yes
Rust Yes Yes
Java Yes Yes
Ruby Yes Yes
C/C++ Yes Yes
All others Yes

Tree-sitter grammars for all 8 languages are installed automatically. For other languages, the fallback parser uses regex patterns to extract functions, classes, and imports.

Architecture

CLI (Typer) ──┐
              ├──▶ Core Services (Indexer, Syncer, ContextStore, SearchEngine)
MCP (FastMCP)─┘         │              │              │
                    Analysis        LLM Provider    Embeddings
                 (tree-sitter +    (Anthropic)    (sentence-transformers)
                  regex fallback)       │              │
                         └──────────────┴──────────────┘
                                        │
                                   SQLite + sqlite-vec
  • Two entry points (CLI and MCP server) share the same core service layer
  • No DI framework — services are wired via a create_app() factory
  • All core services are async — CLI bridges with asyncio.run()
  • Embeddings are local — no external API calls for vector search (model auto-downloads on first run, ~90 MB)
  • Single SQLite database — 6 main tables + 2 vector tables (384-dimensional embeddings)

Development

git clone https://github.com/mrosata/codetex-mcp.git
cd codetex-mcp

# Install dependencies (including dev)
uv sync

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=codetex_mcp

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/

Releasing

Releases are automated via GitHub Actions and python-semantic-release. Version bumps are driven by conventional commit messages on main.

Commit message format

Prefix Effect Example
fix: ... Patch bump (0.1.0 → 0.1.1) fix: handle missing gitignore
feat: ... Minor bump (0.1.0 → 0.2.0) feat: add Ruby tree-sitter support
feat!: ... Major bump (0.1.0 → 1.0.0) feat!: redesign context API
docs:, chore:, ci:, test:, refactor: No release docs: update README

A BREAKING CHANGE: line in the commit body also triggers a major bump.

How it works

  1. Push or merge a PR to main
  2. CI runs lint, type check, and tests
  3. The release workflow analyzes commits since the last tag
  4. If a version bump is needed, it:
    • Updates the version in pyproject.toml
    • Creates a git tag (e.g., v0.2.0)
    • Publishes a GitHub Release with a changelog
    • Builds and publishes the package to PyPI

Manual release (not recommended)

If you need to release without the automation:

uv build
uv publish

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured