rag
A CLI tool and MCP server that turns markdown documentation into a searchable, queryable knowledge base.
README
rag
rag is a CLI tool and MCP server that turns markdown documentation into a searchable, queryable knowledge base.
It chunks .md files by heading, embeds them via Ollama, stores vectors in LanceDB, and exposes search + RAG through both a terminal CLI and MCP.
Prerequisites
Minimum hardware
| Component | Requirement |
|---|---|
| RAM | 4 GB (8 GB for larger doc sets) |
| CPU | Any x86-64 or ARM64, 2+ cores |
| GPU | Optional. Any NVIDIA GPU with 2+ GB VRAM. CPU-only fallback is functional but slower |
| Disk | 100 MB for index (scales with doc count) |
Indexing 5000 chunks: ~25s on RTX 3060, ~3min on CPU-only.
Install
git clone https://github.com/FrameMuse/llm-rag.git
cd llm-rag
bun install
Add shell alias:
alias rag='bun /path/to/llm-rag/scripts/cli.ts'
Quick start
cd my-docs-project
rag init # create .rag/ project scope
rag index # chunk, embed, index all .md files
rag mcp search "..." # semantic search
rag mcp query "..." # RAG: synthesize answer from docs
Commands
| Command | Description |
|---|---|
rag init |
Create .rag/ config, mcp.json, .gitignore |
rag index |
Chunk files by heading, embed via Ollama, store in LanceDB |
rag serve |
Start MCP server (STDIO) for current .rag/ scope |
rag mcp <tool> |
One-shot CLI proxy for MCP tools |
rag info |
Show index statistics |
rag help |
Show usage |
rag mcp tools
| Tool | Usage | Description |
|---|---|---|
search |
rag mcp search "query" [--limit N] |
Semantic vector search |
query |
rag mcp query "question" |
RAG: retrieve chunks, synthesize answer |
list-documents |
rag mcp list-documents |
List all indexed files |
get-document |
rag mcp get-document <path> |
Show full document content |
config |
rag mcp config |
Print mcp.json for opencode.json adoption |
Project scope (.rag/)
project/
├── .rag/
│ ├── config.json # { name, embedModel, ragModel, pattern }
│ ├── mcp.json # MCP config snippet for opencode.json
│ ├── .gitignore # *
│ └── data/lancedb/ # Vector index (generated by rag index)
├── *.md
└── ...
Each project keeps its index local. rag discovers .rag/ by walking up from current directory (like git).
MCP integration
Register in opencode.json:
{
"mcp": {
"my-docs": {
"type": "local",
"command": ["rag", "serve"],
"cwd": "/path/to/project",
"enabled": true
}
}
}
Run rag mcp config from project directory to print the snippet with cwd pre-filled.
Architecture
flowchart LR
MD[.md files] --> Chunker
Chunker -->|heading split| Chunks
Chunks -->|Ollama embed| Vectors
Vectors -->|store| LanceDB
Query -->|embed| LanceDB
LanceDB -->|search| Results
Question -->|embed + search| Context
Context -->|Ollama chat| Answer
- Chunker: splits by
##/###headings, preserves heading hierarchy, merges tiny sections - Embedder: Ollama
/api/embedin batches of 20, truncates to 500 tokens per chunk - Store: LanceDB embedded vector database (no external server)
- RAG: retrieve top 8 chunks, build context prompt, call Ollama chat for synthesis
Configuration
.rag/config.json:
{
"name": "my-docs",
"embedModel": "nomic-embed-text",
"ragModel": "llama3.2:3b",
"pattern": "*.md"
}
Models auto-pull if missing. Override via rag init or edit config.json directly.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.