local_lense
A local RAG-powered documentation search system that uses vector embeddings and Qdrant to enable semantic search across markdown, HTML, and other file formats. It provides an MCP interface for AI tools like Cursor to intelligently query and retrieve information from local knowledge bases.
README
local_lense
A production-ready RAG (Retrieval-Augmented Generation) system that enables semantic search across local documentation using vector embeddings and similarity search. Built with TypeScript, this tool demonstrates modern AI integration patterns including vector databases, embedding generation, and MCP (Model Context Protocol) tooling.
Perfect for: Engineering teams needing intelligent documentation search, knowledge bases, or RAG system implementations.
What is local_lense?
local_lense is a RAG (Retrieval-Augmented Generation) powered documentation search tool that:
- Indexes your local documentation - Processes markdown, HTML, JSON, YAML, and text files to create a searchable vector index
- Semantic search - Uses vector embeddings to find relevant content based on meaning, not just keywords
- Cursor integration - Exposes search capabilities via MCP so Cursor AI can search your docs
- Fast and local - Everything runs locally with Qdrant vector database
- Extensible - Supports custom source processors for indexing content from web, databases, or other sources
How it works
local_lense uses a RAG (Retrieval-Augmented Generation) architecture:
-
Indexing Phase:
- Scans your configured documentation directory
- Splits documents into chunks
- Generates vector embeddings using transformer models
- Stores embeddings in Qdrant vector database
-
Search Phase:
- Takes a natural language query
- Generates an embedding for the query
- Searches Qdrant for similar document chunks
- Returns relevant sections with relevance scores
-
Refresh Mechanism:
- Uses a single "docs" collection that is dropped and re-indexed on initialization
- Simple and straightforward approach for reliable indexing
-
MCP Integration (Future):
- Exposes search as MCP tools
- Cursor AI can query your docs directly
- Seamless integration with your workflow
Prerequisites
- Node.js (v18 or higher)
- Docker and Docker Compose (for Qdrant vector database)
- TypeScript (installed as dev dependency)
Quick Start
1. Clone the repository
git clone <repository-url>
cd local_lense
2. Install dependencies
npm install
3. Start Qdrant vector database
docker-compose up -d
This starts a Qdrant container on localhost:6333. The data persists in a Docker volume.
4. Configure your documentation path
Edit configs.json:
{
"sourcePath": "~/Documents/my-docs",
"searchResultLimit": 3
}
sourcePath: Path to your documentation directory (supports~for home directory)searchResultLimit: Maximum number of search results to return
5. Build the project
npm run build
6. Run indexing and search
Currently, the tool runs as a test script. Edit src/main.ts to configure your search query, then:
npm run dev
Configuration
configs.json
The main user configuration file located in the project root:
-
sourcePath (string, required): Path to your documentation directory
- Important: Use full absolute paths - avoid using
~(tilde) for home directory expansion - Example: Use
"/Users/username/Documents/my-docs"instead of"~/Documents/my-docs" - Full paths ensure reliable operation across different contexts and environments
- Important: Use full absolute paths - avoid using
-
searchResultLimit (number, optional): Maximum number of results per search
- Default:
3
- Default:
-
keywordBoost (boolean, optional): Enable keyword-based score boosting to improve relevance with local embedding models
- Boosts scores when query keywords appear in document content or file paths
- Default:
true
-
keywordBoostWeight (number, optional): Controls the strength of keyword boosting (0.0 to 1.0)
- Higher values increase the boost effect
- Default:
0.2(20% boost weight)
Note: Collection management is handled automatically by the system. The system uses a single "docs" collection that is always dropped and re-indexed on initialization.
Docker Compose
The docker-compose.yaml file configures Qdrant:
- Port:
6333(Qdrant HTTP API) - Storage: Persistent volume
qdrant_storage - Health checks: Automatic container health monitoring
Supported File Types
The default FileSourceProcessor (see src/ragIndexer/implementations/fileSourceProcessor.ts) supports the following file types:
Fully Supported
- Markdown:
.md,.markdown - HTML:
.html,.htm - JSON:
.json - YAML:
.yaml,.yml - Text:
.txt,.text
Other Files
Files with unsupported extensions are processed as ContentType.OTHER. While they will be indexed, the content may not be optimally formatted for search.
The processor recursively scans directories and automatically detects file types based on their extensions. All supported files are read as UTF-8 text.
Custom Source Processors
local_lense uses a pluggable source processor architecture. While the default implementation processes local files, you can implement custom source processors to index content from other sources.
Implementing a Custom Processor
To create a custom source processor, implement the ISourceProcessor interface (see src/ragIndexer/types.ts):
import { ISourceProcessor, SourceItem } from './types';
export class MyCustomProcessor implements ISourceProcessor {
public get sourceItems(): ReadonlyArray<SourceItem> {
// Return processed source items
}
public process(): ReadonlyArray<SourceItem> {
// Fetch and process content from your source
// Return array of SourceItem objects with:
// - sourceLocation: identifier (file path, URL, etc.)
// - contentType: ContentType enum value
// - content: the actual content string
}
}
Example Use Cases for Custom Processors
- Web Scraping: Index content from websites or web APIs
- Database Sources: Query and index content from databases
- Cloud Storage: Index documents from Google Drive, Dropbox, etc.
- RSS Feeds: Index blog posts or news articles
- Git Repositories: Index code documentation from git repos
See src/ragIndexer/types.ts for the complete interface definition and type definitions.
Using in Cursor
Prerequisites: Before using local_lense in Cursor, ensure Docker is running and Qdrant is started:
# In your local_lense directory
docker-compose up -d
This starts the Qdrant vector database on localhost:6333, which the MCP tool requires.
Configuration: Add local_lense to your Cursor MCP server settings:
{
"mcpServers": {
"local_lense": {
"command": "node",
"args": ["/full/path/to/your/local_lense/build/index.js"],
"env": {}
}
}
}
Important Notes:
- Use the full absolute path to
build/index.jsin theargsfield - Avoid hyphens in repository/directory names (use underscores instead) due to Cursor MCP configuration parsing issues
- See "When local_lense Works Best" below for important usage limitations and recommendations
When local_lense Works Best
Reliable Usage
When documentation is indexed from paths OUTSIDE Cursor's working directory:
- Examples:
/Users/name/Documents/my-docs,/Users/name/notes, separate directory from codebase - Cursor will use the MCP tool because built-in tools can't access those paths
Limited Usage
When documentation is indexed from paths WITHIN Cursor's working directory:
- Cursor may use built-in
grepinstead of the MCP tool - This is an acceptable limitation - built-in tools will handle searches in the working directory.
- Workaround this by specifying more direct queries. e.g. "use 'search' from your registered mcp tools with query: {query}"
Best Practices
- Directory Placement: Index documentation from directories outside your project workspace for most reliable MCP tool usage
- Path Configuration: Always use full absolute paths in
configs.json(see Configuration section above) - Repository Naming: Use underscores instead of hyphens in directory names to avoid MCP path parsing issues
Example Use Cases
- Engineering Documentation: Search team wikis, architecture docs, API documentation
- Personal Knowledge Base: Index your notes, research, and personal documentation
- Project Documentation: Quick access to project-specific docs and guides
- Research Notes: Semantic search across research papers and notes
Architecture
┌─────────────────────────────────────────────────────────┐
│ RAG Pipeline │
└─────────────────────────────────────────────────────────┘
Indexing Flow:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Documents │ --> │ Chunking │ --> │ Embeddings │
│ (MD/HTML) │ │ Strategy │ │ Generation │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ Qdrant │
│ Vector Store │
└──────┬───────┘
│
Search Flow: │
┌──────────────┐ ┌──────────────┐ │
│ User Query │ --> │ Embed │ ----------┘
│ (Natural Lang)│ │ Query │
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ Similarity │
│ Search │
└──────┬───────┘
│
▼
┌──────────────┐
│ Ranked │
│ Results │
└──────────────┘
Troubleshooting
Qdrant connection errors
- Ensure Docker is running:
docker ps - Check Qdrant container:
docker-compose ps - Verify port 6333 is available:
curl http://localhost:6333/health
Path not found errors
- Verify
sourcePathinconfigs.jsonexists (see Configuration section for path requirements) - Check file permissions
- Ensure the path is accessible from the local_lense working directory
Empty search results
- Run indexing first:
await ragIndexer.init()inmain.ts - Verify documents were processed (check Qdrant dashboard at http://localhost:6333/dashboard)
- Ensure the "docs" collection exists and contains indexed documents
Build errors
- Ensure TypeScript is installed:
npm install - Check Node.js version:
node --version(should be v18+) - Clear build cache:
rm -rf build && npm run build
MCP tool not being used
- Symptom: Cursor uses grep instead of local_lense search tool
- Cause: Documentation path is within Cursor's working directory
- Solution: Move documentation to a separate directory or accept the limitation
Repository name with hyphens causes path truncation
- Symptom: MCP server path gets truncated (e.g.,
local-lensebecomeslocal) - Cause: Cursor's MCP server configuration has issues parsing paths containing hyphens
- Solution: Use underscores instead of hyphens in repository/directory names (e.g.,
local_lenseinstead oflocal-lense) - Note: This is a Cursor MCP configuration limitation, not a local_lense issue
Development
Project Structure
local_lense/
├── src/
│ ├── main.ts # Entry point (test script)
│ ├── services/ # Core services
│ │ ├── configService.ts # Configuration management
│ │ └── embedService.ts # Embedding generation
│ ├── ragIndexer/ # Indexing logic
│ │ ├── ragIndexer.ts
│ │ └── implementations/
│ │ └── fileSourceProcessor.ts
│ └── ragSearch/ # Search logic
│ ├── ragSearch.ts
│ └── implementations/
│ ├── qdrantVectorSearchService.ts
│ ├── qdrantVectorCollectionService.ts
│ └── qdrantVectorStorageService.ts
├── configs.json # Configuration file
├── docker-compose.yaml # Qdrant setup
└── package.json
Building
npm run build
Output goes to build/ directory.
Running Development Mode
npm run dev
Uses tsx to run TypeScript directly without building.
Roadmap
- [ ] MCP server implementation for Cursor integration
- [ ] Relevance score tuning and filtering
License
ISC
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.