local_lense

local_lense

A local RAG-powered documentation search system that uses vector embeddings and Qdrant to enable semantic search across markdown, HTML, and other file formats. It provides an MCP interface for AI tools like Cursor to intelligently query and retrieve information from local knowledge bases.

Category
Visit Server

README

local_lense

TypeScript Node.js Qdrant Docker

A production-ready RAG (Retrieval-Augmented Generation) system that enables semantic search across local documentation using vector embeddings and similarity search. Built with TypeScript, this tool demonstrates modern AI integration patterns including vector databases, embedding generation, and MCP (Model Context Protocol) tooling.

Perfect for: Engineering teams needing intelligent documentation search, knowledge bases, or RAG system implementations.

What is local_lense?

local_lense is a RAG (Retrieval-Augmented Generation) powered documentation search tool that:

  • Indexes your local documentation - Processes markdown, HTML, JSON, YAML, and text files to create a searchable vector index
  • Semantic search - Uses vector embeddings to find relevant content based on meaning, not just keywords
  • Cursor integration - Exposes search capabilities via MCP so Cursor AI can search your docs
  • Fast and local - Everything runs locally with Qdrant vector database
  • Extensible - Supports custom source processors for indexing content from web, databases, or other sources

How it works

local_lense uses a RAG (Retrieval-Augmented Generation) architecture:

  1. Indexing Phase:

    • Scans your configured documentation directory
    • Splits documents into chunks
    • Generates vector embeddings using transformer models
    • Stores embeddings in Qdrant vector database
  2. Search Phase:

    • Takes a natural language query
    • Generates an embedding for the query
    • Searches Qdrant for similar document chunks
    • Returns relevant sections with relevance scores
  3. Refresh Mechanism:

    • Uses a single "docs" collection that is dropped and re-indexed on initialization
    • Simple and straightforward approach for reliable indexing
  4. MCP Integration (Future):

    • Exposes search as MCP tools
    • Cursor AI can query your docs directly
    • Seamless integration with your workflow

Prerequisites

  • Node.js (v18 or higher)
  • Docker and Docker Compose (for Qdrant vector database)
  • TypeScript (installed as dev dependency)

Quick Start

1. Clone the repository

git clone <repository-url>
cd local_lense

2. Install dependencies

npm install

3. Start Qdrant vector database

docker-compose up -d

This starts a Qdrant container on localhost:6333. The data persists in a Docker volume.

4. Configure your documentation path

Edit configs.json:

{
  "sourcePath": "~/Documents/my-docs",
  "searchResultLimit": 3
}
  • sourcePath: Path to your documentation directory (supports ~ for home directory)
  • searchResultLimit: Maximum number of search results to return

5. Build the project

npm run build

6. Run indexing and search

Currently, the tool runs as a test script. Edit src/main.ts to configure your search query, then:

npm run dev

Configuration

configs.json

The main user configuration file located in the project root:

  • sourcePath (string, required): Path to your documentation directory

    • Important: Use full absolute paths - avoid using ~ (tilde) for home directory expansion
    • Example: Use "/Users/username/Documents/my-docs" instead of "~/Documents/my-docs"
    • Full paths ensure reliable operation across different contexts and environments
  • searchResultLimit (number, optional): Maximum number of results per search

    • Default: 3
  • keywordBoost (boolean, optional): Enable keyword-based score boosting to improve relevance with local embedding models

    • Boosts scores when query keywords appear in document content or file paths
    • Default: true
  • keywordBoostWeight (number, optional): Controls the strength of keyword boosting (0.0 to 1.0)

    • Higher values increase the boost effect
    • Default: 0.2 (20% boost weight)

Note: Collection management is handled automatically by the system. The system uses a single "docs" collection that is always dropped and re-indexed on initialization.

Docker Compose

The docker-compose.yaml file configures Qdrant:

  • Port: 6333 (Qdrant HTTP API)
  • Storage: Persistent volume qdrant_storage
  • Health checks: Automatic container health monitoring

Supported File Types

The default FileSourceProcessor (see src/ragIndexer/implementations/fileSourceProcessor.ts) supports the following file types:

Fully Supported

  • Markdown: .md, .markdown
  • HTML: .html, .htm
  • JSON: .json
  • YAML: .yaml, .yml
  • Text: .txt, .text

Other Files

Files with unsupported extensions are processed as ContentType.OTHER. While they will be indexed, the content may not be optimally formatted for search.

The processor recursively scans directories and automatically detects file types based on their extensions. All supported files are read as UTF-8 text.

Custom Source Processors

local_lense uses a pluggable source processor architecture. While the default implementation processes local files, you can implement custom source processors to index content from other sources.

Implementing a Custom Processor

To create a custom source processor, implement the ISourceProcessor interface (see src/ragIndexer/types.ts):

import { ISourceProcessor, SourceItem } from './types';

export class MyCustomProcessor implements ISourceProcessor {
    public get sourceItems(): ReadonlyArray<SourceItem> {
        // Return processed source items
    }

    public process(): ReadonlyArray<SourceItem> {
        // Fetch and process content from your source
        // Return array of SourceItem objects with:
        // - sourceLocation: identifier (file path, URL, etc.)
        // - contentType: ContentType enum value
        // - content: the actual content string
    }
}

Example Use Cases for Custom Processors

  • Web Scraping: Index content from websites or web APIs
  • Database Sources: Query and index content from databases
  • Cloud Storage: Index documents from Google Drive, Dropbox, etc.
  • RSS Feeds: Index blog posts or news articles
  • Git Repositories: Index code documentation from git repos

See src/ragIndexer/types.ts for the complete interface definition and type definitions.

Using in Cursor

Prerequisites: Before using local_lense in Cursor, ensure Docker is running and Qdrant is started:

# In your local_lense directory
docker-compose up -d

This starts the Qdrant vector database on localhost:6333, which the MCP tool requires.

Configuration: Add local_lense to your Cursor MCP server settings:

{
  "mcpServers": {
    "local_lense": {
      "command": "node",
      "args": ["/full/path/to/your/local_lense/build/index.js"],
      "env": {}
    }
  }
}

Important Notes:

  • Use the full absolute path to build/index.js in the args field
  • Avoid hyphens in repository/directory names (use underscores instead) due to Cursor MCP configuration parsing issues
  • See "When local_lense Works Best" below for important usage limitations and recommendations

When local_lense Works Best

Reliable Usage

When documentation is indexed from paths OUTSIDE Cursor's working directory:

  • Examples: /Users/name/Documents/my-docs, /Users/name/notes, separate directory from codebase
  • Cursor will use the MCP tool because built-in tools can't access those paths

Limited Usage

When documentation is indexed from paths WITHIN Cursor's working directory:

  • Cursor may use built-in grep instead of the MCP tool
  • This is an acceptable limitation - built-in tools will handle searches in the working directory.
  • Workaround this by specifying more direct queries. e.g. "use 'search' from your registered mcp tools with query: {query}"

Best Practices

  • Directory Placement: Index documentation from directories outside your project workspace for most reliable MCP tool usage
  • Path Configuration: Always use full absolute paths in configs.json (see Configuration section above)
  • Repository Naming: Use underscores instead of hyphens in directory names to avoid MCP path parsing issues

Example Use Cases

  • Engineering Documentation: Search team wikis, architecture docs, API documentation
  • Personal Knowledge Base: Index your notes, research, and personal documentation
  • Project Documentation: Quick access to project-specific docs and guides
  • Research Notes: Semantic search across research papers and notes

Architecture

┌─────────────────────────────────────────────────────────┐
│                    RAG Pipeline                         │
└─────────────────────────────────────────────────────────┘

Indexing Flow:
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Documents   │ --> │   Chunking   │ --> │  Embeddings  │
│  (MD/HTML)   │     │   Strategy   │     │  Generation  │
└──────────────┘     └──────────────┘     └──────┬───────┘
                                                   │
                                                   ▼
                                          ┌──────────────┐
                                          │   Qdrant     │
                                          │ Vector Store │
                                          └──────┬───────┘
                                                 │
Search Flow:                                      │
┌──────────────┐     ┌──────────────┐           │
│ User Query   │ --> │   Embed      │ ----------┘
│ (Natural Lang)│     │   Query      │
└──────────────┘     └──────────────┘
                            │
                            ▼
                     ┌──────────────┐
                     │ Similarity   │
                     │   Search     │
                     └──────┬───────┘
                            │
                            ▼
                     ┌──────────────┐
                     │  Ranked      │
                     │  Results     │
                     └──────────────┘

Troubleshooting

Qdrant connection errors

  • Ensure Docker is running: docker ps
  • Check Qdrant container: docker-compose ps
  • Verify port 6333 is available: curl http://localhost:6333/health

Path not found errors

  • Verify sourcePath in configs.json exists (see Configuration section for path requirements)
  • Check file permissions
  • Ensure the path is accessible from the local_lense working directory

Empty search results

  • Run indexing first: await ragIndexer.init() in main.ts
  • Verify documents were processed (check Qdrant dashboard at http://localhost:6333/dashboard)
  • Ensure the "docs" collection exists and contains indexed documents

Build errors

  • Ensure TypeScript is installed: npm install
  • Check Node.js version: node --version (should be v18+)
  • Clear build cache: rm -rf build && npm run build

MCP tool not being used

  • Symptom: Cursor uses grep instead of local_lense search tool
  • Cause: Documentation path is within Cursor's working directory
  • Solution: Move documentation to a separate directory or accept the limitation

Repository name with hyphens causes path truncation

  • Symptom: MCP server path gets truncated (e.g., local-lense becomes local)
  • Cause: Cursor's MCP server configuration has issues parsing paths containing hyphens
  • Solution: Use underscores instead of hyphens in repository/directory names (e.g., local_lense instead of local-lense)
  • Note: This is a Cursor MCP configuration limitation, not a local_lense issue

Development

Project Structure

local_lense/
├── src/
│   ├── main.ts                    # Entry point (test script)
│   ├── services/                  # Core services
│   │   ├── configService.ts       # Configuration management
│   │   └── embedService.ts       # Embedding generation
│   ├── ragIndexer/                # Indexing logic
│   │   ├── ragIndexer.ts
│   │   └── implementations/
│   │       └── fileSourceProcessor.ts
│   └── ragSearch/                 # Search logic
│       ├── ragSearch.ts
│       └── implementations/
│           ├── qdrantVectorSearchService.ts
│           ├── qdrantVectorCollectionService.ts
│           └── qdrantVectorStorageService.ts
├── configs.json                   # Configuration file
├── docker-compose.yaml            # Qdrant setup
└── package.json

Building

npm run build

Output goes to build/ directory.

Running Development Mode

npm run dev

Uses tsx to run TypeScript directly without building.

Roadmap

  • [ ] MCP server implementation for Cursor integration
  • [ ] Relevance score tuning and filtering

License

ISC

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured