ChromaDB Local MCP Server

ChromaDB Local MCP Server

Provides AI assistants with persistent memory through local ChromaDB vector storage, featuring automated file ingestion and batch processing for over 70 file types. It enables advanced vector search, EXIF metadata extraction for photos, and duplicate file detection across local directories.

Category
Visit Server

README

ChromaDB MCP Server 🧠

A Model Context Protocol (MCP) server that gives AI assistants persistent memory through ChromaDB vector storage. Now with EXIF extraction, Watch Folders, and Duplicate Detection - the ultimate tool for creators!

MCP ChromaDB Bun Version

✨ Features

Core

  • Persistent AI Memory: Your AI assistant remembers past conversations and solutions
  • Vector Search: Find similar code patterns, configurations, and documentation instantly
  • Local First: Run everything on your own hardware, no cloud dependencies

šŸš€ Batch Processing

  • Fast Batch Ingest: Process entire directories in seconds (500+ files)
  • 77 File Types: Photos, CAD, documents, data files, code
  • Quick Load/Unload: Temporary collections for rapid workflows
  • Export/Import: Backup and transfer collections as JSON

šŸ“ø Photo Features (NEW in v3.0)

  • EXIF Extraction: Camera, lens, exposure, GPS location, date taken
  • Search by Camera: "Find photos shot with my Canon 5D"
  • Search by Location: GPS coordinates embedded and searchable
  • Search by Date: "Find photos from vacation 2024"

šŸ‘ļø Watch Folders (NEW in v3.0)

  • Auto-Ingest: Drop files in watched folders, auto-add to ChromaDB
  • Hands-Free: Perfect for incoming photo dumps, downloads
  • Filter by Type: Watch only for specific file types

šŸ” Duplicate Detection (NEW in v3.0)

  • Find Duplicates: Hash-based detection across directories
  • Reclaim Space: See exactly how much space duplicates waste
  • Compare Files: Check if two files are identical
  • Perceptual Hashing: Find similar (not just identical) images

šŸš€ Quick Start

Prerequisites

Installation

  1. Clone the repository

    git clone https://github.com//vespo92/chromadblocal-mcp-server.git
    cd chromadb-mcp-server
    
  2. Install dependencies

    bun install
    
  3. Start ChromaDB

    docker run -d \
      --name chromadb-local \
      -p 8001:8000 \
      -v ~/chromadb-data:/chroma/chroma \
      -e IS_PERSISTENT=TRUE \
      chromadb/chroma:latest
    
  4. Initialize collections

    bun run setup
    
  5. Configure Claude Desktop

    Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

    {
      "mcpServers": {
        "chromadb-context": {
          "command": "bun",
          "args": ["run", "/path/to/chromadb-mcp-server/index.js"],
          "env": {
            "CHROMADB_URL": "http://localhost:8001"
          }
        }
      }
    }
    
  6. Restart Claude Desktop and start building your knowledge base!

šŸ’¬ Usage Examples

Once configured, interact naturally with your AI:

Store Knowledge

  • "Store this Docker configuration in ChromaDB for future reference"
  • "Save this React component pattern with tags: hooks, authentication"
  • "Remember this solution for GPU passthrough issues"

Retrieve Information

  • "Search ChromaDB for Python async examples"
  • "Find similar component patterns to this one"
  • "What solutions do we have for Docker networking issues?"

Build Context

  • "Add this API documentation to the project_docs collection"
  • "Store these test patterns for our testing suite"

šŸš€ Batch File Processing

The killer feature! Process massive amounts of files instantly for AI-powered search and retrieval.

Quick Load Workflow (Fastest)

Perfect for "load, process, discard" workflows:

You: "Quick load my photos from /home/photos/vacation2024"
AI: Creates temp collection, ingests 500 photos in seconds
You: "Find photos with mountains or beaches"
AI: Returns matching photos with metadata
You: "Unload the collection"
AI: Cleans up, frees memory

Supported File Types

Category Extensions Metadata Extracted
Images .jpg, .jpeg, .png, .heic, .raw, .cr2, .nef, .arw, .tiff, .gif, .webp Dimensions, size, format
CAD .stl, .obj, .dxf, .dwg, .step, .iges, .fbx, .blend, .skp, .scad Vertices, faces, format
Documents .pdf, .txt, .md, .doc, .docx, .rtf Full text content
Data .json, .yaml, .xml, .csv, .toml, .ini Parsed content
Code .js, .ts, .py, .go, .rs, .java, .cpp, .c, .php, .rb + 20 more Full source code

Batch Processing Examples

"Scan /projects/cad-files to see what's there"
"Batch ingest all STL files from /3d-prints into the 'print_library' collection"
"Quick load my Downloads folder, find anything mentioning 'invoice'"
"Export the photo_archive collection to backup.json"
"Import backup.json into a new collection called 'restored_photos'"

Processing Speed

  • Quick Load: ~200 files in 2-3 seconds
  • Batch Ingest: ~500 files in 5-10 seconds (with full metadata)
  • Concurrent Processing: 10-20 parallel file operations
  • No external dependencies: Pure JavaScript/Bun processing

šŸ“š Available Collections

Collection Description Use Case
home_automation Smart home configs & automations Home Assistant, IoT scripts
code_snippets Reusable code patterns Functions, hooks, utilities
configurations System & app configs Docker, Kubernetes, services
troubleshooting Problem solutions Fixes, workarounds, debugging
project_docs Project documentation APIs, architecture, guides
learning_notes Learning insights Tutorials, concepts, notes

šŸ› ļø MCP Tools

search_context

Search for relevant information across collections

Parameters:
- query: Search query
- collection: (optional) Specific collection to search
- limit: (optional) Number of results

store_context

Store new information with metadata

Parameters:
- content: The content to store
- metadata: Tags, categories, descriptions
- collection: Target collection

list_collections

List all available collections and their metadata

find_similar_patterns

Find code patterns similar to provided example

Batch Processing Tools

scan_directory

Preview files in a directory before ingesting

Parameters:
- path: Directory to scan
- categories: Filter by type (images, cad, documents, data, code)
- extensions: Filter by extension (.jpg, .stl, etc.)
- recursive: Include subdirectories (default: true)

batch_ingest

Bulk ingest files into ChromaDB with full metadata

Parameters:
- path: Source directory
- collection: Target collection name
- categories: File types to include
- max_files: Limit number of files

quick_load

šŸš€ FAST: Rapidly load files for temporary processing

Parameters:
- path: Directory to load
- name: Collection name (auto-generated if omitted)
- categories: File types to include

unload_collection

Delete a collection (cleanup after quick_load)

Parameters:
- collection: Name of collection to delete

export_collection

Export collection to JSON file

Parameters:
- collection: Collection to export
- output_path: File path for JSON output

import_collection

Import collection from JSON file

Parameters:
- input_path: JSON file to import
- collection: Override collection name
- overwrite: Delete existing first (default: false)

get_collection_info

Get detailed stats about a collection

Parameters:
- collection: Collection name

ingest_file

Ingest a single file with metadata extraction

Parameters:
- path: File to ingest
- collection: Target collection

list_file_types

Show all supported file extensions

EXIF & Photo Tools

extract_exif

Extract detailed EXIF metadata from photos

Parameters:
- path: Path to JPEG or TIFF image
Returns: Camera, lens, exposure, GPS, date taken

Watch Folder Tools

watch_folder

Start auto-ingesting new files from a folder

Parameters:
- path: Folder to watch
- collection: Target collection (default: auto_ingest)
- categories: File types to watch
- include_exif: Extract EXIF from photos (default: true)

stop_watch

Stop watching a folder

Parameters:
- path: Folder to stop watching

list_watchers

List all active folder watchers

Duplicate Detection Tools

find_duplicates

Scan directory for duplicate files

Parameters:
- path: Directory to scan
- hash_method: "partial" (fast), "full" (thorough), "perceptual" (images)
- categories: File types to check
Returns: Duplicate groups with wasted space info

compare_files

Check if two files are duplicates

Parameters:
- file1: First file path
- file2: Second file path

find_collection_duplicates

Find duplicate entries in a ChromaDB collection

Parameters:
- collection: Collection name

šŸ”§ Configuration

Environment Variables

CHROMADB_URL=http://localhost:8001  # ChromaDB server URL

Custom Collections

Add new collections in setup-home-collections.js:

await createCollection('ml_experiments', {
  description: 'Machine learning experiments and results'
});

šŸ“¦ Project Structure

chromadb-mcp-server/
ā”œā”€ā”€ index.js                    # MCP server with 22 tools
ā”œā”€ā”€ batch-processor.js          # Fast batch file processing engine
ā”œā”€ā”€ exif-extractor.js           # EXIF metadata extraction for photos
ā”œā”€ā”€ watch-folder.js             # Auto-ingest watch folder system
ā”œā”€ā”€ duplicate-detector.js       # Duplicate file detection
ā”œā”€ā”€ setup-home-collections.js   # Collection initialization
ā”œā”€ā”€ test-chromadb.js           # Connection test script
ā”œā”€ā”€ test-mcp.js                # MCP functionality test
ā”œā”€ā”€ test-batch-processor.js    # Batch processing tests
ā”œā”€ā”€ HOME-AI-SETUP.md           # Detailed setup guide
ā”œā”€ā”€ package.json               # Project dependencies
└── README.md                  # This file

šŸ¤ Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

See CONTRIBUTING.md for more details.

šŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

šŸ™ Acknowledgments

  • Anthropic for the MCP specification
  • Chroma for the excellent vector database
  • The open-source community for inspiration and support

šŸš€ What's Next?

  • āœ… Export/import collections DONE!
  • āœ… Batch file processing DONE!
  • āœ… EXIF metadata extraction DONE in v3.0!
  • āœ… Watch folders / auto-ingest DONE in v3.0!
  • āœ… Duplicate detection DONE in v3.0!
  • Cloud sync capabilities
  • Multi-user support
  • Web UI for collection management
  • AI-powered image descriptions (what's in the photo)
  • 3D print analysis (volume, time estimates)

Built with ā¤ļø for the Home AI Community

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured