MCP Servers

ChromaDB Local MCP Server

Provides AI assistants with persistent memory through local ChromaDB vector storage, featuring automated file ingestion and batch processing for over 70 file types. It enables advanced vector search, EXIF metadata extraction for photos, and duplicate file detection across local directories.

README

ChromaDB MCP Server 🧠

A Model Context Protocol (MCP) server that gives AI assistants persistent memory through ChromaDB vector storage. Now with EXIF extraction, Watch Folders, and Duplicate Detection - the ultimate tool for creators!

✨ Features

Core

Persistent AI Memory: Your AI assistant remembers past conversations and solutions
Vector Search: Find similar code patterns, configurations, and documentation instantly
Local First: Run everything on your own hardware, no cloud dependencies

🚀 Batch Processing

Fast Batch Ingest: Process entire directories in seconds (500+ files)
77 File Types: Photos, CAD, documents, data files, code
Quick Load/Unload: Temporary collections for rapid workflows
Export/Import: Backup and transfer collections as JSON

📸 Photo Features (NEW in v3.0)

EXIF Extraction: Camera, lens, exposure, GPS location, date taken
Search by Camera: "Find photos shot with my Canon 5D"
Search by Location: GPS coordinates embedded and searchable
Search by Date: "Find photos from vacation 2024"

👁️ Watch Folders (NEW in v3.0)

Auto-Ingest: Drop files in watched folders, auto-add to ChromaDB
Hands-Free: Perfect for incoming photo dumps, downloads
Filter by Type: Watch only for specific file types

🔍 Duplicate Detection (NEW in v3.0)

Find Duplicates: Hash-based detection across directories
Reclaim Space: See exactly how much space duplicates waste
Compare Files: Check if two files are identical
Perceptual Hashing: Find similar (not just identical) images

🚀 Quick Start

Prerequisites

Bun (JavaScript runtime)
Docker (for ChromaDB)
Claude Desktop (or any MCP client)

Installation

Clone the repository

git clone https://github.com//vespo92/chromadblocal-mcp-server.git
cd chromadb-mcp-server

Install dependencies
```
bun install
```

Start ChromaDB

docker run -d \
  --name chromadb-local \
  -p 8001:8000 \
  -v ~/chromadb-data:/chroma/chroma \
  -e IS_PERSISTENT=TRUE \
  chromadb/chroma:latest

Initialize collections
```
bun run setup
```

Configure Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "chromadb-context": {
      "command": "bun",
      "args": ["run", "/path/to/chromadb-mcp-server/index.js"],
      "env": {
        "CHROMADB_URL": "http://localhost:8001"
      }
    }
  }
}

Restart Claude Desktop and start building your knowledge base!

💬 Usage Examples

Once configured, interact naturally with your AI:

Store Knowledge

"Store this Docker configuration in ChromaDB for future reference"
"Save this React component pattern with tags: hooks, authentication"
"Remember this solution for GPU passthrough issues"

Retrieve Information

"Search ChromaDB for Python async examples"
"Find similar component patterns to this one"
"What solutions do we have for Docker networking issues?"

Build Context

"Add this API documentation to the project_docs collection"
"Store these test patterns for our testing suite"

🚀 Batch File Processing

The killer feature! Process massive amounts of files instantly for AI-powered search and retrieval.

Quick Load Workflow (Fastest)

Perfect for "load, process, discard" workflows:

You: "Quick load my photos from /home/photos/vacation2024"
AI: Creates temp collection, ingests 500 photos in seconds
You: "Find photos with mountains or beaches"
AI: Returns matching photos with metadata
You: "Unload the collection"
AI: Cleans up, frees memory

Supported File Types

Category	Extensions	Metadata Extracted
Images	.jpg, .jpeg, .png, .heic, .raw, .cr2, .nef, .arw, .tiff, .gif, .webp	Dimensions, size, format
CAD	.stl, .obj, .dxf, .dwg, .step, .iges, .fbx, .blend, .skp, .scad	Vertices, faces, format
Documents	.pdf, .txt, .md, .doc, .docx, .rtf	Full text content
Data	.json, .yaml, .xml, .csv, .toml, .ini	Parsed content
Code	.js, .ts, .py, .go, .rs, .java, .cpp, .c, .php, .rb + 20 more	Full source code

Batch Processing Examples

"Scan /projects/cad-files to see what's there"
"Batch ingest all STL files from /3d-prints into the 'print_library' collection"
"Quick load my Downloads folder, find anything mentioning 'invoice'"
"Export the photo_archive collection to backup.json"
"Import backup.json into a new collection called 'restored_photos'"

Processing Speed

Quick Load: ~200 files in 2-3 seconds
Batch Ingest: ~500 files in 5-10 seconds (with full metadata)
Concurrent Processing: 10-20 parallel file operations
No external dependencies: Pure JavaScript/Bun processing

📚 Available Collections

Collection	Description	Use Case
`home_automation`	Smart home configs & automations	Home Assistant, IoT scripts
`code_snippets`	Reusable code patterns	Functions, hooks, utilities
`configurations`	System & app configs	Docker, Kubernetes, services
`troubleshooting`	Problem solutions	Fixes, workarounds, debugging
`project_docs`	Project documentation	APIs, architecture, guides
`learning_notes`	Learning insights	Tutorials, concepts, notes

🛠️ MCP Tools

`search_context`

Search for relevant information across collections

Parameters:
- query: Search query
- collection: (optional) Specific collection to search
- limit: (optional) Number of results

`store_context`

Store new information with metadata

Parameters:
- content: The content to store
- metadata: Tags, categories, descriptions
- collection: Target collection

`list_collections`

List all available collections and their metadata

`find_similar_patterns`

Find code patterns similar to provided example

Batch Processing Tools

`scan_directory`

Preview files in a directory before ingesting

Parameters:
- path: Directory to scan
- categories: Filter by type (images, cad, documents, data, code)
- extensions: Filter by extension (.jpg, .stl, etc.)
- recursive: Include subdirectories (default: true)

`batch_ingest`

Bulk ingest files into ChromaDB with full metadata

Parameters:
- path: Source directory
- collection: Target collection name
- categories: File types to include
- max_files: Limit number of files

`quick_load`

🚀 FAST: Rapidly load files for temporary processing

Parameters:
- path: Directory to load
- name: Collection name (auto-generated if omitted)
- categories: File types to include

`unload_collection`

Delete a collection (cleanup after quick_load)

Parameters:
- collection: Name of collection to delete

`export_collection`

Export collection to JSON file

Parameters:
- collection: Collection to export
- output_path: File path for JSON output

`import_collection`

Import collection from JSON file

Parameters:
- input_path: JSON file to import
- collection: Override collection name
- overwrite: Delete existing first (default: false)

`get_collection_info`

Get detailed stats about a collection

Parameters:
- collection: Collection name

`ingest_file`

Ingest a single file with metadata extraction

Parameters:
- path: File to ingest
- collection: Target collection

`list_file_types`

Show all supported file extensions

EXIF & Photo Tools

`extract_exif`

Extract detailed EXIF metadata from photos

Parameters:
- path: Path to JPEG or TIFF image
Returns: Camera, lens, exposure, GPS, date taken

Watch Folder Tools

`watch_folder`

Start auto-ingesting new files from a folder

Parameters:
- path: Folder to watch
- collection: Target collection (default: auto_ingest)
- categories: File types to watch
- include_exif: Extract EXIF from photos (default: true)

`stop_watch`

Stop watching a folder

Parameters:
- path: Folder to stop watching

`list_watchers`

List all active folder watchers

Duplicate Detection Tools

`find_duplicates`

Scan directory for duplicate files

Parameters:
- path: Directory to scan
- hash_method: "partial" (fast), "full" (thorough), "perceptual" (images)
- categories: File types to check
Returns: Duplicate groups with wasted space info

`compare_files`

Check if two files are duplicates

Parameters:
- file1: First file path
- file2: Second file path

`find_collection_duplicates`

Find duplicate entries in a ChromaDB collection

Parameters:
- collection: Collection name

🔧 Configuration

Environment Variables

CHROMADB_URL=http://localhost:8001  # ChromaDB server URL

Custom Collections

Add new collections in setup-home-collections.js:

await createCollection('ml_experiments', {
  description: 'Machine learning experiments and results'
});

📦 Project Structure

chromadb-mcp-server/
├── index.js                    # MCP server with 22 tools
├── batch-processor.js          # Fast batch file processing engine
├── exif-extractor.js           # EXIF metadata extraction for photos
├── watch-folder.js             # Auto-ingest watch folder system
├── duplicate-detector.js       # Duplicate file detection
├── setup-home-collections.js   # Collection initialization
├── test-chromadb.js           # Connection test script
├── test-mcp.js                # MCP functionality test
├── test-batch-processor.js    # Batch processing tests
├── HOME-AI-SETUP.md           # Detailed setup guide
├── package.json               # Project dependencies
└── README.md                  # This file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

See CONTRIBUTING.md for more details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Anthropic for the MCP specification
Chroma for the excellent vector database
The open-source community for inspiration and support

🚀 What's Next?

✅ ~~Export/import collections~~ DONE!
✅ ~~Batch file processing~~ DONE!
✅ ~~EXIF metadata extraction~~ DONE in v3.0!
✅ ~~Watch folders / auto-ingest~~ DONE in v3.0!
✅ ~~Duplicate detection~~ DONE in v3.0!
Cloud sync capabilities
Multi-user support
Web UI for collection management
AI-powered image descriptions (what's in the photo)
3D print analysis (volume, time estimates)

Built with ❤️ for the Home AI Community

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

E2B

Using MCP to run code via e2b.

Official

Featured