ChromaDB Local MCP Server
Provides AI assistants with persistent memory through local ChromaDB vector storage, featuring automated file ingestion and batch processing for over 70 file types. It enables advanced vector search, EXIF metadata extraction for photos, and duplicate file detection across local directories.
README
ChromaDB MCP Server š§
A Model Context Protocol (MCP) server that gives AI assistants persistent memory through ChromaDB vector storage. Now with EXIF extraction, Watch Folders, and Duplicate Detection - the ultimate tool for creators!
⨠Features
Core
- Persistent AI Memory: Your AI assistant remembers past conversations and solutions
- Vector Search: Find similar code patterns, configurations, and documentation instantly
- Local First: Run everything on your own hardware, no cloud dependencies
š Batch Processing
- Fast Batch Ingest: Process entire directories in seconds (500+ files)
- 77 File Types: Photos, CAD, documents, data files, code
- Quick Load/Unload: Temporary collections for rapid workflows
- Export/Import: Backup and transfer collections as JSON
šø Photo Features (NEW in v3.0)
- EXIF Extraction: Camera, lens, exposure, GPS location, date taken
- Search by Camera: "Find photos shot with my Canon 5D"
- Search by Location: GPS coordinates embedded and searchable
- Search by Date: "Find photos from vacation 2024"
šļø Watch Folders (NEW in v3.0)
- Auto-Ingest: Drop files in watched folders, auto-add to ChromaDB
- Hands-Free: Perfect for incoming photo dumps, downloads
- Filter by Type: Watch only for specific file types
š Duplicate Detection (NEW in v3.0)
- Find Duplicates: Hash-based detection across directories
- Reclaim Space: See exactly how much space duplicates waste
- Compare Files: Check if two files are identical
- Perceptual Hashing: Find similar (not just identical) images
š Quick Start
Prerequisites
- Bun (JavaScript runtime)
- Docker (for ChromaDB)
- Claude Desktop (or any MCP client)
Installation
-
Clone the repository
git clone https://github.com//vespo92/chromadblocal-mcp-server.git cd chromadb-mcp-server -
Install dependencies
bun install -
Start ChromaDB
docker run -d \ --name chromadb-local \ -p 8001:8000 \ -v ~/chromadb-data:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ chromadb/chroma:latest -
Initialize collections
bun run setup -
Configure Claude Desktop
Add to
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "chromadb-context": { "command": "bun", "args": ["run", "/path/to/chromadb-mcp-server/index.js"], "env": { "CHROMADB_URL": "http://localhost:8001" } } } } -
Restart Claude Desktop and start building your knowledge base!
š¬ Usage Examples
Once configured, interact naturally with your AI:
Store Knowledge
- "Store this Docker configuration in ChromaDB for future reference"
- "Save this React component pattern with tags: hooks, authentication"
- "Remember this solution for GPU passthrough issues"
Retrieve Information
- "Search ChromaDB for Python async examples"
- "Find similar component patterns to this one"
- "What solutions do we have for Docker networking issues?"
Build Context
- "Add this API documentation to the project_docs collection"
- "Store these test patterns for our testing suite"
š Batch File Processing
The killer feature! Process massive amounts of files instantly for AI-powered search and retrieval.
Quick Load Workflow (Fastest)
Perfect for "load, process, discard" workflows:
You: "Quick load my photos from /home/photos/vacation2024"
AI: Creates temp collection, ingests 500 photos in seconds
You: "Find photos with mountains or beaches"
AI: Returns matching photos with metadata
You: "Unload the collection"
AI: Cleans up, frees memory
Supported File Types
| Category | Extensions | Metadata Extracted |
|---|---|---|
| Images | .jpg, .jpeg, .png, .heic, .raw, .cr2, .nef, .arw, .tiff, .gif, .webp | Dimensions, size, format |
| CAD | .stl, .obj, .dxf, .dwg, .step, .iges, .fbx, .blend, .skp, .scad | Vertices, faces, format |
| Documents | .pdf, .txt, .md, .doc, .docx, .rtf | Full text content |
| Data | .json, .yaml, .xml, .csv, .toml, .ini | Parsed content |
| Code | .js, .ts, .py, .go, .rs, .java, .cpp, .c, .php, .rb + 20 more | Full source code |
Batch Processing Examples
"Scan /projects/cad-files to see what's there"
"Batch ingest all STL files from /3d-prints into the 'print_library' collection"
"Quick load my Downloads folder, find anything mentioning 'invoice'"
"Export the photo_archive collection to backup.json"
"Import backup.json into a new collection called 'restored_photos'"
Processing Speed
- Quick Load: ~200 files in 2-3 seconds
- Batch Ingest: ~500 files in 5-10 seconds (with full metadata)
- Concurrent Processing: 10-20 parallel file operations
- No external dependencies: Pure JavaScript/Bun processing
š Available Collections
| Collection | Description | Use Case |
|---|---|---|
home_automation |
Smart home configs & automations | Home Assistant, IoT scripts |
code_snippets |
Reusable code patterns | Functions, hooks, utilities |
configurations |
System & app configs | Docker, Kubernetes, services |
troubleshooting |
Problem solutions | Fixes, workarounds, debugging |
project_docs |
Project documentation | APIs, architecture, guides |
learning_notes |
Learning insights | Tutorials, concepts, notes |
š ļø MCP Tools
search_context
Search for relevant information across collections
Parameters:
- query: Search query
- collection: (optional) Specific collection to search
- limit: (optional) Number of results
store_context
Store new information with metadata
Parameters:
- content: The content to store
- metadata: Tags, categories, descriptions
- collection: Target collection
list_collections
List all available collections and their metadata
find_similar_patterns
Find code patterns similar to provided example
Batch Processing Tools
scan_directory
Preview files in a directory before ingesting
Parameters:
- path: Directory to scan
- categories: Filter by type (images, cad, documents, data, code)
- extensions: Filter by extension (.jpg, .stl, etc.)
- recursive: Include subdirectories (default: true)
batch_ingest
Bulk ingest files into ChromaDB with full metadata
Parameters:
- path: Source directory
- collection: Target collection name
- categories: File types to include
- max_files: Limit number of files
quick_load
š FAST: Rapidly load files for temporary processing
Parameters:
- path: Directory to load
- name: Collection name (auto-generated if omitted)
- categories: File types to include
unload_collection
Delete a collection (cleanup after quick_load)
Parameters:
- collection: Name of collection to delete
export_collection
Export collection to JSON file
Parameters:
- collection: Collection to export
- output_path: File path for JSON output
import_collection
Import collection from JSON file
Parameters:
- input_path: JSON file to import
- collection: Override collection name
- overwrite: Delete existing first (default: false)
get_collection_info
Get detailed stats about a collection
Parameters:
- collection: Collection name
ingest_file
Ingest a single file with metadata extraction
Parameters:
- path: File to ingest
- collection: Target collection
list_file_types
Show all supported file extensions
EXIF & Photo Tools
extract_exif
Extract detailed EXIF metadata from photos
Parameters:
- path: Path to JPEG or TIFF image
Returns: Camera, lens, exposure, GPS, date taken
Watch Folder Tools
watch_folder
Start auto-ingesting new files from a folder
Parameters:
- path: Folder to watch
- collection: Target collection (default: auto_ingest)
- categories: File types to watch
- include_exif: Extract EXIF from photos (default: true)
stop_watch
Stop watching a folder
Parameters:
- path: Folder to stop watching
list_watchers
List all active folder watchers
Duplicate Detection Tools
find_duplicates
Scan directory for duplicate files
Parameters:
- path: Directory to scan
- hash_method: "partial" (fast), "full" (thorough), "perceptual" (images)
- categories: File types to check
Returns: Duplicate groups with wasted space info
compare_files
Check if two files are duplicates
Parameters:
- file1: First file path
- file2: Second file path
find_collection_duplicates
Find duplicate entries in a ChromaDB collection
Parameters:
- collection: Collection name
š§ Configuration
Environment Variables
CHROMADB_URL=http://localhost:8001 # ChromaDB server URL
Custom Collections
Add new collections in setup-home-collections.js:
await createCollection('ml_experiments', {
description: 'Machine learning experiments and results'
});
š¦ Project Structure
chromadb-mcp-server/
āāā index.js # MCP server with 22 tools
āāā batch-processor.js # Fast batch file processing engine
āāā exif-extractor.js # EXIF metadata extraction for photos
āāā watch-folder.js # Auto-ingest watch folder system
āāā duplicate-detector.js # Duplicate file detection
āāā setup-home-collections.js # Collection initialization
āāā test-chromadb.js # Connection test script
āāā test-mcp.js # MCP functionality test
āāā test-batch-processor.js # Batch processing tests
āāā HOME-AI-SETUP.md # Detailed setup guide
āāā package.json # Project dependencies
āāā README.md # This file
š¤ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
See CONTRIBUTING.md for more details.
š License
This project is licensed under the MIT License - see the LICENSE file for details.
š Acknowledgments
- Anthropic for the MCP specification
- Chroma for the excellent vector database
- The open-source community for inspiration and support
š What's Next?
- ā
Export/import collectionsDONE! - ā
Batch file processingDONE! - ā
EXIF metadata extractionDONE in v3.0! - ā
Watch folders / auto-ingestDONE in v3.0! - ā
Duplicate detectionDONE in v3.0! - Cloud sync capabilities
- Multi-user support
- Web UI for collection management
- AI-powered image descriptions (what's in the photo)
- 3D print analysis (volume, time estimates)
Built with ā¤ļø for the Home AI Community
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.