Calibre RAG MCP Server
Enables semantic search and contextual conversations with your Calibre ebook library using vector-based RAG technology. Supports project-based organization, multi-format book processing, and OCR capabilities for enhanced content extraction and retrieval.
README
Calibre RAG MCP Server
Enhanced Calibre MCP server with RAG (Retrieval-Augmented Generation) capabilities for project-based vector search and contextual conversations.
Features
- RAG-Enhanced Search: Vector-based semantic search using FAISS and Transformers
- Project-Based Organization: Create isolated vector search projects for different contexts
- Multi-Format Support: Process books in various formats (EPUB, PDF, MOBI, etc.)
- OCR Capabilities: Extract text from images and scanned PDFs using Tesseract
- Advanced Text Processing: Natural language processing for better content understanding
- Windows Compatible: Designed specifically for Windows environments
Technologies Used
- Vector Search: FAISS for efficient similarity search
- Embeddings: Xenova Transformers for local embedding generation
- OCR: Tesseract for optical character recognition
- PDF Processing: Multiple PDF parsing libraries (pdf-parse, pdf-poppler, pdf2pic)
- Image Processing: Sharp for image manipulation
- NLP: Natural language processing with multiple libraries
Prerequisites
- Node.js >= 16.0.0
- Calibre installed on Windows
- ImageMagick (for enhanced image processing)
- Tesseract OCR (for text extraction from images)
Installation
- Clone this repository:
git clone https://github.com/yourusername/calibre-rag-mcp-nodejs.git
cd calibre-rag-mcp-nodejs
- Install dependencies:
npm install
- Run setup (Windows):
setup.bat
Configuration
The server automatically detects your Calibre library location. For custom configurations, modify the settings in server.js.
Usage
Starting the Server
npm start
Available Tools
search: Semantic search across your ebook libraryfetch: Retrieve specific content from bookslist_projects: List all RAG projectscreate_project: Create a new RAG projectadd_books_to_project: Add books to a project for vectorizationsearch_project_context: Search within specific projects
Example MCP Configuration
Add to your MCP client configuration:
{
"mcpServers": {
"calibre-rag": {
"command": "node",
"args": ["path/to/calibre-rag-mcp-nodejs/server.js"]
}
}
}
Project Structure
calibre-rag-mcp-nodejs/
├── server.js # Main MCP server
├── package.json # Dependencies and scripts
├── setup.bat # Windows setup script
├── test-*.js # Various test files
├── projects/ # RAG projects storage
├── CONFIG.md # Configuration documentation
├── USAGE_EXAMPLES.md # Usage examples
└── QUICK_TEST.md # Quick testing guide
Testing
Run the test suite:
npm test
Individual test files:
test-enhanced-server.js- Enhanced server functionalitytest-ocr-full.js- OCR capabilitiestest-pdf-approaches.js- PDF processingtest-enhanced-auto.js- Automated testing
Documentation
Requirements
System Requirements
- Windows 10/11
- Node.js 16+
- Calibre installed
- At least 4GB RAM (8GB+ recommended for large libraries)
Optional Dependencies
- ImageMagick (for enhanced image processing)
- Tesseract OCR (for text extraction from scanned documents)
Troubleshooting
Common Issues
- FAISS Installation: If FAISS fails to install, ensure you have proper build tools
- Tesseract Not Found: Install Tesseract and add to PATH
- Memory Issues: Reduce batch sizes for large document processing
Debug Mode
Enable verbose logging by setting environment variable:
set DEBUG=calibre-rag:*
npm start
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
Licensed under the Apache License 2.0. See LICENSE file for details.
Support
For issues and questions, please open an issue on GitHub.
Changelog
v1.0.0
- Initial release with RAG capabilities
- Project-based vector search
- Multi-format document support
- OCR integration
- Windows optimization
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.