MCP VectorStore Server

MCP VectorStore Server

Provides advanced document search and processing capabilities through vector stores, including PDF processing, semantic search, web search integration, and file operations. Enables users to create searchable document collections and retrieve relevant information using natural language queries.

Category
Visit Server

README

MCP VectorStore Server

A Model Context Protocol (MCP) server that provides advanced vector store operations for document search, PDF processing, and information retrieval. This server wraps the functionality from vectorstore.py into a standardized MCP interface.

Features

  • Vector Store Operations: Create, search, and manage document vector stores
  • PDF Processing: Extract and index content from PDF documents using LLMSherpa
  • Semantic Search: Advanced document search using HuggingFace embeddings
  • Web Search Integration: Google, Wikipedia, and DuckDuckGo search capabilities
  • File Operations: Read and process local files
  • Mathematical Calculations: Built-in calculator functionality

Prerequisites

System Requirements

  • Python: 3.8 or higher
  • Operating System: Linux, macOS, or Windows
  • Memory: Minimum 4GB RAM (8GB+ recommended for large document collections)
  • Storage: At least 2GB free space for models and vector stores
  • Network: Internet connection for downloading models and web searches

Optional GPU Support

For improved performance with large document collections:

  • CUDA: 11.8 or higher
  • GPU: NVIDIA GPU with 4GB+ VRAM
  • cuDNN: Compatible version for your CUDA installation

Installation

Step 1: Clone or Download the Repository

# If you have the files locally, navigate to the directory
cd /path/to/McpDocServer

# Or clone from a repository (if available)
# git clone <repository-url>
# cd McpDocServer

Step 2: Create a Virtual Environment

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
# On Linux/macOS:
source venv/bin/activate

# On Windows:
# venv\Scripts\activate

Step 3: Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

Step 4: Install LLMSherpa (Optional but Recommended)

For optimal PDF processing, install LLMSherpa locally:

# Install LLMSherpa
pip install llmsherpa

# Start the LLMSherpa server (in a separate terminal)
llmsherpa --port 5001

Step 5: Download Embedding Models

The server will automatically download the required embedding model on first use, but you can pre-download it:

# Download the embedding model
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"

Configuration

Environment Variables

Create a .env file in the project directory:

# LLMSherpa API URL (use local if available, otherwise cloud)
LLMSHERPA_API_URL=http://localhost:5001/api/parseDocument?renderFormat=all

# Vector store directory
VECTORSTORE_DIR=/path/to/your/documents

# User agent for web scraping
USER_AGENT=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

# Optional: CUDA device for GPU acceleration
CUDA_VISIBLE_DEVICES=0

Directory Structure

Prepare your document directory:

your_documents/
├── pdfs/
│   ├── document1.pdf
│   ├── document2.pdf
│   └── ...
├── text_files/
│   ├── notes.txt
│   └── ...
└── other_documents/
    └── ...

Usage

Starting the MCP Server

# Make the server executable
chmod +x mcp_vectorstore_server.py

# Start the server on linux
python /home/em/McpDocServer/mcp_vectorstore_server.py
or windows with wsl
wsl -d Ubuntu-24.04 bash -c "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"

Using with MCP Clients

0. Claude Desktop

Add to your MCP configuration:

{
  "mcpServers": {
    "vectorstore": {
      "command": "python",
      "args": ["/home/em/McpDocServer/mcp_vectorstore_server.py"],
      "env": {
        "PYTHONPATH": "/home/em/McpDocServer/McpDocServer"
      }
    }
  }
}

1. GitHub Copilot

  1. Click on Configure Tools in the GitHub Copilot Chat window:<br>
  2. Click on Add More Tools in the top search bar.<br>
  3. Click on Add MCP Server in the top search bar.<br>
  4. Click on command (stdio) in the top search bar.<br>
  5. Enter command to run:<br>
  6. python /home/em/McpDocServer/mcp_vectorstore_server.py<br> or on windows: wsl -d Ubuntu-24.04 /mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh
  7. Enter mcp server id / name e.g. McpDocServer-19be5552<br>
  8. Configure settings.json<br>
{
    "security.workspace.trust.untrustedFiles": "open",
    "python.defaultInterpreterPath": "/mnt/c/Users/emanu/Desktop/LLM/venv/venv/bin/python",
    "terminal.integrated.inheritEnv": false,
    "git.openRepositoryInParentFolders": "never",
    "terminal.integrated.scrollback": 100000,
    "mcp": {
        "servers": {
            "McpDocServer-19be5552": {
                "type": "stdio",
                "command": "python",
                "args": [
                    "/mnt/c/Users/emanu/Desktop/McpDocServer/mcp_vectorstore_server.py"
                ]
            }
        }
    }
}
  1. Check if the following tools are available in the mcp server tool list when you click on Configure Tools in the GitHub Copilot Chat window and scroll to bottom:<br>     vectorstore_search<br>     vectorstore_create<br>     vectorstore_info<br>     vectorstore_clear<br>     read_file<br>     google_search<br>     wikipedia_search<br>     duckduckgo_search<br>     calculate<br>
  2. Select Agent mode in GitHub Copilot Chat window and use vectorstore_search to get information:<br> use vectorstore_search to get information on unit testing<br> 11)Confirm tool call usage.

2. Continue MCP CLient

name: McpDocServer
version: 1.0.1
schema: v1
mcpServers:
  - name: McpDocServer
    command: wsl -d Ubuntu-24.04
    args:
      - "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"
    env: {}
    mcp_timeout: 180 # set timeout to 180 sec
    timeout: 9999
    connectionTimeout: 120000  # 120 seconds = 2 minutes

3. Other MCP Clients

Configure your MCP client to use the server:

# Example with a generic MCP client
mcp-client --server python --args /path/to/McpDocServer/mcp_vectorstore_server.py

Available Tools

Vector Store Operations

vectorstore_search

Search the vector store for relevant documents.

Parameters:

  • query (string, required): Search query
  • k (integer, optional): Number of results (default: 2)

Example:

{
  "name": "vectorstore_search",
  "arguments": {
    "query": "machine learning algorithms",
    "k": 5
  }
}

vectorstore_create

Create a new vector store from documents in a directory.

Parameters:

  • directory_path (string, required): Path to directory containing documents

Example:

{
  "name": "vectorstore_create",
  "arguments": {
    "directory_path": "/home/user/documents/research_papers"
  }
}

vectorstore_info

Get information about the current vector store.

Example:

{
  "name": "vectorstore_info",
  "arguments": {}
}

vectorstore_clear

Clear all documents from the vector store.

Example:

{
  "name": "vectorstore_clear",
  "arguments": {}
}

File Operations

read_file

Read the contents of a file on the system.

Parameters:

  • filename (string, required): Path to the file to read

Example:

{
  "name": "read_file",
  "arguments": {
    "filename": "/home/user/documents/notes.txt"
  }
}

Web Search Operations

google_search

Search Google for information.

Parameters:

  • query (string, required): Search query
  • max_results (integer, optional): Maximum number of results (default: 3)

Example:

{
  "name": "google_search",
  "arguments": {
    "query": "latest AI developments 2024",
    "max_results": 5
  }
}

wikipedia_search

Search Wikipedia for information.

Parameters:

  • query (string, required): Search query

Example:

{
  "name": "wikipedia_search",
  "arguments": {
    "query": "artificial intelligence"
  }
}

duckduckgo_search

Search DuckDuckGo for information.

Parameters:

  • query (string, required): Search query

Example:

{
  "name": "duckduckgo_search",
  "arguments": {
    "query": "privacy-focused search engines"
  }
}

Utility Operations

calculate

Perform mathematical calculations.

Parameters:

  • operation (string, required): Mathematical operation to perform

Example:

{
  "name": "calculate",
  "arguments": {
    "operation": "2 + 2 * 3"
  }
}

Resources

The server provides the following resources:

vectorstore://info

Returns information about the current vector store in JSON format.

Example Response:

{
  "num_documents": 150,
  "directory": "/home/user/documents",
  "embeddings_model": "sentence-transformers/all-mpnet-base-v2"
}

Troubleshooting

Common Issues

1. Import Errors

Problem: ModuleNotFoundError for various packages Solution: Ensure all dependencies are installed:

pip install -r requirements.txt

2. CUDA/GPU Issues

Problem: CUDA-related errors Solution: Install CPU-only versions:

pip uninstall faiss-gpu torch
pip install faiss-cpu

3. LLMSherpa Connection Issues

Problem: Cannot connect to LLMSherpa API Solution:

  • Start LLMSherpa server: llmsherpa --port 5001
  • Or use cloud API by updating the URL in the code

4. Memory Issues

Problem: Out of memory errors with large documents Solution:

  • Reduce chunk size in the text splitter
  • Use smaller embedding models
  • Process documents in batches

5. Permission Issues

Problem: Cannot read files or directories Solution: Check file permissions:

chmod 644 /path/to/documents/*
chmod 755 /path/to/documents/

Performance Optimization

For Large Document Collections

  1. Use GPU acceleration:

    # In vectorstore.py, ensure CUDA is enabled
    model_kwargs={'device': 'cuda'}
    
  2. Optimize chunk size:

    # Adjust in PDFVectorStoreTool.__init__
    chunk_size=1000,  # Smaller chunks for better performance
    chunk_overlap=100,
    
  3. Batch processing:

    # Process documents in smaller batches
    batch_size = 10
    

For Better Search Results

  1. Adjust similarity threshold:

    # In vectorstore_search method
    similarity_threshold = 0.7
    
  2. Use different embedding models:

    # Try different models for better results
    model_name="sentence-transformers/all-MiniLM-L6-v2"  # Faster
    model_name="sentence-transformers/all-mpnet-base-v2"  # Better quality
    

Development

Project Structure

McpDocServer/
├── mcp_vectorstore_server.py  # Main MCP server
├── vectorstore.py             # Original vectorstore implementation
├── requirements.txt           # Python dependencies
├── README.md                 # This documentation
└── .env                      # Environment variables (create this)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Testing

# Run basic functionality tests
python -c "
from mcp_vectorstore_server import *
print('Server imports successfully')
"

# Test vector store operations
python -c "
from vectorstore import PDFVectorStoreTool
tool = PDFVectorStoreTool()
print(f'Vector store initialized with {tool.vectorstore_get_num_items()} documents')
"

License

This project is provided as-is for educational and research purposes. Please ensure you comply with the licenses of all included dependencies.

Support

For issues and questions:

  1. Check the troubleshooting section above
  2. Review the error logs
  3. Ensure all dependencies are correctly installed
  4. Verify your system meets the requirements

Changelog

Version 1.0.0

  • Initial release
  • MCP server implementation
  • Vector store operations
  • Web search integration
  • File operations
  • Mathematical calculations

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured