AVS Document Search System

AVS Document Search System

A vector search system that enables semantic retrieval of document chunks using MongoDB Atlas Vector Search and Voyage AI embeddings, allowing users to search documents by meaning rather than just keywords.

Category
Visit Server

README

MCP Document Search System

A vector search system for document retrieval using MongoDB Atlas Vector Search and Voyage AI embeddings.

Sample data included is for Atlas Vector Search!

Features

  • Ingests and chunks markdown documents with hierarchical headers
  • Generates embeddings using Voyage AI's contextual embeddings API
  • Stores documents and embeddings in MongoDB with parent-child relationships
  • Provides a FastMCP server for semantic document search
  • Supports configurable vector dimensions and chunking strategies

Available MCP Tools

The document search server provides these tools:

  1. search_documents_vector(query: str, limit: int = 5)

    • Primary search method using vector similarity
    • Returns document chunks with metadata and similarity scores
    • Best for semantic/meaning-based queries
  2. search_documents_lexicaly(query: str, limit: int = 1)

    • Fallback search using lexical/text matching
    • Returns full parent documents with search scores
    • Useful when vector search doesn't find good matches
  3. get_parent_document(parent_id: str)

    • Retrieves the complete parent document by ID
    • Returns original content and file path
    • Use after search to get full context for a chunk

Claude Desktop Tool Call

Prerequisites

  • Python 3.10+
  • MongoDB Atlas cluster with vector search enabled
  • Voyage AI API key

Installation

  1. Clone the repository:
git clone https://github.com/patw/avs-document-search.git
cd avs-document-search
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file based on sample.env with your credentials

Usage

  1. Ingest documents in the docs/ directory:
python ingest_docs.py
  1. Run the search server:
python avs-mcp.py

Running the search server won't do much, other than verify your MongoDB URI is correct, you will need to plug this MCP server into an MCP client like Claude Desktop. Here's a sample config:

{
  "mcpServers": {
    "Atlas Vector Search Docs": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "fastmcp, pymongo, requests",
        "fastmcp",
        "run",
        "<path to>/avs-docs-mcp/avs-mcp.py"
      ]
    }
  }
}

Configuration

Copy sample.env to .env and Edit to configure:

  • MongoDB connection string
  • Database and collection names
  • Voyage AI API key
  • Vector dimensions (256 default)

Future Improvements

  • Implement hybrid search combining vector and text search using $rankFusion (when MongoDB 8.1 is GA on Atlas)
  • Support additional file formats (PDF, Word, etc.) with Docling

Contributing

Pull requests are welcome! For major changes, please open an issue first.

Author

Pat Wendorf
pat.wendorf@mongodb.com
GitHub: patw

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured