MCP Servers

MCP Server with FAISS for RAG

Mirror of

MCP-Mirror

Research & Data

README

MCP Server with FAISS for RAG

This project provides a proof-of-concept implementation of a Machine Conversation Protocol (MCP) server that allows an AI agent to query a vector database and retrieve relevant documents for Retrieval-Augmented Generation (RAG).

Features

FastAPI server with MCP endpoints
FAISS vector database integration
Document chunking and embedding
GitHub Move file extraction and processing
LLM integration for complete RAG workflow
Simple client example
Sample documents

Installation

Using pipx (Recommended)

pipx is a tool to help you install and run Python applications in isolated environments.

First, install pipx if you don't have it:

# On macOS
brew install pipx
pipx ensurepath

# On Ubuntu/Debian
sudo apt update
sudo apt install python3-pip python3-venv
python3 -m pip install --user pipx
python3 -m pipx ensurepath

# On Windows with pip
pip install pipx
pipx ensurepath

Install the MCP Server package directly from the project directory:

# Navigate to the directory containing the mcp_server folder
cd /path/to/mcp-server-project

# Install in editable mode
pipx install -e .

(Optional) Configure environment variables:
- Copy .env.example to .env
- Add your GitHub token for higher rate limits: GITHUB_TOKEN=your_token_here
- Add your OpenAI or other LLM API key for RAG integration: OPENAI_API_KEY=your_key_here

Manual Installation

If you prefer not to use pipx:

Clone the repository
Install dependencies:

cd mcp_server
pip install -r requirements.txt

Usage with pipx

After installing with pipx, you'll have access to the following commands:

Downloading Move Files from GitHub

# Download Move files with default settings
mcp-download --query "use sui" --output-dir docs/move_files

# Download with more options
mcp-download --query "module sui::coin" --max-results 50 --new-index --verbose

Improved GitHub Search and Indexing (Recommended)

# Search GitHub and index files with default settings
mcp-search-index --keywords "sui move"

# Search multiple keywords and customize options
mcp-search-index --keywords "sui move,move framework" --max-repos 30 --output-results --verbose

# Save search results and use a custom index location
mcp-search-index --keywords "sui coin,sui::transfer" --index-file custom/path/index.bin --output-results

The mcp-search-index command provides enhanced GitHub repository search capabilities:

Searches repositories first, then recursively extracts Move files
Supports multiple search keywords (comma-separated)
Intelligently filters for Move files containing "use sui" references
Always rebuilds the vector database after downloading

Indexing Move Files

# Index files in the default location
mcp-index

# Index with custom options
mcp-index --docs-dir path/to/files --index-file path/to/index.bin --verbose

Querying the Vector Database

# Basic query
mcp-query "What is a module in Sui Move?"

# Advanced query with options
mcp-query "How do I define a struct in Sui Move?" -k 3 -f

Using RAG with LLM Integration

# Basic RAG query (will use simulated LLM if no API key is provided)
mcp-rag "What is a module in Sui Move?"

# Using with a specific LLM API
mcp-rag "How do I define a struct in Sui Move?" --api-key your_api_key --top-k 3

# Output as JSON for further processing
mcp-rag "What are the benefits of sui::coin?" --output-json > rag_response.json

Running the Server

# Start the server with default settings
mcp-server

# Start with custom settings
mcp-server --host 127.0.0.1 --port 8080 --index-file custom/path/index.bin

Manual Usage (without pipx)

Starting the server

cd mcp_server
python main.py

The server will start on http://localhost:8000

Downloading Move Files from GitHub

To download Move files from GitHub and populate your vector database:

# Download Move files with default query "use sui"
./run.sh --download-move

# Customize the search query
./run.sh --download-move --github-query "module sui::coin" --max-results 50

# Download, index, and start the server
./run.sh --download-move --index

You can also use the Python script directly:

python download_move_files.py --query "use sui" --output-dir docs/move_files

Indexing documents

Before querying, you need to index your documents. You can place your text files (.txt), Markdown files (.md), or Move files (.move) in the docs directory.

To index the documents, you can either:

Use the run script with the --index flag:

./run.sh --index

Use the index script directly:

python index_move_files.py --docs-dir docs/move_files --index-file data/faiss_index.bin

Querying documents

You can use the local query script:

python local_query.py "What is RAG?"

# With more options
python local_query.py -k 3 -f "How to define a struct in Sui Move?"

Using RAG with LLM Integration

# Direct RAG query with an LLM
python rag_integration.py "What is a module in Sui Move?" --index-file data/faiss_index.bin

# With API key (if you have one)
OPENAI_API_KEY=your_key_here python rag_integration.py "How do coins work in Sui?"

MCP API Endpoint

The MCP API endpoint is available at /mcp/action. You can use it to perform different actions:

retrieve_documents: Retrieve relevant documents for a query
index_documents: Index documents from a directory

Example:

curl -X POST "http://localhost:8000/mcp/action" -H "Content-Type: application/json" -d '{"action_type": "retrieve_documents", "payload": {"query": "What is RAG?", "top_k": 3}}'

Complete RAG Pipeline

The full RAG (Retrieval-Augmented Generation) pipeline works as follows:

Search Query: The user submits a question
Retrieval: The system searches the vector database for relevant documents
Context Formation: Retrieved documents are formatted into a prompt
LLM Generation: The prompt is sent to an LLM with the retrieved context
Enhanced Response: The LLM provides an answer based on the retrieved information

This workflow is fully implemented in the rag_integration.py module, which can be used either through the command line or as a library in your own applications.

GitHub Move File Extraction

The system can extract Move files from GitHub based on search queries. It implements two methods:

GitHub API (preferred): Requires a GitHub token for higher rate limits
Web Scraping fallback: Used when API method fails or when no token is provided

To configure your GitHub token, set it in the .env file or as an environment variable:

GITHUB_TOKEN=your_github_token_here

Project Structure

mcp_server/
├── __init__.py             # Package initialization
├── main.py                # Main server file
├── mcp_api.py             # MCP API implementation
├── index_move_files.py    # File indexing utility
├── local_query.py         # Local query utility
├── download_move_files.py # GitHub Move file extractor
├── rag_integration.py     # LLM integration for RAG
├── pyproject.toml         # Package configuration
├── requirements.txt       # Dependencies
├── .env.example           # Example environment variables
├── README.md              # This file
├── data/                  # Storage for the FAISS index
├── docs/                  # Sample documents
│   └── move_files/        # Downloaded Move files
├── models/                # Model implementations
│   └── vector_store.py    # FAISS vector store implementation
└── utils/
    ├── document_processor.py  # Document processing utilities
    └── github_extractor.py    # GitHub file extraction utilities

Extending the Project

To extend this proof-of-concept:

Add authentication and security features
Implement more sophisticated document processing
Add support for more document types
Integrate with other LLM providers
Add monitoring and logging
Improve the Move language parsing for more structured data extraction

License

MIT

Recommended Servers

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured

TypeScript

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured

Python

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured

TypeScript

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured

TypeScript

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured

Python

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official

Python

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official

JavaScript

Mathematica Documentation MCP server

A server that provides access to Mathematica documentation through FastMCP, enabling users to retrieve function documentation and list package symbols from Wolfram Mathematica.

Local

Python

kb-mcp-server

An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded

Local

Python

Research MCP Server

The server functions as an MCP server to interact with Notion for retrieving and creating survey data, integrating with the Claude Desktop Client for conducting and reviewing surveys.

Local

Python