MCP Servers

Markdown RAG

A Retrieval Augmented Generation system that enables AI assistants to perform semantic searches and manage document indices for markdown files. It supports PostgreSQL with pgvector and integrates both Google Gemini and Ollama for intelligent embedding generation.

README

Markdown RAG

A Retrieval Augmented Generation (RAG) system for markdown documentation with intelligent rate limiting and MCP server integration.

Features

Semantic Search: Vector-based similarity search using Google Gemini or Ollama embeddings
Markdown-Aware Chunking: Intelligent document splitting that preserves semantic boundaries
Rate Limiting: Sophisticated sliding window algorithm with token counting and batch optimization
MCP Server: Model Context Protocol server for AI assistant integration
PostgreSQL Vector Store: Scalable storage using pgvector extension
Incremental Updates: Smart deduplication prevents reprocessing existing documents
Production Ready: Type-safe configuration, comprehensive logging, and error handling

Installation

git clone https://github.com/yourusername/markdown-rag.git

Prerequisites

Python 3.11+
PostgreSQL 12+ with pgvector extension installed
Google Gemini API key (if using Google embeddings)
Ollama (if using local embeddings)
MCP-compatible client (Claude Desktop, Cline, etc.)

Quick Start

1. (Optional) Set Up PostgreSQL

createdb embeddings

If you do not create a database, the tool will create one for you. The pgvector extension will be automatically enabled when you first run the tool.

2. Ingest Documents

cd markdown-rag
# Use Google Gemini
uv run markdown-rag /path/to/docs --command ingest --engine google
# Or use Ollama
uv run markdown-rag /path/to/docs --command ingest --engine ollama

Required environment variables (create .env or export):

POSTGRES_PASSWORD=your_password
GOOGLE_API_KEY=your_gemini_api_key  # Only if using Google engine

3. Configure MCP Client

Add to your MCP client configuration (e.g., claude_desktop_config.json). The client will automatically start the server.

Minimal configuration:

{
  "mcpServers": {
    "markdown-rag": {
      "command": "uv",
      "args": [
        "run",
        "--directory"
        "/absolute/path/to/markdown-rag",
        "markdown-rag",
        "/absolute/path/to/docs",
        "--command",
        "mcp"
      ],
      "env": {
        "POSTGRES_PASSWORD": "your_password",
        "GOOGLE_API_KEY": "your_api_key"
      }
    }
  }
}

Full configuration:

{
  "mcpServers": {
    "markdown-rag": {
      "command": "uv",
      "args": [
        "run",
        "--directory"
        "/absolute/path/to/markdown-rag",
        "markdown-rag",
        "/absolute/path/to/docs",
        "--command",
        "mcp"
      ],
      "env": {
        "POSTGRES_USER": "postgres_username",
        "POSTGRES_PASSWORD": "your_password",
        "DISABLED_TOOLS": "delete_document,update_document",
        "CHUNK_OVERLAP": 50,
        # Google Configuration
        "GOOGLE_API_KEY": "your_api_key",
        "GOOGLE_MODEL": "models/gemini-embedding-001",
        "RATE_LIMIT_REQUESTS_PER_DAY": "1000",
        "RATE_LIMIT_REQUESTS_PER_MINUTE": "100",
        # Ollama Configuration
        "OLLAMA_HOST": "http://localhost:11434",
        "OLLAMA_MODEL": "mxbai-embed-large",
      }
    }
  }
}

4. Query via MCP

The server exposes several tools:

query

Semantic search over documentation
Arguments: query (string), num_results (integer, optional, default: 4)

list_documents

List all ingested documents
Arguments: none

delete_document

Remove a document from the index
Arguments: filename (string)

update_document

Re-ingest a specific document
Arguments: filename (string)

refresh_index

Scan directory and ingest new/modified files
Arguments: none

To disable tools (e.g., in production), set DISABLED_TOOLS environment variable:

DISABLED_TOOLS=delete_document,update_document,refresh_index

Configuration

Environment Variables

Variable	Default	Required	Description
`POSTGRES_USER`	`postgres`	No	PostgreSQL username
`POSTGRES_PASSWORD`	-	Yes	PostgreSQL password
`POSTGRES_HOST`	`localhost`	No	PostgreSQL host
`POSTGRES_PORT`	`5432`	No	PostgreSQL port
`POSTGRES_DB`	`[engine]_embeddings`	No	Database name
`GOOGLE_API_KEY`	-	Yes*	Google Gemini API key (*if using Google)
`GOOGLE_MODEL`	`models/gemini...`	No	Google embedding model
`OLLAMA_HOST`	`http://localhost...`	No	Ollama host URL
`OLLAMA_MODEL`	`mxbai-embed-large`	No	Ollama embedding model
`RATE_LIMIT_REQUESTS_PER_MINUTE`	`100`	No	Max API requests per minute
`RATE_LIMIT_REQUESTS_PER_DAY`	`1000`	No	Max API requests per day
`DISABLED_TOOLS`	-	No	Comma-separated list of tools to disable

Command Line Options

uv run markdown-rag <directory> [OPTIONS]

Arguments:

<directory>: Path to markdown files directory (required)

Options:

-c, --command {ingest|mcp}: Operation mode (default: mcp)
- ingest: Process and store documents
- mcp: Start MCP server for queries
-e, --engine {google|ollama}: Embedding engine (default: google)
-l, --level {debug|info|warning|error}: Logging level (default: warning)

Examples:

uv run markdown-rag ./docs --command ingest --level info --engine ollama

uv run markdown-rag /var/docs -c ingest -l debug -e google

Architecture

System Components

The following diagram shows how the system components interact:

graph TD
    A[MCP Client<br/>Claude, ChatGPT, etc.] --> B[FastMCP Server<br/>Tool: query]
    B --> C[MarkdownRAG]
    C --> D[Text Splitters]
    C --> E[Rate Limited Embeddings]
    E --> F[Google Gemini<br/>Embeddings API]
    C --> G[PostgreSQL<br/>+ pgvector]

Rate Limiting Strategy

The system implements a dual-window sliding algorithm:

Request Limits: Tracks requests per minute and per day
Token Limits: Counts tokens before API calls
Batch Optimization: Calculates maximum safe batch sizes
Smart Waiting: Minimal delays with automatic retry

See Architecture Documentation for detailed diagrams.

Development

Setup Development Environment

git clone https://github.com/yourusername/markdown-rag.git
cd markdown-rag
uv sync

Run Linters

uv run ruff check .

uv run mypy .

Code Style

This project follows:

Linting: Ruff with Google docstring convention
Type Checking: mypy with strict settings
Line Length: 79 characters
Import Sorting: Alphabetical with isort

Project Structure

markdown-rag/
├── src/markdown_rag/
│   ├── __init__.py
│   ├── main.py              # Entry point and MCP server
│   ├── config.py            # Environment and CLI configuration
│   ├── models.py            # Pydantic data models
│   ├── rag.py               # Core RAG logic
│   ├── embeddings.py        # Rate-limited embeddings wrapper
│   └── rate_limiter.py      # Rate limiting algorithm
├── docs/
│   ├── api-reference.md     # API documentation
│   ├── architecture.md      # Architecture documentation
│   ├── mcp-integration.md   # MCP server integration guide
│   └── user-guide.md        # User guide
├── pyproject.toml           # Project configuration
├── .env                     # Environment variables (not in git)
└── README.md

Troubleshooting

Common Issues

"Failed to start store: connection refused"

PostgreSQL not running or wrong connection settings. Check your connection parameters in environment variables.

"Rate limit exceeded"

Adjust rate limits in environment variables:

RATE_LIMIT_REQUESTS_PER_MINUTE=50
RATE_LIMIT_REQUESTS_PER_DAY=500

"pgvector extension not found"

The pgvector PostgreSQL extension is not installed. Follow the pgvector installation guide for your platform.

"Skipping all files (already in vector store)"

Expected behavior. The system prevents duplicate ingestion.

Logging

uv run markdown-rag ./docs --command ingest --level debug

Security

Best Practices

Never commit .env files - Add to .gitignore
Use environment variables for all secrets
Restrict database access - Use firewall rules
Rotate API keys regularly
Use read-only database users for query-only deployments

Secrets Management

All secrets use SecretStr type to prevent accidental logging:

from pydantic import SecretStr

api_key = SecretStr("secret_value")

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make changes and add tests
Run linters (uv run ruff check .)
Run type checks (uv run mypy .)
Commit changes (git commit -m 'feat: add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Commit Message Format

Follow conventional commits:

feat: add new feature
fix: resolve bug
docs: update documentation
refactor: improve code structure
test: add tests
chore: update dependencies

TODOS

Management of embeddings store via MCP tool.
Add support for other embeddings models.
Add support for other vector stores.

License

This project is licensed under the MIT License.

Acknowledgments

LangChain - RAG framework
Google Gemini - Embedding model
pgvector - Vector similarity search
FastMCP - MCP server framework

Support

Documentation: docs/architecture.md
Issues: GitHub Issues
Discussions: GitHub Discussions

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured