Gemini CLI RAG MCP
Transforms static gemini-cli documentation into a queryable RAG service, enabling developers to ask questions about Gemini CLI in natural language and receive instant, accurate answers based on the official documentation directly within their workflow.
README
Gemini CLI RAG MCP
This project builds a standalone RAG service, transforming the static gemini-cli documentation into a dynamic and queryable tool. This tool exposes knowledge via a protocol (like MCP), making it accessible to any integrated client. Therefore, environments like gemini-cli, VS Code, or Cursor can provide developers with instant, accurate answers in natural language, directly within their workflow. Accelerating learning and letting you intuitively leverage the tool's full potential.
Table of Contents
- Project Overview
- Features
- System Architecture
- Getting Started
- Usage
- How It Works
- Scripts
- Dependencies
Project Overview
This project integrates a RAG pipeline and it consists of three main components:
- Data Extraction and Processing: Python scripts that extract content from all markdown files in the
gemini-cli/docsdirectory and sub-directories, process it, and create a vector store. - MCP Server: A Python-based MCP server that exposes the vector store as a queryable tool.
- Gemini CLI/VSCode/ClaudeCode/Windsurf/Cursor...etc: The official Gemini CLI, which can connect to the MCP server to answer questions about its documentation.
Features
- RAG-based Q&A: Ask questions about the Gemini CLI in natural language and get answers based on its official documentation.
- Local Vector Store: The entire documentation is stored and indexed locally using
SKLearnVectorStore. - Extensible: The MCP server can be easily extended with new tools and data sources.
System Architecture
The system is composed of the following parts:
extract.py: This script walks through thegemini-cli/docsdirectory, finds all.mdfiles, and concatenates their content into a singlegemini_cli_docs.txtfile.create_vectorstore.py: This script loads thegemini_cli_docs.txtfile, splits it into chunks, and creates agemini_cli_vectorstore.parquetfile usingHuggingFaceEmbeddingsandSKLearnVectorStore.gemini_cli_mcp.py: This script runs aFastMCPserver that loads the vector store and exposes two endpoints:gemini_cli_query_tool(query: str): A tool that takes a user query, retrieves relevant documents from the vector store, and returns them.docs://gemini-cli/full: A resource that returns the entire content of thegemini_cli_docs.txtfile.
gemini-cli/: The official Gemini CLI, which can be configured to use the MCP server.
Getting Started
Prerequisites
- Python 3.13
- Node.js 18+
- An existing
gemini-cliinstallation. If you don't have it, you can clone the official repository:git clone https://github.com/google-gemini/gemini-cli.git
Installation
-
Clone the repository:
git clone https://github.com/your-username/gemini-cli-rag-mcp.git cd gemini-cli-rag-mcp -
Install Python dependencies:
pip install -r requirements.txt -
Prepare the documentation data: Run the
extract.pyscript to gather all the markdown documentation into a single file.python extract.py -
Create the vector store: Run the
create_vectorstore.pyscript to create the vector store from the documentation file.python create_vectorstore.py
Usage
Before running with docker, try running the mcp in dev mode and test:
mcp dev gemini_cli_mcp.py
On Command field type 'python' and on Arguments type 'gemini_cli_mcp.py' and press Connect.
1. Run the MCP Service with Docker
The most efficient way to run the MCP server is with Docker Compose. This starts a container in the background and keeps it ready for Gemini CLI to connect to.
docker-compose up -d
The container will keep running, but the Python MCP script itself will only be executed on-demand by Gemini CLI.
2. Configure Gemini CLI
To make Gemini CLI aware of your local MCP server, you need to create a configuration file.
-
Inside the
.geminidirectory add the following content to thesettings.jsonfile:{ "mcpServers": { "local_rag_server": { "command": "docker", "args": [ "exec", "-i", "gemini-cli-mcp-container", "python", "gemini_cli_mcp.py" ] } } }
This configuration tells Gemini CLI how to launch your MCP server using docker exec.
Obs: To use it in VSCode, go to Settings type 'mcp' and click on settings.json. Then put on Agent mode and ask copilot to implement the gemini-cli-mcp server (give the json above as context).
3. Ask Questions
After restarting terminal to changes make effect, simply run gemini from your terminal. It will automatically discover the local_rag_server and use its tools when needed.
Example:
How do I customize my gemini-cli?
or something more specific:
My gemini cli is not showing an interactive prompt when I run it on my build server, it just exits. I have a CI_TOKEN environment variable set. Why is this happening and how can I fix it?
How It Works
Data Extraction and Vectorization
The extract.py script recursively finds all markdown files in the gemini-cli/docs directory. It reads their content and combines it into a single text file, gemini_cli_docs.txt.
The create_vectorstore.py script then takes this text file and:
- Loads the document.
- Splits it into smaller, overlapping chunks using
RecursiveCharacterTextSplitter. - Uses
HuggingFaceEmbeddings(with theBAAI/bge-large-en-v1.5model) to create embeddings for each chunk. - Stores these embeddings in a
SKLearnVectorStore, which is persisted togemini_cli_vectorstore.parquet.
MCP Server
The gemini_cli_mcp.py script creates a FastMCP server. This server defines a tool, gemini_cli_query_tool, which can be called by the Gemini CLI or VSCode/Cursor/etc. When this tool is invoked, it:
- Loads the persisted
SKLearnVectorStore. - Uses the vector store as a retriever to find the most relevant document chunks for the given query.
- Returns the content of these chunks to the Gemini CLI.
Gemini CLI Integration
The Gemini CLI is designed to be extensible through MCP servers. The CLI discovers available tools by connecting to servers defined in the mcpServers object in a settings.json file (either in the project's .gemini directory or in the user's home ~/.gemini directory).
Gemini CLI supports three transport mechanisms for communication:
- Stdio Transport: Spawns a subprocess and communicates with it over
stdinandstdout. This is the method used in this project, with thecommandproperty insettings.json. - SSE Transport: Connects to a Server-Sent Events (SSE) endpoint, defined with a
urlproperty. - Streamable HTTP Transport: Uses HTTP streaming for communication, configured with an
httpUrlproperty.
By using the docker exec command, we are leveraging the stdio transport to create a direct communication channel with the Python script inside the container.
Scripts
extract.py: Extracts documentation from markdown files.create_vectorstore.py: Creates the vector store.gemini_cli_mcp.py: Runs the MCP server.
Dependencies
Python
The main Python dependencies are listed in requirements.txt:
langchain: For text splitting, vector stores, and embeddings.tiktoken: For token counting.sentence-transformers: For the embedding model.scikit-learn: For the vector store.mcp: For the MCP server.fastapi: For the MCP server.
Node.js
The project relies on the gemini-cli package and its dependencies. See gemini-cli/package.json for more details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.