
LlamaCloud MCP Server
A local MCP server that integrates with Claude Desktop, enabling RAG capabilities to provide Claude with up-to-date private information from custom LlamaCloud indices.
README
LlamaIndex MCP demos
This repo demonstrates both how to create an MCP server using LlamaCloud and how to use LlamaIndex as an MCP client.
LlamaCloud as an MCP server
To provide a local MCP server that can be used by a client like Claude Desktop, you can use mcp-server.py
. You can use this to provide a tool that will use RAG to provide Claude with up-to-the-second private information that it can use to answer questions. You can provide as many of these tools as you want.
Set up your LlamaCloud index
- Get a LlamaCloud account
- Create a new index with any data source you want. In our case we used Google Drive and provided a subset of the LlamaIndex documentation as a source. You could also upload documents directly to the index if you just want to test it out.
- Get an API key from the LlamaCloud UI
Set up your MCP server
- Clone this repository
- Create a
.env
file and add two environment variables:LLAMA_CLOUD_API_KEY
- The API key you got in the previous stepOPENAI_API_KEY
- An OpenAI API key. This is used to power the RAG query. You can use any other LLM if you don't want to use OpenAI.
Now let's look at the code. First you instantiate an MCP server:
mcp = FastMCP('llama-index-server')
Then you define your tool using the @mcp.tool()
decorator:
@mcp.tool()
def llama_index_documentation(query: str) -> str:
"""Search the llama-index documentation for the given query."""
index = LlamaCloudIndex(
name="mcp-demo-2",
project_name="Rando project",
organization_id="e793a802-cb91-4e6a-bd49-61d0ba2ac5f9",
api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
)
response = index.as_query_engine().query(query + " Be verbose and include code examples.")
return str(response)
Here our tool is called llama_index_documentation
; it instantiates a LlamaCloud index called mcp-demo-2
and then uses it as a query engine to answer the query, including some extra instructions in the prompt. You'll get instructions on how to set up your LlamaCloud index in the next section.
Finally, you run the server:
if __name__ == "__main__":
mcp.run(transport="stdio")
Note the stdio
transport, used for communicating to Claude Desktop.
Configure Claude Desktop
- Install Claude Desktop
- In the menu bar choose
Claude
->Settings
->Developer
->Edit Config
. This will show up a config file that you can edit in your preferred text editor. - You'll want your config to look something like this (make sure to replace
$YOURPATH
with the path to the repository):
{
"mcpServers": {
"llama_index_docs_server": {
"command": "poetry",
"args": [
"--directory",
"$YOURPATH/llamacloud-mcp",
"run",
"python",
"$YOURPATH/llamacloud-mcp/mcp-server.py"
]
}
}
}
Make sure to restart Claude Desktop after configuring the file.
Now you're ready to query! You should see a tool icon with your server listed underneath the query box in Claude Desktop, like this:
LlamaIndex as an MCP client
LlamaIndex also has an MCP client integration, meaning you can turn any MCP server into a set of tools that can be used by an agent. You can see this in mcp-client.py
, where we use the BasicMCPClient
to connect to our local MCP server.
For simplicity of demo, we are using the same MCP server we just set up above. Ordinarily, you would not use MCP to connect LlamaCloud to a LlamaIndex agent, you would use QueryEngineTool and pass it directly to the agent.
Set up your MCP server
To provide a local MCP server that can be used by an HTTP client, we need to slightly modify mcp-server.py
to use the run_sse_async
method instead of run
. You can find this in mcp-http-server.py
.
mcp = FastMCP('llama-index-server',port=8000)
asyncio.run(mcp.run_sse_async())
Get your tools from the MCP server
mcp_client = BasicMCPClient("http://localhost:8000/sse")
mcp_tool_spec = McpToolSpec(
client=mcp_client,
# Optional: Filter the tools by name
# allowed_tools=["tool1", "tool2"],
)
tools = mcp_tool_spec.to_tool_list()
Create an agent and ask a question
llm = OpenAI(model="gpt-4o-mini")
agent = FunctionAgent(
tools=tools,
llm=llm,
system_prompt="You are an agent that knows how to build agents in LlamaIndex.",
)
async def run_agent():
response = await agent.run("How do I instantiate an agent in LlamaIndex?")
print(response)
if __name__ == "__main__":
asyncio.run(run_agent())
You're all set! You can now use the agent to answer questions from your LlamaCloud index.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.