LLM Graph Builder MCP
Enables Claude to automatically extract entities and relationships from URLs, PDFs, and YouTube videos to build structured knowledge graphs in Neo4j. It supports custom schemas, academic citation extraction, and community detection for advanced research and content analysis.
README
LLM Graph Builder MCP
Build knowledge graphs from any URL using Claude Desktop and Neo4j.
What is this?
This Model Context Protocol (MCP) server enables Claude to automatically extract entities and relationships from unstructured text and build knowledge graphs in Neo4j. Simply give Claude a URL (Wikipedia article, PDF, web page, YouTube video) and ask it to build a knowledge graph - it handles the rest.
Perfect for: Research, Zotero integrations, academic papers, content analysis, and building structured knowledge from unstructured sources.
What's Included
This repository is a complete, ready-to-use package containing:
- llm_graph_builder_mcp/ - The MCP server code
- llm-graph-builder/ - Neo4j's LLM Graph Builder backend (June 24, 2025, commit 4d7bb5e8)
Both are included so you get a tested, working version out of the box. Just clone once and you're ready to go!
Why include the backend?
- Guaranteed compatibility - this MCP is tested with this exact backend version
- Zero configuration headaches - everything just works together
- If Neo4j updates their backend, you still have a working version
Features
- Multi-source support: Wikipedia, PDFs, web pages, YouTube videos
- Academic mode: Extract citations, authors, journals, and bibliographic data
- Custom schemas: Define allowed entity types and relationships
- Community detection: Find clusters and groups in your knowledge graph
- Zero modifications: Works with unmodified llm-graph-builder backend
- Local processing: Your data, your Neo4j instance, your control
Quick Start
Prerequisites
- Neo4j database - Get a free instance at Neo4j AuraDB
- Create an instance and note your connection URI, username, and password
- OpenAI API key - Get one here
- Python 3.10+ with
uv- Install uv - Claude Desktop - Download here
Step 1: Clone This Repository
# Clone the entire project (includes both MCP and backend)
git clone https://github.com/henrardo/llm-graph-builder-mcp.git
cd llm-graph-builder-mcp
Your directory structure will be:
llm-graph-builder-mcp/ # The MCP server
llm-graph-builder/ # The backend (included)
Step 2: Set Up the Backend
# Navigate to backend
cd llm-graph-builder/backend
# Create environment file
cp example.env .env
Edit .env with your credentials:
# Neo4j Connection (from your AuraDB instance)
NEO4J_URI=neo4j+s://your-instance-id.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-auradb-password
NEO4J_DATABASE=neo4j
# OpenAI Configuration
LLM_MODEL_CONFIG_openai_gpt_4.1=gpt-4-turbo-2024-04-09,sk-your-openai-api-key
Install and start the backend:
# Create virtual environment
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt
# Start the backend server
uvicorn score:app --reload --port 8000
Keep this terminal running. The backend must be running for the MCP to work.
Step 3: Install the MCP
Open a new terminal (keep the backend running in the first one):
# Navigate back to the MCP directory
cd llm-graph-builder-mcp
# Install the MCP
uvx --from . llm-graph-builder-mcp
Step 4: Configure Claude Desktop
Edit your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add this configuration:
{
"mcpServers": {
"llm-graph-builder": {
"command": "uvx",
"args": [
"--from",
"/absolute/path/to/llm-graph-builder-mcp",
"llm-graph-builder-mcp"
],
"env": {
"NEO4J_URI": "neo4j+s://your-instance-id.databases.neo4j.io",
"NEO4J_USERNAME": "neo4j",
"NEO4J_PASSWORD": "your-auradb-password",
"NEO4J_DATABASE": "neo4j",
"GRAPH_BUILDER_URL": "http://localhost:8000"
}
}
}
}
Important:
- Replace
/absolute/path/to/with the full path to yourllm-graph-builder-mcpdirectory- Run
pwdin thellm-graph-builder-mcpdirectory to get this path - Example:
/Users/yourname/projects/llm-graph-builder-mcp
- Run
- Use the same credentials as in your backend
.envfile
Step 5: Restart Claude Desktop
Completely quit and restart Claude Desktop for the changes to take effect.
Step 6: Test It
In Claude Desktop, try:
Build a knowledge graph from this Wikipedia article:
https://en.wikipedia.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy
Claude should now use the MCP to build a knowledge graph in your Neo4j database!
Usage Examples
Basic Usage
Build a knowledge graph from this Wikipedia article:
https://en.wikipedia.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy
Academic Papers (with citations)
Build a knowledge graph from this PDF with bibliographic extraction:
https://example.com/research-paper.pdf
Custom Schema
Build a knowledge graph from this article with these entities:
- Nodes: Person, Organization, Location, Event
- Relationships: Person WORKS_FOR Organization, Person ATTENDED Event
https://example.com/article
With Community Detection
Build a knowledge graph from this page and enable community detection:
https://en.wikipedia.org/wiki/Renaissance
Querying the Graph
This MCP builds graphs. To query them, use the separate mcp-neo4j-cypher server.
After building a graph, ask Claude:
"What entities are connected to Arthur Dent?"
"Show me all the citations in my research papers"
"Find communities in the knowledge graph"
Tool Reference
build_knowledge_graph_from_url
Extracts entities and relationships from a URL and builds a knowledge graph.
Parameters:
url(required): URL to process (Wikipedia, PDF, web page, YouTube)model(optional): LLM model to use (default:openai_gpt_4.1)allowed_nodes(optional): Comma-separated entity types (e.g.,"Person,Organization,Location")allowed_relationships(optional): Relationship triples (e.g.,"Person,WORKS_FOR,Organization")enable_communities(optional): Enable community detection (default:false)extract_bibliographic_info(optional): Extract academic citations and references (default:false)
Supported Sources
| Type | Example | Notes |
|---|---|---|
| Wikipedia | https://en.wikipedia.org/wiki/... |
Any language supported |
| PDF URLs | https://example.com/paper.pdf |
Full text extraction |
| Web pages | https://example.com/article |
Any accessible page |
| YouTube | https://www.youtube.com/watch?v=... |
Extracts from transcript |
Architecture
Claude Desktop
↓ MCP Protocol
llm-graph-builder-mcp (this repo)
↓ HTTP
llm-graph-builder backend (FastAPI)
↓ Cypher
Neo4j Database
↑ Cypher Queries
mcp-neo4j-cypher (separate MCP)
↑ MCP Protocol
Claude Desktop
Research & Zotero Integration
This MCP is perfect for academic research workflows:
- Export PDF URLs from your Zotero library
- Ask Claude to process them with bibliographic extraction
- Query relationships between papers, authors, and concepts
- Discover connections in your research
Example:
"Build knowledge graphs from these papers with bibliographic extraction:
- https://paper1.pdf
- https://paper2.pdf
- https://paper3.pdf
Then show me how they cite each other and what common themes they share."
Backend Version & Updates
This repository includes llm-graph-builder from June 24, 2025 (commit 4d7bb5e8). This version is tested and fully compatible with the MCP.
Using the Included Backend (Recommended)
The included backend is frozen at a known-good version. This ensures:
- Everything works out of the box
- No compatibility issues
- Predictable behavior
Using a Newer Backend Version
If you want to use the latest llm-graph-builder:
# Remove the included backend
rm -rf llm-graph-builder
# Clone the latest version
git clone https://github.com/neo4j-labs/llm-graph-builder.git
# Follow the same setup steps in Step 2
Note: Newer versions should work (the MCP uses standard endpoints), but haven't been tested. If you encounter issues, revert to the included version.
Troubleshooting
Backend won't start
cd llm-graph-builder/backend
source .venv/bin/activate
uvicorn score:app --reload --port 8000
Claude doesn't see the MCP
- Check config path is correct (use absolute path, not
~) - Completely quit and restart Claude Desktop (not just close the window)
- Check Claude logs:
~/Library/Logs/Claude/mcp*.log(macOS) - Verify the MCP path in config matches your actual directory
"Model not found" error
Make sure your backend .env has:
LLM_MODEL_CONFIG_openai_gpt_4.1=gpt-4-turbo-2024-04-09,YOUR-API-KEY
Backend shows "Connection refused"
- Ensure the backend is running on port 8000
- Check
GRAPH_BUILDER_URLin Claude config matches the backend URL - Backend must be running before you use the MCP
Empty graph / few entities
- Enable
extract_bibliographic_infofor academic papers - Check OpenAI API key is valid and has credits
- Verify Neo4j connection in backend
.env - For PDFs: URL must be directly accessible (no authentication required)
Cache issues after updates
# Clear uvx cache
uv cache clean llm-graph-builder-mcp --force
# Completely quit and restart Claude Desktop
Development
# Install in development mode
git clone https://github.com/henrardo/llm-graph-builder-mcp.git
cd llm-graph-builder-mcp
uv pip install -e .
How It Works
PDF URLs
The MCP automatically detects PDF URLs, downloads them, and uploads to the backend for full-text extraction using PyMuPDF. No binary garbage, just clean text.
Academic Extraction
When extract_bibliographic_info=true, the MCP instructs the LLM to specifically extract:
- Authors, titles, journals, years, DOIs
- Citations and references
- Research concepts and methods
- Relationships: AUTHORED, CITES, PUBLISHED_IN, DISCUSSES
Schema Specification
Define allowed entities and relationships to guide extraction:
allowed_nodes: "Person,Organization,Product"
allowed_relationships: "Person,FOUNDED,Organization,Organization,PRODUCES,Product"
Zero Backend Modifications
This MCP works with the unmodified llm-graph-builder backend. It uses compatibility tricks (like sending a space character for optional parameters) to work seamlessly with the original code.
Security
Never commit:
- API keys (OpenAI, etc.)
- Database passwords
- Real Neo4j URIs
All credentials should be in .env files or Claude Desktop config (both gitignored).
License
Apache License 2.0 - see LICENSE file for details.
This project includes the Neo4j LLM Graph Builder, which is also licensed under Apache License 2.0.
Contributing
Contributions welcome! This project aims to be a clean wrapper with zero backend modifications required.
Open an issue or pull request on GitHub.
Credits
- Neo4j LLM Graph Builder by Neo4j, Inc.
- FastMCP by Marvin
- Model Context Protocol by Anthropic
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.