MCP-based Knowledge Graph Construction System

MCP-based Knowledge Graph Construction System

An automated system that processes raw text through a three-stage pipeline to assess data quality, enhance information, and generate structured knowledge graphs. It provides tools for triple extraction with confidence scoring and creates interactive HTML visualizations of the resulting graph.

Category
Visit Server

README

MCP-based Knowledge Graph Construction System

A fully automated knowledge graph construction system built on the Model Context Protocol (MCP), implementing a sophisticated 3-stage data processing pipeline for intelligent knowledge extraction and graph generation.

Overview

This project implements an advanced knowledge graph construction system that automatically processes raw text data through three intelligent stages:

  1. Data Quality Assessment - Evaluates completeness, consistency, and relevance
  2. Knowledge Completion - Enhances low-quality data using LLM and external knowledge bases
  3. Knowledge Graph Construction - Builds structured knowledge graphs with confidence scoring

The system is built on the MCP (Model Context Protocol) architecture, providing a clean client-server interface for seamless integration and scalability.

Key Features

Fully Automated Processing

  • Zero Manual Intervention: Automatically detects data quality and processing needs
  • Intelligent Pipeline: Adapts processing strategy based on input data characteristics
  • Real-time Processing: Immediate knowledge graph generation from raw text

3-Stage Processing Pipeline

Stage 1: Data Quality Assessment

  • Completeness Analysis: Evaluates entity and relationship coverage
  • Consistency Checking: Detects semantic conflicts and contradictions
  • Relevance Scoring: Assesses information relevance and meaningfulness
  • Quality Threshold: Automatically determines if data needs enhancement

Stage 2: Knowledge Completion (for low-quality data)

  • Entity Enhancement: Completes missing entity information
  • Relationship Inference: Adds missing relationships between entities
  • Conflict Resolution: Corrects semantic inconsistencies
  • Format Normalization: Standardizes data format and structure
  • Implicit Knowledge Inference: Extracts hidden knowledge patterns

Stage 3: Knowledge Graph Construction

  • Rule-based Extraction: Fast, deterministic triple generation
  • LLM-enhanced Processing: Advanced semantic understanding and relationship inference
  • Confidence Scoring: Assigns reliability scores to extracted knowledge
  • Interactive Visualization: Generates beautiful HTML visualizations

MCP Architecture

  • Client-Server Design: Clean separation of concerns
  • Standardized Protocol: Built on MCP for interoperability
  • Tool-based Interface: Modular, extensible tool system
  • Async Processing: High-performance asynchronous operations

Requirements

  • Python: 3.11 or higher
  • UV Package Manager: For dependency management
  • OpenAI-compatible API: For LLM integration (DeepSeek, OpenAI, etc.)

Quick Start

1. Clone and Setup

git clone https://github.com/turambar928/MCP_based_KG_construction.git
cd MCP_based_KG_construction

# Install dependencies
uv sync

2. Environment Configuration

Create a .env file with your API configuration:

OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.siliconflow.cn/v1  # or your preferred endpoint
OPENAI_MODEL=Qwen/QwQ-32B                      # or your preferred model

Supported API Providers:

  • OpenAI: https://api.openai.com/v1
  • DeepSeek: https://api.deepseek.com
  • SiliconFlow: https://api.siliconflow.cn/v1
  • Any OpenAI-compatible endpoint

3. Start the MCP Server

uv run kg_server.py

The server will start and listen for MCP client connections.

4. Running Tests

There are three ways to test the system:

a. Using MCP Inspector

npx -y @modelcontextprotocol/inspector uv run kg_server.py

After running this command, click the link that appears after "MCP Inspector is up and running at" to open the MCP Inspector in your browser. Once opened:

  1. Click "Connect"
  2. Select "Tools" from the top menu
  3. Choose "build_knowledge_graph" from the list tools
  4. Enter your text in the left panel to generate the knowledge graph

MCP Inspector

b. Using Client Code

uv run kg_client.py

After the connection is successful, enter your text to view the results.

Client Code Execution

c. Using Mainstream MCP Tools (Cursor, Cherry Studio, etc.)

Example: Running in Cherry Studio

In settings, select MCP servers, click "Add Server" (import from JSON). Here's the configuration JSON (make sure to modify the local path):

{
  "mcpServers": {
    "kg_server": {
      "command": "uv",
      "args": [
        "--directory",
        "D:/mcp_getting_started",
        "run",
        "kg_server.py"
      ],
      "env": {},
      "disabled": false,
      "autoApprove": []
    }
  }
}

After enabling this MCP server, you can use it in Cherry Studio.

Using in Cherry Studio

🛠️ Usage Guide

Interactive Client Commands

Once the client is running, you can use these commands:

# Build knowledge graph from text
build <your_text_here>

# Example usage
build 北京大学是中国著名的高等教育机构,位于北京市海淀区

# Run demonstration examples
demo

# Exit the client
quit

Programmatic Usage

from kg_client import KnowledgeGraphClient

async def main():
    client = KnowledgeGraphClient()
    await client.connect_to_server()

    # Build knowledge graph
    result = await client.build_knowledge_graph(
        "苹果公司的CEO是蒂姆·库克",
        output_file="my_graph.html"
    )

    print(f"Generated graph: {result}")
    await client.cleanup()

Example Outputs

High-Quality Input

Input: "北京大学是中国著名的高等教育机构,位于北京市海淀区。"
Processing: Direct Stage 3 (high quality detected)
Output:
- Entities: [北京大学, 中国, 高等教育机构, 北京市, 海淀区]
- Triples: [(北京大学, 是, 高等教育机构), (北京大学, 位于, 海淀区), ...]
- Visualization: Interactive HTML graph

Low-Quality Input (Incomplete)

Input: "李华去巴黎"
Processing:
- Stage 1: Detects incomplete information
- Stage 2: Enhances with "巴黎位于法国", "李华是人"
- Stage 3: Builds enhanced knowledge graph
Output: Enriched knowledge graph with inferred relationships

Low-Quality Input (Conflicting)

Input: "巴黎市是德国城市。"
Processing:
- Stage 1: Detects semantic conflict
- Stage 2: Corrects to "巴黎是法国城市"
- Stage 3: Builds corrected knowledge graph
Output: Corrected and enhanced knowledge graph

MCP Tools API

The system exposes the following MCP tools for integration:

build_knowledge_graph

Description: Complete pipeline for knowledge graph construction with automatic quality assessment and enhancement.

Parameters:

  • text (string): Input text to process
  • output_file (string, optional): HTML visualization output filename (default: "knowledge_graph.html")

Returns: JSON object containing:

  • success (boolean): Processing success status
  • entities (array): Extracted entities
  • triples (array): Generated knowledge triples
  • confidence_scores (array): Confidence scores for each triple
  • visualization_file (string): Path to generated HTML visualization
  • processing_stages (object): Details of each processing stage

Example:

{
  "success": true,
  "entities": ["北京大学", "中国", "高等教育机构"],
  "triples": [
    {
      "subject": "北京大学",
      "predicate": "是",
      "object": "高等教育机构",
      "confidence": 0.95
    }
  ],
  "visualization_file": "knowledge_graph.html"
}

Project Structure

├── kg_server.py              # Main MCP server implementation
├── kg_client.py              # Interactive client for testing
├── kg_utils.py               # Core knowledge graph construction utilities
├── kg_visualizer.py          # HTML visualization generator
├── data_quality.py           # Stage 1: Data quality assessment
├── knowledge_completion.py   # Stage 2: Knowledge completion and enhancement
├── pyproject.toml            # Project dependencies and configuration
├── .env                      # Environment variables (API keys)
└── README.md                 # This file

Core Components

  • kg_server.py: MCP server that orchestrates the 3-stage pipeline
  • kg_client.py: Command-line client for interactive testing and batch processing
  • kg_utils.py: Knowledge graph construction engine with rule-based and LLM-enhanced extraction
  • kg_visualizer.py: Generates interactive HTML visualizations using Plotly
  • data_quality.py: Implements quality assessment algorithms for completeness, consistency, and relevance
  • knowledge_completion.py: Handles knowledge enhancement and conflict resolution

Advanced Features

Quality Assessment Metrics

  • Completeness Score: Based on entity coverage and relationship density
  • Consistency Score: Detects semantic conflicts and contradictions
  • Relevance Score: Evaluates information meaningfulness
  • Composite Quality Score: Weighted combination of all metrics

Knowledge Enhancement Strategies

  • Entity Completion: Adds missing entity attributes and types
  • Relationship Inference: Discovers implicit relationships
  • Conflict Resolution: Corrects factual inconsistencies
  • Format Normalization: Standardizes entity and relationship representations

Visualization Features

  • Interactive Network Graph: Clickable nodes and edges
  • Entity Clustering: Groups related entities by type
  • Confidence Visualization: Color-coded confidence levels
  • Export Options: HTML, PNG, SVG formats

Technical Details

Processing Pipeline

  1. Input Validation: Checks text format and encoding
  2. Quality Assessment: Multi-dimensional quality scoring
  3. Conditional Enhancement: Applies enhancement only when needed
  4. Graph Construction: Rule-based + LLM hybrid approach
  5. Confidence Calculation: Bayesian confidence scoring
  6. Visualization Generation: Interactive HTML output

Performance Characteristics

  • Processing Speed: ~1-3 seconds per text input
  • Memory Usage: ~50-100MB for typical workloads
  • Scalability: Async architecture supports concurrent processing
  • Accuracy: 85-95% entity extraction, 80-90% relationship accuracy

Development

Running Tests

Refer to the "Running Tests" section above for three different testing methods:

  • MCP Inspector (recommended for visual testing)
  • Client code (for programmatic testing)
  • Mainstream MCP tools (for integration testing)
# Quick test with demonstration examples
uv run kg_client.py
# Then type: demo

# Test with custom input
uv run kg_client.py "Your test text here"

Adding New Features

  1. Custom Quality Metrics: Extend data_quality.py
  2. New Enhancement Strategies: Modify knowledge_completion.py
  3. Additional Visualization: Enhance kg_visualizer.py
  4. New MCP Tools: Add tools to kg_server.py

Configuration Options

Environment variables in .env:

# Required
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_api_endpoint
OPENAI_MODEL=your_model_name

# Optional
QUALITY_THRESHOLD=0.5          # Quality threshold for enhancement
MAX_ENTITIES=50                # Maximum entities per graph
VISUALIZATION_WIDTH=1200       # HTML visualization width
VISUALIZATION_HEIGHT=800       # HTML visualization height

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Submit a pull request with detailed description

Troubleshooting

Common Issues

  1. Port Occupation Error

    # Find process using the port
    netstat -ano | findstr :6277
    # Kill the process
    taskkill /PID <process_id> /F
    
  2. API Balance Insufficient

    • Check API configuration in .env file
    • Ensure API account has sufficient balance
  3. Dependency Installation Issues

    uv sync --reinstall
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

For questions, issues, or contributions:

  • 📧 Email: tzf9282003@163.com
  • 🐛 Issues: GitHub Issues
  • 📖 Documentation: See KNOWLEDGE_GRAPH_README.md for detailed technical documentation

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured