
LangExtract MCP Server
A FastMCP server that enables AI assistants to extract structured information from unstructured text using Google's langextract library through a secure, optimized Model Context Protocol interface.
Tools
extract_from_text
Extract structured information from text using langextract. Uses Large Language Models to extract structured information from unstructured text based on user-defined instructions and examples. Each extraction is mapped to its exact location in the source text for precise source grounding. Args: text: The text to extract information from prompt_description: Clear instructions for what to extract examples: List of example extractions to guide the model config: Configuration parameters for the extraction Returns: Dictionary containing extracted entities with source locations and metadata Raises: ToolError: If extraction fails due to invalid parameters or API issues
extract_from_url
Extract structured information from text content at a URL. Downloads text from the specified URL and extracts structured information using Large Language Models. Ideal for processing web articles, documents, or any text content accessible via HTTP/HTTPS. Args: url: URL to download text from (must start with http:// or https://) prompt_description: Clear instructions for what to extract examples: List of example extractions to guide the model config: Configuration parameters for the extraction Returns: Dictionary containing extracted entities with source locations and metadata Raises: ToolError: If URL is invalid, download fails, or extraction fails
save_extraction_results
Save extraction results to a JSONL file for later use or visualization. Saves the extraction results in JSONL (JSON Lines) format, which is commonly used for structured data and can be loaded for visualization or further processing. Args: extraction_results: Results from extract_from_text or extract_from_url output_name: Name for the output file (without .jsonl extension) output_dir: Directory to save the file (default: current directory) Returns: Dictionary with file path and save confirmation Raises: ToolError: If save operation fails
generate_visualization
Generate interactive HTML visualization from extraction results. Creates an interactive HTML file that shows extracted entities highlighted in their original text context. The visualization is self-contained and can handle thousands of entities with color coding and hover details. Args: jsonl_file_path: Path to the JSONL file containing extraction results output_html_path: Optional path for the HTML output (default: auto-generated) Returns: Dictionary with HTML file path and generation details Raises: ToolError: If visualization generation fails
list_supported_models
List all supported language models and their characteristics. This server currently supports Google Gemini models only, optimized for reliable structured extraction with schema constraints. Returns: Dictionary containing model information and recommendations
get_server_info
Get information about the LangExtract MCP server. Returns server version, capabilities, and configuration information. Returns: Dictionary containing server information and capabilities
README
LangExtract MCP Server
A FastMCP server that provides Model Context Protocol (MCP) tools for Google's langextract library. This server enables AI assistants like Claude Code to extract structured information from unstructured text using Large Language Models through a secure, optimized MCP interface.
Overview
LangExtract is a Python library that uses LLMs to extract structured information from text documents while maintaining precise source grounding. This MCP server exposes langextract's capabilities through the Model Context Protocol with advanced performance optimizations and enterprise-grade security.
The server includes intelligent caching, persistent connections, and server-side credential management to provide optimal performance in long-running environments like Claude Code while maintaining complete security isolation.
Quick Setup for Claude Code
Prerequisites
- Claude Code installed and configured
- Google Gemini API key (Get one here)
- Python 3.10 or higher
Installation
Install directly into Claude Code using the built-in MCP management:
claude mcp add langextract-mcp -e LANGEXTRACT_API_KEY=your-gemini-api-key -- uv run --with fastmcp fastmcp run src/langextract_mcp/server.py
The server will automatically start and integrate with Claude Code. No additional configuration is required.
Verification
After installation, verify the integration by asking Claude Code:
Use the get_server_info tool to show the LangExtract server capabilities
You should see output indicating the server is running with optimization features enabled.
Available Tools
The server provides six MCP tools optimized for text extraction workflows:
Core Extraction
extract_from_text
- Extract structured information from provided textextract_from_url
- Extract information from web contentsave_extraction_results
- Save results to JSONL formatgenerate_visualization
- Create interactive HTML visualizations
Server Information
list_supported_models
- View available language models and recommendationsget_server_info
- Check server status and capabilities
Usage Examples
Basic Text Extraction
Ask Claude Code to extract information using natural language:
Extract medication information from this text: "Patient prescribed 500mg amoxicillin twice daily for infection"
Use these examples to guide the extraction:
- Text: "Take 250mg ibuprofen every 4 hours"
- Expected: medication=ibuprofen, dosage=250mg, frequency=every 4 hours
Advanced Configuration
For complex extractions, specify configuration parameters:
Extract character emotions from Shakespeare using:
- Model: gemini-2.5-pro for better literary analysis
- Multiple passes: 3 for comprehensive extraction
- Temperature: 0.2 for consistent results
URL Processing
Extract information directly from web content:
Extract key findings from this research paper: https://arxiv.org/abs/example
Focus on methodology, results, and conclusions
Supported Models
This server currently supports Google Gemini models only, optimized for reliable structured extraction with advanced schema constraints:
gemini-2.5-flash
- Recommended default - Optimal balance of speed, cost, and qualitygemini-2.5-pro
- Best for complex reasoning and analysis tasks requiring highest accuracy
The server uses persistent connections, schema caching, and connection pooling for optimal performance with Gemini models. Support for additional providers may be added in future versions.
Configuration Reference
Environment Variables
Set during installation or in server environment:
LANGEXTRACT_API_KEY=your-gemini-api-key # Required
Tool Parameters
Configure extraction behavior through tool parameters:
{
"model_id": "gemini-2.5-flash", # Language model selection
"max_char_buffer": 1000, # Text chunk size
"temperature": 0.5, # Sampling temperature (0.0-1.0)
"extraction_passes": 1, # Number of extraction attempts
"max_workers": 10 # Parallel processing threads
}
Output Format
All extractions return consistent structured data:
{
"document_id": "doc_123",
"total_extractions": 5,
"extractions": [
{
"extraction_class": "medication",
"extraction_text": "amoxicillin",
"attributes": {"type": "antibiotic"},
"start_char": 25,
"end_char": 35
}
],
"metadata": {
"model_id": "gemini-2.5-flash",
"extraction_passes": 1,
"temperature": 0.5
}
}
Use Cases
LangExtract MCP Server supports a wide range of use cases across multiple domains. In healthcare and life sciences, it can extract medications, dosages, and treatment protocols from clinical notes, structure radiology and pathology reports, and process research papers or clinical trial data. For legal and compliance applications, it enables extraction of contract terms, parties, and obligations, as well as analysis of regulatory documents, compliance reports, and case law. In research and academia, the server is useful for extracting methodologies, findings, and citations from papers, analyzing survey responses and interview transcripts, and processing historical or archival materials. For business intelligence, it helps extract insights from customer feedback and reviews, analyze news articles and market reports, and process financial documents and earnings reports.
Support and Documentation
Primary Resources:
- LangExtract Documentation - Core library reference
- FastMCP Documentation - MCP server framework
- Model Context Protocol - Protocol specification
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.