Tagging MCP
Enables parallel tagging and classification of CSV data using multiple LLM providers with structured output, confidence scores, and optional reasoning for batch classification tasks.
README
Tagging MCP
MCP server for tagging CSV rows using polar_llama with parallel LLM inference.
Overview
This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.
Features
- Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
- Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
- Structured Output: Uses Pydantic models for consistent, type-safe results
- Flexible Taxonomy: Define custom tag lists for your use case
- Optional Reasoning: Include confidence levels and explanations for tags
Installation
Prerequisites
- Python 3.12+
- UV package manager
- API key for at least one LLM provider
Environment Setup
- Clone this repository
- Create a
.envfile with your API keys:ANTHROPIC_API_KEY=your_key_here OPENAI_API_KEY=your_key_here GEMINI_API_KEY=your_key_here GROQ_API_KEY=your_key_here
Claude Desktop Configuration
Option 1: Local Development (Recommended)
Run directly without containers:
{
"mcpServers": {
"tagging-mcp": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/tagging_mcp/tagging.py"]
}
}
}
Option 2: Container Deployment
-
Build the container:
container build -t tagging_mcp . -
Configure Claude Desktop:
{ "mcpServers": { "tagging-mcp": { "command": "container", "args": ["run", "--interactive", "tagging_mcp"] } } }
Available Tools
tag_csv
Simple tagging with a list of categories. Perfect for basic classification tasks.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])text_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning and reflection (default: false)field_name(str, optional): Name for the classification field (default: "category")
Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors
tag_csv_advanced
Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(Dict): Full taxonomy dictionary with field definitions and value descriptionstext_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning (default: false)
Example Taxonomy:
{
"sentiment": {
"description": "The emotional tone of the text",
"values": {
"positive": "Text expresses positive emotions or favorable opinions",
"negative": "Text expresses negative emotions or unfavorable opinions",
"neutral": "Text is factual and objective"
}
},
"urgency": {
"description": "How urgent the content is",
"values": {
"high": "Requires immediate attention",
"medium": "Should be addressed soon",
"low": "Can be addressed at any time"
}
}
}
Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning
preview_csv
Preview the first few rows of a CSV file to understand its structure.
Parameters:
csv_path(str): Path to the CSV filerows(int, optional): Number of rows to preview (default: 5)
Returns: Dictionary with columns, row count, and preview data
get_tagging_info
Get information about the tagging MCP server and supported providers.
Returns: Server metadata, supported providers, features, and available tools
Example Usage
Basic Tagging
-
Preview your CSV:
Use preview_csv with csv_path="/path/to/data.csv" -
Simple category tagging:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["technology", "business", "science", "politics"] - text_column="description" - output_path="/path/to/tagged_output.csv" -
Include reasoning for transparency:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["urgent", "normal", "low_priority"] - field_name="priority" - include_reasoning=true
Advanced Multi-Field Tagging
For complex classification with multiple dimensions:
Use tag_csv_advanced with:
- csv_path="/path/to/support_tickets.csv"
- taxonomy={
"department": {
"description": "Which department should handle this",
"values": {
"sales": "Product inquiries and purchases",
"support": "Technical issues and bugs",
"billing": "Payment and account questions"
}
},
"priority": {
"description": "How urgent this is",
"values": {
"urgent": "Service down or critical issue",
"high": "Significant problem",
"normal": "Standard request"
}
}
}
- text_column="ticket_description"
- output_path="/path/to/classified_tickets.csv"
Output Structure
Basic Tagging Output
- Original CSV columns
{field_name}: The selected tagconfidence: Confidence score (0.0 to 1.0)thinking: Reasoning for each possible value (ifinclude_reasoning=true)reflection: Overall analysis (ifinclude_reasoning=true)
Advanced Tagging Output
- Original CSV columns
- For each taxonomy field:
{field_name}: Selected value{field_name}_confidence: Confidence score{field_name}_thinking: Reasoning dict (if enabled){field_name}_reflection: Analysis (if enabled)
Supported LLM Providers
- Groq (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768
- Claude (Anthropic): claude-3-5-sonnet-20241022, claude-3-opus-20240229
- OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
- Gemini: gemini-1.5-pro, gemini-1.5-flash
- AWS Bedrock: anthropic.claude-3-sonnet, anthropic.claude-3-haiku
Key Features
✨ Detailed Reasoning: For each tag, see why the model chose it 🔍 Reflection: Model reflects on its analysis 📊 Confidence Scores: Know how confident each classification is (0.0-1.0) ⚡ Parallel Processing: All rows processed concurrently 🎯 Error Detection: Automatic error tracking and reporting 🔧 Flexible: Simple list or complex multi-field taxonomies
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.