SegmentMCP
Transforms natural language queries into SQL for customer segmentation, enabling democratized access to customer data insights.
README
SegmentMCP - AI-Powered Customer Segmentation Server
π― Overview
SegmentMCP is an intelligent Model Context Protocol (MCP) server that transforms natural language queries into actionable customer segments. It bridges the gap between business stakeholders who think in plain English and technical systems that require structured SQL queries, enabling democratized access to customer data insights.
π Problem Statement
The Challenge
Modern businesses struggle with customer segmentation due to several key barriers:
- Technical Complexity: Marketing teams need SQL knowledge to create customer segments
- Time-to-Insight: Manual query writing and validation takes hours or days
- Error-Prone Process: Hand-written SQL queries often contain syntax errors or logical mistakes
- Limited Accessibility: Only technical users can create and modify customer segments
- Inconsistent Results: Different team members create different queries for similar business requirements
The Solution
SegmentMCP eliminates these barriers by providing:
- Natural Language Interface: "Find married customers over 30 with housing loans"
- Automated SQL Generation: AI-powered query creation with optimization
- Built-in Validation: Automatic query testing and error detection
- Processing Transparency: Complete breakdown of all processing steps in response
- Integration Ready Architecture: Framework for connecting to downstream systems
ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Client βββββΆβ MCP Server βββββΆβ Kaggle Dataset β
β (Claude/Custom) β β (FastMCP) β β (45K+ records) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Agent Pipeline β
β β
β 1. Intent Parser β β GPT-4.1
β 2. Data Mapper β β Rule-based
β 3. Query Gen β β GPT-4.1
β 4. Validator β β Rule-based
β 5. Activator β β Simulation
ββββββββββββββββββββ
Agent-Based Processing Pipeline
- Intent Parser Agent - Converts natural language to structured criteria using GPT-4.1
- Data Mapper Agent - Maps business terms to database schema fields
- Query Generator Agent - Creates optimized SQL queries with AI assistance
- Validation Agent - Tests queries for syntax, performance, and safety
- Activation Agent - Executes segments and integrates with downstream systems
π Project Structure
SegmentMCP/
βββ agents/ # AI processing agents
β βββ intent_parser.py # Natural language β criteria
β βββ data_mapper.py # Business terms β DB fields
β βββ query_generator.py # Criteria β SQL
β βββ validation_agent.py # SQL validation & testing
β βββ activation_agent.py # Segment execution
βββ database/
β βββ kaggle_connector.py # Dataset management
βββ models/
β βββ schemas.py # Pydantic data models
βββ data/ # Dataset storage
β βββ bank-full.csv # Bank customer dataset
β βββ bank_deposit.db # SQLite database
βββ main.py # Core MCP server
βββ demo_server.py # Demo mode (no OpenAI)
βββ http_server.py # Direct HTTP API
βββ http_wrapper.py # MCP protocol wrapper
βββ demo_http_wrapper.py # Demo HTTP wrapper
βββ config.py # Configuration management
βββ generate_claude_config.py # Claude Desktop setup
βββ validate_schemas.py # Schema validation
βββ validate_config_usage.py # Config usage checker
βββ test_config_integration.py # Config testing
βββ requirements.txt # Dependencies
π Dataset Information
Bank Customer Dataset
- Source: Kaggle "Bank Term Deposit Subscription" dataset
- Records: 45,211 bank customers
- Columns: 17 attributes including demographics, financial, and campaign data
- Format: CSV with semicolon delimiter
- Storage: Automatically converted to SQLite for querying
Key Data Fields
- Demographics: age, job, marital status, education
- Financial: balance, housing loan, personal loan, default status
- Campaign: contact method, duration, campaign number, previous contacts
- Target: term deposit subscription (yes/no)
Data Processing
- Automatic CSV detection and loading
- SQLite conversion for efficient querying
- Schema introspection and validation
- Sample data generation for testing
Current Data Source Support
- Primary: Single Kaggle dataset connector
- Architecture: Extensible connector pattern ready for additional sources
- Storage: Local SQLite database with CSV import
- Future: Framework supports PostgreSQL, MySQL, BigQuery connectors
β¨ Features
π§ Intelligent Query Processing
- Natural Language Understanding: Parse complex business requirements using GPT-4.1
- Context-Aware Mapping: Automatically map business terms to database fields
- Query Optimization: Generate efficient SQL with automatic LIMIT clauses and optimization
- Error Prevention: Built-in validation prevents dangerous operations (DELETE, UPDATE, DROP)
π Safety & Validation
- Operation Restrictions: Blocks DELETE, UPDATE, DROP operations
- Performance Limits: Automatic LIMIT clauses and row count warnings
- Syntax Validation: Pre-execution query testing
- Input Validation: Query structure and content validation
- Read-Only Access: Database operations limited to SELECT statements
π Integration Ready
- MCP Protocol: Native support for AI assistant integration
- REST API: HTTP endpoints for web applications
- Extensible Architecture: Framework ready for multiple data sources
- Simulated Downstream Activation: Returns integration targets for CRM, email, analytics systems
Note: Current version simulates downstream integrations. Real API connections require additional development.
π Comprehensive Results
- Sample Data Preview: See actual customer records before activation
- Processing Transparency: Complete breakdown of all agent processing steps with timestamps
- Performance Metrics: Processing time tracking, query execution time, and row count estimates
- Confidence Scoring: Basic confidence reporting and ambiguous term detection
- Validation Results: Detailed validation reports with warnings and error detection
- Schema Information: Complete database schema with sample values and data types
π οΈ Implementation Use Cases
1. Marketing Campaign Management
{
"query": "High-value customers who haven't been contacted in 6 months",
"use_case": "Re-engagement campaign targeting",
"output": "Segment for email marketing platform"
}
2. Risk Assessment
{
"query": "Customers with loans but negative balance trends",
"use_case": "Credit risk monitoring",
"output": "Alert list for risk management team"
}
3. Product Recommendations
{
"query": "Young professionals without housing loans",
"use_case": "Mortgage product targeting",
"output": "Prospect list for sales team"
}
4. Customer Success
{
"query": "Long-term customers with declining engagement",
"use_case": "Churn prevention",
"output": "Priority list for customer success managers"
}
5. Compliance Reporting
{
"query": "All customers contacted more than regulatory limit",
"use_case": "Compliance monitoring",
"output": "Audit report for regulatory team"
}
π Output Utilization
Integration Architecture (Framework Ready)
The system provides a foundation for integrating with downstream systems:
Current Implementation
- Simulated Activations: Returns list of target systems for segments (simulation only)
- Processing Results: Complete customer data and SQL queries for export
- Segment Storage: In-memory segment management with unique IDs
- API Structure: Framework ready for webhook and API integrations
Integration Framework (Not Yet Implemented)
- CRM Systems: Architecture supports Salesforce, HubSpot, Pipedrive integration
- Email Platforms: Framework ready for Mailchimp, SendGrid connections
- Ad Platforms: Structure prepared for Facebook, Google, LinkedIn APIs
- Analytics Tools: Design supports Tableau, Power BI data export
Note: Current version provides the framework and simulated responses. Real API integrations require additional development work.
Business Process Integration
Marketing Workflows
Natural Language Query β Segment Creation β Campaign Launch β Performance Tracking
Sales Processes
Lead Qualification β Segment Assignment β Automated Outreach β Conversion Tracking
Customer Success
Health Score Monitoring β Risk Segment Identification β Intervention Campaigns β Retention Metrics
π Getting Started
Prerequisites
- Python 3.8+
- OpenAI API Key
- Kaggle API credentials (optional, for dataset access)
Installation
- Clone the repository
git clone https://github.com/tejasayya/SegmentMCP.git
cd SegmentMCP
- Install dependencies
pip install -r requirements.txt
- Configure environment
# Create .env file
OPENAI_API_KEY=your_openai_api_key_here
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_KEY=your_kaggle_key
- Generate portable Claude Desktop config
python generate_claude_config.py
This creates claude_mcp_config_generated.json with correct paths for your system.
What this does:
- Generates cross-platform Claude Desktop configuration
- Auto-detects project paths and data directories
- Validates configuration and reports issues
- Creates portable config that works on any system
- Validate schemas and configuration (optional)
python validate_schemas.py # Validate data schemas
python validate_config_usage.py # Check config usage
python test_config_integration.py # Test config integration
These validate schemas, check configuration usage, and test integration.
π§ͺ Validation & Testing Tools
The project includes comprehensive validation and testing infrastructure:
Schema Validation
python validate_schemas.py
- Validates all Pydantic schemas
- Tests error cases and edge conditions
- Generates schema documentation
- Ensures data model consistency
Configuration Validation
python validate_config_usage.py
- Checks all config values are actually used
- Identifies unused configuration
- Validates config value ranges
- Tests environment variable overrides
Integration Testing
python test_config_integration.py
- Tests agent configuration loading
- Validates config integration across components
- Checks environment variable support
- Tests configuration validation logic
Direct Testing
python test_demo_direct.py # Test demo server directly
python test_http_requests.py # Test HTTP endpoints
python test_mcp_client.py # Test MCP protocol
- Choose your server mode
Note on OpenAI Version: If you encounter OpenAI compatibility issues, you may need to upgrade:
pip install openai>=2.0.0 # Upgrade from 1.35.15 if needed
π― Server Options
Option 1: Full MCP Server (Production)
python main.py
- β Complete AI-powered natural language processing
- β Requires OpenAI API key
- β For Claude Desktop integration
- β Full agent pipeline with GPT
Option 2: Demo Mode (No OpenAI Required)
python demo_server.py
- β Rule-based query parsing (no AI)
- β Works without OpenAI API key
- β Good for testing and development
- β Limited to predefined patterns
Option 3: HTTP Testing Interfaces
For Postman/HTTP API testing, choose one approach:
A) Direct HTTP Server (Recommended for development)
python http_server.py
# Server runs on http://localhost:8001
- Pros: Fast, reliable, easy debugging, direct method calls
- Cons: Bypasses MCP protocol validation
- Use for: Daily development, Postman testing, rapid iteration
B) MCP Protocol Wrapper (Protocol validation)
python http_wrapper.py
# Server runs on http://localhost:8001
- Pros: Tests actual MCP implementation, protocol-compliant, validates MCP server
- Cons: More complex, subprocess overhead, harder debugging
- Use for: Validating MCP server works correctly, protocol testing
C) Demo HTTP Wrapper (No OpenAI)
python demo_http_wrapper.py
# Server runs on http://localhost:8002
- Pros: Works without OpenAI API, good for basic testing, no API costs
- Cons: Limited to rule-based parsing, no AI capabilities
- Use for: Testing without API costs, basic functionality validation
π€ Which Server Should You Use?
| Use Case | Recommended Server | Why |
|---|---|---|
| Claude Desktop Integration | main.py |
Full MCP protocol with AI |
| Development/Testing | http_server.py |
Fast HTTP testing with Postman |
| MCP Protocol Validation | http_wrapper.py |
Ensures MCP server works correctly |
| No OpenAI API Key | demo_server.py or demo_http_wrapper.py |
Works without API costs |
| Production Deployment | main.py |
Complete feature set |
- Test the API
curl -X POST "http://localhost:8001/create-segment" \
-H "Content-Type: application/json" \
-d '{"query": "Married customers with age over 30"}'
MCP Integration
For AI assistant integration, run the MCP server:
python main.py
π§ Architecture Decisions
Why Multiple Server Files?
This project provides multiple ways to run the server to address different development and deployment needs:
Core MCP Server (main.py)
- Purpose: Production MCP server for Claude Desktop
- Features: Full AI pipeline with OpenAI integration
- Protocol: Pure MCP via stdio
Demo Version (demo_server.py)
- Purpose: Development without API costs
- Features: Rule-based parsing, no OpenAI dependency
- Why: Allows testing core functionality without API keys
HTTP Interfaces - Two Approaches
Direct Integration (http_server.py)
- Approach: Directly imports and uses
SegmentationMCPServerclass - Reasoning: Faster development, easier debugging, reliable for testing
- Trade-off: Bypasses MCP protocol, but better for HTTP API needs
Protocol Wrapper (http_wrapper.py)
- Approach: Starts MCP server as subprocess, communicates via JSON-RPC
- Reasoning: Tests actual MCP implementation, validates protocol compliance
- Trade-off: More complex, but ensures MCP server actually works
Why Both HTTP Approaches?
- Development Speed:
http_server.pyfor fast iteration and Postman testing - Protocol Validation:
http_wrapper.pyto ensure MCP server works correctly - Different Needs: Direct calls vs. protocol testing serve different purposes
OpenAI Version Compatibility
Issue: The project initially used openai>=1.30.0,<1.36.0 but users may encounter compatibility issues.
Solution: Upgrade to openai>=2.0.0 if you face initialization errors:
pip install openai>=2.0.0
Why: Newer OpenAI versions have different client initialization patterns and better stability.
π‘ API Reference
Create Segment
POST /create-segment
Create a customer segment from natural language description.
Request Body:
{
"query": "Description of desired customer segment in plain English"
}
Response:
{
"status": "success",
"segment_id": "SEG_ABCD1234",
"customer_count": 1500,
"downstream_systems": ["CRM_System", "Email_Marketing_Platform", "Ad_Platform"],
"generated_query": "SELECT * FROM bank_customers WHERE marital = 'married' AND age > 30 LIMIT 1000",
"validation_sample": [
{"age": 35, "job": "management", "marital": "married", "balance": 2143, "housing": "yes"},
{"age": 42, "job": "technician", "marital": "married", "balance": 1506, "housing": "no"}
],
"estimated_rows": 1500,
"processing_steps": {
"intent_parsing": {
"parsed_criteria": {
"conditions": [{"field": "marital", "operator": "=", "value": "married"}, {"field": "age", "operator": ">", "value": 30}],
"logical_operators": ["AND"]
},
"confidence": 0.9,
"ambiguous_terms": [],
"parsing_notes": ["Successfully parsed natural language query"],
"timestamp": "2024-01-15T10:30:01Z",
"processing_time_ms": 1250
},
"data_mapping": {
"business_terms": {"age": "age", "marital": "marital"},
"table_mappings": {"customers": "bank_customers"},
"field_mappings": {"marital": "marital", "age": "age"},
"timestamp": "2024-01-15T10:30:02Z",
"processing_time_ms": 150
},
"query_generation": {
"sql_query": "SELECT * FROM bank_customers WHERE marital = 'married' AND age > 30 LIMIT 1000",
"optimized": true,
"estimated_rows": 1500,
"tables_used": ["bank_customers"],
"optimization_notes": ["Added LIMIT clause for safety"],
"timestamp": "2024-01-15T10:30:03Z",
"processing_time_ms": 800
},
"validation": {
"is_valid": true,
"issues": [],
"warnings": ["Query returns large number of rows: 1500"],
"sample_data": [
{"age": 35, "job": "management", "marital": "married", "balance": 2143},
{"age": 42, "job": "technician", "marital": "married", "balance": 1506}
],
"row_count": 1500,
"timestamp": "2024-01-15T10:30:04Z",
"processing_time_ms": 200
}
}
}
Get Segment Info
GET /segment/{segment_id}
Retrieve information about a created segment.
Get Database Schema
GET /schema
Get current database schema information.
Health Check
GET /health
Server health status endpoint.
βοΈ Advanced Configuration
Environment Variables
All configuration values support environment variable overrides:
# Model Configuration
export OPENAI_MODEL="gpt-4.1"
export OPENAI_TEMPERATURE="0.1"
export OPENAI_MAX_TOKENS="1000"
# Agent-Specific Models
export INTENT_PARSER_MODEL="gpt-4.1"
export QUERY_GENERATOR_MODEL="gpt-4.1"
# Performance Settings
export MAX_QUERY_ROWS="1000"
export DEFAULT_QUERY_LIMIT="1000"
export VALIDATION_SAMPLE_SIZE="5"
export MAX_SAFE_ROWS="100000"
export WARNING_ROW_THRESHOLD="50000"
# Timeouts
export INTENT_PARSER_TIMEOUT="15"
export QUERY_GENERATOR_TIMEOUT="20"
export VALIDATION_TIMEOUT="10"
export ACTIVATION_TIMEOUT="25"
Agent Configuration
Each agent loads configuration automatically:
- Intent Parser: Model selection, temperature, timeout settings
- Query Generator: Model, optimization rules, query limits, safety settings
- Validator: Performance thresholds, sample sizes, row count limits
- Activator: Timeout settings, downstream system configuration
Basic Environment Variables
OPENAI_API_KEY: Required for AI-powered query generationOPENAI_MODEL: Model to use (default: gpt-4.1)KAGGLE_USERNAME: For dataset accessKAGGLE_KEY: Kaggle API keyDATABASE_PATH: Path to local database fileMAX_QUERY_ROWS: Maximum rows per query (default: 1000)
π Example Queries
Basic Segmentation
"Customers over 25 years old"
"Married customers with housing loans"
"High balance customers without personal loans"
Advanced Criteria
"Customers contacted more than 3 times but never converted"
"Young professionals with tertiary education and no defaults"
"Retired customers with high balances who were contacted in May"
Business-Specific Terms
"High-value prospects for mortgage products"
"At-risk customers for retention campaigns"
"Premium customers for exclusive offers"
π‘οΈ Security Considerations
Data Protection
- No sensitive data stored in logs
- Read-only database access (SELECT operations only)
- Query validation prevents dangerous operations
- Local data processing (no external data transmission)
Access Control
- OpenAI API key required for AI features
- Local file system access only
- Framework ready for authentication systems
Compliance
- Local data processing maintains privacy
- Processing transparency for audit requirements
- Framework supports compliance features
β οΈ Current Limitations
What's Simulated (Not Real)
- Downstream Integrations: Returns system names but doesn't actually connect to CRM/email platforms
- Multi-Database: Only supports single Kaggle dataset, not multiple data sources
- Advanced Security: Basic validation only, not full parameterized queries
What's Real & Working
- MCP Protocol: Full implementation with Claude Desktop integration
- AI Processing: Real GPT-4.1 integration for natural language processing
- SQL Generation: Actual query creation and validation
- HTTP APIs: Working REST endpoints for testing and integration
- Comprehensive Validation: Extensive testing and validation infrastructure
π Support
Built-in Validation Tools
python validate_schemas.py- Comprehensive schema validationpython validate_config_usage.py- Configuration usage analysispython test_config_integration.py- Integration testingpython generate_claude_config.py- Setup assistance
Community
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.