Product MCP Server
Enables natural language product management (CRUD) with multiple classification methods, including LLM-based and fast ML classifiers, for product creation, listing, updating, and deletion via a chat interface.
README
Product MCP Server
A microservice-based product management system using MCP (Model Context Protocol) with intelligent tool selection and argument extraction.
Architecture
- MCP Server: FastAPI-based product CRUD operations (Port 9000)
- Chat API: Natural language interface with multiple classification methods (Port 8000)
- Ollama: Local LLM server for embeddings and text generation (Port 11434)
- PostgreSQL: Database for product storage (Port 5432)
Models Used
- nomic-embed-text: For vector embeddings and semantic search
- llama3.1:8b: For tool selection and argument extraction from natural language
- CSV-based ML Classifier: Fast pattern-based classification with dynamic argument extraction
- Joint Intent-Slot Classifier: Advanced ML-based classification for complex queries
API Endpoints
Chat API (Port 8000)
POST /chat/v1- Uses configurable NLP model (llama3.1:8b, qwen2.5:3b-instruct-q4_K_M, etc.) for tool selection and argument extractionPOST /chat/v2- Uses joint intent-slot classification with advanced ML approachPOST /chat/v3- Uses CSV-based ML classifier with dynamic argument extractionGET /chat/tools- List available toolsGET /chat/test- Health check
Endpoint Evolution: v1 → v2 → v3
The chat endpoints represent an evolution in classification approaches:
v1 (Configurable NLP):
- Why: Unified approach using configurable NLP models via environment variables
- Pros: Flexible model selection (llama3.1:8b, qwen2.5:3b-instruct-q4_K_M, etc.), handles all scenarios
- Cons: Speed depends on chosen model (llama3.1:8b = slow, qwen2.5:3b = fast)
- Use case: All scenarios - model choice based on requirements
v2 (Joint Intent-Slot Classification):
- Why: Needed faster, more structured approach than v1
- Pros: Fast, handles intent classification and argument extraction simultaneously
- Cons: Requires pattern training, less flexible than LLM
- Use case: Standard queries with predictable patterns
v3 (CSV-based ML Classifier):
- Why: Needed fastest possible response for production use
- Pros: Sub-second response, rule-based with ML fallback
- Cons: Requires explicit pattern training, least flexible
- Use case: High-throughput production scenarios
MCP Server (Port 9000)
- Product CRUD operations via MCP protocol
- OpenAPI spec available at
/openapi.json - Health check at
/health
Quick Start
-
Build and start services:
docker-compose up -d -
Test the v1 endpoint (configurable NLP model):
curl -X POST "http://localhost:8000/chat/v1" \ -H "Content-Type: application/json" \ -d '{"message": "add Smartphone Pro, with touchscreen, price $1249"}' -
Expected response (minimal format for all endpoints):
{ "message": "product created", "data": { "id": 1, "name": "Smartphone Pro", "price": 1249.0, "description": "with touchscreen" } }
Features
Multiple Classification Methods
- v1: Configurable NLP model classification (llama3.1:8b, qwen2.5:3b-instruct-q4_K_M, etc.) with configurable timeout
- v2: Joint intent-slot classification with advanced ML approach
- v3: CSV-based ML classifier with dynamic argument extraction (fastest)
Response Formats
All endpoints now use the minimal format for clean, consistent responses:
{
"message": "product created|updated|deleted|retrieved|products retrieved",
"data": { /* actual product data */ }
}
Tool Selection Capabilities
- ✅ Intelligent tool selection using multiple ML approaches
- ✅ Automatic argument extraction from natural language
- ✅ Dynamic argument extraction with regex patterns
- ✅ Recent product filtering for "show recent" queries
- ✅ Fast CSV-based classification for quick responses
- ✅ Advanced joint classification for complex queries
Supported Operations
- product.create: Add new products with name, price, description
- product.list: List all products (with pagination and recent_only filter)
- product.get: Get specific product by ID
- product.update: Update existing products
- product.delete: Delete products
Special Features
- Recent Products: Queries like "show recent product created" return only the most recent product
- Dynamic Extraction: Automatically extracts names, prices, IDs, and descriptions from natural language
- Fallback Handling: Graceful fallbacks when primary classification methods fail
Performance Optimizations
Recent Improvements
- ✅ Unified response format: All endpoints use minimal response format
- ✅ Automated model training: CSV classifier trained during Docker build
- ✅ Dynamic argument extraction: Enhanced with regex-based extraction
- ✅ Recent product support: Special handling for "recent" queries
- ✅ Optimized timeouts: 300-second timeout for Llama 3.1 operations
- ✅ Pre-trained models: Uses existing
csv_classifier.pklwith latest patterns
Build Performance
- Model Training: Automated during container build process
- Pattern Updates: Latest patterns automatically included in builds
- Fast Startup: Pre-loaded models for immediate availability
Development
Rebuilding after changes:
# Stop and rebuild specific service
docker-compose stop chat-api
docker-compose build chat-api
docker-compose up -d chat-api
# Or rebuild all services
docker-compose down
docker-compose up --build -d
Performance Comparison
Response Time Analysis (Query: "add Ismartphone Xy5, best one yet and price is $1592")
| Endpoint | Method | Response Time | Status | Product Created | Features | Best For |
|---|---|---|---|---|---|---|
/v1 |
Llama 3.1 | ~3m 1s | ✅ Success | ID 34 (name: ✓, price: ✓, desc: ✓) | Full LLM reasoning | Complex queries |
/v2 |
Joint Intent-Slot | ~0.21s | ✅ Success | ID 35 (name: ✓, price: ✓, desc: ✗) | Advanced ML classification | Research/advanced |
/v3 |
CSV-based ML | ~0.19s | ✅ Success | ID 36 (name: ✓, price: ✓, desc: ✗) | Pattern matching + regex | Production use |
/v4 |
Qwen2.5-3B (q4) | ~1m 36s | ✅ Success | ID 37 (name: ✓, price: ✓, desc: ✓) | Balanced intelligence/speed | Balanced scenarios |
Performance Characteristics
v1 (Llama 3.1):
- Response Time: ~3m 1s
- Status: ✅ Success (with 5-minute timeout)
- Product Accuracy: Perfect (name: ✓, price: ✓, description: ✓)
- Use Case: Complex queries requiring deep reasoning
- Trade-off: Maximum intelligence, slowest speed
v2 (Joint Intent-Slot):
- Response Time: ~0.21s
- Status: ✅ Success
- Product Accuracy: Good (name: ✓, price: ✓, description: ✗ - includes price info)
- Use Case: Standard queries with predictable patterns
- Trade-off: Fast, requires pattern training
v3 (CSV-based ML):
- Response Time: ~0.19s
- Status: ✅ Success
- Product Accuracy: Good (name: ✓, price: ✓, description: ✗ - includes price info)
- Use Case: High-throughput production scenarios
- Trade-off: Fastest, least flexible
v4 (Qwen2.5-3B q4):
- Response Time: ~1m 36s
- Status: ✅ Success
- Product Accuracy: Perfect (name: ✓, price: ✓, description: ✓)
- Use Case: Balanced scenarios requiring both speed and intelligence
- Trade-off: Moderate speed, good flexibility, reasonable resource usage
Product Creation Analysis
Query: "add Ismartphone Xy5, best one yet and price is $1592"
| Endpoint | Name Extraction | Price Extraction | Description Extraction | Overall Accuracy |
|---|---|---|---|---|
| v1 | ✅ "Ismartphone Xy5" | ✅ 1592.0 | ✅ "best one yet" | 100% |
| v2 | ✅ "Ismartphone Xy5" | ✅ 1592.0 | ❌ "best one yet and price is $1592" (includes price) | 67% |
| v3 | ✅ "Ismartphone Xy5" | ✅ 1592.0 | ❌ "best one yet and price is $1592" (includes price) | 67% |
| v4 | ✅ "Ismartphone Xy5" | ✅ 1592.0 | ✅ "best one yet" | 100% |
Note: All endpoints successfully created products in the database. v1 and v4 correctly extracted the description as "best one yet". v2 and v3 incorrectly included price information in the description.
Endpoint Evolution Summary
| Version | Evolution Reason | Performance | Intelligence | Flexibility |
|---|---|---|---|---|
| v1 | Started with LLM for maximum flexibility | Slow (3m 1s) | High | High |
| v2 | Needed faster, structured approach | Fast (0.21s) | Medium | Medium |
| v3 | Needed fastest possible response | Fastest (0.19s) | Medium | Low |
| v4 | Balanced approach between speed and intelligence | Medium (1m 36s) | High | High |
Troubleshooting
Common Issues
- Model not found: Ensure Ollama is running and
llama3.1:8bis pulled - Timeout errors: v1 endpoint has 300-second timeout for complex queries
- Build issues: Check that all services are properly stopped before rebuilding
Logs
# View chat-api logs
docker-compose logs -f chat-api
# View mcp-api logs
docker-compose logs -f mcp-api
# View ollama logs
docker-compose logs -f ollama
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.