MKMChat
AI assistant and MCP server for Mortal Kombat Mobile that provides intelligent team suggestions, mechanic explanations, and context-aware chat via local LLMs and RAG.
README
MKMChat (Mortal Kombat Mobile Assistant)
MKMChat is a premium, local-first AI assistant and Model Context Protocol (MCP) server for Mortal Kombat Mobile. It combines a high-performance Python search backend, advanced RAG retrieval, local Ollama LLM execution, and a stunning Laravel + Livewire web interface.
๐ Main Features
- โ๏ธ Intelligent Team Suggestion: Assemble custom 3-character team compositions complete with character class analyses, passive synergy ratings, and specific equipment cards recommended for every slot.
- ๐ Mechanic Explanation Flow: Explain complex gameplay mechanics (e.g., Snare, Power Drain, Oblivion) by generating a clear definition and practical combat recommendations in a structured format.
- ๐ฌ AI-Powered Conversation Chat: Enjoy natural, context-aware Q&A about game mechanics, strategy, tier rankings, and character matchups with chat history persistence.
- ๐ค Reasoning Model Support: Optimized dynamically for reasoning models like DeepSeek-R1 and OpenAI o1/o3, adapting parameters (context limits, temperature) for deep analytical outputs.
- ๐ ๏ธ MCP-Compatible Tool Server: Exposes rich game tools directly to LLM clients (like Claude Desktop, Cursor, or AI agents) via Model Context Protocol.
- ๐จ Harmonious Light & Dark Modes: Responsive, visual-first Laravel Livewire web interface featuring glassmorphic designs, vibrant color schemes, and seamless dark/light theme toggles.
๐ง Advanced Hybrid RAG System
The retrieval pipeline has been heavily upgraded to ensure state-of-the-art relevance and precision:
flowchart TD
A[Game Data<br>TSV + TXT] --> B[Set-Aware Indexer]
B --> C[Precise Chunking]
C --> D[Text Normalization]
D -- "sentence-transformers<br>all-MiniLM-L6-v2" --> E[Embeddings Cache<br>.rag_cache/]
E --> F[Hybrid Retrieval<br>Semantic + Keyword Boost]
F --> G[Ollama Assistant<br>Prompt Assembly]
- Set-Aware Indexing: The indexer automatically scans for character set affiliations (e.g.,
{{Friendship}}or{{Brutality}}tags) and injects mutual cross-references. Retrieving one item naturally surfaces details and names of its set partners (e.g., retrieving Baraka's Horde Chef's Delight also surfaces Horde Chef's Paraphernalia). - True Cosine Similarity: Vector embeddings (both query and document vectors) are mathematically $L_2$-normalized upon creation and query time. Dot-product computation of these normalized vectors yields mathematically precise cosine similarity scores strictly bounded within $[-1.0, 1.0]$.
- Hybrid Retrieval (Lexical Keyword Boosting): Vector embeddings (
all-MiniLM-L6-v2) are combined with a specialized keyword-matching reranker (_apply_keyword_boost()). Exact matches on character names, rarity tiers, and major gameplay terms receive an intelligent boost, ensuring high semantic recall without losing keyword precision. - Typography Resiliency: Input queries are automatically normalized (e.g., converting curly quotes
โ,โ,โto straight quotes',") to prevent matching failures caused by different keyboard inputs. - Granular Chunking: Glossary definitions are chunked term-by-term, and gameplay data is chunked line-by-line. This avoids oversized search spaces ("fat chunks") and provides highly targeted context snippets.
๐ ๏ธ Architecture
mkmchat/: Python core package (Asynchronous FastAPI + Uvicorn HTTP server, MCP server implementation, and local vector RAG system).webapp/: Laravel + Livewire web UI consuming the Python API through the secureMkmApiServicewrapper.docker-compose.yml: Full-stack container orchestration linkingollama,python-api(internal network), andwebapp(host exposed).
๐ High-Performance Asynchronous Architecture
The Python API has been completely migrated to a fully asynchronous runtime stack:
- Asynchronous Web Core: Replaced custom
BaseHTTPRequestHandlerwith FastAPI running under Uvicorn. All API routes (/suggest-team,/ask-question,/explain-mechanic,/chat,/health,/) run asynchronously (async def) and non-blockingly. - Concurrency Verification: Multiple parallel requests are executed in parallel on Uvicorn's event loop. Under load tests, executing 5 concurrent API requests yields a total wall-clock execution time that matches the latency of a single request (~4.75 seconds), achieving 80%+ concurrency latency savings compared to blocking synchronous designs.
- Security & Reliability: Implemented native FastAPI exception handlers for exact HTTP contract safety (
{"error": "details"}output format), client IP-based custom sliding rate limiting dependencies, and secure API key authentication headers.
๐ณ Running with Docker (Recommended)
1) Prepare Environment Files
From the project root:
cp .env.docker.example .env.docker
cp webapp/.env.docker.example webapp/.env.docker
2) Configure Your Environment
At a minimum, ensure these match where required:
MKM_API_KEY: A strong random string shared by the backend and Laravel services for authentication.OLLAMA_MODEL: The default model tag to use (e.g.,llama3.2:3bordeepseek-r1:14b-fit).MKM_DEBUG_PROMPTS: Set totrueto write detailed, fully redacted LLM prompts and responses todebug_llm.logfor easy tuning.
3) Start the Stack
docker compose up -d --build
Dedicated GPU / High VRAM Tuning (WSL & Linux)
To deploy a high-performance DeepSeek-R1 (14B) model optimized for local consumption on GPU-enabled environments:
docker compose exec ollama sh -lc "cat > /tmp/Modelfile.deepseek14b-fit << 'EOF'
FROM deepseek-r1:14b
PARAMETER num_ctx 512
PARAMETER num_batch 32
PARAMETER num_predict 800
PARAMETER use_mmap true
PARAMETER temperature 0.2
EOF
ollama create deepseek-r1:14b-fit -f /tmp/Modelfile.deepseek14b-fit"
Once generated, select deepseek-r1:14b-fit in the web application model selector dropdown!
4) Endpoints
- Web Application: http://localhost:8000
- Ollama Interface: http://localhost:11434
Note: The Python API is internal to the Docker network (http://python-api:8080) and is not published directly to the host for maximum container-level security.
5) Operations & Logs
docker compose ps # Check service status
docker compose logs -f python-api # Live Python server logs
docker compose logs -f webapp # Live Web UI logs
docker compose logs -f ollama # Live Ollama inference logs
docker compose down # Tear down all containers
๐ป Running Without Docker
1. Python API & RAG Backend
Ensure you have Python 3.10+ installed:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install -e .
ollama pull llama3.2:3b
python -m mkmchat http
2. Laravel Web Application
Ensure PHP 8.2+ and Composer are installed:
cd webapp
cp .env.example .env
composer install
npm install
npm run build
php artisan key:generate
php artisan migrate
php artisan serve
โ๏ธ Environment Variables
Core Variables (.env.docker)
| Variable | Description | Default |
|---|---|---|
OLLAMA_BASE_URL |
Base URL of the Ollama server | http://ollama:11434 |
OLLAMA_MODEL |
Default LLM model tag | llama3.2:3b |
MKM_HTTP_HOST |
Host binding for Python API | 0.0.0.0 |
MKM_API_KEY |
Strong bearer authorization key | change-me-in-production |
MKM_DEBUG_PROMPTS |
Write fully redacted prompt history to log file | false |
MKM_MECHANIC_RAG_TOP_K |
Number of passages to search for explanations | 16 |
MKM_MECHANIC_RAG_MAX_PASSAGES |
Max context passages to include | 8 |
๐งช Testing
Run RAG and model connection checks directly inside the containers:
Quick Smoke Test
docker compose exec webapp curl -X POST http://python-api:8080/explain-mechanic \
-H "Content-Type: application/json" \
-H "X-API-Key: <your-mkm-api-key>" \
-d '{"mechanic":"power drain","model":"llama3.2:3b"}'
Python RAG Verification
To manually test semantic search quality, cache validation, and keyword boosting:
# Inside the virtual environment
python tests/test_rag.py
๐ License
This project is licensed under the GNU GPL v3. See the LICENSE file for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.