mcp-ai-detection
Enables multi-tier AI-detection screening on academic papers by extracting text from .tex and .docx files, splitting into standard sections, and running a pipeline of statistical and LLM-based analysis.
README
mcp-ai-detection
Open-source MIT MCP server for multi-tier AI-detection screening on academic
papers. It accepts .tex and .docx, extracts clean text, splits standard
paper sections, and runs a three-tier risk pipeline.
AI detection is screening, not proof. Reports include limits, threats to validity, and a final recommendation framed as decision support.
Features
- MCP tools:
extract_text,split_sections,full_pipeline - Input: LaTeX
.texand Word.docx - Text extraction: Pandoc for LaTeX when installed, robust fallback cleaner,
python-docxfor Word - Narrative/structured split: tables, formulas, captions, references, keyword lines, markdown tables, and dense math lines are excluded from the main authorship score
- Section splitting: Abstract, Introduction, Methods, Results, Discussion, Conclusion
- Tier 1 offline: burstiness, lexical diversity, AI-like connectives, n-gram repetition, sentence-length variance, repeated patterns, hedging, example density
- Optional Tier 1 local LLM through Ollama with
gemma4:e4bby default - Tier 2 local Gemma adjudicator through Ollama: rubric-based JSON screening calibrated with Tier 1 metrics, no paid API keys
- Tier 3 open-source ensemble hooks: DetectGPT, Fast-DetectGPT, NPR command adapters plus built-in proxy analysis for repetition, lexical diversity, and semantic coherence
- JSON and Markdown reports with executive summary, section breakdown, section x tier score table, narrative score, structured-content diagnostic, limits, and recommendation
Install
python -m pip install -e .
Pandoc is optional but recommended for LaTeX:
# macOS
brew install pandoc
# Ubuntu/Debian
sudo apt-get install pandoc
MCP server
Run with stdio transport:
python -m mcp_ai_detection.server
Example MCP client config:
{
"mcpServers": {
"ai-detection": {
"command": "python",
"args": ["-m", "mcp_ai_detection.server"],
"env": {
"LOCAL_LLM_MODEL": "gemma4:e4b"
}
}
}
}
Tools
extract_text
{
"file_path": "paper.tex",
"prefer_pandoc": true
}
Returns clean text, word count, extractor used, and warnings.
split_sections
{
"text": "Abstract\n...\nIntroduction\n..."
}
Returns detected standard sections with line ranges and word counts.
full_pipeline
{
"file_path": "paper.docx",
"use_llm": false,
"tier2_provider": "gemma-local",
"early_stop": true
}
Runs extraction, sectioning, Tier 1 statistics, conditional Tier 2 Gemma/Ollama,
conditional Tier 3, then returns report_json and report_markdown.
CLI
python -m mcp_ai_detection.cli paper.tex --markdown report.md --json report.json
Configuration
Environment variables:
LOCAL_LLM_MODEL=gemma4:e4b
OLLAMA_HOST=http://localhost:11434
OLLAMA_KEEP_ALIVE=30m
HTTP_TIMEOUT_SECONDS=120
TIER1_LLM_WEIGHT=0.6
TIER1_STATS_WEIGHT=0.4
DETECTGPT_CMD=
FAST_DETECTGPT_CMD=
NPR_CMD=
METHODS_WEIGHT_REDUCTION=0.75
Tier 2 uses the local Ollama model named by LOCAL_LLM_MODEL. Recommended:
ollama pull gemma4:e4b
ollama serve
Check that Ollama is using the GPU:
ollama ps
The PROCESSOR column should show 100% GPU for loaded models.
External Tier 3 commands receive section text on stdin and should return JSON:
{
"score": 0.72,
"confidence": 0.64,
"details": {
"model": "your-detector"
}
}
If commands are not configured, built-in proxy scorers keep the pipeline fully offline and deterministic.
Thresholds
< 0.3: low0.3-0.6: medium>= 0.6: high- Tier 2 early stop: probability
< 0.4 - Sections below 80 narrative words are marked
insufficient_evidenceand are excluded from the document-level narrative score
Methods sections get reduced Tier 3 weight by default to lower false positives from formulaic scientific prose.
Development
Run offline tests:
python -m unittest discover -s tests
Run lint if dev extras are installed:
ruff check .
gemma3:4b is a smaller fallback for slower machines:
LOCAL_LLM_MODEL=gemma3:4b
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.