mcp-ai-detection

mcp-ai-detection

Enables multi-tier AI-detection screening on academic papers by extracting text from .tex and .docx files, splitting into standard sections, and running a pipeline of statistical and LLM-based analysis.

Category
Visit Server

README

mcp-ai-detection

Open-source MIT MCP server for multi-tier AI-detection screening on academic papers. It accepts .tex and .docx, extracts clean text, splits standard paper sections, and runs a three-tier risk pipeline.

AI detection is screening, not proof. Reports include limits, threats to validity, and a final recommendation framed as decision support.

Features

  • MCP tools: extract_text, split_sections, full_pipeline
  • Input: LaTeX .tex and Word .docx
  • Text extraction: Pandoc for LaTeX when installed, robust fallback cleaner, python-docx for Word
  • Narrative/structured split: tables, formulas, captions, references, keyword lines, markdown tables, and dense math lines are excluded from the main authorship score
  • Section splitting: Abstract, Introduction, Methods, Results, Discussion, Conclusion
  • Tier 1 offline: burstiness, lexical diversity, AI-like connectives, n-gram repetition, sentence-length variance, repeated patterns, hedging, example density
  • Optional Tier 1 local LLM through Ollama with gemma4:e4b by default
  • Tier 2 local Gemma adjudicator through Ollama: rubric-based JSON screening calibrated with Tier 1 metrics, no paid API keys
  • Tier 3 open-source ensemble hooks: DetectGPT, Fast-DetectGPT, NPR command adapters plus built-in proxy analysis for repetition, lexical diversity, and semantic coherence
  • JSON and Markdown reports with executive summary, section breakdown, section x tier score table, narrative score, structured-content diagnostic, limits, and recommendation

Install

python -m pip install -e .

Pandoc is optional but recommended for LaTeX:

# macOS
brew install pandoc

# Ubuntu/Debian
sudo apt-get install pandoc

MCP server

Run with stdio transport:

python -m mcp_ai_detection.server

Example MCP client config:

{
  "mcpServers": {
    "ai-detection": {
      "command": "python",
      "args": ["-m", "mcp_ai_detection.server"],
      "env": {
        "LOCAL_LLM_MODEL": "gemma4:e4b"
      }
    }
  }
}

Tools

extract_text

{
  "file_path": "paper.tex",
  "prefer_pandoc": true
}

Returns clean text, word count, extractor used, and warnings.

split_sections

{
  "text": "Abstract\n...\nIntroduction\n..."
}

Returns detected standard sections with line ranges and word counts.

full_pipeline

{
  "file_path": "paper.docx",
  "use_llm": false,
  "tier2_provider": "gemma-local",
  "early_stop": true
}

Runs extraction, sectioning, Tier 1 statistics, conditional Tier 2 Gemma/Ollama, conditional Tier 3, then returns report_json and report_markdown.

CLI

python -m mcp_ai_detection.cli paper.tex --markdown report.md --json report.json

Configuration

Environment variables:

LOCAL_LLM_MODEL=gemma4:e4b
OLLAMA_HOST=http://localhost:11434
OLLAMA_KEEP_ALIVE=30m
HTTP_TIMEOUT_SECONDS=120
TIER1_LLM_WEIGHT=0.6
TIER1_STATS_WEIGHT=0.4

DETECTGPT_CMD=
FAST_DETECTGPT_CMD=
NPR_CMD=
METHODS_WEIGHT_REDUCTION=0.75

Tier 2 uses the local Ollama model named by LOCAL_LLM_MODEL. Recommended:

ollama pull gemma4:e4b
ollama serve

Check that Ollama is using the GPU:

ollama ps

The PROCESSOR column should show 100% GPU for loaded models.

External Tier 3 commands receive section text on stdin and should return JSON:

{
  "score": 0.72,
  "confidence": 0.64,
  "details": {
    "model": "your-detector"
  }
}

If commands are not configured, built-in proxy scorers keep the pipeline fully offline and deterministic.

Thresholds

  • < 0.3: low
  • 0.3-0.6: medium
  • >= 0.6: high
  • Tier 2 early stop: probability < 0.4
  • Sections below 80 narrative words are marked insufficient_evidence and are excluded from the document-level narrative score

Methods sections get reduced Tier 3 weight by default to lower false positives from formulaic scientific prose.

Development

Run offline tests:

python -m unittest discover -s tests

Run lint if dev extras are installed:

ruff check .

gemma3:4b is a smaller fallback for slower machines:

LOCAL_LLM_MODEL=gemma3:4b

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured