mcp-common-crawl

mcp-common-crawl

MCP server for Common Crawl CDX that enables backlink discovery, expired domain finding, and competitor gap analysis without requiring API keys.

Category
Visit Server

README

mcp-common-crawl

Built by Artur Ferreira @ The GEO Lab ยท ๐• @TheGEO_Lab ยท LinkedIn ยท Reddit

Version Licence Claude Code

MCP server for Common Crawl CDX โ€” backlink discovery, expired domain finder, competitor gap analysis. Free alternative to Ahrefs/Semrush backlink APIs ($100+/month).

Tools

Tool Description
discover_backlinks Find backlinks to any domain across 3 CC indexes
find_expired Search for expired/parked domains in a niche via CC CDX
check_domain Deep single domain check โ€” live/expired/parked + CC page count
competitor_gap Find domains linking to competitors but not to you

Features

โœ… Production-tested โ€” patterns used in production at TheGEOLab

Install

# Claude Code
claude mcp add common-crawl -- npx mcp-common-crawl

# Or in .mcp.json
{
  "mcpServers": {
    "common-crawl": {
      "command": "npx",
      "args": ["mcp-common-crawl"]
    }
  }
}

No API Keys Required

Common Crawl is a free, open web archive. No API keys, no rate limits, no paid tiers.

Usage

> find backlinks to thegeolab.net using Common Crawl
> search for expired domains in the "seo tools" niche
> check if example.com is expired or parked
> find link gap between my site and competitors

Important Notes

  • Uses native fetch() for CC CDX (axios returns 404 on CC CDX โ€” known issue)
  • Queries the 3 most recent CC indexes for best coverage
  • Expired domain detection: ECONNREFUSED/ENOTFOUND = expired, parked page pattern matching for parked domains

Attributions & Licence

Built and maintained by Artur Ferreira @ TheGEOLab.

Email: artur@thegeolab.net

Best Practice Attribution

This MCP server was built following the open source Best Practice Approach โ€” reading community work for inspiration, then writing original content, and crediting every source.

Based on:

Data source:

Backlink analysis concepts inspired by:

  • Ahrefs โ€” backlink discovery and competitor gap methodology
  • Semrush โ€” backlink analytics and domain comparison
  • Majestic โ€” historic backlink index concepts

Technical decisions:

  • Native fetch() used instead of axios for CC CDX queries (axios returns 404 on CC CDX from inside Express โ€” persistent debugging issue documented in geolab-backlinks)

All server code is original writing. No files were copied or adapted from any source. MIT licence.


Found this useful? โญ Star the repo and connect: ๐ŸŒ thegeolab.net ยท ๐• @TheGEO_Lab ยท LinkedIn ยท Reddit

Related Repos

Licence

MIT โ€” see LICENSE


Built and maintained by Artur Ferreira @ TheGEOLab ยท MIT License

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured