Shark-no-Kari

Shark-no-Kari

Remote MCP server for web scraping with anti-bot evasion. Provides stealth HTTP fetching, headless browser with Cloudflare bypass, CSS selectors, YouTube transcripts, and Markdown conversion.

Category
Visit Server

README

<div align="center">

Shark-no-Kari (狩り - the hunt)

Remote MCP server for web scraping with anti-bot evasion

Stealth HTTP fetching · Headless browser · Cloudflare bypass · CSS selectors · YouTube transcripts · Markdown conversion

License: MIT Python 3.12+ MCP Scrapling

Claude's built-in web_fetch fails on GitHub blob URLs, Cloudflare-protected sites, and JS-rendered pages. Shark-no-Kari is a remote MCP server that gives Claude powerful web tools powered by Scrapling — a fast HTTP fetcher with stealth headers, a real headless browser with anti-bot evasion, structured element extraction, and YouTube transcript fetching. Deploy it on any VPS with Docker, add the URL as a custom connector in claude.ai, and Claude can fetch pages that were previously unreachable.

</div>


Table of Contents


Tools

Tool Description Speed
fetch_page Fast HTTP request with stealth headers. Works for GitHub, docs, static pages ~1-2s
stealth_fetch_page Real headless browser with anti-bot evasion. Bypasses Cloudflare, renders JS ~5-15s
extract_elements Fetch a page and extract multiple elements via CSS selectors as structured JSON ~1-2s
get_youtube_transcript Fetch YouTube video transcripts/captions with language fallback ~1-3s

fetch_page and stealth_fetch_page support:

  • css_selector — extract specific elements instead of the full page
  • to_markdown — convert HTML to readable Markdown (default: true)
  • Automatic truncation at 80,000 characters

get_youtube_transcript supports:

  • Standard watch URLs, youtu.be short links, and /shorts/ URLs
  • Preferred language with automatic fallback to the first available language
  • Automatic truncation at 80,000 characters

Quick Start

# Create a directory and grab the required files
mkdir shark-no-kari && cd shark-no-kari
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/docker-compose.yml
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/Caddyfile

# Configure — edit .env with your domain and optional API key
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/.env.example
mv .env.example .env

# Start (pulls the pre-built image from ghcr.io)
docker compose up -d

Then add https://your-domain.com/mcp as a custom connector in claude.ai settings.


Deployment

Prerequisites

Requirement Details
VPS Ubuntu 24.04, 1 GB+ RAM (headless browser needs ~512 MB)
Domain A subdomain you can point to the server
Claude account Pro or Max (free accounts get 1 custom connector)

1. Provision a VPS

Any provider works (Hetzner, DigitalOcean, Vultr, Linode, etc.).

  • OS: Ubuntu 24.04
  • RAM: 1 GB minimum
  • Recommended: Hetzner Cloud CPX22 in Falkenstein (tested, low latency to Anthropic EU)

2. Point your domain

Create an A record for your subdomain (e.g. kari.snowy-burbot.com) pointing to the server's IPv4 address. Wait for DNS propagation.

3. Bootstrap the VPS

ssh root@<server-ip>
curl -sSL https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/scripts/setup-vps.sh | bash

Or run the commands from scripts/setup-vps.sh manually. This installs Docker and creates /opt/shark-no-kari.

4. Deploy

cd /opt/shark-no-kari

# Grab the required files
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/docker-compose.yml
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/Caddyfile

# Configure environment
curl -LO https://raw.githubusercontent.com/HaiNick/Shark-no-Kari/main/.env.example
mv .env.example .env
# Auth is disabled by default — Caddy IP allowlist handles security (see below)

# Start (pulls the pre-built image from ghcr.io)
docker compose up -d

First pull downloads ~2 GB (browser binaries are included in the image). Verify:

docker compose logs -f shark-no-kari
curl -s https://kari.snowy-burbot.com/mcp | head

5. Connect to Claude

  1. Open claude.ai
  2. Go to Settings -> Connectors
  3. Click Add custom connector
  4. Enter the URL: https://kari.snowy-burbot.com/mcp
  5. Click Add

The tools now appear in the Search and tools menu in any conversation.


Usage in Claude

Once connected, Claude can call the tools automatically when web_fetch fails, or you can ask directly:

"Use the scrapling fetcher to get the contents of https://github.com/user/repo/blob/main/README.md"

"Stealth-fetch https://example.com and extract all elements matching .product-card"


Security

IP Allowlisting

Authentication is disabled (MCP_API_KEY is empty). Instead, Caddy restricts access to Anthropic's outbound IP range via an IP allowlist. Only requests from claude.ai's servers can reach the MCP endpoint; all others receive a 403.

@blocked not remote_ip 160.79.104.0/21
respond @blocked 403

See Anthropic's IP ranges documentation for the latest list.

DNS Rebinding Protection Workaround

The MCP SDK includes DNS rebinding protection that rejects requests where the Host header doesn't match the server's expected hostname. Since Caddy proxies to the Docker container as shark-no-kari:8000, the Host header won't match. The Caddyfile includes a header rewrite to satisfy the SDK's check:

reverse_proxy shark-no-kari:8000 {
    header_up Host localhost:8000
}

Optional Bearer Token Auth

If you prefer token-based auth (e.g. when not using Caddy), set MCP_API_KEY in .env:

MCP_API_KEY=$(openssl rand -hex 32)

Every request must then include an Authorization: Bearer <key> header. Claude sends this automatically if you configure it in the connector's advanced settings.

For a stronger alternative that gives you per-user login and consent, see OAuth (OIDC) Authentication below.


OAuth (OIDC) Authentication

Shark-no-Kari supports optional OAuth 2.1 authentication federated to a self-hosted Pocket ID instance via OIDC. When enabled, Claude opens a browser tab the first time it connects and you log in through Pocket ID. The MCP server issues its own short-lived JWTs to Claude; Pocket ID is only involved during the browser-based login.

Auth Mode Comparison

Mode How it works When to use
IP allowlist (default) Caddy blocks all non-Anthropic IPs. No token needed. Simplest. Fine if you only use claude.ai.
Bearer token (MCP_API_KEY) Static secret in the Authorization header. When you need a token-gated endpoint without OIDC.
OIDC (OIDC_ENABLED=true) OAuth 2.1 + Pocket ID browser login. Per-user consent, short-lived tokens. Strongest. Recommended for multi-user or internet-facing deployments.

OIDC_ENABLED and MCP_API_KEY are mutually exclusive. The server raises a startup error if both are set.

Pocket ID Client Setup

In your Pocket ID admin panel, create a new OIDC application:

Field Value
Name Shark-no-Kari (or any label)
Redirect URI https://kari.snowy-burbot.com/auth/callback
Scopes openid, profile, email
PKCE Enabled (S256)
Public client Off (confidential client with a secret)

Copy the generated Client ID and Client Secret for the next step.

Enable OIDC in .env

Generate the two key material values:

# JWT signing key (for fastmcp's own tokens issued to MCP clients):
openssl rand -hex 32

# Fernet encryption key (for the on-disk DCR registration store):
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Then set these variables in .env:

OIDC_ENABLED=true
MCP_API_KEY=                          # must be empty when OIDC is on
OIDC_CONFIG_URL=https://id.snowy-burbot.com/.well-known/openid-configuration
OIDC_CLIENT_ID=<paste from Pocket ID>
OIDC_CLIENT_SECRET=<paste from Pocket ID>
OIDC_BASE_URL=https://kari.snowy-burbot.com
JWT_SIGNING_KEY=<output of openssl command>
STORAGE_ENCRYPTION_KEY=<output of python command>

Restart the stack: docker compose up -d

On the first connection from Claude, it will open a browser tab. Log in with your Pocket ID account and click Allow. Subsequent connections reuse the cached token silently.

Verification

After deploying with OIDC enabled, run scripts/verify-oidc.sh:

bash scripts/verify-oidc.sh https://kari.snowy-burbot.com

Expected output shape:

==> Protected resource metadata
{
  "resource": "https://kari.snowy-burbot.com",
  "authorization_servers": ["https://kari.snowy-burbot.com"],
  ...
}

==> Authorization server metadata
{
  "issuer": "https://kari.snowy-burbot.com",
  "authorization_endpoint": "https://kari.snowy-burbot.com/authorize",
  "token_endpoint": "https://kari.snowy-burbot.com/token",
  ...
}

==> Unauthenticated /mcp must 401 with WWW-Authenticate
www-authenticate: Bearer realm="..."

If the third check produces no output, the server is not returning a WWW-Authenticate header, which means OIDC is not active.


Architecture

Tech Stack

Layer Technology Why
MCP Server FastMCP (PrefectHQ) MCP SDK with built-in OAuth 2.1 / OIDCProxy
Scraping Scrapling Stealth headers + headless browser (Camoufox)
HTML→MD html2text Clean Markdown conversion
ASGI Server uvicorn Fast async Python server
Reverse Proxy Caddy Auto HTTPS, IP allowlisting, OAuth path pass-through
Auth (OIDC) OIDCProxy + Pocket ID OAuth 2.1 with self-hosted OIDC provider
OAuth store py-key-value-aio FileTreeStore + Fernet encryption Encrypted on-disk DCR registration persistence
Deployment Docker + Docker Compose Single command setup

Project Structure

Shark-no-Kari/
├── src/
│   └── server.py            MCP server (FastMCP + Scrapling tools + optional OIDC auth)
├── tests/
│   └── test_server.py       pytest test suite for all tools + OIDC startup tests
├── scripts/
│   ├── setup-vps.sh         Bootstrap script for a fresh Ubuntu VPS
│   └── verify-oidc.sh       Smoke-test script for OIDC endpoints
├── .github/
│   └── workflows/
│       └── docker-publish.yml  CI: build & push Docker image to ghcr.io
├── Caddyfile                 Reverse proxy config (auto HTTPS + IP allowlist + OAuth paths)
├── docker-compose.yml        Orchestrates MCP server + Caddy + oauth_state volume
├── Dockerfile                Builds the MCP server image (Python 3.12 + browser deps)
├── pyproject.toml            pytest configuration
├── requirements.txt          Python dependencies
├── .env.example              Environment variable template (includes OIDC vars)
├── LICENSE                   MIT
└── .gitignore

How It Works

                    claude.ai
                       │
                       │ HTTPS (Streamable HTTP MCP)
                       ▼
                ┌─────────────┐
                │    Caddy    │  Auto HTTPS + IP allowlist (160.79.104.0/21)
                │   :80/:443  │  header_up Host localhost:8000
                └──────┬──────┘
                       │
                       ▼
              ┌──────────────────┐
              │  shark-no-kari   │  FastMCP server (uvicorn :8000)
              │                  │
              │  fetch_page()    │  → Scrapling Fetcher (stealth HTTP)
              │  stealth_fetch() │  → Scrapling StealthyFetcher (headless browser)
              │  extract_elems() │  → Multi-selector structured extraction
              │  yt_transcript() │  → YouTube transcript API
              │                  │
              │  html2text       │  → Markdown conversion + truncation
              └────────┬─────────┘
                       │ PROXY_URL fallback (on direct fetch failure)
                       ▼
              ┌──────────────────┐
              │  nordlynx-proxy  │  NordVPN WireGuard tunnel (sidecar)
              │  kari-nordlynx   │  dante SOCKS5 :1080 + tinyproxy :8888
              └──────────────────┘

Configuration

Environment Variables

Variable Default Description
MCP_API_KEY (empty) Bearer token for auth. Empty = disabled (use Caddy IP allowlist). Cannot be set together with OIDC_ENABLED=true.
PROXY_URL socks5h://kari-nordlynx:1080 SOCKS5 proxy fallback — retries via proxy when direct requests fail. Defaults to the bundled nordlynx-proxy sidecar container
NORDVPN_TOKEN (empty) NordVPN access token from Nord Account > Manual setup. Required for proxy fallback via nordlynx-proxy
NORDVPN_COUNTRY Germany NordVPN server country used by nordlynx-proxy for the WireGuard tunnel
HOST 0.0.0.0 Server bind address
PORT 8000 Server port
OIDC_ENABLED false Set to true to activate OAuth 2.1 via OIDC. All OIDC_* vars and JWT_SIGNING_KEY, STORAGE_ENCRYPTION_KEY are required when enabled.
OIDC_CONFIG_URL (empty) URL of the OIDC provider's discovery document (e.g. https://id.example.com/.well-known/openid-configuration)
OIDC_CLIENT_ID (empty) Client ID from your Pocket ID application
OIDC_CLIENT_SECRET (empty) Client secret from your Pocket ID application
OIDC_BASE_URL (empty) Public base URL of this server (e.g. https://kari.snowy-burbot.com). Used for redirect URIs.
JWT_SIGNING_KEY (empty) Secret for signing JWTs issued to MCP clients. Generate with openssl rand -hex 32.
STORAGE_ENCRYPTION_KEY (empty) Fernet key for encrypting the on-disk OAuth client store. Generate with python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())".

Caddyfile

The Caddyfile is mounted read-only into the Caddy container. Edit it to change your domain or IP allowlist:

kari.snowy-burbot.com {
    @blocked not remote_ip 160.79.104.0/21
    respond @blocked 403

    reverse_proxy shark-no-kari:8000 {
        header_up Host localhost:8000
    }
}

Local Development

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt
scrapling install    # Downloads browser binaries (Camoufox, Playwright)

# Run the server (no auth for local dev)
python src/server.py
# Server starts on http://localhost:8000/mcp

Test with the MCP Inspector:

npx @modelcontextprotocol/inspector http://localhost:8000/mcp

Docker

Run

docker compose up -d            # Pull and start in background
docker compose logs -f          # View logs
docker compose down             # Stop

Images are built automatically by CI and pushed to ghcr.io/hainick/shark-no-kari.

Services

Service Image Ports Purpose
shark-no-kari ghcr.io/hainick/shark-no-kari:latest 8000 (internal) MCP server
nordlynx-proxy edgd1er/nordlynx-proxy:latest 1080, 8888 (internal) NordVPN WireGuard tunnel + local SOCKS5/HTTP proxy
caddy caddy:2-alpine 80, 443 Reverse proxy, auto HTTPS, ACL

Dockerfile

Multi-step build:

  1. Install system libraries required by headless browsers (NSS, GTK, etc.)
  2. Install Python dependencies from requirements.txt
  3. Run scrapling install to download browser binaries (Camoufox, Playwright)
  4. Copy src/ and start with python src/server.py

The container uses shm_size: 512mb for headless browser shared memory.


Testing

# Install dev dependencies
pip install -r requirements.txt

# Run the test suite
pytest tests/ -v

The test suite covers all four tools with mocked external calls (no network requests needed).


Resource Usage

State RAM
Idle ~80 MB
During fetch_page ~120 MB
During stealth_fetch_page ~400-600 MB

1 GB RAM handles this fine for personal use. For concurrent stealth requests, consider 2 GB+.


Troubleshooting

Build fails downloading browser binaries

The scrapling install step downloads ~200 MB of browser binaries. If it times out, retry:

docker compose build --no-cache

stealth_fetch_page crashes with shared memory errors

Ensure shm_size: 512mb is set in docker-compose.yml (it is by default). Headless browsers need more shared memory than Docker's default 64 MB.

Caddy returns 403

Your IP is not in the allowlist. If testing from your machine, temporarily add your IP to the Caddyfile:

@blocked not remote_ip 160.79.104.0/21 YOUR.IP.HERE/32
respond @blocked 403

MCP SDK rejects requests with "Host header mismatch"

The header_up Host localhost:8000 directive in the Caddyfile handles this. If you've customised the Caddyfile, ensure the header rewrite is present. See DNS Rebinding Protection Workaround.

Sites block requests with HTTP/2 PROTOCOL_ERROR

Some sites (e.g. xda-developers.com) reject requests from datacenter IPs at the protocol level. The stack ships with a bundled nordlynx-proxy sidecar container that runs the NordVPN Linux client internally, opens a NordLynx (WireGuard) tunnel, and exposes a local SOCKS5 proxy on port 1080 that only the shark-no-kari container can reach over the Docker network. Requests are first tried directly, and only retried through the proxy if the direct request fails or returns a non-200 status.

To enable it, set a NordVPN access token in .env (get one from Nord Account > Manual setup by clicking "Get Access token"):

NORDVPN_TOKEN=your-access-token-here
NORDVPN_COUNTRY=Germany
PROXY_URL=socks5h://kari-nordlynx:1080

Use socks5h:// (not socks5://) so the proxy handles DNS resolution. If you don't want to run a proxy, leave NORDVPN_TOKEN empty and remove or comment out the nordlynx-proxy service and the depends_on block in docker-compose.yml.

nordlynx-proxy fails to connect

  • Check logs: docker compose logs kari-nordlynx for auth errors
  • Verify the token is valid at https://my.nordaccount.com/dashboard/nordvpn/manual-configuration/
  • Try a different country in .env: NORDVPN_COUNTRY=Netherlands
  • The container needs privileged: true, NET_ADMIN + SYS_MODULE capabilities, and the net.ipv6.conf.all.disable_ipv6 + net.ipv4.conf.all.rp_filter sysctls (all preconfigured in docker-compose.yml). Privileged mode is required by the Debian-based nordlynx-proxy image because the NordVPN client writes sysctls itself at connect time

Claude doesn't see the tools

  1. Verify the server is running: curl -s https://your-domain.com/mcp | head
  2. Check the connector status in claude.ai Settings -> Connectors
  3. Try removing and re-adding the connector

Acknowledgements

  • Scrapling — the scraping engine that powers both tools (stealth HTTP + headless browser via Camoufox)
  • FastMCP — official Python MCP SDK that makes building MCP servers painless
  • Caddy — automatic HTTPS and dead-simple reverse proxy config
  • html2text — clean HTML-to-Markdown conversion
  • uvicorn — fast ASGI server
  • youtube-transcript-api — YouTube transcript/caption fetching
  • nordlynx-proxy — NordVPN WireGuard tunnel in Docker with local SOCKS5/HTTP proxy

License

This project is licensed under the MIT License.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured