web-speed-agent

web-speed-agent

The MCP for the Web Speed Agent SDK that enables post-auth agents.

Category
Visit Server

README

web-speed-agent

PyPI version Python License: GPL v3

Local browser automation + Web Speed API integration for authenticated web extraction.

Point an AI agent at any website — including ones that require login — and get back clean, structured data. Credentials stay on your machine. Only extracted HTML goes to the server.

pip install web-speed-agent
playwright install chromium

Want to use this with Claude, Gemini, or other AI clients?

Check out the MCP Server Installation Guide — it's the easiest way to let AI agents log in and extract data through natural language.


How it works

Your machine                           Web Speed server
─────────────────────────────────      ──────────────────────────
Playwright browser (local)
  ↓ navigates, logs in, clicks
  ↓ gets page HTML
  ↓ (no passwords sent)
agent.extract(html)         ────────→  Advanced extraction engine
                            ←────────  Structured JSON

Credentials never leave your machine. The server only sees HTML.


Quickstart

import asyncio
from web_speed_agent import Agent

async def main():
    agent = Agent(api_key="wsp_...")       # or set WEBSPEED_API_KEY env var

    # Public pages — no browser needed
    result = await agent.map("https://techcrunch.com/some-article/")
    print(result["article"]["sections"])

    # Authenticated pages — browser runs locally
    agent.store_credential("mysite", "me@example.com", "mypassword")

    async with agent.browser(session_name="mysite") as browser:
        page = await browser.new_page()
        await page.goto("https://mysite.com/login")

        username, password = agent.get_credential("mysite")
        await page.fill('[name="email"]', username)
        await page.fill('[name="password"]', password)
        await page.click('button[type="submit"]')
        await page.wait_for_load_state("networkidle")

        # Now on a logged-in page — extract it
        html = await page.content()
        result = await agent.extract(html, page_type="listing")
        print(result["listing"]["items"])

asyncio.run(main())

Get an API key at getwebspeed.io.


Installation

Requirements: Python 3.10+, a Web Speed API key

pip install web-speed-agent
playwright install chromium
export WEBSPEED_API_KEY="wsp_..."

Core concepts

Agent

The main class. Manages credentials, browser sessions, and API calls.

from web_speed_agent import Agent

# API key from argument
agent = Agent(api_key="wsp_...")

# API key from environment variable (recommended)
# export WEBSPEED_API_KEY="wsp_..."
agent = Agent()

# Use as async context manager (auto-closes HTTP client)
async with Agent() as agent:
    ...

Extracting public pages

No browser needed for pages that don't require login:

# Fetch + extract in one call
result = await agent.map("https://example.com/article")

# With JavaScript rendering (for heavy SPAs)
result = await agent.map("https://example.com/spa", js=True)

Extracting authenticated pages

Use a local browser session. The browser runs on your machine:

async with agent.browser(session_name="mysite") as browser:
    page = await browser.new_page()
    await page.goto("https://mysite.com/dashboard")
    html = await page.content()

result = await agent.extract(html)

The session_name persists cookies to ~/.webspeed/sessions/<name>/ so subsequent runs skip the login step.


Credential management

Credentials are stored in your system keychain (macOS Keychain, Windows Credential Manager, Linux secret-tool). They are never sent to Web Speed servers.

# Store once
agent.store_credential("mysite", "me@example.com", "mypassword")

# Retrieve anywhere
username, password = agent.get_credential("mysite")

# Remove
agent.delete_credential("mysite")

Extraction output

The server returns page-type-aware structured data:

# Article
result = await agent.extract(html, page_type="article")
# result["page_type"]    → "article"
# result["title"]        → "Article Title"
# result["author"]       → "Jane Smith"
# result["published_date"] → "2026-05-06"
# result["article"]["sections"] → [{"heading": "...", "paragraphs": [...]}]
# result["article"]["links"]    → [{"text": "...", "url": "..."}]

# Product
result = await agent.extract(html, page_type="product")
# result["product"]["name"]         → "Wireless Headphones"
# result["product"]["price"]        → "$99.99"
# result["product"]["availability"] → "In Stock"
# result["product"]["rating"]       → "4.5"
# result["product"]["specs"]        → {"Battery": "30h", ...}

# Listing (search results, category pages)
result = await agent.extract(html, page_type="listing")
# result["listing"]["items"] → [{"title": "...", "url": "...", "price": "..."}]

# Auto-detect (default)
result = await agent.extract(html)
# result["page_type"] → "article" | "product" | "listing" | "other"

All results include engine: "advanced" — 60–85% more token-efficient than raw HTML.


Examples

Price monitor

import asyncio
from web_speed_agent import Agent

async def check_price(url: str, site_name: str) -> str:
    async with Agent() as agent:
        agent.store_credential(site_name, "me@example.com", "password", overwrite=True)

        async with agent.browser(session_name=site_name) as browser:
            page = await browser.new_page()

            # Login
            await page.goto(f"https://{site_name}.com/login")
            user, pwd = agent.get_credential(site_name)
            await page.fill('[name="email"]', user)
            await page.fill('[name="password"]', pwd)
            await page.click('button[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Check product
            await page.goto(url)
            await page.wait_for_load_state("networkidle")
            html = await page.content()

        result = await agent.extract(html, page_type="product")
        return result.get("product", {}).get("price", "unknown")

price = asyncio.run(check_price("https://example.com/product/123", "example"))
print(f"Current price: {price}")

Read a private dashboard

import asyncio
from web_speed_agent import Agent

async def get_dashboard_data():
    async with Agent() as agent:
        async with agent.browser(session_name="analytics") as browser:
            page = await browser.new_page()

            # Login (first run only — session persists after)
            creds = agent.get_credential("analytics")
            if not creds:
                agent.store_credential("analytics", "me@company.com", "password")
                creds = agent.get_credential("analytics")

            await page.goto("https://analytics.company.com/login")
            await page.fill('[name="email"]', creds[0])
            await page.fill('[name="password"]', creds[1])
            await page.click('button[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Navigate to dashboard
            await page.goto("https://analytics.company.com/dashboard")
            await page.wait_for_selector(".metrics-table", timeout=10000)
            html = await page.content()

        result = await agent.extract(html)
        return result

asyncio.run(get_dashboard_data())

Multi-page scrape while logged in

import asyncio
from web_speed_agent import Agent

async def scrape_inbox():
    async with Agent() as agent:
        async with agent.browser(session_name="webmail") as browser:
            page = await browser.new_page()

            # Login
            await page.goto("https://mail.example.com/login")
            user, pwd = agent.get_credential("webmail")
            await page.fill('[name="username"]', user)
            await page.fill('[name="password"]', pwd)
            await page.click('[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Scrape multiple pages
            emails = []
            for page_num in range(1, 4):
                await page.goto(f"https://mail.example.com/inbox?page={page_num}")
                await page.wait_for_load_state("networkidle")
                html = await page.content()
                result = await agent.extract(html, page_type="listing")
                emails.extend(result.get("listing", {}).get("items", []))

        return emails

asyncio.run(scrape_inbox())

AI agent integration (MCP)

The included MCP server lets Claude Desktop, Gemini CLI, and any MCP-compatible agent use the SDK directly. The agent can log in, navigate, click, and extract — all through natural language.

Start the MCP server:

WEBSPEED_API_KEY="wsp_..." python3 agent_mcp_server.py

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "web-speed-agent": {
      "command": "python3",
      "args": ["/path/to/agent_mcp_server.py"],
      "env": {
        "WEBSPEED_API_KEY": "wsp_..."
      }
    }
  }
}

Add to Gemini CLI (~/.gemini/settings.json):

{
  "mcpServers": {
    "web-speed-agent": {
      "command": "python3.11",
      "args": ["/path/to/agent_mcp_server.py"],
      "env": {
        "WEBSPEED_API_KEY": "wsp_...",
        "PYTHONPATH": "/path/to/web-speed-agent"
      }
    }
  }
}

Then tell the agent:

"Store my credentials for united — username me@example.com, password mypassword"

"Log into united.com and find me the cheapest flight from SFO to JFK next Friday"

Available MCP tools:

Tool Description
store_credential Save login to system keychain
login Open browser + sign in
navigate Go to a URL in the active session
extract_page Get structured data from current page
click Click a button or link
fill_field Type into a form field
submit_form Submit a form
close_browser End the browser session
account_info Check API credit balance

API reference

Agent

Agent(
    api_key: str | None = None,
    server_url: str | None = None,
    config_dir: str = "~/.webspeed",
    headless: bool = True,
)
Parameter Description
api_key Web Speed API key. Falls back to WEBSPEED_API_KEY env var.
server_url Override API server URL. Default: https://api.getwebspeed.io.
config_dir Directory for config, sessions, and logs. Default: ~/.webspeed.
headless Run browser headlessly. Default: True.

agent.browser()

agent.browser(
    session_name: str | None = None,
    headless: bool | None = None,
    proxy: str | None = None,
) -> ManagedBrowser

Returns an async context manager. Inside the block, call .new_page() to get a Playwright Page.

Parameter Description
session_name Persist cookies to ~/.webspeed/sessions/<name>/. None = no persistence.
headless Override instance headless for this session.
proxy Proxy URL e.g. "socks5://localhost:1080".

Session names must be alphanumeric + hyphens/underscores, max 64 chars.


agent.extract()

await agent.extract(
    html: str,
    page_type: str = "auto",
) -> dict

Sends HTML to the Web Speed API. Costs 1 credit.

Parameter Description
html Raw HTML string (e.g. from page.content()).
page_type "article", "product", "listing", or "auto".

agent.map()

await agent.map(
    url: str,
    js: bool = False,
) -> dict

Fetches and extracts a public URL via the server. No local browser needed. Costs 1 credit.

Parameter Description
url Page URL. Must be http:// or https://.
js Render JavaScript before extracting.

agent.account()

await agent.account() -> dict

Returns: credits, tier, status, lifetime (total/hits/misses).


agent.store_credential()

agent.store_credential(
    site: str,
    username: str,
    password: str,
    overwrite: bool = False,
) -> None

Saves to system keychain. Raises CredentialError if credential exists and overwrite=False.


agent.get_credential()

agent.get_credential(site: str) -> tuple[str, str] | None

Returns (username, password) or None if not found.


agent.delete_credential()

agent.delete_credential(site: str) -> None

Removes credential from keychain.


Exceptions

from web_speed_agent import (
    WebSpeedError,          # Base exception
    AuthenticationError,    # Invalid/missing API key
    InsufficientCreditsError, # No credits remaining
    APIError,               # API returned 4xx/5xx
    RateLimitError,         # 429 Too Many Requests
    CredentialError,        # Keychain error
    BrowserError,           # Playwright error
    NetworkError,           # Timeout or DNS failure
    PlaywrightNotInstalledError, # Run: playwright install chromium
)
from web_speed_agent import Agent, InsufficientCreditsError, NetworkError

try:
    result = await agent.extract(html)
except InsufficientCreditsError:
    print("Out of credits — top up at getwebspeed.io")
except NetworkError as e:
    print(f"Connection failed: {e}")

Configuration

Environment variables

Variable Description
WEBSPEED_API_KEY API key (recommended over config file)
WEBSPEED_SERVER_URL Override server URL (must be https://)

Config file

~/.webspeed/config.yaml — created automatically on first run. Permissions set to 0o600 (owner-only).

api:
  server_url: https://api.getwebspeed.io
  timeout: 30

browser:
  headless: true

Session files

Persisted browser sessions are stored in ~/.webspeed/sessions/<name>/storage.json.

  • Permissions: 0o600 (owner-only)
  • Contains: cookies, localStorage, sessionStorage
  • Safe to delete: agent will re-authenticate on next run

Security

What leaves your machine

When you call agent.extract(html), the page HTML is sent to the Web Speed API for processing. Everything else stays local.

Data Where it goes
Login credentials Never leave your machine (system keychain only)
Browser cookies / session Never leave your machine (local Playwright)
Page HTML Sent over HTTPS to Web Speed API for extraction
Extracted JSON Returned to you

HTML scrubbing (on by default)

Before any HTML is transmitted, the SDK automatically scrubs it locally:

  • Inline <script> and <style> blocks removed
  • Hidden form fields with auth-related names (csrf, token, nonce, session, etc.) have their values blanked
  • Sensitive <meta> content attributes cleared
  • HTML comments removed

Visible content — text, links, tables, headings, product data — is untouched.

# Default: scrubbing is on
result = await agent.extract(html)

# Turn off only if the page has no sensitive data
result = await agent.extract(html, scrub=False)

# Or scrub manually and inspect before sending
from web_speed_agent import scrub
clean_html = scrub(raw_html)
print(clean_html)  # inspect what will be sent
result = await agent.extract(clean_html, scrub=False)

Server-side data handling

  • HTML processed in-memory only — never written to disk, never logged, never cached
  • Auth-gated pages never cached — pages requiring login are explicitly excluded from the shared registry
  • Usage logs store only: a hash of your API key, a hash of the URL (or "sdk-extract"), timestamp, and detected page type — no content
  • No raw HTML in error responses — exceptions are sanitized before any error is returned

Other protections

  • Credentials stored in system keychain, never in files, never sent to servers
  • Session files written with 0o600 permissions (owner-only read/write)
  • Config directory created with 0o700 permissions
  • TLS always verifiedverify=True on all HTTP calls, cannot be disabled
  • HTTPS enforcedserver_url must start with https://, plain HTTP rejected
  • Path traversal prevention — session names validated against [a-zA-Z0-9_-] allowlist
  • No credential logging — passwords never appear in logs or error messages

License

GNU General Public License v3.0 — see LICENSE.

Web Speed API usage is subject to the Web Speed Terms of Service.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured