merch-connector
An MCP server that gives AI agents eyes on any e-commerce storefront, enabling scraping, analysis, and comparison through the Model Context Protocol.
README
merch-connector
An MCP server that gives AI agents eyes on any e-commerce storefront.
Scrape product listings, extract facets, badges, sort options, and B2B signals; run AI-powered merchandising audits; compare two storefronts side-by-side; detect what changed between visits; and build persistent memory about sites — all through the Model Context Protocol.
Why merch-connector?
E-commerce merchandising analysis is manual, repetitive, and fragmented. A merchandiser might spend hours clicking through competitor sites, checking if filters work, comparing product grids, and noting what's changed. AI agents can do this work — but they can't see storefronts the way shoppers do.
merch-connector bridges that gap. It gives any MCP-compatible AI agent (Claude, custom agents, etc.) the ability to:
- Browse any storefront with a stealth headless browser that handles bot protection
- Extract structured product data, facets, performance metrics, and page structure
- Analyze merchandising quality through five expert personas or a full roundtable debate
- Remember site quirks across sessions so the agent gets smarter over time
- Track changes across visits — new products, price moves, facet/sort changes
Quick start
npx merch-connector
The server communicates over stdio and is designed to be launched by an MCP client, not run standalone.
Configuration
Add to your Claude Desktop claude_desktop_config.json or Claude Code .mcp.json:
{
"mcpServers": {
"merch-connector": {
"command": "npx",
"args": ["-y", "merch-connector"],
"env": {
"ANTHROPIC_API_KEY": "your_key_here"
}
}
}
}
To enable Firecrawl (bypasses bot-protected sites like Ferguson/Akamai) or pass any other env vars, add them to the env block:
"env": {
"ANTHROPIC_API_KEY": "your_key_here",
"FIRECRAWL_API_KEY": "fc-..."
}
Or install globally: npm install -g merch-connector
Environment variables
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
One of these | Anthropic Claude API key |
GEMINI_API_KEY |
One of these | Google Gemini API key |
OPENAI_API_KEY |
One of these | OpenAI or OpenAI-compatible API key |
OPENAI_BASE_URL |
No | Base URL for OpenAI-compatible endpoint. Defaults to https://api.openai.com/v1 |
MODEL_PROVIDER |
No | Force "anthropic", "gemini", "openai", or "ollama". Auto-detected if omitted. |
MODEL_NAME |
No | Override default model. Required when using Ollama. |
OPENAI_VISION |
No | Set "true" to pass screenshots to OpenAI-compatible vision models |
FIRECRAWL_API_KEY |
No | Enables Firecrawl as a fallback scraper in acquire — only used when Puppeteer is blocked by a WAF (0 products + FCP=0). Puppeteer always runs first. |
MERCH_CONNECTOR_DATA_DIR |
No | Custom path for site memory files. Default: ~/.merch-connector/data/ |
TOOL_TIMEOUT_MS |
No | AI tool timeout in ms. Default: 120000 (2 min) |
MERCH_LOG_FILE |
No | Path to NDJSON log file. If set, every server log entry is appended. |
LIGHTPANDA_CDP_URL |
No | Connect to an external Lightpanda/Chrome CDP endpoint instead of launching Puppeteer's bundled Chromium. Server-side optimization only — standard npx users can ignore this. |
You only need an API key for AI-powered tools (ask_page, merch_roundtable, analyze_products). Scraping tools work without one.
Using Ollama (local models)
{
"mcpServers": {
"merch-connector": {
"command": "npx",
"args": ["-y", "merch-connector"],
"env": {
"MODEL_PROVIDER": "ollama",
"MODEL_NAME": "qwen2.5:14b",
"OPENAI_BASE_URL": "http://localhost:11434/v1"
}
}
}
}
AI analysis tools degrade gracefully if no provider is configured — scraping still works, analysis returns an error instead of crashing.
Tools
| Tool | Description | Needs AI key? |
|---|---|---|
acquire |
Primary scraping tool. One-pass audit payload — products, facets, screenshots, performance, trust signals, navigation, data quality, analytics, and PDP samples in a single call | No |
analyze_products |
Run persona analysis on pre-scraped data. Pass a products/facets JSON payload (from acquire, a CSV export, or any source) and get the full 5-persona analysis without touching a browser |
Yes |
merch_roundtable |
Three expert personas analyze in parallel, then a moderator synthesizes consensus (results stream as each persona completes) | Yes |
ask_page |
Scrape a page and ask any question about it in plain language | Yes |
compare_storefronts |
Structured side-by-side diff of two URLs: facet gaps, trust signals, sort options, B2B mode, performance | No |
scrape_pdp |
Scrape a single product detail page — description fill rate, image count, reviews, spec table, cross-sell modules, CTA text, price | No |
get_category_sample |
Sample PDPs from a category page using spread/random/top strategy | No |
interact_with_page |
Execute one or more search/click actions in sequence, then extract the result | No |
site_memory |
Read/write persistent notes and learned data about any domain | No |
clear_session |
Reset stored cookies and page cache for a domain | No |
save_eval |
Persist a roundtable run as a structured eval record with convergence score | No |
list_evals |
Retrieve eval history for a domain or all domains | No |
get_logs |
Retrieve recent server log entries from the in-memory buffer, filterable by level or tool name | No |
scrape_page |
(Deprecated — use acquire) Raw structured extraction from any category page |
No |
Examples
acquire
Pull everything needed for a full storefront audit in one call
{
"url": "https://www.zappos.com/women/CK_XARC81wHAAQHiAgMBAhg.zso",
"pdp_sample": 2
}
Returns the complete audit payload: products with trust signals, facets, sort, navigation structure, data quality scores, analytics platform detection, performance timings, desktop + mobile screenshots, and 2 sampled PDPs — ready for the plugin to score.
ask_page
"Recommend facet changes for this laptop category page"
{
"url": "https://www.insight.com/en_US/shop/category/notebooks/store.html",
"question": "Recommend facet changes?"
}
Brand/Manufacturer — Most glaring omission. 50 products span 6+ brands (HP, Lenovo, Apple, Microsoft, Dell, Crucial). B2B buyers with vendor agreements need this as facet #1.
Price range buckets are misaligned. "Below $50" (2 items) signals category contamination — confirmed by a Crucial RAM stick appearing in laptop results. Clean up category mapping and re-bucket starting at $500.
merch_roundtable
The roundtable scrapes once, then runs three AI analyses in parallel followed by a moderator synthesis:
- Floor Walker — reacts as a real shopper ("I can't find Dell laptops without scrolling through 50 products")
- Auditor — evaluates Trust/Guidance/Persuasion/Friction ("0% facet detection rate, title normalization at 70%")
- Scout — identifies competitive gaps ("every competitor in B2B tech has brand filtering as facet #1")
- Moderator — synthesizes consensus, surfaces disagreements, produces prioritized recommendations
B2B Auditor automatically substitutes for Auditor when B2B signals are detected.
Personas
Five expert lenses for merchandising analysis. Use individually via ask_page or merch_roundtable.
| Persona | Role | Voice |
|---|---|---|
| Floor Walker | A shopper visiting for the first time | First-person, casual, instinctive — "I don't know what button to click" |
| Auditor | Compliance analyst with a framework | Metric-driven, precise — "Fill rate is 82%, 3/10 titles lack brand prefix" |
| Scout | VP of Merchandising at a competitor | Strategic, comparative — "This is table-stakes for the category" |
| B2B Auditor | Procurement buyer evaluating a vendor | Process-driven — scores steps-to-PO, spec completeness, pricing transparency, self-serve viability |
| Conversion Architect | CRO specialist mapping the purchase funnel | Analytical, hypothesis-driven — "checkout button is below the fold on mobile, estimated −8% conversion" |
Each persona returns score (0–100), severity (1–5), findings[] (3–5 concrete observations), and uniqueInsight — the one thing only that lens would catch.
Architecture
MCP Client (Claude, etc.)
|
| stdio (JSON-RPC)
|
merch-connector (Node.js MCP server)
|
+-- acquire.js One-pass audit entry point; Puppeteer-first waterfall (Firecrawl fallback for WAF-blocked sites)
+-- scraper.js Puppeteer + stealth plugin, structure detection, PageFingerprint
+-- analyzer.js Multi-provider AI (Anthropic / Gemini / OpenAI), 5 personas
+-- network-intel.js XHR interception, 35-platform fingerprint, dataLayer/GA4 parsing
+-- site-memory.js Persistent per-domain JSON store + change detection snapshots
+-- eval-store.js JSONL eval index + full run storage, convergence scoring
+-- prompts/ Persona prompt files (floor-walker, auditor, scout, b2b-auditor, conversion-architect)
- Scraping: Puppeteer with stealth plugin bypasses bot detection. Two-pass heuristic structure detection finds product grids on unknown sites. Extracts products, facets, trust signals (ratings, badges, stock warnings), performance timing, and screenshots. Firecrawl integration (
FIRECRAWL_API_KEY) provides LLM-based extraction as a primary path for bot-protected sites. - Network intelligence: Intercepts XHR/fetch during page load to fingerprint the commerce stack (Algolia, Bloomreach, SFCC, Shopify, Elasticsearch, and 30+ more). When a high-confidence match is found, extracts product and facet data directly from the API response — bypassing DOM parsing failures on enterprise storefronts.
- Analysis: Three-provider AI — Anthropic uses
tool_choiceforcing for structured JSON; Gemini usesresponseSchema; OpenAI-compatible uses function calling with a JSON-prompt fallback. Dynamic imports load only the needed SDK.ask_pageuses Haiku-class models for fast Q&A; persona analysis uses Sonnet-class. - Personas: Five expert lenses.
merch_roundtableruns Floor Walker, Auditor, and Scout in parallel then passes results to a moderator that synthesizes consensus and disagreements. B2B Auditor auto-substitutes for Auditor when B2B mode is detected. - Memory: Auto-learns site patterns on every scrape. Normalized snapshots enable change detection across visits — price moves, new/removed products, facet/sort changes. Manual notes persist across sessions.
- Evals: Two-tier storage — compact JSONL index (100 runs/domain) + full run JSON (10/domain). Convergence score (0–100) measures inter-persona agreement. Dedup hashing prevents double-saves.
Development
git clone https://github.com/grahamton/merchGent.git
cd merchGent
npm install
cp .env.example .env # fill in at least one AI API key
Running tests
npm test # scrape-only (no API key needed)
npm run test:audit # full merchandising audit
npm run test:persona # single persona (floor_walker)
npm run test:roundtable # all 3 personas + moderator
node test/smoke.js --b2b # B2B validation: Insight.com laptops + b2b_auditor
node test/smoke.js --ask "question" # ask anything about a page
node test/smoke.js --url https://... # override default URL
node test/protocol.js # MCP protocol compliance (no browser/API key needed)
MCP Inspector
npx @modelcontextprotocol/inspector -- node bin/merch-connector.js
Opens a browser UI where you can call any tool interactively.
Tool reference
acquire
One-pass audit payload. The primary tool in v2 — replaces the multi-step scrape_page + analysis workflow. Returns everything the audit pipeline needs in a single call.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Full URL to acquire |
pdp_sample |
No | Number of PDP samples to include (0–5, default 2). Auto-selects median-priced + premium (80th percentile) products. |
Returns:
page— title, metaDescription, pageType, breadcrumb, h1commerce— mode (B2B/B2C/Hybrid), platform, priceTransparency, loginRequiredproducts[]— normalized with trust signals, B2B/B2C indicators, description qualityfacets[],sort— filter panel and sort statenavigation— hasFilterPanel, filterPanelPosition, hasStickyNav, breadcrumbPresenttrustSignals— ratingsOnCards, freeShippingPromised, returnPolicyVisible, urgencyMessagingdataQuality— descriptionFillRate, ratingFillRate, priceFillRateanalytics— platform detection, GTM containers, ecommerce tracking status, productImpressionsFiringperformance— fcp, lcp, cls, domContentLoaded, loadCompletepdpSamples[]— sampled PDP detail pagesscreenshots— desktop + mobile base64 JPEGwarnings[]— structured quality flags with severityscraper—"firecrawl"or"puppeteer"(which path was used)
scrape_page
(Deprecated — use acquire) Raw structured extraction. Returns products (title, price, stock, CTA, description, B2B/B2C signals, trust signals), facets/filters, sort options, B2B mode + conflict score, page metadata, performance timing, data layers, interactable elements, and PageFingerprint. On repeat visits, also returns a changes diff.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Full URL to scrape |
depth |
No | Pagination pages to follow (1–5, default 1) |
max_products |
No | Max products per page (default 10) |
include_screenshot |
No | Include base64 JPEG desktop screenshot (default false) |
mobile_screenshot |
No | Also capture a 390×844 (iPhone 14) mobile screenshot (default false) |
Trust signals per product: star rating, review count, sale badge + text, best seller flag, stock warning ("Only 3 left"), sustainability label, raw badge texts.
compare_storefronts
Scrape two URLs concurrently and return a structured diff. No AI call — pure structural analysis.
| Parameter | Required | Description |
|---|---|---|
url_a |
Yes | First URL (your site or baseline) |
url_b |
Yes | Second URL (competitor or variant) |
max_products |
No | Max products per page (default 10) |
Returns: product count delta, facet gap analysis (onlyInA / onlyInB / shared count), trust signal coverage per site, sort option gaps, B2B mode + conflict score for each, performance delta (FCP + full load).
interact_with_page
Execute one or more search/click actions in sequence, then extract the resulting page.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Full URL to load |
actions |
One of these | Array of { action, selector?, value? } for multi-step flows |
action |
One of these | Single action shorthand: "search" or "click" |
selector |
Depends | CSS selector (required for click) |
value |
Depends | Text to type (required for search) |
include_screenshot |
No | Include screenshot of result |
Multi-step example: [{ "action": "search", "value": "laptop" }, { "action": "click", "selector": ".filter-in-stock" }]
ask_page
Scrape + AI Q&A. The model sees full product data, facets, performance, and a screenshot. Supports Anthropic (Haiku), Gemini, and OpenAI-compatible providers.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Full URL to scrape and ask about |
question |
Yes | Plain language question |
depth |
No | Pagination pages (default 1) |
max_products |
No | Max products per page (default 10) |
merch_roundtable
Multi-persona analysis with moderator synthesis. Floor Walker, Auditor, and Scout run in parallel — each result is streamed as a notifications/message as it completes. B2B Auditor auto-substitutes for Auditor when B2B signals are detected.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Full URL to analyze |
depth |
No | Pagination pages (default 1) |
max_products |
No | Max products per page (default 10) |
Returns: perspectives (each persona's typed result), debate.consensus, debate.disagreements, debate.finalRecommendations (with impact + endorsing personas).
site_memory
Persistent per-domain memory. Auto-accumulates on every scrape.
| Parameter | Required | Description |
|---|---|---|
action |
Yes | "read", "write", "list", or "delete" |
url |
Depends | Any URL on the domain (required for read/write/delete) |
note |
No | Text note to append (with write) |
key |
No | Custom field name (with write) |
value |
No | Value for the field (with write + key) |
clear_session
Reset cookies and cached page data for a domain.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | Any URL on the domain to clear |
save_eval
Persist the most recent roundtable or audit run as a structured eval record. Reads from the session persona cache — no data round-trip through the model. Must call merch_roundtable on the same URL first.
| Parameter | Required | Description |
|---|---|---|
url |
Yes | URL of the run to save (must match a cached session) |
note |
No | Optional free-text annotation |
Returns: eval ID, convergence score (0–100 inter-persona agreement), top concerns per persona, moderator summary excerpt, dedup hash.
list_evals
Retrieve eval history for a domain or all domains.
| Parameter | Required | Description |
|---|---|---|
url |
No | Filter to a specific domain. Omit to return all domains with eval history. |
get_logs
Retrieve recent server log entries from the in-memory circular buffer (500 entries).
| Parameter | Required | Description |
|---|---|---|
level |
No | Filter by level: "error", "warn", "info", "debug" |
tool |
No | Filter by tool name (e.g. "merch_roundtable") |
limit |
No | Max entries to return (default 50) |
History
v2.0.14 — Ollama local provider support + graceful no-AI degradation
- Ollama support:
MODEL_PROVIDER=ollamaroutes through the OpenAI-compatible API athttp://localhost:11434/v1— no API key required;MODEL_NAMEselects the local model - Graceful degradation:
ask_pageandmerch_roundtablenow return raw scrape data + a setup hint when no AI provider is configured, instead of throwing hasProvider()export: callers can gate on AI availability before invoking analysis- Docs:
.env.exampleandCLAUDE.mdupdated with Ollama configuration examples
v2.0.13 — Layered data quality model + Firecrawl schema refinement
- Data quality model:
acquirenow returnsdataQuality.overall.usabilityTier(full/degraded/minimal/failed) anddataQuality.dimensionswith graded description tiers (empty,spec,thin,rich), separating extraction confidence from site quality - Commerce-mode-aware warnings:
generateWarnings()uses B2C/B2B/Hybrid threshold maps; new codes:LOW_DESCRIPTION_FILL_CRITICAL,DESCRIPTIONS_SPEC_ONLY,RATINGS_ABSENT,PRICING_INCONSISTENT,EXTRACTION_CONFIDENCE_LOW,FACETS_MINIMAL - Firecrawl schema:
description→cardSubtitleinternally with visual hierarchy cues + few-shot examples; remapped back todescriptionin the payload (no breaking change) - Fixed: Puppeteer
extractionConfidencefalse positive whenstructureConfidenceis null — now falls back to product-count + priceFillRate signals
v2.0.12 — MCP-026: stabilize Firecrawl product description extraction
- MCP-026:
descriptionfield in FirecrawlEXTRACT_SCHEMAnow carries a JSON Schema annotation explaining what to look for (subtitle text, attribute summaries, model/color/finish specs visible on the card);acquireWithFirecrawl()also passes an explicitpromptto the extract call — eliminates non-deterministic empty-description runs caused by the LLM not knowing category cards carry spec text rather than marketing copy
v2.0.11 — MCP-020/023/024/025: breadcrumb heuristic, Hybrid detection, star rating guard, PDP timeout
- MCP-024:
starRatingguard added — values above 5 are discarded (review count bleed);ratingElnow preferscontentattribute (schema.org) before falling back toaria-label/innerText - MCP-020:
getBreadcrumb()gets two new fallback passes —data-testidbreadcrumb variants (React/Next.js), then a URL-depth heuristic overnav a/header aelements to recover multi-level paths like Ferguson's 4-level hierarchy - MCP-023:
PRO_TRADE_PATTERNextended withare you a pro,pro login,become a pro;hasProTradeCta()now checkspageText,page.h1, andpage.title; Firecrawl path falls back to testing fullraw.contentwhen nav items are sparse - MCP-025: Per-PDP
AbortSignal.timeout(12000)added to Firecrawl PDP path — 12s cap per PDP keeps totalacquirewall time under 60s (Claude Desktop client limit); timed-out PDPs fall through to Puppeteer fallback
v2.0.10 — MCP-017–023: data extraction gaps and Firecrawl routing
- MCP-017: PDP sub-scrapes now route through Firecrawl first (bypasses WAF/Akamai); fall back to Puppeteer per-URL;
PDP_SAMPLES_BLOCKEDwarning emitted when all PDPs fail - MCP-018/023:
freeShippingPromisednow checkstrustBadges[]in addition tob2cIndicators;commerce.modeupgraded B2C→Hybrid when Pro/Trade pricing CTAs are detected in page interactables or nav items - MCP-019/021: New warnings —
FACETS_INCOMPLETEwhen Firecrawl returns fewer than 4 facets;PERFORMANCE_UNAVAILABLE(info) when Firecrawl is active scraper - MCP-020/022: Breadcrumb selector expanded to capture
span/li/schema.org elements with dedup + separator filtering;ratingFillRatenow requiresrating > 0(zero-star no longer counted as filled)
v2.0.9 — Bot-block resilience: blocked/blockType/fallbackSuggestions
acquirenow surfaces block state explicitly: top-levelblocked(bool) andblockType(WAF|TIMEOUT|EMPTY_RENDER) are set wheneverFIRECRAWL_FAILED,LOW_CARD_CONFIDENCE, orNO_PRODUCTS_FOUNDwarnings are present — skill layer can branch without parsingwarnings[]fallbackSuggestions[]: three pre-computed search strings (site:, keyword,cache:) derived from the input URL, ready to pass to a search fallback workflow- Blocked responses skip the cache — retries after stealth changes or a different entry point always get a fresh scrape attempt
v2.0.8 — MCP-002: facet extraction for Shopify/Allbirds filter patterns
- Strategy 2 expanded: candidate selector list now includes
form[action*="filter"],[class*="FilterPanel"],[class*="filter-panel"]and similar patterns that Headless Shopify storefronts use — previously missed because filters weren't insideaside/nav/sidebarelements - Strategy 3 added: dedicated
<details>-based extractor for Shopify filter groups (Allbirds and similar) where each facet is a standalone<details>with a<summary>label and checkbox inputs — no shared sidebar container required
v2.0.7 — MCP-016: acquire silent timeout fix + progress logging
- Silent hang fixed: Firecrawl mobile screenshot call had no timeout — bot-blocked URLs caused the entire
acquirehandler to freeze indefinitely with zero log output; addedtimeout: 30000to the mobile scrape call - Progress logging:
acquirenow emitssendLogentries at every major step (Firecrawl start/complete, Puppeteer start/complete, PDP sampling start/complete, cache hit) so timeouts are diagnosable fromget_logs sendLogwired into acquire: passed viasessionOpsfromindex.js— no circular dependency, no architectural change
v2.0.6 — Fix acquire screenshot crash when using Firecrawl
- Root cause: Firecrawl returns
screenshotas a CDN URL, not base64; the MCP SDK's base64 validator rejected it, crashing everyacquirecall whenFIRECRAWL_API_KEYis set - Fix:
acquirehandler now detects URL-format screenshots, fetches and converts to base64 before sending as MCP image content items
v2.0.5 — Fix dotenv stdout corruption on startup
- MCP JSON-RPC broken by dotenv v17: dotenv v17.3+ prints a
[dotenv@17.x]banner to stdout by default; on a stdio transport this corrupted the JSON-RPC stream before the first message was parsed - Fix: Added
quiet: trueto the user config fallbackloadEnvcall inindex.js— both dotenv calls are now silent on startup
v2.0.4 — Fix acquire field truncation
- Root cause of missing fields: Screenshot base64 was included in the JSON text payload AND as a separate image content item — the duplicate filled the MCP token budget before
performance,trustSignals,analytics,navigation,dataQuality,pdpSamples, andwarningsappeared in the serialized output - Fix: Screenshots are now stripped from the JSON text and sent only as image content items; all 7 structured fields are now fully visible to the MCP client on every acquire call
v2.0.3 — MCP-013 API key fix + user config fallback
- MCP-013 root cause:
plugin.jsonwas explicitly settingANTHROPIC_API_KEY=""andFIRECRAWL_API_KEY="", overriding system env vars before they reached the server — fixed in plugin v0.5.1 - User config fallback: Server now loads
~/.merch-connector/.envas a fallback for any env var that is absent or empty, so API keys survive npx cache clears and work regardless of how the launcher passes env vars - Deduped imports: Merged
fsimport consolidation inindex.jsstartup block
v2.0.2 — MCP-014 acquire field fixes
trustSignals.avgRating: Renamed fromavgRatingAcrossProductsto match the field name the plugin audit command expects — was causing silent scoring failures on every acquire call- Warning severity values: Remapped from
"high"/"medium"/"low"to"error"/"warn"across allwarnings[]entries to match the plugin's expected enum
v2.0.1 — Model alias fix + full multi-provider ask_page
- MCP-013: Replaced retired
claude-3-5-sonnet-latestalias withclaude-sonnet-4-6across all Anthropic calls — fixesask_page,merch_roundtable, and all persona analysis tools that were returning 404 errors - ask_page multi-provider: Added full Gemini and OpenAI-compatible implementations (were placeholder stubs). Anthropic path now uses Haiku-class model for fast, cost-effective Q&A; all persona analysis continues to use Sonnet.
- MCP-015 docs:
FIRECRAWL_API_KEYdocumented in README and CLAUDE.md; configuration example updated with env passthrough pattern
v2.0.0 — acquire tool: one-pass v2 architecture
- New
acquiretool: Single call replaces the 6–8 stepscrape_page+ analysis workflow — returns products, facets, screenshots, performance, trust signals, navigation, data quality, PDP samples, analytics, andwarnings[]in one payload - Firecrawl integration: LLM extraction via Firecrawl as primary scraper with automatic Puppeteer fallback;
scraperfield reports which path was used and any fallback reason audit_storefrontretired: Returns a hard error directing callers toacquire;scrape_pagemarked deprecated with log warning- Protocol tests updated: 34/34 passing;
acquirein tool list,audit_storefrontabsent,scrape_pagedeprecation asserted
v1.9.2 — MCP-002 & MCP-005 fixes, roundtable refactor, B2B persona routing
- MCP-002: Restored
extractFacetsGenericfallback +hasFacetStructurestructural scoring bonus (+20); added nested wrapper key support (response.*,data.*) and wired generic extraction as a fallback inextractFromBestApi— "Unknown Facet" no longer appears when XHR data is available - MCP-005: Mobile screenshots now dismiss OneTrust, Cookiebot, and TrustArc consent overlays before capture; blank-image threshold raised to 20 KB to reliably reject consent-blocked frames
- Roundtable refactor: Collapsed per-provider per-persona duplicates into generic dispatch functions (~1000 lines removed);
merch_roundtableauto-substitutes the B2B auditor persona when B2B signals are detected
v1.9.1 — Bug fixes from Cowork plugin QA sweep
- CSS selector safety:
compare_storefrontsno longer crashes on Tailwind JIT arbitrary-value class names — all class-to-selector conversions now useCSS.escape() - Paint timing: FCP and first-paint captured via pre-navigation
PerformanceObserver— no longer returns 0 on SPA category pages - Mobile screenshot: renders in a fresh browser page with UA + viewport set before navigation, fixing blank white screen on UA-gated SPAs
- PDP
pageType: URL pattern signals (/product/,/p/,/buy/product/,/pdp/) now take priority over DOM product-count heuristics, fixing misclassification on PDPs with related-product carousels - AI timeout resilience:
audit_storefrontandmerch_roundtablecap the product payload sent to AI at 20 items, reducing prompt size and inference time scrape_pdpprice extraction: falls back to CTA button text when no dedicated price element is found;hasReviewsandspecTable.presentnow require count > 0- Facet resolution: "Unknown Facet" placeholders replaced with real names from intercepted XHR when a search API is detected
get_category_sample: error response now includesreasonandsuggestionwhen no product URLs are found
v1.9.0 — PDP sampling, smarter facets, B2B fingerprint depth
scrape_pdptool: dedicated PDP scraper returning description fill rate, image count, review schema, spec table, cross-sell modules, CTA text, and primary/sale prices — purpose-built for single product pagesget_category_sampletool: scrapes a category page and runsscrape_pdpin parallel on a spread/random/top selection of products — one call for a multi-PDP spot check- Facet detection hardened: Strategy 1 now skips parent containers that wrap multiple filter groups (fixes the "all filters collapsed into one facet" bug on obfuscated-class sites like Zappos); Strategy 2 replaced with heading-to-heading tree walker so filter groups segment correctly regardless of CSS class names
- B2B fingerprint depth: three new fingerprint fields —
contractPricingVisible,loginRequired,accountPersonalization;audit_storefrontnow uses a dedicatedAUDIT_TIMEOUT_MS(default 240s); PageSpeed Insights Core Web Vitals available viainclude_pagespeed: trueonscrape_page
v1.8.0 — Persona architecture v2
- PA-2 Fingerprint context injection: every persona now receives a
## Page Intelligence (pre-scan)block prepended to its prompt — pageType, platform, commerceMode, trust signal inventory, top risks, and recommended personas — so the AI orients before reading raw product data - PA-4 Unified base schema: all personas return
score(0–100),severity(1–5),findings[](3–5 observations),uniqueInsight— enabling structured cross-persona comparison - PA-5 Smart auto-selection:
audit_storefrontacceptspersona: "auto"—selectPersonas(fingerprint)picks the best-fit lens based on pageType and commerceMode - PA-6 Conversion Architect: new CRO persona maps funnel stages, catalogs friction inventory, identifies top drop-off risk, generates A/B hypotheses with estimated lift ranges
- Perf: roundtable log entries no longer embed full result objects —
get_logspayload reduced ~95% for cached re-runs
v1.7.0 — PageFingerprint + synchronous moderator
- PA-3 Synchronous moderator:
merch_roundtablenow awaits the moderator synthesis before returning —debate.consensusanddebate.finalRecommendations[]are guaranteed in the tool response - PA-1 PageFingerprint: every scrape result now includes a
fingerprintfield with no extra AI call —pageType,platform,commerceMode,priceTransparency,trustSignalInventory,discoveryQuality,funnelReadiness,topRisks[],recommendedPersonas[] - Category contamination detector:
scrape_pagereturnscontamination: { detected, suspectCount, suspects[] }when off-category products appear in results get_logstool + file logging: retrieves recent server log entries from an in-memory buffer (500 entries), filterable by level and tool name; setMERCH_LOG_FILEfor NDJSON file logging
v1.6.4
save_eval now works with all tool types, not just merch_roundtable. Convergence score returns null (not 0) for single-persona runs. Auto-detects toolName from whichever persona cache slots are populated.
v1.6.3 — Eval store
Two new tools (save_eval, list_evals) add persistent run tracking. Convergence score (0–100) measures inter-persona agreement on top concerns. Two-tier storage: compact JSONL index (100/domain) + full run JSON (10/domain). Dedup hashing prevents double-saving identical runs.
v1.6.2
Roundtable personas now run in parallel via Promise.all, cutting wall-clock time from ~90s to ~30s. Persona results are written to cache the moment each resolves, so a retry after a timeout picks up where it left off.
v1.6.0 — Network Intelligence Layer
Every scrape_page call now intercepts XHR/fetch responses and fingerprints the commerce stack from 35 platform signatures: Elasticsearch, Algolia, Coveo, Lucidworks Fusion, Bloomreach, Searchspring, SFCC, SAP Hybris, Shopify, Bazaarvoice, and more. When a high-confidence API match is found (≥70%), products and facets are extracted directly from the API response. Deep dataLayer/digitalData parsing surfaces GA4 events, GTM container IDs, A/B experiment assignments, and user segments. Discovered API endpoints are persisted to site memory so the discovery pass only runs once per domain.
v1.5.0 — Scraper expansion
Per-product trust signals (ratings, badges, stock warnings), sort order detection, b2bMode + b2bConflictScore, change detection on repeat visits. New compare_storefronts tool. Multi-step interact_with_page actions array. Optional mobile screenshot. Roundtable streams each persona result as it completes.
v1.4.0
10-minute in-memory page cache. ask_page, audit_storefront, and merch_roundtable reuse recent scrape results, cutting latency in half. Configurable TOOL_TIMEOUT_MS.
v1.3.0
OpenAI-compatible provider support (OpenAI, Groq, Together AI, any OpenAI-compatible endpoint). OPENAI_VISION=true for multimodal models.
v1.2.0
Complete rewrite — lean MCP server replacing the original React + Express UI. Four expert personas, roundtable mode, persistent site memory, dual AI provider support (Anthropic + Gemini).
v1.0.0
Original React + Express application with Gemini-powered merchandising analysis.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.