sf-architect-mcp

sf-architect-mcp

Scrapes, indexes, and serves Salesforce Architect documentation locally, enabling fast offline RAG-powered search and retrieval for AI coding assistants.

Category
Visit Server

README

SF Architect MCP

An MCP server that scrapes, indexes, and serves the Salesforce Architect documentation locally — enabling fast, offline, RAG-powered search and retrieval for AI coding assistants.

Built with Model Context Protocol, cheerio, SQLite, and Turndown.


What it does

  • Scrapes architect.salesforce.com using fetch + cheerio (the site is server-side rendered)
  • Indexes content into a local SQLite database with section-aware chunking
  • Searches using multi-term keyword scoring with section and language filters
  • Supports 17 languages mirroring the site's locale structure
  • Exposes MCP tools, resources, and prompts so your AI assistant can navigate and query the docs naturally

Prerequisites

  • Node.js ≥ 18
  • No browser binary required — scraping uses plain HTTP requests

Installation

git clone https://github.com/morettimarco/salesforce_architect_MCP.git
cd salesforce_architect_MCP
npm install
npm run build

The compiled server will be at dist/index.js.


Setup in your coding agent

Replace /absolute/path/to/sf-architect-mcp with the actual path where you cloned this repo.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Claude Code (CLI)

claude mcp add sf-architect node /absolute/path/to/sf-architect-mcp/dist/index.js

Or edit ~/.claude/settings.json (global) or .claude/settings.json (project-level):

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Cursor

Edit .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

VS Code (GitHub Copilot)

Edit .vscode/mcp.json in your workspace:

{
  "servers": {
    "sf-architect": {
      "type": "stdio",
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

First-time setup

Once the server is running in your agent, use the scrape-docs prompt or call the tool directly:

scrape_full  →  language_filter: "en"  (or your preferred language)

This fetches the sitemap, scrapes all pages with Puppeteer, converts them to Markdown, and stores them in a local SQLite database at ~/.sf-architect-mcp/sf-architect.db.

A full English scrape takes roughly 2–3 minutes (≈ 115 pages, 3 concurrent requests).


Tools

Tool Description
scrape_full Wipe the database and re-scrape everything from scratch
scrape_incremental Scrape only new, changed, or previously failed pages
search_architect_docs Keyword search across indexed content with relevance scoring
read_architect_page Read a page's full Markdown content by URL (supports max_chars)
read_architect_page_summary Lightweight summary: title, headings, word count, 500-char preview
get_section_summary Page count, total words, and title list for a section
list_architect_sections List all indexed sections with page counts
export_architect_section Export a full section to a single Markdown file on disk
get_scrape_status Database stats: page counts, sections, last run, pending/failed URLs

Resources

Attach these to your conversation context for orientation:

URI Description
sf-architect://guide/usage Recommended workflows and tool usage tips
sf-architect://data/languages All supported language codes with display names
sf-architect://data/sections Live section index with current page counts

Prompts

Pre-built workflow templates:

Prompt Arguments What it does
scrape-docs mode: full or incremental Presents available languages, asks which to scrape, then runs the appropriate tool
research-topic topic: string Searches, summarizes relevant pages, and synthesizes findings with citations
export-section section: string Verifies the section exists, shows a summary, then exports to Markdown

Supported languages

Code Language
en English (default)
de German
fr French
jp Japanese
zh-cn Chinese (Simplified)
zh-tw Chinese (Traditional)
dk Danish
es Spanish
fi Finnish
it Italian
kr Korean
nl Dutch
no Norwegian
pt-br Portuguese (Brazil)
ru Russian
se Swedish
es-mx Spanish (Mexico)
all All languages

Note: Not all languages are available for all sections. The sitemap at scrape time determines what's actually published.


Configuration

The database is stored at ~/.sf-architect-mcp/sf-architect.db by default.

Override with an environment variable:

SF_ARCHITECT_DB_DIR=/custom/path node dist/index.js

Exported markdown files go to ~/.sf-architect-mcp/exports/{section}-{language}.md unless you specify a custom path.


Architecture

src/
├── index.ts          # MCP server — tools, resources, prompts
├── scraper.ts        # fetch + cheerio scraper, concurrency pool
├── sitemap.ts        # Sitemap fetching and URL filtering
├── types.ts          # Shared TypeScript types
├── db/
│   ├── database.ts   # sql.js SQLite, persistence, schema
│   ├── ingest.ts     # Page upsert, chunk sync, scrape run tracking
│   └── queries.ts    # Search, read, export, stats
└── utils/
    ├── chunker.ts    # Section-aware text chunking (1500 chars, 200 overlap)
    └── url-utils.ts  # Language detection from URL path segments

Key design decisions:

  • fetch + cheerio — the site is fully server-side rendered, no headless browser needed. ~10x faster and cross-platform with zero native dependencies
  • sql.js (WASM SQLite) — in-process, zero native dependencies, serialized to disk after every 10 pages
  • Section-aware chunking — detects both Markdown headings and bold-text markers (**Definition:**) which the site uses instead of semantic HTML headings
  • LRU cache on search results (500 entries, 5-minute TTL)
  • Content hash comparison for incremental scraping — only re-indexes pages whose content has actually changed

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured