sf-architect-mcp
Scrapes, indexes, and serves Salesforce Architect documentation locally, enabling fast offline RAG-powered search and retrieval for AI coding assistants.
README
SF Architect MCP
An MCP server that scrapes, indexes, and serves the Salesforce Architect documentation locally — enabling fast, offline, RAG-powered search and retrieval for AI coding assistants.
Built with Model Context Protocol, cheerio, SQLite, and Turndown.
What it does
- Scrapes
architect.salesforce.comusing fetch + cheerio (the site is server-side rendered) - Indexes content into a local SQLite database with section-aware chunking
- Searches using multi-term keyword scoring with section and language filters
- Supports 17 languages mirroring the site's locale structure
- Exposes MCP tools, resources, and prompts so your AI assistant can navigate and query the docs naturally
Prerequisites
- Node.js ≥ 18
- No browser binary required — scraping uses plain HTTP requests
Installation
git clone https://github.com/morettimarco/salesforce_architect_MCP.git
cd salesforce_architect_MCP
npm install
npm run build
The compiled server will be at dist/index.js.
Setup in your coding agent
Replace /absolute/path/to/sf-architect-mcp with the actual path where you cloned this repo.
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"sf-architect": {
"command": "node",
"args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
}
}
}
Claude Code (CLI)
claude mcp add sf-architect node /absolute/path/to/sf-architect-mcp/dist/index.js
Or edit ~/.claude/settings.json (global) or .claude/settings.json (project-level):
{
"mcpServers": {
"sf-architect": {
"command": "node",
"args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
}
}
}
Cursor
Edit .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):
{
"mcpServers": {
"sf-architect": {
"command": "node",
"args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
}
}
}
Windsurf
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"sf-architect": {
"command": "node",
"args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
}
}
}
VS Code (GitHub Copilot)
Edit .vscode/mcp.json in your workspace:
{
"servers": {
"sf-architect": {
"type": "stdio",
"command": "node",
"args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
}
}
}
First-time setup
Once the server is running in your agent, use the scrape-docs prompt or call the tool directly:
scrape_full → language_filter: "en" (or your preferred language)
This fetches the sitemap, scrapes all pages with Puppeteer, converts them to Markdown, and stores them in a local SQLite database at ~/.sf-architect-mcp/sf-architect.db.
A full English scrape takes roughly 2–3 minutes (≈ 115 pages, 3 concurrent requests).
Tools
| Tool | Description |
|---|---|
scrape_full |
Wipe the database and re-scrape everything from scratch |
scrape_incremental |
Scrape only new, changed, or previously failed pages |
search_architect_docs |
Keyword search across indexed content with relevance scoring |
read_architect_page |
Read a page's full Markdown content by URL (supports max_chars) |
read_architect_page_summary |
Lightweight summary: title, headings, word count, 500-char preview |
get_section_summary |
Page count, total words, and title list for a section |
list_architect_sections |
List all indexed sections with page counts |
export_architect_section |
Export a full section to a single Markdown file on disk |
get_scrape_status |
Database stats: page counts, sections, last run, pending/failed URLs |
Resources
Attach these to your conversation context for orientation:
| URI | Description |
|---|---|
sf-architect://guide/usage |
Recommended workflows and tool usage tips |
sf-architect://data/languages |
All supported language codes with display names |
sf-architect://data/sections |
Live section index with current page counts |
Prompts
Pre-built workflow templates:
| Prompt | Arguments | What it does |
|---|---|---|
scrape-docs |
mode: full or incremental |
Presents available languages, asks which to scrape, then runs the appropriate tool |
research-topic |
topic: string |
Searches, summarizes relevant pages, and synthesizes findings with citations |
export-section |
section: string |
Verifies the section exists, shows a summary, then exports to Markdown |
Supported languages
| Code | Language |
|---|---|
en |
English (default) |
de |
German |
fr |
French |
jp |
Japanese |
zh-cn |
Chinese (Simplified) |
zh-tw |
Chinese (Traditional) |
dk |
Danish |
es |
Spanish |
fi |
Finnish |
it |
Italian |
kr |
Korean |
nl |
Dutch |
no |
Norwegian |
pt-br |
Portuguese (Brazil) |
ru |
Russian |
se |
Swedish |
es-mx |
Spanish (Mexico) |
all |
All languages |
Note: Not all languages are available for all sections. The sitemap at scrape time determines what's actually published.
Configuration
The database is stored at ~/.sf-architect-mcp/sf-architect.db by default.
Override with an environment variable:
SF_ARCHITECT_DB_DIR=/custom/path node dist/index.js
Exported markdown files go to ~/.sf-architect-mcp/exports/{section}-{language}.md unless you specify a custom path.
Architecture
src/
├── index.ts # MCP server — tools, resources, prompts
├── scraper.ts # fetch + cheerio scraper, concurrency pool
├── sitemap.ts # Sitemap fetching and URL filtering
├── types.ts # Shared TypeScript types
├── db/
│ ├── database.ts # sql.js SQLite, persistence, schema
│ ├── ingest.ts # Page upsert, chunk sync, scrape run tracking
│ └── queries.ts # Search, read, export, stats
└── utils/
├── chunker.ts # Section-aware text chunking (1500 chars, 200 overlap)
└── url-utils.ts # Language detection from URL path segments
Key design decisions:
- fetch + cheerio — the site is fully server-side rendered, no headless browser needed. ~10x faster and cross-platform with zero native dependencies
- sql.js (WASM SQLite) — in-process, zero native dependencies, serialized to disk after every 10 pages
- Section-aware chunking — detects both Markdown headings and bold-text markers (
**Definition:**) which the site uses instead of semantic HTML headings - LRU cache on search results (500 entries, 5-minute TTL)
- Content hash comparison for incremental scraping — only re-indexes pages whose content has actually changed
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.