MCP Servers

sf-architect-mcp

Scrapes, indexes, and serves Salesforce Architect documentation locally, enabling fast offline RAG-powered search and retrieval for AI coding assistants.

README

SF Architect MCP

An MCP server that scrapes, indexes, and serves the Salesforce Architect documentation locally — enabling fast, offline, RAG-powered search and retrieval for AI coding assistants.

Built with Model Context Protocol, cheerio, SQLite, and Turndown.

What it does

Scrapes architect.salesforce.com using fetch + cheerio (the site is server-side rendered)
Indexes content into a local SQLite database with section-aware chunking
Searches using multi-term keyword scoring with section and language filters
Supports 17 languages mirroring the site's locale structure
Exposes MCP tools, resources, and prompts so your AI assistant can navigate and query the docs naturally

Prerequisites

Node.js ≥ 18
No browser binary required — scraping uses plain HTTP requests

Installation

git clone https://github.com/morettimarco/salesforce_architect_MCP.git
cd salesforce_architect_MCP
npm install
npm run build

The compiled server will be at dist/index.js.

Setup in your coding agent

Replace /absolute/path/to/sf-architect-mcp with the actual path where you cloned this repo.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Claude Code (CLI)

claude mcp add sf-architect node /absolute/path/to/sf-architect-mcp/dist/index.js

Or edit ~/.claude/settings.json (global) or .claude/settings.json (project-level):

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Cursor

Edit .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "sf-architect": {
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

VS Code (GitHub Copilot)

Edit .vscode/mcp.json in your workspace:

{
  "servers": {
    "sf-architect": {
      "type": "stdio",
      "command": "node",
      "args": ["/absolute/path/to/sf-architect-mcp/dist/index.js"]
    }
  }
}

First-time setup

Once the server is running in your agent, use the scrape-docs prompt or call the tool directly:

scrape_full  →  language_filter: "en"  (or your preferred language)

This fetches the sitemap, scrapes all pages with Puppeteer, converts them to Markdown, and stores them in a local SQLite database at ~/.sf-architect-mcp/sf-architect.db.

A full English scrape takes roughly 2–3 minutes (≈ 115 pages, 3 concurrent requests).

Tools

Tool	Description
`scrape_full`	Wipe the database and re-scrape everything from scratch
`scrape_incremental`	Scrape only new, changed, or previously failed pages
`search_architect_docs`	Keyword search across indexed content with relevance scoring
`read_architect_page`	Read a page's full Markdown content by URL (supports `max_chars`)
`read_architect_page_summary`	Lightweight summary: title, headings, word count, 500-char preview
`get_section_summary`	Page count, total words, and title list for a section
`list_architect_sections`	List all indexed sections with page counts
`export_architect_section`	Export a full section to a single Markdown file on disk
`get_scrape_status`	Database stats: page counts, sections, last run, pending/failed URLs

Resources

Attach these to your conversation context for orientation:

URI	Description
`sf-architect://guide/usage`	Recommended workflows and tool usage tips
`sf-architect://data/languages`	All supported language codes with display names
`sf-architect://data/sections`	Live section index with current page counts

Prompts

Pre-built workflow templates:

Prompt	Arguments	What it does
`scrape-docs`	`mode`: `full` or `incremental`	Presents available languages, asks which to scrape, then runs the appropriate tool
`research-topic`	`topic`: string	Searches, summarizes relevant pages, and synthesizes findings with citations
`export-section`	`section`: string	Verifies the section exists, shows a summary, then exports to Markdown

Supported languages

Code	Language
`en`	English (default)
`de`	German
`fr`	French
`jp`	Japanese
`zh-cn`	Chinese (Simplified)
`zh-tw`	Chinese (Traditional)
`dk`	Danish
`es`	Spanish
`fi`	Finnish
`it`	Italian
`kr`	Korean
`nl`	Dutch
`no`	Norwegian
`pt-br`	Portuguese (Brazil)
`ru`	Russian
`se`	Swedish
`es-mx`	Spanish (Mexico)
`all`	All languages

Note: Not all languages are available for all sections. The sitemap at scrape time determines what's actually published.

Configuration

The database is stored at ~/.sf-architect-mcp/sf-architect.db by default.

Override with an environment variable:

SF_ARCHITECT_DB_DIR=/custom/path node dist/index.js

Exported markdown files go to ~/.sf-architect-mcp/exports/{section}-{language}.md unless you specify a custom path.

Architecture

src/
├── index.ts          # MCP server — tools, resources, prompts
├── scraper.ts        # fetch + cheerio scraper, concurrency pool
├── sitemap.ts        # Sitemap fetching and URL filtering
├── types.ts          # Shared TypeScript types
├── db/
│   ├── database.ts   # sql.js SQLite, persistence, schema
│   ├── ingest.ts     # Page upsert, chunk sync, scrape run tracking
│   └── queries.ts    # Search, read, export, stats
└── utils/
    ├── chunker.ts    # Section-aware text chunking (1500 chars, 200 overlap)
    └── url-utils.ts  # Language detection from URL path segments

Key design decisions:

fetch + cheerio — the site is fully server-side rendered, no headless browser needed. ~10x faster and cross-platform with zero native dependencies
sql.js (WASM SQLite) — in-process, zero native dependencies, serialized to disk after every 10 pages
Section-aware chunking — detects both Markdown headings and bold-text markers (**Definition:**) which the site uses instead of semantic HTML headings
LRU cache on search results (500 entries, 5-minute TTL)
Content hash comparison for incremental scraping — only re-indexes pages whose content has actually changed

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured