DataSF MCP Server
Enables AI assistants to search, explore, and query San Francisco's open data portal through a standardized interface for public datasets. It supports SQL-like querying via the Socrata platform and includes features like fuzzy column matching and schema caching.
README
DataSF MCP Server
A Model Context Protocol (MCP) server that provides LLMs with seamless access to San Francisco's open data portal (DataSF), powered by the Socrata platform.
Overview
This MCP server enables AI assistants like Claude to search, explore, and query San Francisco's public datasets through a simple, standardized interface. It handles the complexity of the Socrata API, provides intelligent column name correction, and includes schema caching for optimal performance.
Key Features
- š Dataset Search & Discovery - Find datasets by keywords or browse by category
- š Schema Retrieval - Get column names and data types before querying
- š¬ SoQL Query Execution - Run SQL-like queries against any dataset
- šÆ Fuzzy Column Matching - Auto-corrects typos in column names
- ā” Schema Caching - Reduces API calls with intelligent caching
- š Optional Authentication - Supports Socrata App Tokens for higher rate limits
- ā Property-Based Testing - Comprehensive correctness guarantees
Available Tools
1. search_datasf
Search for datasets by keywords.
Parameters:
query(string, required): Search keywords (1-500 characters)limit(number, optional): Max results (default: 5, max: 20)
Example:
Search for police incident datasets
2. list_datasf
Browse available datasets, optionally filtered by category.
Parameters:
category(string, optional): Filter by categorylimit(number, optional): Max results (default: 5, max: 20)
Example:
List recent public safety datasets
3. get_schema
Get the schema (columns and data types) for a specific dataset.
Parameters:
dataset_id(string, required): Dataset 4x4 ID (format:xxxx-xxxx)
Example:
Get the schema for dataset wg3w-h783
4. query_datasf
Execute a SoQL (Socrata Query Language) query against a dataset.
Parameters:
dataset_id(string, required): Dataset 4x4 IDsoql(string, required): SoQL query (1-4000 characters)auto_correct(boolean, optional): Enable column name correction (default: true)
Example:
Query dataset wg3w-h783: SELECT incident_category, COUNT(*) GROUP BY incident_category LIMIT 10
Installation
Prerequisites
- Node.js 18 or higher
- npm or yarn
Local Setup (Optional)
If you want to run or modify the server locally:
- Clone the repository:
git clone https://github.com/fwextensions/datasf-mcp.git
cd datasf-mcp
- Install dependencies:
npm install
- Run the server:
npm start
The server uses tsx to run TypeScript directly without a build step.
Usage
Testing with MCP Inspector
For the MCP Inspector, you'll need to use the local installation:
# First, clone and install locally
git clone https://github.com/fwextensions/datasf-mcp.git
cd datasf-mcp
npm install
# Then run the inspector
npx -y @modelcontextprotocol/inspector tsx src/index.ts
In the inspector UI, use:
- Command:
tsx - Arguments:
src/index.ts(or absolute path if running from outside the directory)
Quick Start with npx (Recommended)
The easiest way to use the server is directly from GitHub using npx:
{
"mcpServers": {
"datasf": {
"command": "npx",
"args": ["-y", "github:fwextensions/datasf-mcp"],
"env": {
"SOCRATA_APP_TOKEN": "your-optional-token"
}
}
}
}
This will automatically download and run the latest version from GitHub without any manual installation.
Local Installation
Alternatively, clone and install locally:
git clone https://github.com/fwextensions/datasf-mcp.git
cd datasf-mcp
npm install
Then use the absolute path in your MCP configuration (see below).
Configuration for Claude Desktop
Add to your Claude Desktop config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Option 1: Using npx (recommended)
{
"mcpServers": {
"datasf": {
"command": "npx",
"args": ["-y", "github:fwextensions/datasf-mcp"],
"env": {
"SOCRATA_APP_TOKEN": "your-optional-token"
}
}
}
}
Option 2: Using local installation
{
"mcpServers": {
"datasf": {
"command": "npx",
"args": ["tsx", "/absolute/path/to/datasf-mcp/src/index.ts"],
"env": {
"SOCRATA_APP_TOKEN": "your-optional-token"
}
}
}
}
Important: Replace /absolute/path/to/datasf-mcp with the actual full path to where you cloned this project.
Configuration for Kiro IDE
Create or edit .kiro/settings/mcp.json:
Option 1: Using npx from GitHub (recommended)
{
"mcpServers": {
"datasf": {
"command": "npx",
"args": ["-y", "github:fwextensions/datasf-mcp"],
"env": {
"SOCRATA_APP_TOKEN": "your-optional-token"
},
"disabled": false,
"autoApprove": []
}
}
}
Option 2: Using local installation
{
"mcpServers": {
"datasf": {
"command": "npx",
"args": ["tsx", "src/index.ts"],
"env": {
"SOCRATA_APP_TOKEN": "your-optional-token"
},
"disabled": false,
"autoApprove": []
}
}
}
Getting a Socrata App Token
The server works without authentication for public data, but an App Token increases rate limits:
- Visit https://data.sfgov.org/
- Sign up for a free account
- Navigate to Developer Settings
- Create a new App Token
- Add it to your MCP configuration
Development
Project Structure
datasf-mcp-server/
āāā src/
ā āāā index.ts # MCP server entry point
ā āāā socrataClient.ts # Socrata API client
ā āāā validator.ts # Input validation with Zod
ā āāā fuzzyMatcher.ts # Column name auto-correction
ā āāā cache.ts # Schema caching
ā āāā errorHandler.ts # Error handling utilities
ā āāā __tests__/
ā āāā property/ # Property-based tests
āāā dist/ # Compiled JavaScript output
āāā package.json
āāā tsconfig.json
Available Scripts
npm run build- Compile TypeScript to JavaScriptnpm start- Run the compiled servernpm test- Run all testsnpm run test:watch- Run tests in watch mode
Running Tests
npm test
The project uses property-based testing with fast-check to ensure correctness across a wide range of inputs.
Architecture
The server follows a modular architecture:
- MCP Server - Handles protocol communication via stdio
- Socrata Client - Manages HTTP requests to Socrata APIs
- Validator - Validates all inputs using Zod schemas
- Fuzzy Matcher - Corrects column name typos using Fuse.js
- Schema Cache - Caches dataset schemas in memory (5-minute TTL)
- Error Handler - Classifies and formats errors for LLM consumption
Example Queries
Once configured in your LLM, you can ask questions like:
- "Search for datasets about housing in San Francisco"
- "What's the schema for the police incidents dataset (wg3w-h783)?"
- "Show me the top 10 incident categories from the police incidents dataset"
- "Find all building permits issued in 2024"
- "What datasets are available about transportation?"
API Endpoints Used
The server interacts with three Socrata APIs:
- Discovery API:
https://api.us.socrata.com/api/catalog/v1- Dataset search and browsing - Views API:
https://data.sfgov.org/api/views/{id}.json- Schema retrieval - Resource API:
https://data.sfgov.org/resource/{id}.json- Data querying
Error Handling
The server provides descriptive error messages for:
- Validation errors - Invalid input format or length
- Not found - Dataset doesn't exist
- Rate limiting - Too many requests (add App Token to resolve)
- Timeouts - Request exceeded 30 seconds
- API errors - Socrata-specific errors (e.g., SoQL syntax errors)
Contributing
Contributions are welcome! The project uses:
- TypeScript for type safety
- Zod for runtime validation
- fast-check for property-based testing
- Vitest as the test runner
License
MIT
Resources
Troubleshooting
Server not starting
- Ensure you ran
npm run buildfirst - Check that Node.js 18+ is installed
Tools not showing up in LLM
- Verify the path in your config is absolute
- Restart your LLM application after adding the config
- Check the LLM's logs for connection errors
Rate limiting errors
- Add a Socrata App Token to your configuration
- Reduce the frequency of requests
Column name errors in queries
- Use
get_schemafirst to see valid column names - Enable
auto_correct: true(default) for automatic typo correction
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.