MCP Servers

MCP Data Catalog

An MCP server that provides AI assistants with structured, type-safe access to tabular datasets from CSV files. It enables users to list, describe, and query data using filters and projections with support for hot reloading.

README

MCP Data Catalog

A Model Context Protocol (MCP) server that provides AI assistants with structured access to tabular datasets from CSV files. Query, filter, and retrieve data through a clean, type-safe interface.

Features (MVP)

✅ 4 MCP Tools

list_datasets - List all available datasets
describe_dataset - Get schema and field information
query_dataset - Query with filters, projections, and limits
get_by_id - Retrieve specific row by lookup key

✅ Type-Safe Schema

String, number, boolean, and enum field types
Field validation and type checking
Required field enforcement

✅ Filtering (MVP)

eq (equals)
contains (case-insensitive substring)
and (multiple conditions)

✅ Smart Limits

Per-dataset row limits
Truncation indicators
Configurable defaults

✅ Hot Reload

Config changes apply automatically (1-3ms)
No server restart needed
Invalid configs are rejected safely

✅ Stable MVP

Hexagonal architecture
Comprehensive test coverage
Type-safe implementation
Production-quality error handling

Quick Start

1. Install

npm install
npm run build

2. Configure

Copy an example configuration:

cp examples/config/typical.json config/datasets.json

Or create your own:

{
  "datasets": [
    {
      "id": "users",
      "name": "User Directory",
      "schema": {
        "fields": [
          { "name": "id", "type": "number", "required": true },
          { "name": "name", "type": "string", "required": true },
          { "name": "email", "type": "string", "required": true },
          { "name": "role", "type": "enum", "values": ["admin", "user", "guest"] }
        ],
        "visibleFields": ["id", "name", "email", "role"]
      },
      "source": {
        "type": "csv",
        "path": "./data/users.csv"
      },
      "lookupKey": "id",
      "limits": {
        "maxRows": 100,
        "defaultRows": 20
      }
    }
  ]
}

3. Run

npm run dev

The server starts on stdio and exposes 4 MCP tools.

Usage

MCP Client Configuration

Add to your MCP client config (e.g., Claude Desktop):

{
  "mcpServers": {
    "catalog": {
      "command": "node",
      "args": ["/path/to/catalog-mcp/dist/index.js"],
      "env": {
        "CONFIG_PATH": "/path/to/config/datasets.json"
      }
    }
  }
}

Note: Replace /path/to/... with your actual local file paths. The MCP server runs as a Node.js process and reads the CONFIG_PATH environment variable at startup.

Available Tools

1. list_datasets

List all configured datasets.

Request:

{
  "tool": "list_datasets"
}

Response:

{
  "datasets": [
    {
      "id": "users",
      "name": "User Directory"
    },
    {
      "id": "products",
      "name": "Product Catalog"
    }
  ]
}

2. describe_dataset

Get detailed schema information for a dataset.

Request:

{
  "tool": "describe_dataset",
  "arguments": {
    "datasetId": "users"
  }
}

Response:

{
  "id": "users",
  "name": "User Directory",
  "fields": [
    {
      "name": "id",
      "type": "number",
      "required": true
    },
    {
      "name": "name",
      "type": "string",
      "required": true
    },
    {
      "name": "role",
      "type": "enum",
      "values": ["admin", "user", "guest"]
    }
  ],
  "lookupKey": "id",
  "limits": {
    "maxRows": 100,
    "defaultRows": 20
  }
}

3. query_dataset

Query a dataset with optional filters, field projection, and limits.

Simple Query:

{
  "tool": "query_dataset",
  "arguments": {
    "datasetId": "users",
    "limit": 10
  }
}

With Filter:

{
  "tool": "query_dataset",
  "arguments": {
    "datasetId": "users",
    "filters": {
      "field": "role",
      "op": "eq",
      "value": "admin"
    },
    "limit": 10
  }
}

With Multiple Filters:

{
  "tool": "query_dataset",
  "arguments": {
    "datasetId": "users",
    "filters": {
      "and": [
        { "field": "role", "op": "eq", "value": "admin" },
        { "field": "name", "op": "contains", "value": "smith" }
      ]
    }
  }
}

With Field Projection:

{
  "tool": "query_dataset",
  "arguments": {
    "datasetId": "users",
    "fields": ["id", "name"],
    "limit": 10
  }
}

Response:

{
  "rows": [
    { "id": 1, "name": "Alice Smith", "email": "alice@example.com", "role": "admin" },
    { "id": 6, "name": "Frank Miller", "email": "frank@example.com", "role": "admin" }
  ],
  "fields": ["id", "name", "email", "role"],
  "rowsReturned": 2,
  "totalRows": 2,
  "truncated": false
}

4. get_by_id

Retrieve a single row by its lookup key.

Request:

{
  "tool": "get_by_id",
  "arguments": {
    "datasetId": "users",
    "id": "1"
  }
}

Response:

{
  "row": {
    "id": 1,
    "name": "Alice Smith",
    "email": "alice@example.com",
    "role": "admin"
  }
}

If not found:

{
  "row": null
}

Filter Operators

The MVP supports three operators:

eq (equals)

Exact match (case-sensitive for strings).

{
  "field": "role",
  "op": "eq",
  "value": "admin"
}

contains (substring)

Case-insensitive substring search.

{
  "field": "name",
  "op": "contains",
  "value": "smith"
}

and (conjunction)

All conditions must be true.

{
  "and": [
    { "field": "role", "op": "eq", "value": "admin" },
    { "field": "active", "op": "eq", "value": true }
  ]
}

Filter Schema

All filter expressions follow this canonical JSON structure:

Simple filter:

{
  "field": "fieldName",
  "op": "eq" | "contains",
  "value": any
}

Compound filter (and):

{
  "and": [
    { "field": "...", "op": "...", "value": ... },
    { "field": "...", "op": "...", "value": ... }
  ]
}

Post-MVP: Additional operators coming (ne, gt, gte, lt, lte, in, or) using the same structure.

Configuration

Dataset Structure

{
  "datasets": [
    {
      "id": "string",              // Unique identifier
      "name": "string",            // Display name
      "schema": {
        "fields": [                // Field definitions
          {
            "name": "string",      // Field name (matches CSV header)
            "type": "string",      // string | number | boolean | enum
            "required": boolean,   // Optional, default: false
            "values": ["..."]      // Required for enum type
          }
        ],
        "visibleFields": ["..."]   // Fields accessible in queries
      },
      "source": {
        "type": "csv",             // Only CSV in MVP
        "path": "string"           // Relative path to CSV file
      },
      "lookupKey": "string",       // Optional, field name for get_by_id
      "limits": {
        "maxRows": number,         // Hard limit
        "defaultRows": number      // Default when not specified
      }
    }
  ]
}

Field Types

Type	Description	Example Values
`string`	Text data	`"Alice"`, `"alice@example.com"`
`number`	Numeric data	`42`, `99.99`, `-5`
`boolean`	True/false	`true`, `false`
`enum`	Predefined values	`"admin"` (must be in `values` array)

Configuration Format

Configuration uses JSON format. This is the primary and recommended format for the MVP.

Note: YAML support may be added in future versions, but JSON remains the canonical format.

Configuration Validation

The server validates configuration on startup and rejects invalid configs:

✅ Checks performed:

All required fields present
Field types are valid
Enum fields have non-empty values arrays
visibleFields reference existing fields
lookupKey references an existing field
Dataset IDs are unique
Limits are valid (positive, maxRows ≥ defaultRows)
CSV files exist and are readable

Fail-Fast: Invalid configuration prevents server startup with clear error messages.

CSV File Format

Requirements

Header row with column names (first row)
Column names must match field definitions (case-sensitive)
Data types must match field types
UTF-8 encoding
Standard CSV format (comma-delimited)

Example

id,name,email,role,active
1,Alice Smith,alice@example.com,admin,true
2,Bob Johnson,bob@example.com,user,true
3,Charlie Brown,charlie@example.com,guest,false

Type Formatting

Boolean: Must be true or false (lowercase)

active
true
false

Number: Integers or decimals

price,quantity
99.99,5
149.99,10

Enum: Must match one of the configured values

role
admin
user
guest

Hot Reload

Configuration changes are detected automatically:

Edit config/datasets.json
Save the file
Changes apply in 1-3ms (catalog swap only)
Invalid changes are rejected (keeps current config)

Watch the logs:

[INFO] Config reloaded successfully in 2ms

How it works:

Config file is watched for changes
On change: validates new config
If valid: atomically swaps to new catalog
If invalid: preserves current state, logs error

No server restart needed!

AI Usage Guidelines

This server is optimized for local, schema-aware access to CSV-backed reference data – the kind of data I use for project design, exploration, documentation aggregation, and hobby systems. For high-volume or mission-critical production workloads, you would typically pair LLMs with a dedicated database-backed MCP server and keep this catalog focused on lightweight, structured datasets close to the agent.

When designing datasets:

Dataset Design for AI

Keep datasets focused:

Small, single-purpose tables work better than large multi-purpose sheets
Separate reference data (IDs, names, codes) from descriptive content
Break complex domains into multiple related datasets

Optimize for token efficiency:

Use visibleFields to expose only necessary columns
Keep field names short but meaningful
Prefer IDs and codes over long text fields for filtering

Design for stable querying:

Use consistent, stable identifiers (numeric IDs, SKUs, codes)
Avoid relying on free-text names as lookup keys
Normalize categorical data (use enums, not free text)

Structure for filtering:

Tag-based fields enable flexible queries (status, category, type)
Use enums for controlled vocabularies
Boolean flags for common filters (active, published, available)

Example patterns:

Index dataset: IDs, names, tags, status (small, frequently queried)
Detail dataset: Full records with all fields (queried by ID)
Reference dataset: Lookup tables, enums, measurement scales (small, stable)

Architecture

This project follows Hexagonal Architecture for clean separation of concerns:

┌─────────────────────────────────────────────────────┐
│              Primary Adapters (MCP)                 │
│                       ↓                             │
│              Primary Ports (Tools)                  │
│                       ↓                             │
│              Use Cases (Business Logic)             │
│                       ↓                             │
│              Domain Layer (Pure Logic)              │
│                       ↓                             │
│              Secondary Ports (Storage)              │
│                       ↓                             │
│         Secondary Adapters (CSV, Config)            │
└─────────────────────────────────────────────────────┘

Key Principles:

Domain layer has zero external dependencies
Dependencies point inward (adapters → domain)
Business logic is isolated and testable
Easy to extend with new adapters

See docs/dev/mcp-data-catalog.md for detailed architecture documentation.

Examples

The examples/ directory contains:

Configurations

minimal.json - Single dataset, basic features
typical.json - Multiple datasets, common patterns
advanced.json - Complex scenarios with many features

Datasets

minimal.csv - 5 rows, 2 columns
sample-users.csv - 10 users with roles
sample-products.csv - 15 products with categories
employees.csv - 15 employees with departments
inventory.csv - 20 inventory items
orders.csv - 20 customer orders

Try them:

cp examples/config/typical.json config/datasets.json
npm run dev

See examples/README.md for detailed documentation.

Development

Setup

npm install
npm run build

Run in Development Mode

npm run dev

Run Tests

# All tests
npm test

# With coverage
npm run test:coverage

# Watch mode
npm run test:watch

Build for Production

npm run build

Output in dist/ directory.

Testing

Test Coverage:

Comprehensive test suite with high coverage
Unit tests for domain logic and use cases
Integration tests for MCP tools and hot reload
Both statement and branch coverage tracked

Test Structure:

test/
├── unit/
│   ├── domain/           # Domain services and value objects
│   └── use-cases/        # Use case orchestration
└── integration/
    ├── mcp-tools.test.ts           # All 4 MCP tools
    ├── config-hot-reload.test.ts   # Hot reload behavior
    ├── csv-hot-visibility.test.ts  # Visible fields
    └── limits-and-truncation.test.ts # Row limits

Performance

Characteristics:

Config reload: 1-3ms (catalog swap only)
CSV load: 5-10ms per file (varies with size)
Query execution: 1-2ms for in-memory operations
Memory: O(n) where n = dataset size

CSV Loading Behavior:

CSV files are read on-demand for each query
No in-memory caching of CSV data
This keeps memory usage low but includes file I/O in query latency
Config catalog is cached; only CSV data is loaded per-query

Scalability:

Suitable for datasets up to ~100K rows
Query latency includes file read time (~5-10ms per CSV)
For high-performance or large datasets, use database backends (post-MVP)
Consider dataset design: multiple small CSVs better than one large CSV

Roadmap

✅ MVP Complete (All 6 Phases)

[x] Hexagonal architecture
[x] 4 MCP tools (list, describe, query, get_by_id)
[x] MVP filter operators (eq, contains, and)
[x] Type validation (string, number, boolean, enum)
[x] Hot reload support
[x] Comprehensive test coverage
[x] Error handling and logging
[x] Complete documentation (README, dev guide, examples)

🚀 Post-MVP Features

[ ] Additional filter operators (ne, gt, gte, lt, lte, in, or)
[ ] Sorting (ORDER BY)
[ ] Complex types (arrays, nested objects)
[ ] Multiple data sources (PostgreSQL, SQLite, JSON)
[ ] Aggregations (COUNT, SUM, AVG, etc.)
[ ] Full-text search
[ ] Caching layer for performance
[ ] GraphQL-style query language

See docs/project-plans/project-plan-v1.final.md for details.

Documentation

Project Overview - Development approach and engineering narrative
Developer Guide - Architecture and internals
Configuration Reference - Complete config documentation
Examples - Sample datasets and configs
Project Plan - MVP scope and roadmap
Phase Execution - Implementation tracking

Contributing

Contributions welcome! This project follows:

Hexagonal architecture - Keep domain pure
Test-driven development - Write tests first
Type safety - Leverage TypeScript
Clean code - Follow existing patterns

See .clinerules/core-rules.md for architectural guidelines.

Troubleshooting

Server won't start

Check configuration:

node dist/index.js
# Look for validation errors in output

Common issues:

CSV file path incorrect
Field type mismatch
Missing required fields
Duplicate dataset IDs

Queries return no results

Verify dataset:

{
  "tool": "describe_dataset",
  "arguments": { "datasetId": "your-dataset" }
}

Check:

Dataset ID is correct
CSV file has data
Filters match field types
Field names are in visibleFields

Hot reload not working

Verify file watching:

Config file path is correct
File system permissions allow reading
Check server logs for reload confirmation

License

MIT

Credits

Built with:

TypeScript
Model Context Protocol SDK
Zod - Schema validation
Chokidar - File watching
Jest - Testing

Project Status

Version: 1.0.0-mvp
Status: Stable MVP
Last Updated: 2025-11-30

Note: Test and performance numbers in badges reflect the state at release. Designed for production-style workloads, but validate performance and fit for your specific environment.

All 6 phases complete:

✅ Phase 1: Skeleton & Config
✅ Phase 2: Core Use Cases
✅ Phase 3: Hot Reload
✅ Phase 4: MCP Adapter
✅ Phase 5: Hardening & Testing
✅ Phase 6: Documentation

See docs/execution/master-checklist.md for detailed progress.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

E2B

Using MCP to run code via e2b.

Official

Featured