data-filter-mcp

data-filter-mcp

Local MCP server that registers restricted Python filters and runs them against local JSON, YAML, and TXT files, enabling safe data filtering and file conversion.

Category
Visit Server

README

data-filter-mcp

Local MCP server that registers restricted Python filters and runs them against local json, yaml, and txt files.

What it does

  • register_filter accepts Python source code with exactly one top-level function: def filter_item(data):
  • run_filter loads a local file, passes the loaded document into filter_item(data), and returns the text from result_text
  • convert_file loads a local file, passes it into filter_item(data), and writes the returned text to another local file
  • Registered filters live only in memory and expire automatically based on server TTL settings

What filter code may use

Filter bodies are AST-validated against a whitelist. In addition to a curated set of builtins (len, sorted, max, min, range, enumerate, zip, sum, any, all, conversions, etc.) and safe string/dict/list methods, filters may also use a curated set of standard-library modules. Modules are exposed by their canonical names (math, datetime, hashlib, etc.). Filesystem, process, network, and unsafe serialization modules (os, pathlib, shutil, subprocess, socket, urllib, pickle, etc.) are intentionally not available.

  • lambda expressions — typically as key= arguments, e.g. sorted(data, key=lambda item: item.get("score")). Lambda bodies are validated by the same rules as the rest of the filter.
  • jsonjson.loads, json.dumps.
  • yamlyaml.safe_load, yaml.safe_dump. The unsafe yaml.load / yaml.dump are intentionally not exposed.
  • rere.match, re.search, re.fullmatch, re.findall, re.sub, re.subn, re.compile, re.escape, plus Match / Pattern methods (group, groups, groupdict, start, end, span).
  • math — numeric helpers such as math.ceil, math.floor, math.sqrt, math.log, math.exp, math.pow, math.factorial, math.gcd, math.lcm, math.isfinite, math.isclose.
  • statistics — aggregates such as statistics.mean, statistics.median, statistics.stdev, statistics.variance, statistics.quantiles.
  • datetimedatetime.datetime.fromisoformat, datetime.datetime.now, datetime.timedelta, datetime.timezone.utc, and instance methods such as isoformat, strftime, timestamp, weekday, total_seconds. General instance attribute reads such as dt.year and dt.month are not supported by the current policy.
  • decimaldecimal.Decimal(...), quantize, normalize, to_eng_string, to_integral_value.
  • collectionscollections.Counter, collections.defaultdict, collections.OrderedDict, collections.deque, plus methods such as most_common, elements, popleft, appendleft, rotate.
  • itertoolschain, chain.from_iterable, islice, takewhile, dropwhile, groupby, starmap, accumulate, combinations, permutations, product, filterfalse.
  • functoolsreduce, partial, cmp_to_key, wraps. Caching decorators such as lru_cache and cache are intentionally not exposed because they can retain process-local state across filter calls.
  • operatoritemgetter, methodcaller, and arithmetic/comparison helpers such as add, mul, lt, eq, gt. attrgetter is intentionally not exposed.
  • textwrapfill, wrap, shorten, indent, dedent.
  • htmlhtml.escape, html.unescape.
  • base64b64encode, b64decode, urlsafe_b64encode, urlsafe_b64decode, b32encode, b32decode, b16encode, b16decode.
  • hashlibhashlib.sha256, hashlib.sha1, hashlib.md5, hashlib.blake2b, hashlib.new, plus hash object methods such as hexdigest, digest, update.
  • ipaddressip_address, ip_network, ip_interface, IPv4Network, IPv6Network, plus methods such as supernet, subnets, hosts, overlaps, subnet_of, supernet_of. General instance attribute reads such as addr.is_private and addr.compressed are not supported by the current policy.
  • unicodedatacategory, name, lookup, numeric, digit, decimal, bidirectional, combining, mirrored.
  • difflibget_close_matches, ndiff, unified_diff, context_diff, SequenceMatcher.

Note: re.compile runs against patterns supplied by filter code, so a pathological pattern can stall the server (ReDoS). Some helpers such as difflib.SequenceMatcher can also be CPU-heavy on large inputs. Treat filter source as trusted-but-restricted.

Run with uvx

After publishing to PyPI, start the server with:

uvx data-filter-mcp --filter-ttl-seconds 3600 --cleanup-interval-seconds 60

Show the available CLI flags with:

uvx data-filter-mcp --help

Restricting file access with --workdir

By default the server can read any file on the local filesystem. Use one or more --workdir flags to restrict file reads to specific directories:

uvx data-filter-mcp \
  --filter-ttl-seconds 3600 \
  --cleanup-interval-seconds 60 \
  --workdir /Users/me/project \
  --workdir /tmp/data

Rules:

  • Each --workdir value must be an absolute path to an existing directory.
  • run_filter will only accept files located inside the allowed directories.
  • If no --workdir flags are provided, no restrictions are applied (backward compatible).
  • convert_file always requires at least one --workdir because it writes to disk.
  • convert_file requires the destination path to be inside an allowed workdir.
  • convert_file creates missing destination parent directories automatically.
  • convert_file refuses to replace an existing destination file unless overwrite is true.

Writing transformed files with convert_file

Use convert_file when the filtered output should be persisted instead of returned inline to the model. The tool accepts:

  • filter_id — an identifier returned by register_filter
  • source_file_path — absolute path to the json/yaml/txt file to load
  • destination_file_path — absolute path where the returned text should be saved
  • file_type — optional source file type override (json, yaml, or txt)
  • overwrite — optional boolean, default false

Example flow:

def filter_item(data):
    return "\n".join(data["items"])

Then call convert_file with a source such as /tmp/data/items.json and a destination such as /tmp/data/out/items.txt. The result is written as UTF-8 text. The returned metadata includes the resolved source and destination paths, the effective source file type, bytes_written, and whether an existing file was overwritten.

Example MCP client configuration:

{
  "mcpServers": {
    "data-filter": {
      "command": "uvx",
      "args": [
        "data-filter-mcp",
        "--filter-ttl-seconds",
        "3600",
        "--cleanup-interval-seconds",
        "60",
        "--workdir",
        "/Users/me/project",
        "--workdir",
        "/tmp/data"
      ]
    }
  }
}

Run locally

python server.py --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
python -m data_filter_mcp.server --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
.venv/bin/data-filter-mcp --filter-ttl-seconds 3600 --cleanup-interval-seconds 60

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured