data-filter-mcp
Local MCP server that registers restricted Python filters and runs them against local JSON, YAML, and TXT files, enabling safe data filtering and file conversion.
README
data-filter-mcp
Local MCP server that registers restricted Python filters and runs them against local json, yaml, and txt files.
What it does
register_filteraccepts Python source code with exactly one top-level function:def filter_item(data):run_filterloads a local file, passes the loaded document intofilter_item(data), and returns the text fromresult_textconvert_fileloads a local file, passes it intofilter_item(data), and writes the returned text to another local file- Registered filters live only in memory and expire automatically based on server TTL settings
What filter code may use
Filter bodies are AST-validated against a whitelist. In addition to a curated set of builtins (len, sorted, max, min, range, enumerate, zip, sum, any, all, conversions, etc.) and safe string/dict/list methods, filters may also use a curated set of standard-library modules. Modules are exposed by their canonical names (math, datetime, hashlib, etc.). Filesystem, process, network, and unsafe serialization modules (os, pathlib, shutil, subprocess, socket, urllib, pickle, etc.) are intentionally not available.
lambdaexpressions — typically askey=arguments, e.g.sorted(data, key=lambda item: item.get("score")). Lambda bodies are validated by the same rules as the rest of the filter.json—json.loads,json.dumps.yaml—yaml.safe_load,yaml.safe_dump. The unsafeyaml.load/yaml.dumpare intentionally not exposed.re—re.match,re.search,re.fullmatch,re.findall,re.sub,re.subn,re.compile,re.escape, plusMatch/Patternmethods (group,groups,groupdict,start,end,span).math— numeric helpers such asmath.ceil,math.floor,math.sqrt,math.log,math.exp,math.pow,math.factorial,math.gcd,math.lcm,math.isfinite,math.isclose.statistics— aggregates such asstatistics.mean,statistics.median,statistics.stdev,statistics.variance,statistics.quantiles.datetime—datetime.datetime.fromisoformat,datetime.datetime.now,datetime.timedelta,datetime.timezone.utc, and instance methods such asisoformat,strftime,timestamp,weekday,total_seconds. General instance attribute reads such asdt.yearanddt.monthare not supported by the current policy.decimal—decimal.Decimal(...),quantize,normalize,to_eng_string,to_integral_value.collections—collections.Counter,collections.defaultdict,collections.OrderedDict,collections.deque, plus methods such asmost_common,elements,popleft,appendleft,rotate.itertools—chain,chain.from_iterable,islice,takewhile,dropwhile,groupby,starmap,accumulate,combinations,permutations,product,filterfalse.functools—reduce,partial,cmp_to_key,wraps. Caching decorators such aslru_cacheandcacheare intentionally not exposed because they can retain process-local state across filter calls.operator—itemgetter,methodcaller, and arithmetic/comparison helpers such asadd,mul,lt,eq,gt.attrgetteris intentionally not exposed.textwrap—fill,wrap,shorten,indent,dedent.html—html.escape,html.unescape.base64—b64encode,b64decode,urlsafe_b64encode,urlsafe_b64decode,b32encode,b32decode,b16encode,b16decode.hashlib—hashlib.sha256,hashlib.sha1,hashlib.md5,hashlib.blake2b,hashlib.new, plus hash object methods such ashexdigest,digest,update.ipaddress—ip_address,ip_network,ip_interface,IPv4Network,IPv6Network, plus methods such assupernet,subnets,hosts,overlaps,subnet_of,supernet_of. General instance attribute reads such asaddr.is_privateandaddr.compressedare not supported by the current policy.unicodedata—category,name,lookup,numeric,digit,decimal,bidirectional,combining,mirrored.difflib—get_close_matches,ndiff,unified_diff,context_diff,SequenceMatcher.
Note: re.compile runs against patterns supplied by filter code, so a pathological pattern can stall the server (ReDoS). Some helpers such as difflib.SequenceMatcher can also be CPU-heavy on large inputs. Treat filter source as trusted-but-restricted.
Run with uvx
After publishing to PyPI, start the server with:
uvx data-filter-mcp --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
Show the available CLI flags with:
uvx data-filter-mcp --help
Restricting file access with --workdir
By default the server can read any file on the local filesystem. Use one or
more --workdir flags to restrict file reads to specific directories:
uvx data-filter-mcp \
--filter-ttl-seconds 3600 \
--cleanup-interval-seconds 60 \
--workdir /Users/me/project \
--workdir /tmp/data
Rules:
- Each
--workdirvalue must be an absolute path to an existing directory. run_filterwill only accept files located inside the allowed directories.- If no
--workdirflags are provided, no restrictions are applied (backward compatible). convert_filealways requires at least one--workdirbecause it writes to disk.convert_filerequires the destination path to be inside an allowed workdir.convert_filecreates missing destination parent directories automatically.convert_filerefuses to replace an existing destination file unlessoverwriteistrue.
Writing transformed files with convert_file
Use convert_file when the filtered output should be persisted instead of returned
inline to the model. The tool accepts:
filter_id— an identifier returned byregister_filtersource_file_path— absolute path to the json/yaml/txt file to loaddestination_file_path— absolute path where the returned text should be savedfile_type— optional source file type override (json,yaml, ortxt)overwrite— optional boolean, defaultfalse
Example flow:
def filter_item(data):
return "\n".join(data["items"])
Then call convert_file with a source such as /tmp/data/items.json and a
destination such as /tmp/data/out/items.txt. The result is written as UTF-8
text. The returned metadata includes the resolved source and destination paths,
the effective source file type, bytes_written, and whether an existing file was
overwritten.
Example MCP client configuration:
{
"mcpServers": {
"data-filter": {
"command": "uvx",
"args": [
"data-filter-mcp",
"--filter-ttl-seconds",
"3600",
"--cleanup-interval-seconds",
"60",
"--workdir",
"/Users/me/project",
"--workdir",
"/tmp/data"
]
}
}
}
Run locally
python server.py --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
python -m data_filter_mcp.server --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
.venv/bin/data-filter-mcp --filter-ttl-seconds 3600 --cleanup-interval-seconds 60
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.