M4

M4

Query clinical datasets like MIMIC-IV and eICU with natural language, supporting both tabular EHR data and clinical notes through a unified interface.

Category
Visit Server

README

M4: A Toolbox for LLMs on Clinical Data

<p align="center"> <img src="webapp/public/m4_logo_transparent.png" alt="M4 Logo" width="180"/> </p>

<p align="center"> <strong>Query clinical datasets with natural language through Claude, Cursor, or any MCP client</strong> </p>

<p align="center"> <a href="https://www.python.org/downloads/"><img alt="Python" src="https://img.shields.io/badge/Python-3.10+-blue?logo=python&logoColor=white"></a> <a href="https://modelcontextprotocol.io/"><img alt="MCP" src="https://img.shields.io/badge/MCP-Compatible-green?logo=ai&logoColor=white"></a> <a href="https://github.com/hannesill/m4/actions/workflows/tests.yaml"><img alt="Tests" src="https://github.com/hannesill/m4/actions/workflows/tests.yaml/badge.svg"></a> </p>

M4 is an infrastructure layer for multimodal EHR data that provides LLM agents with a unified toolbox for querying clinical datasets. It supports tabular data and clinical notes, dynamically selecting tools by modality to query MIMIC-IV, eICU, and custom datasets through a single natural-language interface.

Usage example

M4 is a fork of the M3 project and would not be possible without it 🫶 Please cite their work when using M4!

Quickstart (3 steps)

1. Install uv

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Initialize M4

mkdir my-research && cd my-research
uv init && uv add m4-mcp
uv run m4 init mimic-iv-demo

This downloads the free MIMIC-IV demo dataset (~16MB) and sets up a local DuckDB database.

3. Connect your AI client

Claude Desktop:

uv run m4 config claude --quick

Other clients (Cursor, LibreChat, etc.):

uv run m4 config --quick

Copy the generated JSON into your client's MCP settings, restart, and start asking questions!

<details> <summary>Different setup options</summary>

  • If you don't want to use uv, you can just run pip install m4-mcp

  • If you want to use Docker, look at <a href="docs/DEVELOPMENT.md">docs/DEVELOPMENT.md</a> </details>

Example Questions

Once connected, try asking:

Tabular data (mimic-iv, eicu):

  • "What tables are available in the database?"
  • "Show me the race distribution in hospital admissions"
  • "Find all ICU stays longer than 7 days"
  • "What are the most common lab tests?"

Clinical notes (mimic-iv-note):

  • "Search for notes mentioning diabetes"
  • "List all notes for patient 10000032"
  • "Get the full discharge summary for this patient"

Supported Datasets

Dataset Modality Size Access Local BigQuery
mimic-iv-demo Tabular 100 patients Free Yes No
mimic-iv Tabular 365k patients PhysioNet credentialed Yes Yes
mimic-iv-note Notes 331k notes PhysioNet credentialed Yes Yes
eicu Tabular 200k+ patients PhysioNet credentialed Yes Yes

These datasets are supported out of the box. However, it is possible to add any other custom dataset by following these instructions.

Switch datasets anytime:

m4 use mimic-iv     # Switch to full MIMIC-IV
m4 status           # Show active dataset details
m4 status --all     # List all available datasets

<details> <summary><strong>Setting up MIMIC-IV or eICU (credentialed datasets)</strong></summary>

  1. Get PhysioNet credentials: Complete the credentialing process and sign the data use agreement for the dataset.

  2. Download the data:

    # For MIMIC-IV
    wget -r -N -c -np --user YOUR_USERNAME --ask-password \
      https://physionet.org/files/mimiciv/3.1/ \
      -P m4_data/raw_files/mimic-iv
    
    # For eICU
    wget -r -N -c -np --user YOUR_USERNAME --ask-password \
      https://physionet.org/files/eicu-crd/2.0/ \
      -P m4_data/raw_files/eicu
    

    Put the downloaded data in a m4_data directory that ideally is located within the project directory. Name the directory for the dataset mimic-iv/eicu.

  3. Initialize:

    m4 init mimic-iv   # or: m4 init eicu
    

This converts the CSV files to Parquet format and creates a local DuckDB database. </details>

Available Tools

M4 exposes these tools to your AI client. Tools are filtered based on the active dataset's modality.

Dataset Management:

Tool Description
list_datasets List available datasets and their status
set_dataset Switch the active dataset

Tabular Data Tools (mimic-iv, mimic-iv-demo, eicu):

Tool Description
get_database_schema List all available tables
get_table_info Get column details and sample data
execute_query Run SQL SELECT queries

Clinical Notes Tools (mimic-iv-note):

Tool Description
search_notes Full-text search with snippets
get_note Retrieve a single note by ID
list_patient_notes List notes for a patient (metadata only)

More Documentation

Guide Description
Tools Reference Detailed tool documentation
BigQuery Setup Use Google Cloud for full datasets
Custom Datasets Add your own PhysioNet datasets
Development Contributing, testing, architecture
OAuth2 Authentication Enterprise security setup

Roadmap

M4 is designed as a growing toolbox for LLM agents working with EHR data. Planned and ongoing directions include:

  • More Tools

    • Implement tools for current modalities (e.g. statistical reports, RAG)
    • Add tools for new modalities (images, waveforms)
  • Better context handling

    • Concise, dataset-aware context for LLM agents
  • Dataset expansion

    • Out-of-the-box support for additional PhysioNet datasets
    • Improved support for institutional/custom EHR schemas
  • Evaluation & reproducibility

    • Session export and replay
    • Evaluation with the latest LLMs and smaller expert models

The roadmap reflects current development goals and may evolve as the project matures.

Troubleshooting

"Parquet not found" error:

m4 init mimic-iv-demo --force

MCP client won't connect: Check client logs (Claude Desktop: Help → View Logs) and ensure the config JSON is valid.

Need to reconfigure:

m4 config claude --quick   # Regenerate Claude Desktop config
m4 config --quick          # Regenerate generic config

Citation

M4 builds on the M3 project. Please cite:

@article{attrach2025conversational,
  title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
  author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2507.01053},
  year={2025}
}

<p align="center"> <a href="https://github.com/hannesill/m4/issues">Report an Issue</a> · <a href="docs/DEVELOPMENT.md">Contribute</a> </p>

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured