MCP Servers

docx-mcp

Legal document redlining engine that applies AI-generated JSON changes as professional tracked changes with comments in .docx files, producing Word-indistinguishable output.

README

docx-mcp

Legal document redlining engine. Takes AI-generated changes (structured JSON) and applies them as professional tracked changes with comments inside .docx files. The output is indistinguishable from what a lawyer would produce in Microsoft Word -- proper w:ins/w:del markup, comment annotations with justification text, and preserved formatting.

Installation

Requires Python 3.14+.

uv sync

Quick start

Python API

from docx_mcp import (
    ParagraphChange, ParagraphChangeType,
    TableChange, TableChangeType,
    RedlineConfig, apply_redlines,
)

changes = [
    # Modify a body paragraph
    ParagraphChange(
        kind="paragraph",
        fragment_id="3",              # ← str (was int in v0.1.0)
        change_type=ParagraphChangeType.MODIFY,
        new_text="The Company **shall** provide written notice.",
        justification="Strengthened obligation language.",
    ),
    # Delete a paragraph
    ParagraphChange(
        kind="paragraph",
        fragment_id="5",
        change_type=ParagraphChangeType.DELETE,
        justification="Removed redundant clause.",
    ),
    # Append a new paragraph
    ParagraphChange(
        kind="paragraph",
        fragment_id="7",
        change_type=ParagraphChangeType.APPEND_AFTER,
        new_text="The foregoing shall survive termination.",
        justification="Added survival provision.",
    ),
    # Modify a header paragraph
    ParagraphChange(
        kind="paragraph",
        fragment_id="header_1.1",
        change_type=ParagraphChangeType.MODIFY,
        new_text="CONFIDENTIAL",
        justification="Updated header text.",
    ),
    # Modify a table cell
    TableChange(
        kind="table",
        table_id=2,
        row=1,
        col=1,
        change_type=TableChangeType.MODIFY_CELL,
        new_text="Updated **cell** content",
        justification="Corrected table entry.",
    ),
    # Clear a table cell
    TableChange(
        kind="table",
        table_id=2,
        row=3,
        col=2,
        change_type=TableChangeType.CLEAR_CELL,
        justification="Removed obsolete data.",
    ),
]

doc = apply_redlines("contract.docx", changes)
doc.save("contract_redlined.docx")

CLI

# Extract fragment text from a document
docx-mcp convert input.docx
docx-mcp convert input.docx --format json

# Apply changes
docx-mcp apply input.docx changes.json -o output.docx

# Validate a redlined document
docx-mcp validate output.docx

# Audit a document for structural issues
docx-mcp audit input.docx
docx-mcp audit input.docx --format json

Note: The CLI convert command extracts body content only (no headers, footers, or tables). For full-document extraction, use the MCP extract_fragments tool or the Python full_to_fragments() function.

MCP server

The library includes an MCP server so that LLM clients (Claude Desktop, Cursor, etc.) can redline .docx files directly.

# Start the server (stdio transport)
docx-mcp-server

Configure in Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "docx-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/docx-mcp", "docx-mcp-server"]
    }
  }
}

Configure in Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "docx-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/docx-mcp", "docx-mcp-server"]
    }
  }
}

Tools

Tool	Description
`extract_fragments`	Read a `.docx` and return paragraphs, tables, headers, and footers as tagged text
`apply_changes`	Apply tracked changes from an inline list and save
`apply_changes_from_file`	Apply tracked changes from a JSON file on disk
`validate_document_tool`	Run structural validation checks
`diff_fragments`	Compare two `.docx` files paragraph-by-paragraph (full document)
`audit_document_tool`	Audit a `.docx` for headers, images, tables, section breaks, and more

Resource

URI	Description
`docx-fragments://{document_path}`	Browse paragraph fragments (URL-encode the path)

Example workflow

An LLM client would typically:

Call extract_fragments to read the document and get fragment IDs.
Reason about the content and construct a list of changes.
Call apply_changes with the change list to produce a redlined document.
Optionally call diff_fragments to compare original vs. redlined output.

Concepts

Fragments

Documents are decomposed into fragments: paragraphs, tables, headers, and footers, all indexed in document order. Each fragment has a string ID.

Fragment IDs:

Pattern	Meaning	Example
`"1"`, `"2"`, …	Body paragraphs / tables	`<f=1>Introduction.</f=1>`
`"header_P.I"`	Header part P, paragraph I	`<f=header_1.3>Confidential</f=header_1.3>`
`"footer_P.I"`	Footer part P, paragraph I	`<f=footer_2.1>Page 1 of 10</f=footer_2.1>`

Tables and body paragraphs share the same ID space (they interleave in document order). Fragment "3" might be a table and fragment "4" a paragraph.

Use extract_fragments (MCP) or full_to_fragments() (Python) to see the fragment map for any document:

<f=1>Introduction paragraph.</f=1>
<f=2>**Definitions.** The following terms shall apply.</f=2>
<table=3 rows=2 cols=3>
<cell=3.1.1 span="2">Merged Header</cell=3.1.1>
<cell=3.1.3>Header C</cell=3.1.3>
<cell=3.2.1>Data 1</cell=3.2.1>
<cell=3.2.2>Data 2</cell=3.2.2>
<cell=3.2.3>Data 3</cell=3.2.3>
</table=3>
<f=4>Closing paragraph. See [Section 2](https://example.com).</f=4>
<f=header_1.1>Confidential</f=header_1.1>
<f=footer_1.1>Page 1 of 10</f=footer_1.1>

Tables

Simple tables

Simple (rectangular) tables are extracted as <table=N> blocks. Each cell has a cell_id in "table_id.row.col" format (e.g., "3.1.2").

Merged-cell tables

Tables with horizontally or vertically merged cells (gridSpan / vMerge) are now supported. Merge spans are shown as attributes:

span="2" — cell spans 2 columns (horizontal merge)
vspan="3" — cell spans 3 rows (vertical merge)

Spanned-over cells (positions covered by a merge) are omitted from output. For example, if cell=3.1.1 has span="2", then cell=3.1.2 does not appear.

When targeting merged cells with changes, always target the originating cell (the one with the span/vspan attribute). Targeting a spanned-over position raises a ValueError.

Skipped tables

Tables that cannot be processed (nested tables, malformed merges, tables inside headers/footers) appear as:

<table=5 skipped reason="table 5, cell 2.3 contains nested table"/>

Headers and footers

Header and footer paragraphs are extracted with prefixed fragment IDs: header_1.1, footer_2.1, etc. The first number is the 1-based part index (usually 1 for the default header/footer), the second is the 1-based paragraph index within that part.

Header/footer paragraphs can be modified, deleted, and appended to just like body paragraphs. Tables inside headers/footers are not editable and are reported as skipped elements.

Limitation: Comments on header/footer changes are not attached to the output (Word and LibreOffice do not support comment ranges in those parts). They trigger a UserWarning and are dropped.

Hyperlinks

Hyperlinks are extracted as [link text](url) inline within paragraph text. Formatting inside links is preserved: [**bold link**](url).

When modifying an existing paragraph, [text] without (url) preserves the original hyperlink URL. [text](new_url) creates a new link.

When appending new text, [text](url) creates a hyperlink. [text] without (url) produces plain text — always specify (url) on append if you want a hyperlink.

Tracked changes policy

Documents with pre-existing tracked changes (<w:ins>, <w:del>, <w:moveFrom>, <w:moveTo>) are hard-rejected in both extract_fragments and apply_redlines. Accept or reject all changes in Word before processing.

`collapse_empty` mode

Optional mode that suppresses empty paragraphs from extraction and redlining. Produces cleaner output for LLM consumption. When enabled, it must be used consistently across extraction and redlining — mismatched values cause fragment ID misalignment.

Change types

Paragraph changes

Type	Description	Requires `new_text`
`modify`	Word-level diff applied as tracked changes	Yes
`delete`	Entire paragraph marked as deleted	No
`append_after`	New paragraph inserted after the referenced fragment	Yes

Table cell changes

Type	Description	Requires `new_text`
`modify_cell`	Modify cell content (single or multi-paragraph)	Yes
`clear_cell`	Delete all content in a cell (preserves structure)	No

Cell modification uses positional alignment: if the cell has multiple paragraphs, the new text is split on newlines (\n) and each line is applied to the corresponding paragraph in order. Cell content is marked with tracked changes and comments just like paragraph modifications.

Blank line management

When appending new paragraphs, you can control surrounding blank lines:

Change(
    fragment_id=10,
    change_type=ChangeType.APPEND_AFTER,
    new_text="New clause text here.",
    justification="Added new provision.",
    blank_lines_before=1,  # Insert 1 blank line before the new paragraph
    blank_lines_after=1,   # Insert 1 blank line after the new paragraph
)

When deleting paragraphs, you can remove trailing blank lines automatically:

Change(
    fragment_id=15,
    change_type=ChangeType.DELETE,
    justification="Removed obsolete clause.",
    delete_next_blanks=1,  # Also delete the next blank paragraph
)

All blank lines are marked as tracked insertions/deletions and will appear in the redlined document.

Pseudo-Markdown

Text content uses a simplified Markdown-like format for inline formatting:

**bold**
_italic_
__underline__

Unicode characters (smart quotes, em dashes, section symbols, non-breaking spaces) are preserved as-is.

Font inheritance: When appending new paragraphs, the font family, size, and color are automatically copied from the reference paragraph's first text-bearing run. Bold, italic, and underline formatting from the pseudo-Markdown is layered on top of the inherited base formatting.

Changes JSON

The CLI accepts a JSON file containing either a bare array or a {"changes": [...]} wrapper.

Paragraph changes example

[
  {
    "fragment_id": "1",
    "change_type": "modify",
    "new_text": "The Seller agrees to deliver within **sixty** days.",
    "justification": "Extended delivery window."
  },
  {
    "fragment_id": "3",
    "change_type": "delete",
    "justification": "Removed governing law clause.",
    "delete_next_blanks": 1
  },
  {
    "fragment_id": "5",
    "change_type": "append_after",
    "new_text": "This Agreement shall be governed by Delaware law.",
    "justification": "Added Delaware governing law.",
    "blank_lines_before": 1,
    "blank_lines_after": 0
  },
  {
    "fragment_id": "header_1.1",
    "change_type": "modify",
    "new_text": "CONFIDENTIAL",
    "justification": "Updated header marking."
  }
]

Table cell changes example

[
  {
    "cell_id": "2.1.1",
    "change_type": "modify_cell",
    "new_text": "Updated **cell** content",
    "justification": "Corrected cell value."
  },
  {
    "cell_id": "2.3.2",
    "change_type": "clear_cell",
    "justification": "Cleared obsolete data."
  }
]

Cell IDs use the format "table_id.row.col" where rows and columns are 1-based.

Validation

The validate_document() function (and docx-mcp validate CLI) checks:

Annotation ID isolation -- tracked-change and comment IDs don't collide across groups
Comment integrity -- every <w:comment> has matching range markers in the document body, and vice versa
Tracked-change attributes -- every <w:ins> and <w:del> has required w:id, w:author, and w:date
Package consistency -- content-type and relationship entries exist for comments.xml

from docx_mcp import validate_document

result = validate_document(doc)
if not result.ok:
    for error in result.errors:
        print(error)

Architecture

The library manipulates OOXML directly via lxml (not python-docx) because python-docx has no tracked-change support. Key design decisions:

Word-level diffing via diff-match-patch with a word-to-char mapping for high-quality diffs
Conservative mutation -- only changed paragraphs are touched; everything else passes through byte-identical
Globally unique annotation IDs via a monotonic IdManager seeded from the document's existing max ID
python-docx is used only for test fixture generation, not in the library itself

Module map

src/docx_mcp/
  __init__.py        Public API
  cli.py             CLI entry point (apply, convert, validate)
  models.py          Pydantic data models (Change, ChangeType, RedlineConfig, ...)
  document.py        DocxDocument: ZIP parsing, XML tree access, serialization
  converter.py       Paragraph & table XML -> pseudo-Markdown conversion
  table_utils.py     Table inspection utilities (cell access, simplicity checks)
  tokenizer.py       Word-level tokenization
  differ.py          Word-level diff engine (diff-match-patch wrapper)
  run_ops.py         Diff-to-XML-run mapping, run splitting, element building
  id_manager.py      Monotonic annotation ID allocator
  comments.py        Comment creation and range marker insertion
  redliner.py        Main orchestrator: apply_redlines()
  table_redliner.py  Table cell change application
  audit.py           Document structural audit (headers, images, tables, etc.)
  validator.py       Structural validation checks
  server.py          MCP server (FastMCP 3.x, stdio transport)
  handlers/
    modify.py        Word-level tracked changes on existing paragraphs
    delete.py        Full paragraph deletion markup
    append.py        New paragraph insertion markup

Development

# Run tests
uv run pytest tests/ -v

# Lint
uvx ruff check src/ tests/

# Auto-fix lint issues
uvx ruff check src/ tests/ --fix

# Type check
uvx ty check src/ tests/

431 tests covering all modules, handlers, table operations, headers/footers, hyperlinks, tracked-change rejection, merged-cell tables, section breaks, CLI, validation, and MCP server.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

docx-mcp

README

docx-mcp

Installation

Quick start

Python API

CLI

MCP server

Tools

Resource

Example workflow

Concepts

Fragments

Tables

Simple tables

Merged-cell tables

Skipped tables

Headers and footers

Hyperlinks

Tracked changes policy

collapse_empty mode

Change types

Paragraph changes

Table cell changes

Blank line management

Pseudo-Markdown

Changes JSON

Paragraph changes example

Table cell changes example

Validation

Architecture

Module map

Development

License

Recommended Servers

`collapse_empty` mode