diffgrab
Enables tracking web page changes, detecting content modifications with structured diffs and snapshot history through MCP tools.
README
diffgrab
Web page change tracking with structured diffs. markgrab + snapgrab integration, MCP native.
from diffgrab import DiffTracker
tracker = DiffTracker()
await tracker.track("https://example.com")
changes = await tracker.check()
for c in changes:
if c.changed:
print(c.summary) # "3 lines added, 1 lines removed in sections: Introduction."
print(c.unified_diff) # Standard unified diff output
await tracker.close()
Features
- Change detection — track any URL, detect content changes via content hashing
- Structured diffs — unified diff + section-level analysis (which headings changed)
- Human-readable summaries — "5 lines added, 2 removed in sections: Intro, Methods"
- Snapshot history — SQLite storage, browse past versions of any page
- markgrab powered — HTML/YouTube/PDF/DOCX extraction via markgrab
- Visual diff — optional screenshot comparison via snapgrab
- MCP server — 5 tools for Claude Code / MCP clients
- CLI included —
diffgrab track,check,diff,history,untrack
How It Works
flowchart TD
A["diffgrab track URL"] --> B["Fetch initial snapshot\n(markgrab + snapgrab)"]
B --> C["Store baseline"]
C --> D["diffgrab check"]
D --> E["Fetch current page"]
E --> F{"Content\nhash match?"}
F -->|"changed"| G["Compute structured diff\n+ section analysis"]
F -->|"unchanged"| H["No changes"]
G --> I["📊 DiffResult\nadded / removed / modified"]
Install
pip install diffgrab
Optional extras:
pip install 'diffgrab[cli]' # CLI with click + rich
pip install 'diffgrab[visual]' # Visual diff with snapgrab
pip install 'diffgrab[mcp]' # MCP server with fastmcp
pip install 'diffgrab[all]' # Everything
Usage
Python API
import asyncio
from diffgrab import DiffTracker
async def main():
tracker = DiffTracker()
# Track a URL (takes initial snapshot)
await tracker.track("https://example.com", interval_hours=12)
# Check for changes
changes = await tracker.check()
for change in changes:
if change.changed:
print(change.summary)
print(change.unified_diff)
# Get diff between specific snapshots
result = await tracker.diff("https://example.com", before_id=1, after_id=2)
# Browse snapshot history
history = await tracker.history("https://example.com", count=20)
# Stop tracking
await tracker.untrack("https://example.com")
await tracker.close()
asyncio.run(main())
Convenience Functions
from diffgrab import track, check, diff, history, untrack
await track("https://example.com")
changes = await check()
result = await diff("https://example.com")
snaps = await history("https://example.com")
await untrack("https://example.com")
CLI
# Track a URL
diffgrab track https://example.com --interval 12
# Check all tracked URLs for changes
diffgrab check
# Check a specific URL
diffgrab check https://example.com
# Show diff between snapshots
diffgrab diff https://example.com
diffgrab diff https://example.com --before 1 --after 3
# View snapshot history
diffgrab history https://example.com --count 20
# Stop tracking
diffgrab untrack https://example.com
MCP Server
Add to your Claude Code MCP config:
{
"mcpServers": {
"diffgrab": {
"command": "diffgrab-mcp",
"args": []
}
}
}
Or with uvx:
{
"mcpServers": {
"diffgrab": {
"command": "uvx",
"args": ["--from", "diffgrab[mcp]", "diffgrab-mcp"]
}
}
}
MCP Tools:
| Tool | Description |
|---|---|
track_url |
Register a URL for change tracking |
check_changes |
Check tracked URLs for changes |
get_diff |
Get structured diff between snapshots |
get_history |
Browse snapshot history |
untrack_url |
Stop tracking a URL |
DiffResult
Every diff operation returns a DiffResult:
@dataclass
class DiffResult:
url: str # The tracked URL
changed: bool # Whether content changed
added_lines: int # Lines added
removed_lines: int # Lines removed
changed_sections: list[str] # Markdown headings with changes
unified_diff: str # Standard unified diff
summary: str # Human-readable summary
before_snapshot_id: int | None # DB ID of older snapshot
after_snapshot_id: int | None # DB ID of newer snapshot
before_timestamp: str # When older snapshot was taken
after_timestamp: str # When newer snapshot was taken
Storage
Snapshots are stored in SQLite at ~/.local/share/diffgrab/diffgrab.db (auto-created). Custom path:
tracker = DiffTracker(db_path="/path/to/custom.db")
QuartzUnit Ecosystem
| Package | Role | PyPI |
|---|---|---|
| markgrab | HTML/YouTube/PDF/DOCX to markdown | pip install markgrab |
| snapgrab | URL to screenshot + metadata | pip install snapgrab |
| docpick | OCR + LLM document extraction | pip install docpick |
| feedkit | RSS feed collection | pip install feedkit |
| diffgrab | Web page change tracking | pip install diffgrab |
| browsegrab | Browser agent for LLMs | Coming soon |
Used in
- newswatch — RSS news monitoring pipeline (feedkit → markgrab → embgrep → diffgrab)
- watchdeck — Web page monitoring with visual diffs and safety guards
License
<!-- mcp-name: io.github.QuartzUnit/diffgrab -->
<sub>Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.</sub>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.