asset-aware-mcp
Enables AI agents to precisely retrieve and analyze PDF assets (tables, figures, sections) via MCP, with knowledge graph integration for medical research.
README
asset-aware-mcp
π₯ Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) and Knowledge Graph for AI Agents.
π ηΉι«δΈζ Β· Docs Site Β· GitHub Wiki
π― Why Asset-Aware MCP?
AI cannot directly read image files on your computer. This is a common misconception.
| Method | Can AI analyze image content? | Description |
|---|---|---|
| β Provide PNG path | No | AI cannot access the local file system |
| β Asset-Aware MCP | Yes | Retrieves Base64 via MCP, allowing AI vision to understand directly |
Real-world Effect
# After retrieving the image via MCP, the AI can analyze it directly:
User: What is this figure about?
AI: This is the architecture diagram for Scaled Dot-Product Attention:
1. Inputs: Q (Query), K (Key), V (Value)
2. MatMul of Q and K
3. Scale (1/βdβ)
4. Optional Mask (for decoder)
5. SoftMax normalization
6. Final MatMul with V to get the output
This is the value of Asset-Aware MCP - enabling AI Agents to truly "see" and understand charts and tables in your PDF literature.
β¨ Features
- π Asset-Aware ETL - PDF β Markdown with a PyMuPDF-first parser and retained Marker code path:
- PyMuPDF (default) - Fast extraction (~50MB)
- Marker (
use_marker=True) - High-precision structured parsing code path retained, but packaged runtime remains on security hold in v0.7.0 until upstreammarker-pdfsupports patched Pillow
- π§© Unified Segmentation Export - Normalized
segmentation.jsonmerges manifest, blocks, reading order, and persisted markdown line spans for downstream tools and extensions. - π‘οΈ PDF Safety/Structure/Coverage/Accessibility Audits - OpenDataloader-inspired artifact-only reports flag suspicious hidden/off-page/prompt-injection text, native structure signals, segmentation coverage gaps, and accessibility/readability readiness via the existing
documentfacade.document(op="prepare_ai")anddocument(op="auto")expose agent-ready status and next actions without adding public tools. - π§ Structural Pointer Retrieval - Proxy-Pointer-inspired
document(op="pointer_index"),document(op="structural_retrieve"), anddocument(op="compare")preserve section breadcrumbs, line/char/byte locators, source hashes, asset IDs, and evidence-span provenance without adding MCP tools. - πΌοΈ Layout Overlay Debugging - Render page overlays from
original.pdfto inspect bbox, segment type, and reading order visually. - π€ On-Demand OCR Preprocessing - Optional
ocrmypdfpreprocessing path for scanned PDFs before ETL. - π§ Section Navigation - Dynamic hierarchy section tree through the
sectionfacade: browse, search, detail, content reading, and block extraction for any depth of headings. - π Async Job Pipeline - Supports asynchronous ingest, Marker-required parse, OCR, and conversion jobs with progress tracking.
- πΊοΈ Document Manifest - Provides a structured "map" of the document for precise data access by Agents.
- π§ LightRAG Integration - Knowledge Graph + Vector Index, supporting cross-document comparison and reasoning.
- π§Ύ Verified Citation Bundles -
citation_bundle, Foam evidence packs, citation health checks, table/figure evidence notes, and claim promotion export citation-ready spans with locator, quote/hash, context, CRAAP scaffold, and verification status. - π Docx Editing (DFM) - Edit .docx files in Markdown via Docx-Flavored Markdown format. Supports legacy
.doc,.odt, and.odsingest via LibreOffice auto-conversion. The balanced surface keeps 6 DOCX/DFM public entrypoints for ingest, read, save, validation, conversion, table edit planning, and Docx β A2T bridges. - π‘οΈ DFM Integrity Checker - Automatic validation and auto-repair at every pipeline stage (post-ingest, pre-save, post-save). Catches orphan markers, column mismatches, and format inconsistencies.
- π A2T (Anything to Table) - 7 operation-based tools for building professional tables from any source (PDF assets, Knowledge Graph, URLs, user input). Features: stable row IDs, row search/filter/paging, citation coverage, artifact-only large-table render, skipped-large-table UX, Citations (AssetRef), Audit Trail, Schema Evolution, Templates, Drafting, and Token-efficient resumption.
- π₯οΈ VS Code Management Extension - Graphical interface for monitoring server status, ingested documents, document artifacts, citation spans, and A2T tables/drafts with one-click Excel export.
- π MCP Server - Exposes tools and resources to Copilot/Claude via FastMCP.
- π₯ Medical Research Focus - Optimized for medical literature, supporting Base64 image transmission for Vision AI analysis.
ποΈ Architecture
<p align="center"> <img src="docs/images/architecture-overview.jpg" alt="Asset-Aware MCP Architecture" width="700"> </p>
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Agent (Copilot) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β MCP Protocol (Tools & Resources)
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β MCP Server (Modular Presentation) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β tools/: 30 public tools (balanced surface) β β
β β 17 facade tools + 13 high-frequency shortcuts β β
β β compact=17 β legacy/direct compatibility=63 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β resources/: 13 resources in 2 modules β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β ETL Pipeline (DDD) β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β PyMuPDF β β Asset β β LightRAG β β
β β Adapter ββ β Parser ββ β Index β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β Local Storage β
β ./data/ β
β βββ {doc_id}/ # PDF document artifacts β
β βββ docx_{id}/ # Docx IR + DFM + Assets β
β βββ tables/ # A2T Tables (JSON/MD/XLSX) β
β β βββ drafts/ # Table Drafts (Persistence) β
β βββ lightrag_db/ # Knowledge Graph β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Project Structure (DDD)
asset-aware-mcp/
βββ src/
β βββ domain/ # π΅ Domain: Entities, Value Objects, Interfaces
β βββ application/ # π’ Application: Doc Service, Table Service (A2T), Asset Service
β βββ infrastructure/ # π Infrastructure: PyMuPDF, LightRAG, Excel Renderer
β βββ presentation/ # π΄ Presentation: MCP Server (FastMCP)
βββ data/ # Document and Asset Storage
βββ docs/
β βββ spec.md # Technical Specification
βββ tests/ # Unit and Integration Tests
βββ vscode-extension/ # VS Code Management Extension
βββ pyproject.toml # uv Project Config
π Architecture Diagrams
Visual overview for the project. All diagrams use consistent GitHub README style.
| Diagram | Description |
|---|---|
| 01 β System Architecture | Full stack: Telegram β Gateway β MCP Adapter β 3 MCP servers β Ollama |
| 02 β Data Layout | 30 balanced public tools + 13 resources; legacy direct tool compatibility remains available |
| 03 β PDF Ingestion Pipeline | 7-stage flow from PDF upload to knowledge graph |
| 04 β DOCX Bidirectional Edit | DOCX ingest β TableContext edit β round-trip save workflow |
| 05 β Knowledge Graph Search | Cross-document search with 3 parallel query paths |
| 06 β Installation Steps | 7-step installation from clone to verification |
| 07 β PDF ETL Pipeline | PyMuPDF default path + Marker security-hold diagnostics |
| 08 β KG Architecture | lightrag-hku 3-layer KG architecture |
| 09 β Agent Harness Concept | Assistant harness model for stateless agents |
π‘ All generation prompts are saved in docs/diagrams/ALL-PROMPTS.md for style consistency and regeneration.
π Quick Start
# Install dependencies (using uv) β default install skips Marker/torch
uv sync
# v0.7.0: Marker extra is temporarily empty because marker-pdf pins
# Pillow<11 while the secure runtime requires Pillow>=12.2.0.
# Use the default PyMuPDF backend until upstream marker-pdf supports patched Pillow.
# Run MCP Server
uv run python -m src.presentation.server
# Or use the VS Code extension for graphical management
Runtime note:
The VS Code extension prefers a managed Python 3.11 runtime when launching the MCP server via version-pinned uv tool run, with Python 3.10 fallback for older machines. This avoids native package builds on end-user machines, especially macOS systems without Xcode Command Line Tools, while keeping the project itself compatible with newer Python versions.
Installation scope note:
- The VS Code extension installs once per user (global). MCP launch env defaults
DATA_DIRto workspace./dataandUV_CACHE_DIRtoDATA_DIR/.uv-cache; Prepare Server Runtime warms a workspace.uv-cache, falling back to extension global storage only when no workspace is open. - Runtime data stays with your repo:
.envandassetAwareMcp.dataDirdefault to./data, so ingested assets and the uv cache used by the launched server remain scoped to the current workspace.
Marker note:
Since v0.6.28 the packaged Marker extra has intentionally stayed on security hold: upstream marker-pdf 1.10.2 requires Pillow<11, while this release pins Pillow>=12.2.0 for patched image-processing security. Default installs use the PyMuPDF backend only. use_marker=True / parse_pdf_structure will report that Marker is unavailable until upstream Marker supports a patched Pillow range.
π MCP Tools
The default runtime surface is balanced: 30 public tools that keep the full document workflow available without overwhelming agents. It is made of 17 operation-based facade tools plus 13 high-frequency shortcuts. Set ASSET_AWARE_MCP_TOOL_SURFACE=compact for the 17 facade-only surface, or ASSET_AWARE_MCP_TOOL_SURFACE=legacy / ASSET_AWARE_MCP_ENABLE_LEGACY_TOOLS=true for the full 63-tool compatibility inventory.
| Area | Balanced public tools |
|---|---|
| Documents, assets, evidence, conversion | document, document_asset, evidence, convert_document, ingest_documents, list_documents, parse_pdf_structure, fetch_document_asset, find_evidence_spans, verify_citation_ref, citation_bundle |
| DOCX / DFM | docx, docx_table, ingest_docx, get_docx_content, save_docx, docx_table_edit_plan |
| Sections, jobs, KG, ETL profiles | section, job, get_job_status, list_jobs, knowledge, etl_profile |
| A2T tables | plan_table, table_manage, table_data, table_cite, table_history, table_draft, discover_sources |
See MCP Tools and Tool Consolidation for operation details, shortcut rationale, and legacy direct-tool mapping.
Agent handoff note:
Use document(op="auto", file_paths=[...]) for new PDFs and document(op="auto", doc_id="...") or document(op="prepare_ai", doc_id="...") for existing documents. document(op="prepare_ai", output_format="json") returns the v2 readiness contract with status, blockers, warnings, capabilities, artifacts, missing_audits, invalid_audits, audit_artifacts, and next_actions. document(op="audit", doc_id="...") reuses current audit artifacts only when they are present and valid; pass refresh=true to rebuild safety, native-structure, coverage, and accessibility reports. Use document(op="pointer_index"), document(op="structural_retrieve", query="..."), and document(op="compare", doc_b_id="...", criteria="...") when an agent needs section-level structural retrieval or comparison without new public tools. Readiness and job-status artifact discovery are read-only, so status checks do not create document directories.
PDF audit caveat: The audit reports are inspired by OpenDataloader-style artifact workflows, but they are not a sanitizer, a PDF/UA certification, or an OpenDataloader compatibility layer. They preserve source artifacts and report conservative diagnostics for review.
π§ Tech Stack
| Category | Technology |
|---|---|
| Language | Python 3.10+ |
| Package Manager | uv (all pip/setup-python removed) |
| ETL | PyMuPDF (fitz); Marker is temporarily on security hold |
| RAG | LightRAG (lightrag-hku) |
| MCP | FastMCP |
| Storage | Local filesystem (JSON/Markdown/PNG) |
π Documentation
Installation guidance:
-
Default install:
uv sync(slim ~227 MB; no LightRAG/KG dependencies). -
LightRAG / Knowledge Graph backend (optional, since v0.6.34):
uv tool install --upgrade --python 3.11 'asset-aware-mcp[lightrag]'for uvx/published users, oruv sync --extra lightragfor local source checkouts. Required before settingENABLE_LIGHTRAG=true. -
VS Code extension: run the command
Asset-Aware MCP: Install LightRAG Backendfrom the Command Palette; it auto-detects source vs published mode and emits the matching install command. -
OpenRouter optional preset (since v0.6.35): set
LLM_BACKEND=openrouter,OPENROUTER_API_KEY=..., and optionallyOPENROUTER_MODEL=liquid/lfm-2.5-1.2b-instruct:freefor fast low-cost summaries and draft RAG answers. LightRAG retrieval still uses the configured embedding backend. -
Marker backend: temporarily disabled in v0.7.0 because
marker-pdfpins vulnerablePillow<11; themarker/pdfextras are compatibility placeholders until upstream supports patched Pillow. -
VS Code extension:
assetAwareMcp.enableMarkerBackendis retained as a setting, but the launcher will not installmarker-pdfwhile the security hold is active. -
Technical Spec - Detailed technical specification
-
Architecture - System architecture
-
Constitution - Project principles
-
Competitive Analysis - MCP + DOCX ecosystem landscape
π License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.