Ebook-MCP

Ebook-MCP

A MCP server that supports mainstream eBook formats including EPUB, PDF and more. Simplify your eBook user experience with LLM.

onebirdrocks

Research & Data
Visit Server

README

Ebook-MCP

<img src="logo.png" alt="ebook-mcp logo" width="132" height="132">

English | 中文

Ebook-MCP is a powerful Model Context Protocol (MCP) server for processing electronic books. Built on the Model Context Protocol, it provides a set of standardized APIs for seamless integration between LLM applications and e-book processing capabilities. Currently supports EPUB and PDF formats.

Use Cases & Value

Ebook-MCP transforms how you interact with your digital books by enabling natural language conversations with your reading materials. It seamlessly integrates with modern AI-powered IDEs like Cursor and Claude, allowing you to:

  • Smart Library Management: Simply ask "Show me all EPUB files in my downloads folder" or "Find books about GenAI in my library"
  • Interactive Reading Experience: Have natural conversations about your books:
    • "Give me a brief introduction to 'LLM Engineer Handbook'"
    • "What's covered in Chapter 3?"
    • "Summarize the key points about RAG from this book"
  • Active Learning Support: Enhance your learning through AI-powered interactions:
    • "Create a quiz based on the RAG concepts from Chapter 5"
    • "Explain the differences between the architectures discussed in this chapter"
    • "Give me practical exercises based on the concepts in this section"
  • Content Navigation: Easily navigate through your books with natural language queries:
    • "Find all sections discussing prompt engineering"
    • "Show me the chapters about fine-tuning"
    • "Take me to the part about vector databases"

By bridging the gap between traditional e-books and AI capabilities, Ebook-MCP helps readers extract more value from their digital library through intuitive, conversation-based interactions.

Features

EPUB Support

  • Extract metadata (title, author, publication date, etc.)
  • Extract table of contents
  • Extract chapter content (with Markdown output)
  • Batch process EPUB files

PDF Support

  • Extract metadata (title, author, creation date, etc.)
  • Extract table of contents
  • Extract content by page number
  • Extract content by chapter title
  • Markdown output support
  • Batch process PDF files

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/ebook-mcp.git
cd ebook-mcp
  1. Install dependencies using uv:
uv pip install -r requirements.txt

Usage

Starting the MCP Server in development mode

Run the server in development mode:

uv run mcp dev main.py

You can visit http://localhost:5173/ for testing & debuging purpose

Starting the MCP Server in Prod mode

Run the server:

uv run main.py

Config the MCP in Cursor

Add the following configuration in Cursoe


"ebook-mcp":{
            "command": "uv",
            "args": [
                "--directory",
                "/Users/onebird/github/ebook-mcp",
                "run",
                "main.py"
            ]
        }

EPUB Processing Examples

# Get all EPUB files in a directory
epub_files = get_all_epub_files("/path/to/books")

# Get EPUB metadata
metadata = get_metadata("/path/to/book.epub")

# Get table of contents
toc = get_toc("/path/to/book.epub")

# Get specific chapter content (in Markdown format)
chapter_content = get_chapter_markdown("/path/to/book.epub", "chapter_id")

PDF Processing Examples

# Get all PDF files in a directory
pdf_files = get_all_pdf_files("/path/to/books")

# Get PDF metadata
metadata = get_pdf_metadata("/path/to/book.pdf")

# Get table of contents
toc = get_pdf_toc("/path/to/book.pdf")

# Get specific page content
page_text = get_pdf_page_text("/path/to/book.pdf", 1)
page_markdown = get_pdf_page_markdown("/path/to/book.pdf", 1)

# Get specific chapter content
chapter_content, page_numbers = get_pdf_chapter_content("/path/to/book.pdf", "Chapter 1")

API Reference

EPUB APIs

get_all_epub_files(path: str) -> List[str]

Get all EPUB files in the specified directory.

get_metadata(epub_path: str) -> Dict[str, Union[str, List[str]]]

Get metadata from an EPUB file.

get_toc(epub_path: str) -> List[Tuple[str, str]]

Get table of contents from an EPUB file.

get_chapter_markdown(epub_path: str, chapter_id: str) -> str

Get chapter content in Markdown format.

PDF APIs

get_all_pdf_files(path: str) -> List[str]

Get all PDF files in the specified directory.

get_pdf_metadata(pdf_path: str) -> Dict[str, Union[str, List[str]]]

Get metadata from a PDF file.

get_pdf_toc(pdf_path: str) -> List[Tuple[str, int]]

Get table of contents from a PDF file.

get_pdf_page_text(pdf_path: str, page_number: int) -> str

Get plain text content from a specific page.

get_pdf_page_markdown(pdf_path: str, page_number: int) -> str

Get Markdown formatted content from a specific page.

get_pdf_chapter_content(pdf_path: str, chapter_title: str) -> Tuple[str, List[int]]

Get chapter content and corresponding page numbers by chapter title.

Dependencies

Key dependencies include:

  • ebooklib: EPUB file processing
  • PyPDF2: Basic PDF processing
  • PyMuPDF: Advanced PDF processing
  • beautifulsoup4: HTML parsing
  • html2text: HTML to Markdown conversion
  • pydantic: Data validation
  • fastmcp: MCP server framework

Important Notes

  1. PDF processing relies on the document's table of contents. Some features may not work if TOC is not available.
  2. For large PDF files, it's recommended to process by page ranges to avoid loading the entire file at once.
  3. EPUB chapter IDs must be obtained from the table of contents structure.

Architecture

           ┌────────────────────────────┐
           │         Agent Layer        │
           │  - Translation Strategy    │
           │  - Style Consistency Check │
           │  - LLM Call & Interaction │
           └────────────▲─────────────┘
                        │ Tool Calls
           ┌────────────┴─────────────┐
           │        MCP Tool Layer     │
           │  - extract_chapter        │
           │  - write_translated_chapter│
           │  - generate_epub          │
           └────────────▲─────────────┘
                        │ System/IO Calls
           ┌────────────┴─────────────┐
           │     System Base Layer     │
           │  - File Reading          │
           │  - ebooklib Parsing      │
           │  - File Path Storage/Check│
           └────────────────────────────┘

Contributing

We welcome Issues and Pull Requests!

Changelog

v1.0.0

  • Initial release
  • EPUB and PDF format support
  • Basic file processing APIs
           ┌────────────────────────────┐
           │         Agent Layer        │
           │  - Translation Strategy    │
           │  - Style Consistency Check │
           │  - LLM Call & Interaction │
           └────────────▲─────────────┘
                        │ Tool Calls
           ┌────────────┴─────────────┐
           │        MCP Tool Layer     │
           │  - extract_chapter        │
           │  - write_translated_chapter│
           │  - generate_epub          │
           └────────────▲─────────────┘
                        │ System/IO Calls
           ┌────────────┴─────────────┐
           │     System Base Layer     │
           │  - File Reading          │
           │  - ebooklib Parsing      │
           │  - File Path Storage/Check│
           └────────────────────────────┘

Recommended Servers

Crypto Price & Market Analysis MCP Server

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured
TypeScript
MCP PubMed Search

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured
Python
dbt Semantic Layer MCP Server

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured
TypeScript
mixpanel

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured
TypeScript
Sequential Thinking MCP Server

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured
Python
Nefino MCP Server

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official
Python
Vectorize

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official
JavaScript
Mathematica Documentation MCP server

Mathematica Documentation MCP server

A server that provides access to Mathematica documentation through FastMCP, enabling users to retrieve function documentation and list package symbols from Wolfram Mathematica.

Local
Python
kb-mcp-server

kb-mcp-server

An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded

Local
Python
Research MCP Server

Research MCP Server

The server functions as an MCP server to interact with Notion for retrieving and creating survey data, integrating with the Claude Desktop Client for conducting and reviewing surveys.

Local
Python