CHM to Markdown Converter
chm to markdown
DTDucas
README
CHM to Markdown Converter
A Python utility for converting Compiled HTML Help (CHM) files to Markdown format. This tool extracts HTML files from CHM documents and converts them to well-formatted Markdown files, making technical documentation more accessible and version control friendly.
Features
- Extracts CHM files using 7-Zip
- Converts HTML content to clean Markdown format
- Special handling for code snippets with language-specific syntax highlighting
- Preserves and fixes tables
- Updates internal links to maintain document references
- Processes files asynchronously for better performance
- Batch processing with progress reporting
Requirements
- Python 3.7+
- 7-Zip installed in the default location (
C:\Program Files\7-Zip\7z.exe
) - The following Python packages:
- beautifulsoup4
- html2text
- aiofiles
Installation
- Clone or download this repository
- Install required Python packages:
pip install -r requirements.txt
Or install them directly:
pip install beautifulsoup4 html2text aiofiles
Usage
- Edit the configuration variables in the
main()
function ofchm_to_markdown.py
:
input_folder = r"C:\Path\To\Extracted\Files" # Temporary folder for extracting CHM
output_folder = r"C:\Path\To\Output\Markdown" # Where Markdown files will be saved
chm_file_path = r"C:\Path\To\Your\File.chm" # Your CHM file path
- Run the script:
python chm_to_markdown.py
- The script will:
- Clear the input and output folders
- Extract CHM files to the input folder
- Convert HTML files to Markdown
- Save the Markdown files to the output folder
Performance Tuning
You can adjust the following parameters in the process_folder_async()
call to optimize performance for your system:
max_workers
: Number of worker threads for CPU-bound operationssemaphore_limit
: Maximum concurrent file I/O operationsbatch_size
: Number of files to process in each batch
await process_folder_async(
input_folder, output_folder, max_workers=8, semaphore_limit=20, batch_size=50
)
Customization
The script provides several customization options for content conversion:
Removing Unwanted Elements
You can customize which HTML elements to remove by editing these lists:
tags_to_remove = ["iframe", "object", "script", "br", "img"]
classes_to_remove = ["collapsibleAreaRegion", "collapsibleRegionTitle", ...]
ids_to_remove = ["PageFooter"]
Code Snippets
The script handles code snippets with language-specific formatting. You can customize the language mapping:
id_to_lang = {
"IDAB_code_Div1": "csharp",
"IDAB_code_Div2": "vb",
"IDAB_code_Div3": "cpp",
"IDAB_code_Div4": "fsharp",
}
Troubleshooting
- Missing modules error: Make sure you've installed all required packages and your Python environment is correctly configured.
- 7-Zip not found: Check that 7-Zip is installed in the default location or update the path in the script.
- Permission errors: Run your terminal or command prompt with administrator privileges.
- Memory issues with large CHM files: Try increasing the batch size and reducing max_workers to manage memory usage.
License
This project is open source and available under the MIT License.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
MCP Package Docs Server
Facilitates LLMs to efficiently access and fetch structured documentation for packages in Go, Python, and NPM, enhancing software development with multi-language support and performance optimization.
Claude Code MCP
An implementation of Claude Code as a Model Context Protocol server that enables using Claude's software engineering capabilities (code generation, editing, reviewing, and file operations) through the standardized MCP interface.
@kazuph/mcp-taskmanager
Model Context Protocol server for Task Management. This allows Claude Desktop (or any MCP client) to manage and execute tasks in a queue-based system.
Linear MCP Server
Enables interaction with Linear's API for managing issues, teams, and projects programmatically through the Model Context Protocol.
mermaid-mcp-server
A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.
Jira-Context-MCP
MCP server to provide Jira Tickets information to AI coding agents like Cursor

Linear MCP Server
A Model Context Protocol server that integrates with Linear's issue tracking system, allowing LLMs to create, update, search, and comment on Linear issues through natural language interactions.

Sequential Thinking MCP Server
This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.