MCP Duplicate File Manager Server
Enables LLM agents to find and safely delete duplicate files using content-based detection, with smart copy prioritization and dry-run mode.
README
MCP Duplicate File Manager Server
A Model Context Protocol (MCP) server that enables LLM-based AI agents to manage duplicate files efficiently. Designed for seamless integration with LLMs and AI agents using the MCP protocol with intelligent duplicate detection and safe deletion capabilities.
Features
- Exposes file management tools for LLM agents via MCP:
list_files: List files in directories with optional filteringfind_duplicates: Find duplicate files based on content hash (MD5)delete_duplicates: Safely delete duplicate files with intelligent copy detection
- Smart copy detection: Prioritizes deleting files with copy indicators ("copy", "(1)", "(2)", etc.)
- Content-based duplicate detection using MD5 hashing for accuracy
- Recursive directory scanning with optional pattern matching
- Dry-run mode for safe preview before actual deletion
- Simple integration with Claude, MCP clients, or other LLM-based agents
Quickstart
1. Install & Run
Run the MCP Duplicate File Manager Server using uvx:
uvx --from git+https://github.com/deerajd/DuplicateFileRemover-MCP.git mcp-server
2. Integrate with Your LLM Agent
Add the following to your tool configuration (for Claude, MCP clients, etc.):
{
"mcpServers": {
"DuplicateListor": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/deerajd/DuplicateFileRemover-MCP.git",
"mcp-server"
]
}
}
}
This configuration launches the MCP Duplicate File Manager Server as a subprocess, exposing file management tools to your LLM agent.
3. Usage
From your LLM agent, you can call:
List Files in Directory
list_files(directory="./Documents", pattern="*.pdf") # List PDF files
list_files(directory="/home/user", include_hidden=True) # Include hidden files
list_files() # List all files in current directory
Find Duplicate Files
find_duplicates(directory="./Downloads") # Find duplicates in Downloads
find_duplicates(directory="/Users/john/Desktop", recursive=False) # Non-recursive search
find_duplicates() # Find duplicates in current directory
Delete Duplicate Files
delete_duplicates(directory="./Documents", dry_run=True) # Preview what would be deleted
delete_duplicates(directory="./Documents", dry_run=False) # Actually delete duplicates
delete_duplicates(directory="./Pictures", recursive=True) # Recursive deletion with preview
The server provides detailed responses including:
- List of files with metadata (name, size, type, path)
- Duplicate file groups organized by content hash
- Deletion results with preserved originals and deleted copies
- Error handling for inaccessible files or directories
API Reference
list_files(directory=".", pattern="*", include_hidden=False)
- Parameters:
directory(str): Directory path to list files from (default: current directory)pattern(str): File pattern to match (e.g., ".py", ".txt") (default: "*")include_hidden(bool): Whether to include hidden files (default: False)
- Returns: List of dictionaries containing file information (name, size, type, path, modified)
find_duplicates(directory=".", recursive=True)
- Parameters:
directory(str): Directory path to search for duplicates (default: current directory)recursive(bool): Whether to search recursively in subdirectories (default: True)
- Returns: Dictionary with hash as key and list of duplicate file paths as values
delete_duplicates(directory=".", recursive=True, dry_run=True)
- Parameters:
directory(str): Directory path to search for duplicates (default: current directory)recursive(bool): Whether to search recursively in subdirectories (default: True)dry_run(bool): If True, only show what would be deleted without actually deleting (default: True)
- Returns: Dictionary with information about deleted files and preserved originals
Smart Copy Detection
The server intelligently identifies and prioritizes deletion of files with copy indicators:
copy- Files containing "copy" in the filename(1),(2), etc. - Files with numbered suffixes_copy,-copy- Files with copy prefixes/suffixesduplicate,dup- Files marked as duplicates- Multiple language variants (copie, kopia, copia)
Development
Requirements
- Python 3.8+
hashlibfor MD5 hashingpathlibfor file path operationsFastMCPfor MCP server implementation
Project Structure
DuplicateListor/
├── duplicatelist.py # Main server implementation
├── main.py # Entry point
├── pyproject.toml # Project configuration
└── README.md # This file
Safety Features
- Dry-run Mode: Preview changes before actual deletion
- Smart Original Detection: Preserves files without copy indicators
- Error Handling: Graceful handling of inaccessible files
- Content-based Matching: Uses MD5 hashing for accurate duplicate detection
- Detailed Reporting: Comprehensive feedback on all operations
License
MIT License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.