MCP Duplicate File Manager Server

MCP Duplicate File Manager Server

Enables LLM agents to find and safely delete duplicate files using content-based detection, with smart copy prioritization and dry-run mode.

Category
Visit Server

README

MCP Duplicate File Manager Server

A Model Context Protocol (MCP) server that enables LLM-based AI agents to manage duplicate files efficiently. Designed for seamless integration with LLMs and AI agents using the MCP protocol with intelligent duplicate detection and safe deletion capabilities.


Features

  • Exposes file management tools for LLM agents via MCP:
    • list_files: List files in directories with optional filtering
    • find_duplicates: Find duplicate files based on content hash (MD5)
    • delete_duplicates: Safely delete duplicate files with intelligent copy detection
  • Smart copy detection: Prioritizes deleting files with copy indicators ("copy", "(1)", "(2)", etc.)
  • Content-based duplicate detection using MD5 hashing for accuracy
  • Recursive directory scanning with optional pattern matching
  • Dry-run mode for safe preview before actual deletion
  • Simple integration with Claude, MCP clients, or other LLM-based agents

Quickstart

1. Install & Run

Run the MCP Duplicate File Manager Server using uvx:

uvx --from git+https://github.com/deerajd/DuplicateFileRemover-MCP.git mcp-server

2. Integrate with Your LLM Agent

Add the following to your tool configuration (for Claude, MCP clients, etc.):

{
  "mcpServers": {
    "DuplicateListor": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/deerajd/DuplicateFileRemover-MCP.git",
        "mcp-server"
      ]
    }
  }
}

This configuration launches the MCP Duplicate File Manager Server as a subprocess, exposing file management tools to your LLM agent.


3. Usage

From your LLM agent, you can call:

List Files in Directory

list_files(directory="./Documents", pattern="*.pdf")        # List PDF files
list_files(directory="/home/user", include_hidden=True)     # Include hidden files
list_files()                                                # List all files in current directory

Find Duplicate Files

find_duplicates(directory="./Downloads")                    # Find duplicates in Downloads
find_duplicates(directory="/Users/john/Desktop", recursive=False)  # Non-recursive search
find_duplicates()                                           # Find duplicates in current directory

Delete Duplicate Files

delete_duplicates(directory="./Documents", dry_run=True)    # Preview what would be deleted
delete_duplicates(directory="./Documents", dry_run=False)   # Actually delete duplicates
delete_duplicates(directory="./Pictures", recursive=True)   # Recursive deletion with preview

The server provides detailed responses including:

  • List of files with metadata (name, size, type, path)
  • Duplicate file groups organized by content hash
  • Deletion results with preserved originals and deleted copies
  • Error handling for inaccessible files or directories

API Reference

list_files(directory=".", pattern="*", include_hidden=False)

  • Parameters:
    • directory (str): Directory path to list files from (default: current directory)
    • pattern (str): File pattern to match (e.g., ".py", ".txt") (default: "*")
    • include_hidden (bool): Whether to include hidden files (default: False)
  • Returns: List of dictionaries containing file information (name, size, type, path, modified)

find_duplicates(directory=".", recursive=True)

  • Parameters:
    • directory (str): Directory path to search for duplicates (default: current directory)
    • recursive (bool): Whether to search recursively in subdirectories (default: True)
  • Returns: Dictionary with hash as key and list of duplicate file paths as values

delete_duplicates(directory=".", recursive=True, dry_run=True)

  • Parameters:
    • directory (str): Directory path to search for duplicates (default: current directory)
    • recursive (bool): Whether to search recursively in subdirectories (default: True)
    • dry_run (bool): If True, only show what would be deleted without actually deleting (default: True)
  • Returns: Dictionary with information about deleted files and preserved originals

Smart Copy Detection

The server intelligently identifies and prioritizes deletion of files with copy indicators:

  • copy - Files containing "copy" in the filename
  • (1), (2), etc. - Files with numbered suffixes
  • _copy, -copy - Files with copy prefixes/suffixes
  • duplicate, dup - Files marked as duplicates
  • Multiple language variants (copie, kopia, copia)

Development

Requirements

  • Python 3.8+
  • hashlib for MD5 hashing
  • pathlib for file path operations
  • FastMCP for MCP server implementation

Project Structure

DuplicateListor/
├── duplicatelist.py             # Main server implementation
├── main.py                      # Entry point
├── pyproject.toml               # Project configuration
└── README.md                    # This file

Safety Features

  • Dry-run Mode: Preview changes before actual deletion
  • Smart Original Detection: Preserves files without copy indicators
  • Error Handling: Graceful handling of inaccessible files
  • Content-based Matching: Uses MD5 hashing for accurate duplicate detection
  • Detailed Reporting: Comprehensive feedback on all operations

License

MIT License


Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured