HandleDuplicateFiles

An efficient Windows console application to locate and deduplicate files by content. It recursively scans a directory (with optional extension filtering), groups same-sized files, compares their contents in buffered chunks, reports duplicate groups, and optionally replaces duplicate files with NTFS hard links to save space.

Features

Recursively enumerate files under a root folder
Optional, case-insensitive extension filter (e.g. .txt)
Skip files smaller than 16 KB
Group files by size, then by content using a pivot-based, buffered comparison
Report duplicate groups and total reclaimed bytes
Replace duplicates with hard links to the master file

Requirements

Windows 7 or later on an NTFS volume
A C++17-capable compiler (e.g. MSVC)
Windows SDK for WinAPI functions
Console configured for Unicode output

Build Instructions

You can compile with the Microsoft Visual C++ compiler (cl.exe) from a Developer Command Prompt:

cl /EHsc /W4 HandleDuplicateFiles.cpp

Alternatively, create a Visual Studio project:

New → Visual C++ → Empty Project
Add HandleDuplicateFiles.cpp to Source Files
Project → Properties → Configuration Properties → General → Character Set: Use Unicode
Build the solution

Usage

HandleDuplicateFiles.exe [extension_filter]

root_folder The top-level directory to scan.
extension_filter (optional) File extension filter including the dot (e.g. .jpg, .txt). Case-insensitive. If omitted, all files ≥ 16 KB are considered.

Examples

Find duplicate .txt files in C:\Projects:

HandleDuplicateFiles.exe C:\Projects .txt

Scan all files ≥ 16 KB under D:\Media:

HandleDuplicateFiles.exe D:\Media

How It Works

EnumerateFilesAndGroupBySize Recursively visits each file, skips reparse points, filters by extension and minimum size, and buckets paths by file size.
GroupFilesByContentUsingMap For each size-bucket with ≥ 2 files:
- Picks the first file as the pivot.
- Compares buffered chunks of the pivot against batches of up to 256 “right” files via CompareFilesBufferedAdvanced.
- Builds a map of comparison keys (byte offset + mismatch byte) → lists of files.
- Recursively groups non-matching subsets by their next pivot key.
- Files matching pivot exactly join the duplicate group.
Reporting Prints each duplicate group with its byte size and computes total reclaimed bytes.
Deduplication For each group, the first file remains the master; others are deleted and replaced with hard links pointing to it (unless they already share the same NTFS file ID).

Permissions & Notes

Must run with sufficient permissions to delete files and create hard links.
Hard links only work on NTFS volumes; duplicates on other file systems will be reported but not linked.
Large file sets can consume memory proportional to the number of open streams and grouping structures.

License

This project is released under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
HandleDuplicateFiles.cpp		HandleDuplicateFiles.cpp
HandleDuplicateFiles.sln		HandleDuplicateFiles.sln
HandleDuplicateFiles.vcxproj		HandleDuplicateFiles.vcxproj
HandleDuplicateFiles.vcxproj.filters		HandleDuplicateFiles.vcxproj.filters
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HandleDuplicateFiles

Features

Requirements

Build Instructions

Usage

Examples

How It Works

Permissions & Notes

License

About

Uh oh!

Releases

Packages

Languages

License

aliakseis/HandleDuplicateFiles

Folders and files

Latest commit

History

Repository files navigation

HandleDuplicateFiles

Features

Requirements

Build Instructions

Usage

Examples

How It Works

Permissions & Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages