I needed to save some space, and my disk had Debian and Ubuntu mirrors. Most source packages are identical, so hardlinking the relevant files was a good way to save a few tens of gigabytes.
The fdupes program was the only tool I found for this that was packaged for Debian. Unfortunately, it can only list or delete duplicates, not hardlink them to each other.
I started writing a program to parse fdupes output and do the hardlinking, but it evolved into a program that did everything.
I named it dupfiles.
It has a test suite, but I don't know if it covers all cases. Probably not. Worked for me, but please be careful. Patches most welcome.
- Git browse: http://git.liw.fi/cgi-bin/cgit/cgit.cgi/dupfiles/
- Git clone:
I wrote a script to let me run benchmarks semi-easily (
the source tree). I made a copy of my laptop home directory to a server
(about 140 GiB of data), and compared all four tools I know of:
30.8 fdupes 22.9 ./dupfiles 20.5 finddup 13.9 hardlink
Times are in seconds. hardlink is the clear winner in this case.
- pmatch seems very versatile