I needed to save some space, and my disk had Debian and Ubuntu mirrors. Most source packages are identical, so hardlinking the relevant files was a good way to save a few tens of gigabytes.

The fdupes program was the only tool I found for this that was packaged for Debian. Unfortunately, it can only list or delete duplicates, not hardlink them to each other.

I started writing a program to parse fdupes output and do the hardlinking, but it evolved into a program that did everything.

I named it dupfiles.

It has a test suite, but I don't know if it covers all cases. Probably not. Worked for me, but please be careful. Patches most welcome.

Benchmark results

I wrote a script to let me run benchmarks semi-easily (speed-test in the source tree). I made a copy of my laptop home directory to a server (about 140 GiB of data), and compared all four tools I know of:

30.8 fdupes
22.9 ./dupfiles
20.5 finddup
13.9 hardlink

Times are in seconds. hardlink is the clear winner in this case.