Post-processed deduplication on Linux using reflinks

By steve, 21 January, 2020

BTRFS and XFS support reflinks, where a userspace program can deduplicate data by telling the filesystem to point a block of data within one file to an existing block of data. The filesystem then makes sure that if the data is updated in one of the files, it makes a copy and does not update both files.

https://strugglers.net/~andy/blog/2017/01/10/xfs-reflinks-and-deduplica…

Files can be "copied" using "cp --reflinks" to duplicate an existing file, however the main point of this page is to document the user space tools used to perform file and block level after processing:

Dedupremove will scan files and directories on a block level for duplicate blocks and link
https://github.com/markfasheh/duperemove/wiki

Bedup will scan a BTRFS filesystem for any common files and link them together. It integrates with BTRFS, but does have an advantage that after the initial run it will run quickly by looking at the BTRFS generation and only scanning data that has changed since the last scan.
https://github.com/g2p/bedup

Comments