dedup

By steve, 19 December, 2014

I recently had a look at a few linux deduplicated filesystems, namely opendedup, lessfs and ddumbfs to act as a backup repository storage.

I was frustrated with opendedup because if would return "filesystem full" when there was no obvious reason why the filesystem might be full. As a result, I did not feel confident that I could safely monitor the system to alert me prior to the filesystem filling up.

Lessfs was too slow in my environment using the berkelydb and tokyodb backends, and I could not get hamsterdb to make a clean filesystem

Tags

By steve, 5 November, 2013

Before you begin:

Like any storage device, disk configuration is a factor, including:
• Disk speed (SSD/15k/10k/7200)
• RAID Level
• Write-back cache (Hardware RAID with BBU, Linux bcache, EMC FAST cache)
• Memory for read cache

In addition for the above, dedup appliances need RAM to store the contents of the hash. For SDFS the rule is:
• (volume size / chunk size) * 25. This equated to 256MB per TB for a 128k chunk size, and 8GB per TB for a 4k chunk size.
• You also need CPU to process the data