November 8, 2018

Remote Incremental Backups

When I put my backup system together, I had a few goals in mind

This is a pretty demanding set of requirements, but thankfully there's a filesystem that handles it all pretty well

Enter btrfs

btrfs's snapshots and balance filters are killer features for a backup server. Here's the rundown:

A RAID1 balance filter lets us guarentee that every bit of data is present on two disks, letting us survive a disk failure without data loss. The RAID1 profile in btrfs is somewhat different than a traditional stripe-based RAID1 implementation like you'd find in a hardware RAID controller or mdadm. Instead of striping every piece of data to every disk in the array, btrfs will just create two copies of every block and make sure they're placed on two different physical disks in the pool. This has the neat advantage of always giving you half your physical storage as usable space, even when you have many disks of different sizes (so long as your largest disk in the pool is not larger than the sum of all your other disks). This makes it really easy to build the pool out of miscellaneous disks and add more later.

Snapshots of btrfs subvolumes freeze their contents, so all further modifications are stored separately without overwriting the data in the snapshot. This is a great way to supply incremental backup functionality.

Though not in the original requirements, btrfs also supplies pretty good compression functionality - great for making disk space go further in a backup system.

But we still need a way to actually get the data to the backup server

Enter rsync

We don't want to send the entire contents of a system over the wire for each backup, so we need a tool that can determine what's changed since the last backup, and only send that. With rsync's delta updates, we can scan through the local and remote filesystems looking for modified, new, or deleted files and only copy those to the backup. Plus, it's cross-platform (even Windows has cwRsync) and not tied to any particular filesystem.

Putting it all together

So we throw all our disks in a server, add them all to a btrfs pool, and mount it with compression enabled. We create a directory for each machine we plan to back up, and create a subvolume called "current" in each directory. Every time we want to back up, we rsync a machine to the "current" subvolume in its directory, and then create a snapshot of "current" named after the date. On clients, doing this is something along the lines of

rsync -aP --delete / backup-server:/mnt/backups/my-machine/current/ && ssh backup-server 'btrfs subvolume snapshot -r /mnt/backups/my-machine/current/ /mnt/backups/my-machine/$(date +%F-%H%M%S)'

Which you can break into scripts on the client or server as necessary.

You can do this with anything and everything. An entire filesystem, a home directory, or just your documents on as many systems as you feel like backing up. Running out of space? Add any new disk to the btrfs pool and start a rebalance, and you're good to go with extra space.