The Problem with Encrypted Backups
Posted: 2015-11-08
It is often said that any data you do not have backed up is data you do not care about. I agree. This adage is used to encourage people to start keeping copies of their data. However, simply making copies is not enough to make data safe. Those copies must be invulnerable to whatever catastrophe you are trying to protect your data from. Thus, when choosing a method of backing up your data, you must take into account your threat model: that is, what could go wrong, and how will your data survive it?
For example, if you are only worried about hard drive failure, RAID1 is "backup" enough. If you just want to recover from accidental file deletion or an application corrupting your files (bugs, malware, etc.), then filesystem snapshots are a reasonable solution. On the other hand, if your family photos need to survive a house fire, you need some sort of offsite backup, like giving a hard disk to a friend, or uploading to a cloud service.
My main backup solution is to synchronize my home directory between my laptop
and my home server with unison
. This works great for protecting
against several failure modes: accidental deletion, filesystem corruption, disk
failure, stolen laptop, and natural disasters at my house. I consider this to
be effectively an offsite backup because my laptop is rarely in the same
building as the server. My workstation mounts its home directory with Samba, so
it has nothing to back up. Operating system configuration is version controlled
and stored in my home directory (I currently use Ansible). The only thing left
is data on my server outside my home directory, like my web site.
I use encryption on all of my systems for various reasons. For my laptop, it is to prevent data access in case it is lost or stolen (physical security is especially hard with small self-contained machines you carry around with you). For my workstation, it is because I am not the only person with a key to my abode, so I cannot guarantee that nobody else has gained physical access to my machine. For my server, it is so I can RMA a failed disk without worrying about its contents. On all of my systems, and additional reason is to prevent someone booting a live system and installing a rootkit or keylogger. Therefore, I must use full-disk encryption and not simply use PGP with individual files.
Since I use encryption on the main copy of my data, to maintain its security I must also encrypt my backups. However, it would be a good idea to encrypt offsite backups even if my main storage was not encrypted. Data at rest is kept confidential in one of two ways: by ensuring the physical security of the computer it is stored on or by encryption. This first method is generally not possible when it comes to offsite backups (unless you, e.g., work for a large company with multiple datacenters), so that leaves you with only encryption.
Unfortunately, encrypted offsite backups present a few problems:
- They are long-lived due to the effort required to make them (slow uploads for cloud services, or having to carry a physical disk to a secure location)
- They are rarely used, since all but the most catastrophic failures can be recovered from locally
- They must be encrypted, since you cannot guarantee physical security
- The password must be unusually strong, since any attacker will have a long time to mount a brute-force attack (because, from point 3, they could just image your disk, which they will have in possesion for a while due to point 1, and you won't notice because of point 2) The combination of points 2 and 4 above mean you have to create a strong password that you will rarely use. This sounds like the perfect recipe for a password you will forget.
And here we get to the personal application--the motivation for the story. I had a blog. It had several posts. In a way, I still have the blog. It is stored on a laptop hard disk in my desk. However, that disk is encrypted with dm-crypt and a rather long passphrase. It is an offsite backup made August 2014. I have tried brute forcing anything I thought the password might be, but I have not been successful. You see, at one point, I was shuffling data around on my disks and ran out of space. I had the idea that "well, everything is backed up anyway, so I can just delete it now and restore it later when I need it." Then I forgot for a year, and by the time I needed the data again, I could not access it. So all of my previous website content is lost, probably forever, to the mathematical void. I keep the disk in hope that one day I will remember the passphrase, but I realize my chances are slim.
What did I learn from this? That an untested backup is no backup at all! Test your backups!