(July 2020)
The COVID19 situation means that like so many other folks, over the last 4 months I've been working remotely from home. I was told to take my work laptop (a Windows 10 machine) with me - and use it everyday; connecting via VPN and going through my tasks.
A lot of work has been done in 4 months. And I am backing it all up in the encrypted ZFS mirror I built with my Atomic PI.
The previous sentence says a lot. Allow me to elaborate:
My 35$ backup (and Vagrant, and Docker, and many other things...) server
I have attached two external 2TB USB drives on this tiny 35$ Linux server.
I used ZFS, the undisputed king of filesystems, to arrange the drives in a "mirror" configuration.
You can think of it like so: "if while my master reads his files I detect an error, I will get that data from the other drive - and I will re-write the data in the "bad" drive where I detected the error".
ZFS can detect errors, because it uses checksums while reading; and when it detects errors, it will transparently address them by reading from the spare storage. It will also update (rewrite) the storage where the error was detected.
ZFS would do this even if I had a single USB drive attached, if when creating the
filesystem I used the "copies=2"
option ; basically making sure that each data block is kept in
two places on the disk. That wouldn't shield me from the whole disk going
dead, though ; hence why I use two drives.
ZFS supports compression, with various algorithms. I have more than 1.9x compression ratio on everything I store inside it - except two ZFS filesystems where I explicitly said I don't want that (yes, you can have many filesystems inside a single ZFS pool). Why? Because the data I store there were already compressed (videos) - why waste CPU trying to compress them?
The spare storage is used by ZFS to speed up reads, too. When multiple processes read data, both drives will be used at the same time to serve them.
To protect the sensitivity of the data, my ZFS pool is built from encrypted LUKS devices. The mirror is not built on the raw devices of the two disks; it is instead built on top of LUKS-encrypted devices that are backed by them. Simply put: if someone steals the drives, he won't be able to read anything from them.
The encryption uses the AES instructions on the Atom CPU; I don't experience any slowdown when reading data from the ZFS pool.
I could speak about more things; like snapshots, and optimal use of space... But I'll just stop here.
Nothing beats ZFS. Nothing even comes close.
My work laptop has a 1TB NVMe storage.
I could try to rsync
files to the ZFS server while Windows runs; but
that's not really optimal. For example, Windows doesn't normally allow files opened
for writing to be read by your backup program; the dreaded "sharing violation"
is still there in many cases. And permissions under Windows are sometimes
a mystery - files fail to open for weird reasons (e.g. because
the antivirus daemon decided to wake up and do something with them).
One thought that comes to mind that covers all angles - by allowing me to recover from a complete disk failure - is to boot from a Linux USB stick and take a disk image of the entire drive. I can resilver a new drive with the image; and get back to my pristine OS setup.
But why take a dd
image, when you can get a compressed SquashFS
image instead?
Why indeed.
Boot from a Linux USB stick - I use
SystemRescue -
and after mounting your ZFS storage remotely (via NFS
, CIFS
, etc -
take your pick), you execute this inside it:
mkdir empty-dir
mksquashfs empty-dir squash.img -p 'nvme0n1.img f 444 root root dd if=/dev/nvme0n1 bs=1M'
rmdir empty-dir
What this does, is create a SquashFS filesystem that contains a single file:
nvme0n1.img
. This file's contents are obtained by the dd
command;
basically a bit-by-bit copy of the entire NVMe drive.
After a few hours, I have a squash.img
in my ZFS pool. Nicely checksummed
and mirrored in my two USB drives. Even though the original drive was 1TB,
the SquashFS image is just 400GB.
Now, I can mount the NTFS partition inside it:
$ cat /tank/ESA/Laptop/mount.sh #!/bin/bash df | grep /iso5 || { echo "[-] Mounting SquashFS..." mount -o loop squash.img /iso5/ || exit 1 } df | grep /iso6 || { echo "[-] Mounting NTFS-3G..." losetup -f /iso5/nvme0n1.img partprobe /dev/loop0 mount -o ro -t ntfs-3g /dev/loop0p3 /iso6 || exit 1 }
After this, I get a read-only mounted copy of my entire Windows drive
on /iso6
. Nice.
The only way I can survive under Windows, is by working inside Linux virtual machines. Which means a lot of my data exist only inside .vmdk / .vdi files.
Guess what - no problem:
modprobe nbd qemu-nbd -r -c /dev/nbd0 /iso6/path/to/some.vdi mount /dev/nbd0p3 /somewhere
...and when you are done:
... umount /somewhere qemu-nbd -d /dev/nbd0
OK, but that's a static image - how do you update it?
There are many ways. The one I use is
OverlayFS.
Basically, any time I want, I rsync
the new state of the data to the ZFS
server, into a path that is served by both the read-only mounted folder
from the SquashFS snapshot, and a writable folder that stores the differences
sent by rsync
.
And whenever I want, I can use the OverlayFS-mounted folder, to create an updated SquashFS image - and recompress everything into a new snapshot. Just to save some space.
So, overall - we use a compressed version of the entire drive (which in my
case was 400GB instead of the full 1TB), to mount and access the entire
contents, find/grep
do anything we want in their contents... and also
mount the virtualized storage contained inside them, and do anything we
want inside those too.
And we can update it - while knowing that behind it, there's rock solid ZFS-based mirroring. And compression. And checksums.
I challenge you, dear reader - to point me to something that provides all the functionality described above, and that only costs 2 external USB drives, and a 35$ Single Board Computer.
P.S. My periodic pool scrubs have not found any issue yet. The system is rock solid - and I use the Atomic PI for other tasks, too - including booting up Docker containers, and even Vagrant/libvirt ones. I didn't expect that last part to work, but ZRAM (memory compression) really helps.