A simple automated backup scheme

I haven't talked about backups since more than two years, but as detailed in my previous post, I was recently most emphatically reminded that it isn't clear to most people that their data can vanish anytime. Anytime? Anytime. poof

I've detailed what I expect from a backup solution and what I've used over the years in several previous posts (see one, two, and three) and do not need to repeat that here. My current backup scheme is based on borg with support from rsync and ownCloud. The following chart (which looks way more complicated than it is) visualizes this scheme and is explained below.

../images/backup.svg.png

My home installation is on the left, separated from my office setup on the right by the dashed line in the center. The desktops (top) are equipped with an SSD containing the system and home partitions as well as an HDD for archiving, backup, and caching purposes. Active projects are stored in a dedicated ownCloud folder and synchronized between all clients, including my notebooks. The synchronization is done via the ownCloud server (the light gray box at the lower right) and is denoted by (1) in the chart. The server keeps a number of previous versions of each file, and in case a file is deleted, the latest version is kept, providing a kind of poor man's backup. But I'm using ownCloud to sync my active projects between my machines, not for its backup features.

The actual backup from the SSD to the HDD is denoted by (2) or by (2') for my notebooks (to the same HDD, but over WiFi), and is done by borg. The backup is then transferred to a NAS (the gray boxes at the bottom) by simple rsync scripts indicated by (3). All of that is orchestrated with the help of a few crontab entries – here's the one for my office system as an example:

$ crontab -l
15  7-21        *   *   *       $HOME/bin/backup.borg
30  21          *   *   *       source $HOME/.keychain/${HOSTNAME}-sh;  $HOME/bin/sync.backup
30  23          *   *   *       $HOME/bin/sync.vms
30  02          *   *   *       $HOME/bin/sync.archive

The first entry invokes my borg backup script (see below) every hour between 7:15 and 21:15 (I'm highly unlikely to work outside of this time interval). The second entry (see also below) takes care of transferring the entire backup folder on the HDD 15 min after the last backup to the NAS. Since my rsync script invokes ssh for transport, I use keychain to inform cron about the current values of the environment variables SSH_AUTH_SOCK and SSH_AGENT_PID. The third entry induces the transfer of any changed virtual machine to the internal HDD. And finally, the fourth entry syncs the archive on the internal HDD to an external one (3'). I do that since once a project is finished, the corresponding folder is moved out of the ownCloud folder to the archive, effectively taking it out from the daily backup. This way, the size of my ownCloud folder never increased beyond 3.5 GB over the past five years. Since the projects typically don't change anymore once they are in the archive, this step effectively just creates a copy of the archive folder.

What's not shown in the chart above: there's a final backup level involving the NAS. At home I do that manually (and thus much too infrequently) by rsyncing the NAS content to the HDDs I've collected over the years. ` ;) At the office, the NAS is backed up to tape every night automatically. The tapes are part of a tape library located in a different building (unfortunately, too close to survive a nuclear strike ;) ), and are kept ten years as legally required.

What does all of that mean when I prepare an important document such as a publication or a patent? Well, let's count. It doesn't matter where I start working: since the document is in my ownCloud folder, it is soon present on five different disks (two desktops and notebooks, and the ownCloud server). The backup and sync adds four more disks, and the final backup of the NAS results in two more copies (one on disk, one on tape). Altogether, within one day, my important document is automatically duplicated to ten different storage media (disks or tape) in three different locations. And when I continue working on this document the next days, my borg configuration (see below) keeps previous copies up to six month in the past (see cron mail below).

You're probably thinking that I'm a complete paranoid. 10 different storage media in 3 different locations! Crazy! Well, the way I do it, I get this kind of redundancy and the associated peace of mind for free. See for yourself:

(1) ownCloud

My employee runs an ownCloud server. I just need to install the client on all of my desktops and notebooks. If you are not as lucky: there are very affordable ownCloud or nextCloud (recommended) servers available in the interwebs.

(2) backup.borg

A simple shell script:

#!/bin/bash

#`https://github.com/borgbackup/borg <https://github.com/borgbackup/borg>`_
#`https://borgbackup.readthedocs.io/en/stable/index.html <https://borgbackup.readthedocs.io/en/stable/index.html>`_

ionice -c3 -p$$

repository="/bam/backup/attic" # directory backing up to
excludelist="/home/ob/bin/exclude_from_attic.txt"
hostname=$(echo $HOSTNAME)

notify-send "Starting backup"

          borg create --info --stats --compression lz4                  \
          $repository::$hostname-`date +%Y-%m-%d--%H:%M:%S`             \
          /home/ob                                                      \
          --exclude-from $excludelist                                   \
          --exclude-caches

notify-send "Backup complete"

          borg prune --info $repository --keep-within=1d --keep-daily=7 --keep-weekly=4 --keep-monthly=6

          borg list $repository

(3) sync.backup

A simple shell script:

#!/bin/bash

# `http://everythinglinux.org/rsync/ <http://everythinglinux.org/rsync/>`_
# `http://troy.jdmz.net/rsync/index.html <http://troy.jdmz.net/rsync/index.html>`_

ionice -c3 -p$$

RHOST=nas4711

BUSPATH=/bam/backup/attic
BUDPATH=/home/users/brandt/backup

nice -n +10 rsync -az -e 'ssh -l brandt' --stats --delete $BUSPATH $RHOST:$BUDPATH

The two other scripts in the crontab listing above are entirely analogous to the one above.

That's all. It's just these scripts and the associated crontab entries above, nothing more. And since (2) and (3) are managed by cron, I'm informed about the status of my backup every time one is performed. The list of entries you see in the mail below are the individual backups I could roll back to, or just copy individual files from after mounting the whole caboodle with 'borg mount -v /bam/backup/attic /bam/attic_mnt/' (see the screenshot below). You see how these backups are organized: hourly for the last 24h, daily for the last week, weekly for the past month, and monthly for the five months after.

From: "(Cron Daemon)" <ob@pdi282>
Subject: Cron <ob@pdi282> $HOME/bin/backup.borg

------------------------------------------------------------------------------
Archive name: pdi282-2018-12-21--14:15:01
Archive fingerprint: d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8
Time (start): Fri, 2018-12-21 14:15:01
Time (end):   Fri, 2018-12-21 14:15:21
Duration: 19.52 seconds
Number of files: 173314
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                                           Original size      Compressed size    Deduplicated size
This archive:               30.14 GB             22.17 GB              1.40 MB
All archives:              975.54 GB            715.02 GB             43.41 GB

                                           Unique chunks         Total chunks
Chunk index:                  216890              6892358
------------------------------------------------------------------------------
pdi282-2018-06-30--21:15:01          Sat, 2018-06-30 21:15:01 [113186c574898837d0fb11e6fb7b71f62b0a5422d71b627662aec0d2d6a0e0bf]
pdi282-2018-07-31--21:15:01          Tue, 2018-07-31 21:15:01 [8af0cccbab5645490fcec5e88576dad1a3fbbfd3d726a35e17851d7bec545958]
pdi282-2018-08-31--21:15:01          Fri, 2018-08-31 21:15:01 [2d763ea253d18222015d124c48826425e75b83efedeedcc11b24cf8f0d7e8899]
pdi282-2018-09-30--21:15:01          Sun, 2018-09-30 21:15:01 [39932a0d8c081bc05f9cdff54637e3962fd9e622edce8ef64160e79ae767541f]
pdi282-2018-10-31--21:15:01          Wed, 2018-10-31 21:15:02 [49386980b5270554c6c92b8397809736dea5d07c7ccb3861187a6ed5065ba7a6]
pdi282-2018-11-18--21:15:01          Sun, 2018-11-18 21:15:02 [c2eb215ce883fa5a0800a9d4b9a6c53ac82ace48151180e6a15e944dbf65e009]
pdi282-2018-11-25--21:15:01          Sun, 2018-11-25 21:15:01 [e99c2f3baed4a863b08551605eb8ebeaa5ed6a02decccdb88268c89a9b9b9cc0]
pdi282-2018-11-30--21:15:01          Fri, 2018-11-30 21:15:01 [882f6466adcbc43d7e1a12df5f38ecc9b257a436143b00711aa37e16a4dbf54d]
pdi282-2018-12-02--21:15:02          Sun, 2018-12-02 21:15:02 [7436da61af62faf21ca3f6aeb38f536ec5f1a4241e2d17c9f67271c3ba76c188]
pdi282-2018-12-09--21:15:01          Sun, 2018-12-09 21:15:02 [82e6c845601c1a12266b0b675dfeaee44cd4ab6f33dafa981f901a3e84567bbb]
pdi282-2018-12-13--21:15:01          Thu, 2018-12-13 21:15:01 [9ac3dfd4aca2e56df8927c7bc676cd476ea249f4dd2c1c39fc2a4997e0ada896]
pdi282-2018-12-14--21:15:02          Fri, 2018-12-14 21:15:02 [c8c1358f58dae6eb28bd66e9b49c7cfe237720de21214ebd99cc4b4964ec9249]
pdi282-2018-12-15--21:15:01          Sat, 2018-12-15 21:15:01 [e24d3b26dcdf81d0b0899085fb992c7a7d33d16671fba7a2c9ef1215bd3ae8fb]
pdi282-2018-12-16--21:15:01          Sun, 2018-12-16 21:15:01 [27a8a6943f1053d106ced8d40848eccbfb6c145d80d5e2a9e92f891ed98778ce]
pdi282-2018-12-17--21:15:02          Mon, 2018-12-17 21:15:02 [14118ea958387e0a606e9f627182e521b92b4e2c2dd9fb5387660b84a08971a6]
pdi282-2018-12-18--21:15:01          Tue, 2018-12-18 21:15:01 [842c3f7e301de89944d8edf7483956aff2b7cf9e15b64b327f476464825bd250]
pdi282-2018-12-19--21:15:01          Wed, 2018-12-19 21:15:01 [b7f99c56a8e6ee14559b3eddec04646c8a756515765db562c35b8fbefcd4e58e]
pdi282-2018-12-20--15:15:01          Thu, 2018-12-20 15:15:01 [e832afd41762a69cb8c5fe1c14395dde313dc4368871fd27073fdc50e9f7c6c9]
pdi282-2018-12-20--16:15:01          Thu, 2018-12-20 16:15:01 [8471ccb87d513604d31320ff91c2e0aaf0d31e5ff908ff41b8653c55ee11c1e5]
pdi282-2018-12-20--17:15:01          Thu, 2018-12-20 17:15:01 [73a3ae72815a10732fc495317a7e0f8cd9d05eb2ea862f8c01b437138ac82103]
pdi282-2018-12-20--18:15:01          Thu, 2018-12-20 18:15:01 [7eced8e18b52d00300c8f1b17e188fbfc1124dc60adf68ef2924425677615a96]
pdi282-2018-12-20--19:15:01          Thu, 2018-12-20 19:15:01 [6b7dbc4095b704209921424a52ed37d854b3a61c49cc65ac6889d215aad95a6f]
pdi282-2018-12-20--20:15:01          Thu, 2018-12-20 20:15:01 [66da0f57d6c93b149a9fdf679acf5e43fc22ce6b582db4da3ab606df741bdf82]
pdi282-2018-12-20--21:15:01          Thu, 2018-12-20 21:15:01 [1fce9aa4751be905a45ccce7fca3d44be3cf580d5e4b7c4f5091167099df57ad]
pdi282-2018-12-21--07:15:01          Fri, 2018-12-21 07:15:02 [ee551653a18d400719f9ffe1a67787326f5d5dad41be7d7b5482d5610ed86d43]
pdi282-2018-12-21--08:15:01          Fri, 2018-12-21 08:15:01 [264d7ce1dab3bc1578b521a170ee944598fa99f894d6ca273793ad14824b1689]
pdi282-2018-12-21--09:15:01          Fri, 2018-12-21 09:15:01 [b37de3616438e83c7184af57080690db3a76de521e77fd1ae6e90262f6beb1cc]
pdi282-2018-12-21--10:15:01          Fri, 2018-12-21 10:15:01 [6862d0136b2e4ac7fc0544eb74c0085e7baceca7147bd59b13cd68cbf00cb089]
pdi282-2018-12-21--11:15:01          Fri, 2018-12-21 11:15:01 [e5c6ee4ea65d6dacb34badb850353da87f9d5c19bb42e4fb3b951efecd58e64f]
pdi282-2018-12-21--12:15:01          Fri, 2018-12-21 12:15:01 [5b93f864b9422ed953c1aabb5b1b98ce9ae04fe2f584c05e91b87213082e2ff0]
pdi282-2018-12-21--13:15:01          Fri, 2018-12-21 13:15:01 [461f976422c45a7d10d38d1db097abd30a4885181ec7ea2086d05f0afd9169eb]
pdi282-2018-12-21--14:15:01          Fri, 2018-12-21 14:15:01 [d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8]

Here's a screenshot of nemo running on my notebook with an sftp connection to my office desktop, after having mounted the available backups with the command given above.

../images/backup_list.png

The age of (digital) decline

About a decade ago, AppleInsider presented an enthusiastic report on the latest innovation from the iPhone inventor:

Apple is dramatically rethinking how applications organize their documents on iPad, leaving behind the jumbled file system [...].

Outside of savvy computer users, the idea of opening a file by searching through hierarchical paths in the file system is a bit of a mystery.

Apple has already taken some steps to hide complexity in the file system in Mac OS X, [...] the iPhone similarly abstracts away the file system entirely; there is no concept of opening or saving files.

I remember reading that and being very sceptical, perhaps as one of a few, but not the only one:

Heck, my 6 year old daughter can understand the idea of saving some files to a folder with her name on it, and others to different locations.

Another lone voice in the digital wilderness:

While this might sound like some kind of user experience utopia, I have a grave concern that eliminating a file system in this manner misses a huge audience. Us.

Now, almost 10 years later, we begin to pay the price for this development. How's that? Well, my experience shows that users who grew up with iOS or Android as their prime computing environment have difficulties to grasp the basic paradigms that still dominate professional work with computers. In particular, only few young users seem to understand the concept of a file (yes, a file, but see above), file types, and file systems. Even less understood is the client-server model, a concept that is indispensable in a modern IT infrastructure.

Consequences of this erosion of knowledge range from the comical to the disastrous. As an example for the former: when I ask for original data I do not mean ASCII data embedded in an MS Word document and a photo- or micrograph embedded in a Powerpoint presentation. However, many young users do not know that a 'data.dat' or a 'photograph.tiff' are valid file formats that can be viewed and edited by suitable applications. A secretary at an associated university had the opposite tendency: she wrote invitations for seminars with Word, printed them, scanned the printout with 1200 dpi, and attached the resulting 100 MB bitmap to electronic invitations sent by e-mail.

That's funny if your e-mail account has no size limit. But even if, you may see that this development has also much less amusing consequences. On a very general level, these users are incapable to appropriately interact with professional IT infrastructures, including common desktop environments (regardless of their provenience). More specifically, users with these deficiencies should not be trusted with handling and managing important data at all. Because ... they will lose them.

At least that's what's happening here: the number of users who experience a total loss of their data increased rapidly over the past few years. In most cases, the cause was not negligence and carelessness, but an alarming level of ignorance. Often, the root cause arose simply from bypassing the infrastructure we provide, and employing the private notebook for data analysis and presentation instead of the dedicated office desktop. Now, our employees can bring their own devices if they like, but if they don't register them with our IT staff, they will be classified as guest devices that have no access to our intranet – with the rather obvious result that the data on these devices cannot be synced to the home directory of the respective user on our file server (which is part of a daily incremental backup on tape, covering every day over the last 10 years as legally required).

The users, naturally, don't find that obvious at all (although they have been informed at length about these facts). They claim to have acted in the firm believe that the data on their private notebook will be automatically backed up to “the cloud” as soon as they enter a certain “zone” around their working place. When I asked how they imagined this miracle backup would work, one of them referred to Apple commercials in which photographs were transferred from an iPhone to an iPad “magically”. “That's the state-of-the-art, right? I expected that it would be implemented here!” She also said that she imagined the mechanism to work wirelessly, but that she wouldn't care how it worked, as long as it did.

Now, when people approach me with these stories, they want (i) forgiveness and understanding and (ii) an immediate solution. Well...

../images/mccoy-doctor-not-magician.svg

Fortunately, we now have first level support consisting of an invariably cheery youth who finds these problems most entertaining. Let's see what he says in a few years from now, when pampering the “digital natives” has become the next big thing.

And let's see where we are then, with our big hopes and high flying dreams of, for example, artificial intelligence and quantum computing, autonomous electric mobility, populating the Mars, establishing controlled fusion on Earth, and controlling the world's climate. Personally, when I see the present generation of which the majority has difficulties to count to three, well, you know, I'm not all that optimistic. ;)

Living in your own private Idaho

Golem published an interview with Enigmail developer Patrick Brunschwig (the following quotes were translated by DeepL):

“Then I took a look at the different tools that existed. I didn't like Evolution that much and Mozilla was still in its infancy. Nevertheless, we used it.”

All mail clients suck. This one just sucks less. ;)

“What's really important is that PGP is integrated directly into the tools. At K-9 I have to install Openkeychain first, at Thunderbird I have to install Enigmail and then GnuPG. Just this step, to have PGP directly, without addon and without additional software installation in the tool, would simplify a lot of things.”

Well, I had PGP “directly” for the past 15 years. It's a matter of choosing the right tool for the job. ;) I never had to install GnuPG manually (in fact, it's a core package in many distributions: the package managers depend on it since the packages are PGP signed!), and both KMail and Evolution support PGP right out of the box. Mutt as well, by the way. ;) The lack of a native support is only one reason to avoid Thunderbird and its shaky PGP integration via an addon. There are others that are more basic in nature.

"Which more modern mail clients are there? There aren't that many that are still widespread. It's true that Thunderbird isn't very modern, but there are very, very few really modern mail clients. I don't know of any real alternative."

That's either ill informed or deliberately misleading. It's ok to dislike programs on a subjective basis, but one shouldn't entirely lose an objective stance. Of course there are alternatives to Thunderbird (and with native PGP support), for all operating systems. Windows: eM Client, EverDesk, and of course, The Bat! (the best E-Mail program ever, period). MacOS: Canary Mail. And Linux: KMail, Evolution.

Admittedly, the latter two are not exactly “modern”, although I'm not sure what that is supposed to mean. If “modern” refers to the interface, there's Geary, which was initially released 2012 and has recently become quite popular, having been adopted by the Gnome project and being the default mailer in ElementaryOS. Geary features a modern interface and a modern feature set: it does neither support POP3 nor PGP. Even more modern is Mailspring, the successor of Nylas Mail: it's based on Electron and thus weighs 280 MB, its interface is très chic and themable, and it requires a Customer ID to synchronize mails over the different devices modern people have. It doesn't need to bother with PGP, as there's no need for that in the modern, post-privacy world.

But I'm being sarcastic, and that's not entirely fair, since there are also developments on the other end of the spectrum. The icelandic Mailpile, for example, has privacy as its top priority, and invented an entirely new concept for that: a local webmailer. I would actually consider to use it on my notebook wouldn't the interface be the most dilettantishly designed one I have seen since Geocities.