Modern file compression

Unknown to most users, file compression silently works behind the scene. Updates for any operating system, for example, are compressed. That happens automatically and the user doesn't even need to know about it.

But sometimes, we have a choice. In Archlinux, for example, we can set the compression we'd like to use for packages created by makepkg (such as those installed over the AUR) – but how to chose between gz, bz2, xz, lrz, lzo, and z? And some backup software adds further options: Borg, for example, offers zlib, lzma, lz4, and zstd.

Most surprisingly, some of these algorithms have been developed only very recently: zstd comes from Facebook (2016), and there's brotli from Google (2015) and lzfse from Apple (2015). Why do these multi-billion-dollar companies develop compression algorithms? Because of the multi-billion dollars.

Instead of testing each of these algorithms yourself, you can use lzbench. It tests all open source algorithms of the lz family with the de facto standard file package in the compression business, the silesia suite.

Here are three examples geared toward high compression ratio, high speed compression, and high speed decompression:

High compression ratio (<25%)

➜  lzbench -c -ebrotli,11/xz,6,9/zstd,22 silesia.tar
lzbench 1.7.3 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9814 MB/s  9852 MB/s   211947520 100.00
brotli 2017-12-12 -11    0.48 MB/s   385 MB/s    51136654  24.13
xz 5.2.3 -6              2.30 MB/s    74 MB/s    48745306  23.00
zstd 1.3.3 -22           2.30 MB/s   600 MB/s    52845025  24.93

These are single core values. xz compression (but not decompression) profits from multithreading, while brotli and zstd do not.

High speed compression (for compression ratios <50%)

➜  lzbench -c -elz4/lzo1x silesia.tar
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9861 MB/s  9768 MB/s   211947520 100.00
lz4 1.8.0                 524 MB/s  2403 MB/s   100880800  47.60
lzo1x 2.09 -12            521 MB/s   738 MB/s   103238859  48.71

High speed decompression (> 2000 MB/s)

↪ lzbench -c -elz4/lizard,10/lzsse8,6 silesia.tar
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9579 MB/s 10185 MB/s   211947520 100.00
lz4 1.8.0                 525 MB/s  2421 MB/s   100880800  47.60
lizard 1.0 -10            421 MB/s  2115 MB/s   103402971  48.79
lzsse8 2016-05-14 -6     8.25 MB/s  3359 MB/s    75469717  35.61

What do we learn from these benchmarks?

  1. If we want high compression reasonably fast, nothing beats xz. It's just perfect for what it's actually used by some (all?) Linux distributions: to distribute updates with acceptable computational resources over a channel with a very limited band width.
  2. If the distributor commands over virtually unlimited resources, and compression speed is thus not an issue, brotli and zstd are clearly superior to all other choices. That's how we would like to have our updates: small and fast to decompress.
  3. If size is not of primary importance, but compression speed is, lz4 and lzo are the champions.
  4. If decompression speed is essential, lzsse8 wins. This is a lesser known member of the lz family and not widely available, in contrast to lz4 which thus scores again.

Weather widget

To my great dismay, Yahoo announced earlier this year that their weather API will retire at January 3rd. And indeed, my little weather conky ceased to work at this day and thus reached its EOL. :(

That's really too bad. Common forecast sites are peppered with dozens of javascripts and take ages to open on my Mini or my Lifebook. Here's an example which I find disagreeably slow even on my desktop. Besides, I also liked my weather conky for its aesthetic merit – at least _I_ found it pleasing to look at.

Well well well. There’s no point in crying over spilled milk. What are the options? There's conkywx, of course, the script collection for true weather aficionados. I look at it every second year and always feel that it's a tad overwhelming, at least for me, both with regard to features and visuals (one, two, three).

Alternatively, I could rebuild my weather conky around a new API – perhaps even the new Yahoo API. But I don't like to register anywhere for a simple weather forecast (and certainly not at Yahoo), and also don't want to spent more than, say, 15 min in getting a new forecast.

And then I found wttr.in – the fastest weather forecast ever. It's accessible by browser or in the terminal as a curl-based service, can be configured on the fly, and its ASCII graphs breath nerdy charm.

It's a one-liner in conky:

${execpi 3600 curl -s "wttr.in/Berlin?nT&lang=de" | head -n -2}

for a narrow (?n) and black-and-white (?T) view and

${execpi 3600 curl -s "wttr.in/Berlin?n&lang=de" | ~/.config/conky/ansito | head -n -2}

in full color thanks to ansito, which translates the ANSI color codes to the corresponding ones on conky.

But do we actually need a desktop widget when the forecast is so readily accessible? One could simple have an extra tab open in a browser, for example. Or, one could have a tab reserved for wttr.in in a Guake style terminal. For convenience, one could define an alias (works for all major shells) that updates the weather forecast every hour:

alias wttr='watch -ct -n 3600 "curl -s wttr.in/Berlin?lang=de | head -n -2"'

Hey, I like that. :)

A simple automated backup scheme

I haven't talked about backups since more than two years, but as detailed in my previous post, I was recently most emphatically reminded that it isn't clear to most people that their data can vanish anytime. Anytime? Anytime. poof

I've detailed what I expect from a backup solution and what I've used over the years in several previous posts (see one, two, and three) and do not need to repeat that here. My current backup scheme is based on borg with support from rsync and ownCloud. The following chart (which looks way more complicated than it is) visualizes this scheme and is explained below.

../images/backup.svg.png

My home installation is on the left, separated from my office setup on the right by the dashed line in the center. The desktops (top) are equipped with an SSD containing the system and home partitions as well as an HDD for archiving, backup, and caching purposes. Active projects are stored in a dedicated ownCloud folder and synchronized between all clients, including my notebooks. The synchronization is done via the ownCloud server (the light gray box at the lower right) and is denoted by (1) in the chart. The server keeps a number of previous versions of each file, and in case a file is deleted, the latest version is kept, providing a kind of poor man's backup. But I'm using ownCloud to sync my active projects between my machines, not for its backup features.

The actual backup from the SSD to the HDD is denoted by (2) or by (2') for my notebooks (to the same HDD, but over WiFi), and is done by borg. The backup is then transferred to a NAS (the gray boxes at the bottom) by simple rsync scripts indicated by (3). All of that is orchestrated with the help of a few crontab entries – here's the one for my office system as an example:

$ crontab -l
15  7-21        *   *   *       $HOME/bin/backup.borg
30  21          *   *   *       source $HOME/.keychain/${HOSTNAME}-sh;  $HOME/bin/sync.backup
30  23          *   *   *       $HOME/bin/sync.vms
30  02          *   *   *       $HOME/bin/sync.archive

The first entry invokes my borg backup script (see below) every hour between 7:15 and 21:15 (I'm highly unlikely to work outside of this time interval). The second entry (see also below) takes care of transferring the entire backup folder on the HDD 15 min after the last backup to the NAS. Since my rsync script invokes ssh for transport, I use keychain to inform cron about the current values of the environment variables SSH_AUTH_SOCK and SSH_AGENT_PID. The third entry induces the transfer of any changed virtual machine to the internal HDD. And finally, the fourth entry syncs the archive on the internal HDD to an external one (3'). I do that since once a project is finished, the corresponding folder is moved out of the ownCloud folder to the archive, effectively taking it out from the daily backup. This way, the size of my ownCloud folder never increased beyond 3.5 GB over the past five years. Since the projects typically don't change anymore once they are in the archive, this step effectively just creates a copy of the archive folder.

What's not shown in the chart above: there's a final backup level involving the NAS. At home I do that manually (and thus much too infrequently) by rsyncing the NAS content to the HDDs I've collected over the years. ` ;) At the office, the NAS is backed up to tape every night automatically. The tapes are part of a tape library located in a different building (unfortunately, too close to survive a nuclear strike ;) ), and are kept ten years as legally required.

What does all of that mean when I prepare an important document such as a publication or a patent? Well, let's count. It doesn't matter where I start working: since the document is in my ownCloud folder, it is soon present on five different disks (two desktops and notebooks, and the ownCloud server). The backup and sync adds four more disks, and the final backup of the NAS results in two more copies (one on disk, one on tape). Altogether, within one day, my important document is automatically duplicated to ten different storage media (disks or tape) in three different locations. And when I continue working on this document the next days, my borg configuration (see below) keeps previous copies up to six month in the past (see cron mail below).

You're probably thinking that I'm a complete paranoid. 10 different storage media in 3 different locations! Crazy! Well, the way I do it, I get this kind of redundancy and the associated peace of mind for free. See for yourself:

(1) ownCloud

My employee runs an ownCloud server. I just need to install the client on all of my desktops and notebooks. If you are not as lucky: there are very affordable ownCloud or nextCloud (recommended) servers available in the interwebs.

(2) backup.borg

A simple shell script:

#!/bin/bash

#`https://github.com/borgbackup/borg <https://github.com/borgbackup/borg>`_
#`https://borgbackup.readthedocs.io/en/stable/index.html <https://borgbackup.readthedocs.io/en/stable/index.html>`_

ionice -c3 -p$$

repository="/bam/backup/attic" # directory backing up to
excludelist="/home/ob/bin/exclude_from_attic.txt"
hostname=$(echo $HOSTNAME)

notify-send "Starting backup"

          borg create --info --stats --compression lz4                  \
          $repository::$hostname-`date +%Y-%m-%d--%H:%M:%S`             \
          /home/ob                                                      \
          --exclude-from $excludelist                                   \
          --exclude-caches

notify-send "Backup complete"

          borg prune --info $repository --keep-within=1d --keep-daily=7 --keep-weekly=4 --keep-monthly=6

          borg list $repository

(3) sync.backup

A simple shell script:

#!/bin/bash

# `http://everythinglinux.org/rsync/ <http://everythinglinux.org/rsync/>`_
# `http://troy.jdmz.net/rsync/index.html <http://troy.jdmz.net/rsync/index.html>`_

ionice -c3 -p$$

RHOST=nas4711

BUSPATH=/bam/backup/attic
BUDPATH=/home/users/brandt/backup

nice -n +10 rsync -az -e 'ssh -l brandt' --stats --delete $BUSPATH $RHOST:$BUDPATH

The two other scripts in the crontab listing above are entirely analogous to the one above.

That's all. It's just these scripts and the associated crontab entries above, nothing more. And since (2) and (3) are managed by cron, I'm informed about the status of my backup every time one is performed. The list of entries you see in the mail below are the individual backups I could roll back to, or just copy individual files from after mounting the whole caboodle with 'borg mount -v /bam/backup/attic /bam/attic_mnt/' (see the screenshot below). You see how these backups are organized: hourly for the last 24h, daily for the last week, weekly for the past month, and monthly for the five months after.

From: "(Cron Daemon)" <ob@pdi282>
Subject: Cron <ob@pdi282> $HOME/bin/backup.borg

------------------------------------------------------------------------------
Archive name: pdi282-2018-12-21--14:15:01
Archive fingerprint: d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8
Time (start): Fri, 2018-12-21 14:15:01
Time (end):   Fri, 2018-12-21 14:15:21
Duration: 19.52 seconds
Number of files: 173314
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                                           Original size      Compressed size    Deduplicated size
This archive:               30.14 GB             22.17 GB              1.40 MB
All archives:              975.54 GB            715.02 GB             43.41 GB

                                           Unique chunks         Total chunks
Chunk index:                  216890              6892358
------------------------------------------------------------------------------
pdi282-2018-06-30--21:15:01          Sat, 2018-06-30 21:15:01 [113186c574898837d0fb11e6fb7b71f62b0a5422d71b627662aec0d2d6a0e0bf]
pdi282-2018-07-31--21:15:01          Tue, 2018-07-31 21:15:01 [8af0cccbab5645490fcec5e88576dad1a3fbbfd3d726a35e17851d7bec545958]
pdi282-2018-08-31--21:15:01          Fri, 2018-08-31 21:15:01 [2d763ea253d18222015d124c48826425e75b83efedeedcc11b24cf8f0d7e8899]
pdi282-2018-09-30--21:15:01          Sun, 2018-09-30 21:15:01 [39932a0d8c081bc05f9cdff54637e3962fd9e622edce8ef64160e79ae767541f]
pdi282-2018-10-31--21:15:01          Wed, 2018-10-31 21:15:02 [49386980b5270554c6c92b8397809736dea5d07c7ccb3861187a6ed5065ba7a6]
pdi282-2018-11-18--21:15:01          Sun, 2018-11-18 21:15:02 [c2eb215ce883fa5a0800a9d4b9a6c53ac82ace48151180e6a15e944dbf65e009]
pdi282-2018-11-25--21:15:01          Sun, 2018-11-25 21:15:01 [e99c2f3baed4a863b08551605eb8ebeaa5ed6a02decccdb88268c89a9b9b9cc0]
pdi282-2018-11-30--21:15:01          Fri, 2018-11-30 21:15:01 [882f6466adcbc43d7e1a12df5f38ecc9b257a436143b00711aa37e16a4dbf54d]
pdi282-2018-12-02--21:15:02          Sun, 2018-12-02 21:15:02 [7436da61af62faf21ca3f6aeb38f536ec5f1a4241e2d17c9f67271c3ba76c188]
pdi282-2018-12-09--21:15:01          Sun, 2018-12-09 21:15:02 [82e6c845601c1a12266b0b675dfeaee44cd4ab6f33dafa981f901a3e84567bbb]
pdi282-2018-12-13--21:15:01          Thu, 2018-12-13 21:15:01 [9ac3dfd4aca2e56df8927c7bc676cd476ea249f4dd2c1c39fc2a4997e0ada896]
pdi282-2018-12-14--21:15:02          Fri, 2018-12-14 21:15:02 [c8c1358f58dae6eb28bd66e9b49c7cfe237720de21214ebd99cc4b4964ec9249]
pdi282-2018-12-15--21:15:01          Sat, 2018-12-15 21:15:01 [e24d3b26dcdf81d0b0899085fb992c7a7d33d16671fba7a2c9ef1215bd3ae8fb]
pdi282-2018-12-16--21:15:01          Sun, 2018-12-16 21:15:01 [27a8a6943f1053d106ced8d40848eccbfb6c145d80d5e2a9e92f891ed98778ce]
pdi282-2018-12-17--21:15:02          Mon, 2018-12-17 21:15:02 [14118ea958387e0a606e9f627182e521b92b4e2c2dd9fb5387660b84a08971a6]
pdi282-2018-12-18--21:15:01          Tue, 2018-12-18 21:15:01 [842c3f7e301de89944d8edf7483956aff2b7cf9e15b64b327f476464825bd250]
pdi282-2018-12-19--21:15:01          Wed, 2018-12-19 21:15:01 [b7f99c56a8e6ee14559b3eddec04646c8a756515765db562c35b8fbefcd4e58e]
pdi282-2018-12-20--15:15:01          Thu, 2018-12-20 15:15:01 [e832afd41762a69cb8c5fe1c14395dde313dc4368871fd27073fdc50e9f7c6c9]
pdi282-2018-12-20--16:15:01          Thu, 2018-12-20 16:15:01 [8471ccb87d513604d31320ff91c2e0aaf0d31e5ff908ff41b8653c55ee11c1e5]
pdi282-2018-12-20--17:15:01          Thu, 2018-12-20 17:15:01 [73a3ae72815a10732fc495317a7e0f8cd9d05eb2ea862f8c01b437138ac82103]
pdi282-2018-12-20--18:15:01          Thu, 2018-12-20 18:15:01 [7eced8e18b52d00300c8f1b17e188fbfc1124dc60adf68ef2924425677615a96]
pdi282-2018-12-20--19:15:01          Thu, 2018-12-20 19:15:01 [6b7dbc4095b704209921424a52ed37d854b3a61c49cc65ac6889d215aad95a6f]
pdi282-2018-12-20--20:15:01          Thu, 2018-12-20 20:15:01 [66da0f57d6c93b149a9fdf679acf5e43fc22ce6b582db4da3ab606df741bdf82]
pdi282-2018-12-20--21:15:01          Thu, 2018-12-20 21:15:01 [1fce9aa4751be905a45ccce7fca3d44be3cf580d5e4b7c4f5091167099df57ad]
pdi282-2018-12-21--07:15:01          Fri, 2018-12-21 07:15:02 [ee551653a18d400719f9ffe1a67787326f5d5dad41be7d7b5482d5610ed86d43]
pdi282-2018-12-21--08:15:01          Fri, 2018-12-21 08:15:01 [264d7ce1dab3bc1578b521a170ee944598fa99f894d6ca273793ad14824b1689]
pdi282-2018-12-21--09:15:01          Fri, 2018-12-21 09:15:01 [b37de3616438e83c7184af57080690db3a76de521e77fd1ae6e90262f6beb1cc]
pdi282-2018-12-21--10:15:01          Fri, 2018-12-21 10:15:01 [6862d0136b2e4ac7fc0544eb74c0085e7baceca7147bd59b13cd68cbf00cb089]
pdi282-2018-12-21--11:15:01          Fri, 2018-12-21 11:15:01 [e5c6ee4ea65d6dacb34badb850353da87f9d5c19bb42e4fb3b951efecd58e64f]
pdi282-2018-12-21--12:15:01          Fri, 2018-12-21 12:15:01 [5b93f864b9422ed953c1aabb5b1b98ce9ae04fe2f584c05e91b87213082e2ff0]
pdi282-2018-12-21--13:15:01          Fri, 2018-12-21 13:15:01 [461f976422c45a7d10d38d1db097abd30a4885181ec7ea2086d05f0afd9169eb]
pdi282-2018-12-21--14:15:01          Fri, 2018-12-21 14:15:01 [d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8]

Here's a screenshot of nemo running on my notebook with an sftp connection to my office desktop, after having mounted the available backups with the command given above.

../images/backup_list.png

The age of (digital) decline

About a decade ago, AppleInsider presented an enthusiastic report on the latest innovation from the iPhone inventor:

Apple is dramatically rethinking how applications organize their documents on iPad, leaving behind the jumbled file system [...].

Outside of savvy computer users, the idea of opening a file by searching through hierarchical paths in the file system is a bit of a mystery.

Apple has already taken some steps to hide complexity in the file system in Mac OS X, [...] the iPhone similarly abstracts away the file system entirely; there is no concept of opening or saving files.

I remember reading that and being very sceptical, perhaps as one of a few, but not the only one:

Heck, my 6 year old daughter can understand the idea of saving some files to a folder with her name on it, and others to different locations.

Another lone voice in the digital wilderness:

While this might sound like some kind of user experience utopia, I have a grave concern that eliminating a file system in this manner misses a huge audience. Us.

Now, almost 10 years later, we begin to pay the price for this development. How's that? Well, my experience shows that users who grew up with iOS or Android as their prime computing environment have difficulties to grasp the basic paradigms that still dominate professional work with computers. In particular, only few young users seem to understand the concept of a file (yes, a file, but see above), file types, and file systems. Even less understood is the client-server model, a concept that is indispensable in a modern IT infrastructure.

Consequences of this erosion of knowledge range from the comical to the disastrous. As an example for the former: when I ask for original data I do not mean ASCII data embedded in an MS Word document and a photo- or micrograph embedded in a Powerpoint presentation. However, many young users do not know that a 'data.dat' or a 'photograph.tiff' are valid file formats that can be viewed and edited by suitable applications. A secretary at an associated university had the opposite tendency: she wrote invitations for seminars with Word, printed them, scanned the printout with 1200 dpi, and attached the resulting 100 MB bitmap to electronic invitations sent by e-mail.

That's funny if your e-mail account has no size limit. But even if, you may see that this development has also much less amusing consequences. On a very general level, these users are incapable to appropriately interact with professional IT infrastructures, including common desktop environments (regardless of their provenience). More specifically, users with these deficiencies should not be trusted with handling and managing important data at all. Because ... they will lose them.

At least that's what's happening here: the number of users who experience a total loss of their data increased rapidly over the past few years. In most cases, the cause was not negligence and carelessness, but an alarming level of ignorance. Often, the root cause arose simply from bypassing the infrastructure we provide, and employing the private notebook for data analysis and presentation instead of the dedicated office desktop. Now, our employees can bring their own devices if they like, but if they don't register them with our IT staff, they will be classified as guest devices that have no access to our intranet – with the rather obvious result that the data on these devices cannot be synced to the home directory of the respective user on our file server (which is part of a daily incremental backup on tape, covering every day over the last 10 years as legally required).

The users, naturally, don't find that obvious at all (although they have been informed at length about these facts). They claim to have acted in the firm believe that the data on their private notebook will be automatically backed up to “the cloud” as soon as they enter a certain “zone” around their working place. When I asked how they imagined this miracle backup would work, one of them referred to Apple commercials in which photographs were transferred from an iPhone to an iPad “magically”. “That's the state-of-the-art, right? I expected that it would be implemented here!” She also said that she imagined the mechanism to work wirelessly, but that she wouldn't care how it worked, as long as it did.

Now, when people approach me with these stories, they want (i) forgiveness and understanding and (ii) an immediate solution. Well...

../images/mccoy-doctor-not-magician.svg

Fortunately, we now have first level support consisting of an invariably cheery youth who finds these problems most entertaining. Let's see what he says in a few years from now, when pampering the “digital natives” has become the next big thing.

And let's see where we are then, with our big hopes and high flying dreams of, for example, artificial intelligence and quantum computing, autonomous electric mobility, populating the Mars, establishing controlled fusion on Earth, and controlling the world's climate. Personally, when I see the present generation of which the majority has difficulties to count to three, well, you know, I'm not all that optimistic. ;)

Living in your own private Idaho

Golem published an interview with Enigmail developer Patrick Brunschwig (the following quotes were translated by DeepL):

“Then I took a look at the different tools that existed. I didn't like Evolution that much and Mozilla was still in its infancy. Nevertheless, we used it.”

All mail clients suck. This one just sucks less. ;)

“What's really important is that PGP is integrated directly into the tools. At K-9 I have to install Openkeychain first, at Thunderbird I have to install Enigmail and then GnuPG. Just this step, to have PGP directly, without addon and without additional software installation in the tool, would simplify a lot of things.”

Well, I had PGP “directly” for the past 15 years. It's a matter of choosing the right tool for the job. ;) I never had to install GnuPG manually (in fact, it's a core package in many distributions: the package managers depend on it since the packages are PGP signed!), and both KMail and Evolution support PGP right out of the box. Mutt as well, by the way. ;) The lack of a native support is only one reason to avoid Thunderbird and its shaky PGP integration via an addon. There are others that are more basic in nature.

"Which more modern mail clients are there? There aren't that many that are still widespread. It's true that Thunderbird isn't very modern, but there are very, very few really modern mail clients. I don't know of any real alternative."

That's either ill informed or deliberately misleading. It's ok to dislike programs on a subjective basis, but one shouldn't entirely lose an objective stance. Of course there are alternatives to Thunderbird (and with native PGP support), for all operating systems. Windows: eM Client, EverDesk, and of course, The Bat! (the best E-Mail program ever, period). MacOS: Canary Mail. And Linux: KMail, Evolution.

Admittedly, the latter two are not exactly “modern”, although I'm not sure what that is supposed to mean. If “modern” refers to the interface, there's Geary, which was initially released 2012 and has recently become quite popular, having been adopted by the Gnome project and being the default mailer in ElementaryOS. Geary features a modern interface and a modern feature set: it does neither support POP3 nor PGP. Even more modern is Mailspring, the successor of Nylas Mail: it's based on Electron and thus weighs 280 MB, its interface is très chic and themable, and it requires a Customer ID to synchronize mails over the different devices modern people have. It doesn't need to bother with PGP, as there's no need for that in the modern, post-privacy world.

But I'm being sarcastic, and that's not entirely fair, since there are also developments on the other end of the spectrum. The icelandic Mailpile, for example, has privacy as its top priority, and invented an entirely new concept for that: a local webmailer. I would actually consider to use it on my notebook wouldn't the interface be the most dilettantishly designed one I have seen since Geocities.

Yay!

Veteran users of Archlinux scoff at AUR helpers, tools that automate certain tasks for using the Arch User Repository (AUR). If you listen to them, they all install packages from the AUR in the canonical way, i.e., via 'makepkg -si'. In reality, of course, they write their own scripts for interacting with the AUR. ;)

I've used yaourt from the start until it became clear that it has serious security issues. At that point I switched to pacaur, which was declared to be orphaned in December 2017.

Well, when looking at the table displayed in the AUR helper page, it's clear that there are a few promising candidates for a future AUR helper on my desktop. I finally opted for yay since I like the name and appreciate the support for all three shells I'm using. But what I like best is the system summary:

➜  ~ yay -Ps
Yay version v8.1115
===========================================
Total installed packages: 1821
Total foreign installed packages: 81
Explicitly installed packages: 510
Total Size occupied by packages: 12.1 GiB
===========================================
Ten biggest packages:
texlive-fontsextra: 1.0 GiB
texlive-core: 384.7 MiB
libreoffice-fresh: 370.1 MiB
linux-firmware: 297.6 MiB
atom: 244.3 MiB
languagetool: 229.6 MiB
nvidia-utils: 183.1 MiB
jre10-openjdk-headless: 169.6 MiB
chromium: 164.6 MiB
gravit-designer-bin: 159.8 MiB
===========================================
:: Querying AUR...
 -> Orphaned AUR Packages:  ipython-ipyparallel
 -> Out Of Date AUR Packages:  hyperspy  netselect  pacserve  scidavis  ttf-vista-fonts

The fifth biggest package in this list is atom, an editor. 244 MB for an editor? Well, there are very few applications that I'm using as much as an editor, and I'm thus constantly evaluating potentially promising newcomers in the editor/IDE category. Is atom worth the ¼ GB of disk space? You'll get the answer in the comprehensive editor shootout that I will post later this year. ;)

A one-liner to upgrade your virtualenvs

This blog is powered by Nikola, a Python-based static blog compiler, which I've installed in a Python virtualenv to separate it from the system-wide Python installation of my notebook. Updates of the major version of Python (like from 3.6 to 3.7) inevitably break these virtualenvs, and I have so far accepted that there's no other way to get them back than to rebuild them from scratch. In fact, that's what you get to hear even from experienced Linux developers.

The recent update to Python 3.7 brought that topic back to my attention, and I kind of lost my patience. I just couldn't accept that there shouldn't be a better way, and indeed found a solution for those using the venv module of Python 3.3+:

python -m venv --upgrade <path_of_existing_venv>

Despite the fact that I'm using virtualenv instead of venv, this command worked exactly as I had hoped. ☺

The virtualenv can now be updated as usual. Well, almost – both pip and pip-tools got a lot more conservative and actually have to be told explicitly that they really should upgrade to the latest version. For a particular package, that looks like this:

pip install --upgrade Nikola --upgrade-strategy=eager

A rather weird behavior, if you ask me, but what do I know. ☺

Back to our virtualenv. To really, genuinely and truly update all requirements, the following sequence of commands is necessary:

pip install --upgrade setuptools
pip install --upgrade pip
pip install --upgrade pip-tools
pip-compile --rebuild --upgrade --output-file requirements.txt requirements.in
pip-sync requirements.txt

The update to Nikola 8.0.0 broke my old theme (based on bootstrap2), and it was about time: too many things were not working as desired. I'm now using an essentially unmodified bootblog4, as before with Kreon as the main font, and Muli for the headlines (from 15.12.2018: Oswald for the latter).

Security headers

A few simple provisions were sufficient to make this blog GDPR compliant. Even less was required to get a high rating regarding security. As a first step, I've obtained certificates from the Let's Encrypt initiative and configured Hiawatha (our web server) accordingly:

VirtualHost {
        ...
        TLScertFile = /etc/hiawatha/tls/pdes-net.org.pem
        RequireTLS = yes, 31536000; includeSubDomains; preload
        ...
}

This configuration got an A+ rating from Qualys SSL labs.

However, there's more to the security of a website than transport encryption. For example, the Content Security Policy “provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website”. There are actually a number of these security headers, and after some research I came up with the following settings:

VirtualHost {
        ...
        CustomHeader = Vary: Accept-Encoding
        CustomHeaderClient = X-Frame-Options: sameorigin
        CustomHeaderClient = X-XSS-Protection: 1; mode=block
        CustomHeaderClient = X-Content-Type-Options: nosniff
        CustomHeaderClient = X-Robots-Tag: none
        CustomHeaderClient = X-Permitted-Cross-Domain-Policies: none
        CustomHeaderClient = Referrer-Policy: same-origin
        CustomHeaderClient = Expect-CT: enforce; max-age=3600
        CustomHeaderClient = Content-Security-Policy: child-src 'self'; connect-src *; default-src 'self'; img-src 'self' data: chrome-extension-resource:; font-src 'self' data:; object-src 'self'; media-src 'self' data:; manifest-src 'self'; script-src 'self' 'nonce-...' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; base-uri 'none'
        CustomHeaderClient = Feature-Policy: geolocation 'none'; midi 'none'; notifications 'none'; push 'none'; sync-xhr 'none'; microphone 'none'; camera 'none'; magnetometer 'none'; gyroscope 'none'; speaker 'none'; vibrate 'none'; fullscreen 'self'; payment 'none';
        ...
}

The CSP proved to be tricky since Chromium did not display images in SVG format with stricter settings. A post of April King finally provided the missing piece of the puzzle.

Mike Kuketz listed a number of online scanners evaluating the implementation of security policies on arbitrary web sites. One of the most informative one of these tools is Mozilla's observatory, developed by the same April whose post had helped me with the CSP. ☺

../images/observatory.png

The office suites disaster

I was recently involved in the preparation of a project proposal, i.e, the attempt to obtain external funding for a certain research project. This particular project is a joint initiative of two academic institutions, and after having agreed that we would like to collaborate, a practical question arose. How should we prepare the documents required for the proposal?

The youngest of us proposed to use Google Docs (“it's so convenient”), but this proposal just received hems and haws from all others. My colleague then remarked, with an apologetic smile, that I would work exclusively with LaTeX. Our benjamin cheerfully chimed in and suggested Authorea. Once again, his proposal was met with limited enthusiasm. I gleefully added CoCalc as another LaTeX-based collaborative cloud service and briefly enjoyed the embarrassed silence that followed. After a few seconds I hastened to add that I, being foreseeably only responsible for a minor part of the proposal, would agree to anything with which the majority would feel comfortable with (which, in hindsight, was entirely besides the point).

In the end, we decided to use a Microsoft Sharepoint server run by our partner institution. I didn't connect to it myself, but was told by my colleague that the comments and tracking features didn't work correctly. And so we ended up with good ol' (cough) Microsoft Word as the least common denominator.

Well, I have LibreOffice (LO) on all of my systems, which should do for a simple document like the one we would create. To be on the safe side, I've ordered a license for Softmaker Office (SMO) which is praised for being largely compatible with MS Office (MSO), and which I planned to test anyway in my function as IT strategy officer (muhaha). Besides, in the office I have Microsoft Word 2007 available in a virtual machine running Windows 7. So what could go wrong?

At the very beginning, our document contained only text with minimal formatting, and was displayed with only insignificant differences on MSO and LO. After a few iterations, comments and the track of changes became longer than the actual text. That's when the problems started.

We were in the middle of an intense discussion when I noticed that, from my view, it just didn't seem to make any sense. My coworkers were discussing item (1) of an enumerated list, but their arguments revolved around an entirely different subject. When the discussion moved to item (2), I suddenly understood the origin of this confusion. LO, apparently triggered by a deletion just prior to the list, had removed the first entry, and my list thus only consisted of items (2)–(7), but LO enumerated them as (1)–(6). Gnarf.

Since my SMO order had not yet been approved (it's no problem to order MSO licences for €50,000, but an unknown product for €49.99 stirs up the entire administration), I decided to open and edit the proposal by MSO 2007. Only to discover that my colleagues, using MSO 2013 and 2016, talked about a paragraph that simply didn't exist in my version of the document. The excessive tracking also broke compatibility to MSO 2007. I wasn't too surprised, but at this point I really looked forward to SMO.

When the license finally arrived, we were busy with creating figures. A simple sketch created by our youngster in Powerpoint 2016 looked more like surreal modern art in LO, and not much better in MSO 2007. But in SMO, it was missing altogether. What the hell?

It turned out that the much acclaimed SMO has serious deficiencies with vector graphics, particularly under Linux. This flaw, together with the missing formula editor, makes SMO basically unusable in a scientific environment. That's a pity, since SMO is responsive and has a flexible and intuitive user interface.

Anyway, with the deadline approaching rapidly, my colleagues started to work on the proposal over the weekend at home. It turned out that none of them had MSO at home, so they all used LO. Now everyone's printed version deviated in one way or the other from the master copy on ownCloud, and we had to carefully compare the content, line by line, in the subsequent discussion.

Miserable pathetic fools! Why on earth didn't we agree at the outset of this endeavour to use LO? We could have agreed even on the same exact version to use. Our work on the proposal would have been easier and much more efficient. And instead of creating figures the hard way with Powerpoint, we could have used a full-scale vector graphics suite such as Inkscape, since recent versions of LO support the svg format. Equations could have been handled with TeXMaths, which allows users to insert any kind of math losslessly in an LO document. Here's a slide composed exclusively of vector graphic elements:

../images/vector_graphics_in_libreoffice.svg

No, I did not suddenly become an enthusiastic user and advocate of LO. On the contrary, I still firmly believe that for the task at hand, LaTeX would have been most appropriate and convenient solution. However, LO would have obviously been a much more rational choice than MSO.

When I said that aloud, my colleagues gave me this 'oh-my-god-now-he's-going-mad' look. Normal people accept LO for personal use, but not for a professional one. Why? Well, they have this deeply internalized believe that what you get is what you pay for. Free software is for amateurs and nerds, but as a professional, one uses professional software! Oh wait ...

Pagespeed

After having removed all external resources from this blog, I was curious to see how well it fares in terms of performance. See for yourself:

I was pleasantly surprised, since all I did otherwise to improve performance was to add a few file types to those that are subject to gzip compression, and a few more to those types that the client should cache for some time. To get a well performing website is actually not as terribly difficult as I previously believed (Ich habe eine Blogsoftware in C gehackt, mit dietlibc, libowfat, unter gatling laufend und mit einem tinyldap-Backend.)