Quality journalism, the second

I'll keep this post in German since most of the links and the quotes are. Use DeepL to translate. ;)

Der Niedergang von Zeitungen und Zeitschriften macht auch vor Computerzeitschriften nicht halt – ganz im Gegenteil. Die c't is davon noch vergleichsweise wenig betroffen, was aber ausschließlich an ihren treuen Abonnenten liegt. Doch auch hier bröckelt es seit Jahren langsam, aber stetig. Ich habe seit mehr als 20 Jahren ein Abonnement der c't, die ich als Klolektüre auch auf keinen Fall vermissen möchte. Allerdings häufen sich in den letzten Jahren Fehler einer Art, die einem das Vergnügen nachhaltig vergällen. Wenn man jede Aussage hinterfragen muß, ist es einfacher, selbst zu recherchieren. Und zur reinen Unterhaltung kann ich auch Fix und Foxi lesen.

Mit der letzten Ausgabe ist mir der Kragen geplatzt, und ich habe mich dazu hinreißen lassen, einen Leserbrief einzusenden:

...

Es ist ja ein wirklich lobenswertes Ziel, den Leuten Lua- oder auch Python-Programmierung näher zu bringen, aber ich erwarte, daß zumindest erwähnt wird, daß es auch deutlich einfacher geht (wenn es denn so ist).

c't 7/2019, p. 158. „Trotz dieser Flexibilität stößt man irgendwann an Grenzen: So kann das Tool von Haus aus nicht die aktuelle Wetterlage bei wttr.in erfragen und anzeigen. Diese und weitere Funktionen lassen sich jedoch leicht über selbstgeklöppelte Lua-Skripte nachrüsten.”

Was immer auch „von Haus aus” bedeuten soll, Lua-Skripte braucht man nicht dafür.

Stündliche Abfrage des Wetters:

${execpi 3600 curl -s "wttr.in/Berlin?nT&lang=de" | head -n -2}

Stündliche Abfrage des Wetters in Farbe. ;)

${execpi 3600 curl -s "wttr.in/Berlin?n&lang=de" | ~/.config/conky/ansito | head -n -2}

Ansito: https://github.com/pawamoy/ansito

Und ganz ähnlich in

c't 5/2019, p. 42. „Der Grep-Befehl durchsucht die Datei auf einem Rechner mit Core i5 mit SSD in etwas mehr als einer Minute. Er nutzt aber nicht aus, dass die Datei bereits nach Hashes sortiert ist. In einer sortierten Liste kann man per binärer Suche viel schneller suchen. Eine selbst programmierte binäre Suche in Python braucht nur wenige Zeilen Code. Wir haben daher kurzerhand ein Skript entwickelt, das die Datenmassen in Rekordzeit durchforstet.”

Auch sehr schön, aber mit keinem Wort erwähnt, daß es unter Linux deutlich einfacher und etwa viermal schneller geht:

$ look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt

Himmel nochmal, das ist doch nicht so schwer. Ein Satz, der darauf hinweist, ist doch nicht zu viel verlangt. Oder doch?

...

Begleitet werden diese Eindrücke natürlich von der Entwicklung von heise online (ein von der c't prinzipell redaktionell unabhängiges Medium), das ich wie so viele langjährige c't-Abonennten als Online-Heimathafen betrachte. Neulich kam es dort zur Veröffentlichung eines Artikels einer Autorin aus der Ecke der genderfeministischen SJWs, der vor allem mit der kompletten Abwesenheit auch nur irgendeiner Kompetenz glänzt. Ein Zitat:

Während Entwickler stets bemüht sind, möglichst genau den Programmcode einzugeben und dabei keine Tippfehler zu machen, sind SozialwissenschaftlerInnen trainiert das "große Ganze" zu erkennen, die systemischen Zusammenhänge in der Welt zu überblicken.

Die mehr als 5000 Kommentare ließen keinen Zweifel daran übrig, daß es Heise hiermit geschafft hat, seine Kernklientel nachhaltig zu verärgern. :)

Wenige Tage später erreichte mich diese E-Mail vom „neuen Online-Service heise+”:

Sehr geehrter Herr Brandt, es freut uns sehr, dass Sie als 7 Leser unseren Qualitäts-Journalismus unterstützen.

Ich habe nicht nachgefragt, was „7 Leser” zu bedeuten hat. Ein solch eklatanter Fehler in einer Mail, die schätzungsweise an eine halbe Million Leute rausgeht, in einem Atemzug mit dem selbsternannten Merkmal des Qualitätsjournalismus' (nur echt mit Deppenbindestrich) zu nennen, ist schon recht frech. Ob sie wohl irgendwann mal merken, warum die Leute sie nicht mehr kaufen?

The fastest search, or: find your password

A couple of weeks ago, a monstrous set of login data was passed around in the dark web. Now known as collections #1 – #5, the files contain an estimated 2.2 billion of unique email address/password combinations. And that means cracked passwords, mind you, not only hashes.

Some of my account data have been leaked ages ago by Adobe and Dropbox (at that time, I believed in simple – 8-digit – passwords for test accounts). Incidentally, I had already deleted these accounts when the breach became public.

Currently, all of my accounts are secured by an at least 25-digit random password with an actual entropy not lower than 90 bytes, which is essentially impossible to brute force even from unsalted SHA1 hashes. Hence, none of my accounts should show up in the collections mentioned above. If one did, it would show that the password had been saved in plain text, and I would react correspondingly by deleting this account.

The most comprehensive list of password hashes (including collection #1) has been assembled by Troy Hunt. We can download and search this list very easily as shown in the following.

$ cd /hdd/Downloads/pwned
$ wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v4.7z
$ cp pwned-passwords-sha1-ordered-by-hash-v4.7z /ssd/temp/pwned
$ cd /ssd/temp/pwned/
$ unp ppwned-passwords-sha1-ordered-by-hash-v4.7z
$ wc -l <pwned-passwords-sha1-ordered-by-hash-v4.txt
551509767
$ awk -F: '{ SUM += $2 } END { print SUM }' pwned-passwords-sha1-ordered-by-hash-v4.txt
3344070078

551 million passwords, and 3.344 billion accounts: holy cow. What's the fastest way to search for a single string in such a huge file? C't 5/2019 has actually an entire article on this issue, complete with a github repository.

Pina Merkert points out that grep, the standard tool for searching text files under Linux, does not exploit the fact that Troy Hunt offers the download also in a version where the passwords are ordered by hash (that's the version I've downloaded above). The linear search performed by grep is indeed rather slow:

$ time echo -n "111111" | sha1sum | awk '{print toupper($1)}' | grep -f - pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m48.227s
user    0m30.939s
sys     0m5.752s

The possibility to order the data allows a binary search to be performed, which is potentially orders of magnitude faster – O(log[n]) vs. O(n) – than a linear search. Pina's python script is indeed dramatically faster:

$ time ./binary_search.py '111111'
Searching for hash 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D of password "111111".
Password found at byte  5824044773: "3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220"
Your password "111111" was in 3093220 leaks or hacked databases! Please change it immediately.

real    0m0.042s
user    0m0.035s
sys     0m0.007s

But my one-liner is four times faster than her 57-line script: ;)

$ time look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m0.011s
user    0m0.009s
sys     0m0.005s

'Look', by the way, is a binary search tool, part of the util-linux package and thus present on any Linux installation (and also on the Windows subsystem for Linux, I guess).

With the help of 'look', one can also very easily go through lists of passwords – let's take these completely arbitrary examples:

less topten.dat

123456
123456789
qwerty
password
111111
12345678
abc123
1234567
password1
12345

$ time while read -r password; do look $(echo -n "$password" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt; done <topten.dat

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079
6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
8CB2237D0679CA88DB6464EAC60DA96345513964:2333232

real    0m0.053s
user    0m0.048s
sys     0m0.023s

If you ever had a shred of doubt that humanity has a glorious future, look at these 10 ingenious passwords, which secure the 54 million accounts of our most brilliant minds!

Modern file compression

Unknown to most users, file compression silently works behind the scene. Updates for any operating system, for example, are compressed. That happens automatically and the user doesn't even need to know about it.

But sometimes, we have a choice. In Archlinux, for example, we can set the compression we'd like to use for packages created by makepkg (such as those installed over the AUR) – but how to chose between gz, bz2, xz, lrz, lzo, and z? And some backup software adds further options: Borg, for example, offers zlib, lzma, lz4, and zstd.

Most surprisingly, some of these algorithms have been developed only very recently: zstd comes from Facebook (2016), and there's brotli from Google (2015) and lzfse from Apple (2015). Why do these multi-billion-dollar companies develop compression algorithms? Because of the multi-billion dollars.

Instead of testing each of these algorithms yourself, you can use lzbench. It tests all open source algorithms of the lz family with the de facto standard file package in the compression business, the silesia suite.

Here are three examples geared toward high compression ratio, high speed compression, and high speed decompression:

High compression ratio (<25%)

➜  lzbench -c -ebrotli,11/xz,6,9/zstd,22 silesia.tar
lzbench 1.7.3 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9814 MB/s  9852 MB/s   211947520 100.00
brotli 2017-12-12 -11    0.48 MB/s   385 MB/s    51136654  24.13
xz 5.2.3 -6              2.30 MB/s    74 MB/s    48745306  23.00
zstd 1.3.3 -22           2.30 MB/s   600 MB/s    52845025  24.93

These are single core values. xz compression (but not decompression) profits from multithreading, while brotli and zstd do not.

High speed compression (for compression ratios <50%)

➜  lzbench -c -elz4/lzo1x silesia.tar
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9861 MB/s  9768 MB/s   211947520 100.00
lz4 1.8.0                 524 MB/s  2403 MB/s   100880800  47.60
lzo1x 2.09 -12            521 MB/s   738 MB/s   103238859  48.71

High speed decompression (> 2000 MB/s)

↪ lzbench -c -elz4/lizard,10/lzsse8,6 silesia.tar
Compressor name         Compress. Decompress. Compr. size  Ratio
memcpy                   9579 MB/s 10185 MB/s   211947520 100.00
lz4 1.8.0                 525 MB/s  2421 MB/s   100880800  47.60
lizard 1.0 -10            421 MB/s  2115 MB/s   103402971  48.79
lzsse8 2016-05-14 -6     8.25 MB/s  3359 MB/s    75469717  35.61

What do we learn from these benchmarks?

  1. If we want high compression reasonably fast, nothing beats xz. It's just perfect for what it's actually used by some (all?) Linux distributions: to distribute updates with acceptable computational resources over a channel with a very limited band width.
  2. If the distributor commands over virtually unlimited resources, and compression speed is thus not an issue, brotli and zstd are clearly superior to all other choices. That's how we would like to have our updates: small and fast to decompress.
  3. If size is not of primary importance, but compression speed is, lz4 and lzo are the champions.
  4. If decompression speed is essential, lzsse8 wins. This is a lesser known member of the lz family and not widely available, in contrast to lz4 which thus scores again.

Weather widget

To my great dismay, Yahoo announced earlier this year that their weather API will retire at January 3rd. And indeed, my little weather conky ceased to work at this day and thus reached its EOL. :(

That's really too bad. Common forecast sites are peppered with dozens of javascripts and take ages to open on my Mini or my Lifebook. Here's an example which I find disagreeably slow even on my desktop. Besides, I also liked my weather conky for its aesthetic merit – at least _I_ found it pleasing to look at.

Well well well. There’s no point in crying over spilled milk. What are the options? There's conkywx, of course, the script collection for true weather aficionados. I look at it every second year and always feel that it's a tad overwhelming, at least for me, both with regard to features and visuals (one, two, three).

Alternatively, I could rebuild my weather conky around a new API – perhaps even the new Yahoo API. But I don't like to register anywhere for a simple weather forecast (and certainly not at Yahoo), and also don't want to spent more than, say, 15 min in getting a new forecast.

And then I found wttr.in – the fastest weather forecast ever. It's accessible by browser or in the terminal as a curl-based service, can be configured on the fly, and its ASCII graphs breath nerdy charm.

It's a one-liner in conky:

${execpi 3600 curl -s "wttr.in/Berlin?nT&lang=de" | head -n -2}

for a narrow (?n) and black-and-white (?T) view and

${execpi 3600 curl -s "wttr.in/Berlin?n&lang=de" | ~/.config/conky/ansito | head -n -2}

in full color thanks to ansito, which translates the ANSI color codes to the corresponding ones on conky.

But do we actually need a desktop widget when the forecast is so readily accessible? One could simple have an extra tab open in a browser, for example. Or, one could have a tab reserved for wttr.in in a Guake style terminal. For convenience, one could define an alias (works for all major shells) that updates the weather forecast every hour:

alias wttr='watch -ct -n 3600 "curl -s wttr.in/Berlin?lang=de | head -n -2"'

Hey, I like that. :)

A simple automated backup scheme

I haven't talked about backups since more than two years, but as detailed in my previous post, I was recently most emphatically reminded that it isn't clear to most people that their data can vanish anytime. Anytime? Anytime. poof

I've detailed what I expect from a backup solution and what I've used over the years in several previous posts (see one, two, and three) and do not need to repeat that here. My current backup scheme is based on borg with support from rsync and ownCloud. The following chart (which looks way more complicated than it is) visualizes this scheme and is explained below.

../images/backup.svg.png

My home installation is on the left, separated from my office setup on the right by the dashed line in the center. The desktops (top) are equipped with an SSD containing the system and home partitions as well as an HDD for archiving, backup, and caching purposes. Active projects are stored in a dedicated ownCloud folder and synchronized between all clients, including my notebooks. The synchronization is done via the ownCloud server (the light gray box at the lower right) and is denoted by (1) in the chart. The server keeps a number of previous versions of each file, and in case a file is deleted, the latest version is kept, providing a kind of poor man's backup. But I'm using ownCloud to sync my active projects between my machines, not for its backup features.

The actual backup from the SSD to the HDD is denoted by (2) or by (2') for my notebooks (to the same HDD, but over WiFi), and is done by borg. The backup is then transferred to a NAS (the gray boxes at the bottom) by simple rsync scripts indicated by (3). All of that is orchestrated with the help of a few crontab entries – here's the one for my office system as an example:

$ crontab -l
15  7-21        *   *   *       $HOME/bin/backup.borg
30  21          *   *   *       source $HOME/.keychain/${HOSTNAME}-sh;  $HOME/bin/sync.backup
30  23          *   *   *       $HOME/bin/sync.vms
30  02          *   *   *       $HOME/bin/sync.archive

The first entry invokes my borg backup script (see below) every hour between 7:15 and 21:15 (I'm highly unlikely to work outside of this time interval). The second entry (see also below) takes care of transferring the entire backup folder on the HDD 15 min after the last backup to the NAS. Since my rsync script invokes ssh for transport, I use keychain to inform cron about the current values of the environment variables SSH_AUTH_SOCK and SSH_AGENT_PID. The third entry induces the transfer of any changed virtual machine to the internal HDD. And finally, the fourth entry syncs the archive on the internal HDD to an external one (3'). I do that since once a project is finished, the corresponding folder is moved out of the ownCloud folder to the archive, effectively taking it out from the daily backup. This way, the size of my ownCloud folder never increased beyond 3.5 GB over the past five years. Since the projects typically don't change anymore once they are in the archive, this step effectively just creates a copy of the archive folder.

What's not shown in the chart above: there's a final backup level involving the NAS. At home I do that manually (and thus much too infrequently) by rsyncing the NAS content to the HDDs I've collected over the years. ` ;) At the office, the NAS is backed up to tape every night automatically. The tapes are part of a tape library located in a different building (unfortunately, too close to survive a nuclear strike ;) ), and are kept ten years as legally required.

What does all of that mean when I prepare an important document such as a publication or a patent? Well, let's count. It doesn't matter where I start working: since the document is in my ownCloud folder, it is soon present on five different disks (two desktops and notebooks, and the ownCloud server). The backup and sync adds four more disks, and the final backup of the NAS results in two more copies (one on disk, one on tape). Altogether, within one day, my important document is automatically duplicated to ten different storage media (disks or tape) in three different locations. And when I continue working on this document the next days, my borg configuration (see below) keeps previous copies up to six month in the past (see cron mail below).

You're probably thinking that I'm a complete paranoid. 10 different storage media in 3 different locations! Crazy! Well, the way I do it, I get this kind of redundancy and the associated peace of mind for free. See for yourself:

(1) ownCloud

My employee runs an ownCloud server. I just need to install the client on all of my desktops and notebooks. If you are not as lucky: there are very affordable ownCloud or nextCloud (recommended) servers available in the interwebs.

(2) backup.borg

A simple shell script:

#!/bin/bash

#`https://github.com/borgbackup/borg <https://github.com/borgbackup/borg>`_
#`https://borgbackup.readthedocs.io/en/stable/index.html <https://borgbackup.readthedocs.io/en/stable/index.html>`_

ionice -c3 -p$$

repository="/bam/backup/attic" # directory backing up to
excludelist="/home/ob/bin/exclude_from_attic.txt"
hostname=$(echo $HOSTNAME)

notify-send "Starting backup"

          borg create --info --stats --compression lz4                  \
          $repository::$hostname-`date +%Y-%m-%d--%H:%M:%S`             \
          /home/ob                                                      \
          --exclude-from $excludelist                                   \
          --exclude-caches

notify-send "Backup complete"

          borg prune --info $repository --keep-within=1d --keep-daily=7 --keep-weekly=4 --keep-monthly=6

          borg list $repository

(3) sync.backup

A simple shell script:

#!/bin/bash

# `http://everythinglinux.org/rsync/ <http://everythinglinux.org/rsync/>`_
# `http://troy.jdmz.net/rsync/index.html <http://troy.jdmz.net/rsync/index.html>`_

ionice -c3 -p$$

RHOST=nas4711

BUSPATH=/bam/backup/attic
BUDPATH=/home/users/brandt/backup

nice -n +10 rsync -az -e 'ssh -l brandt' --stats --delete $BUSPATH $RHOST:$BUDPATH

The two other scripts in the crontab listing above are entirely analogous to the one above.

That's all. It's just these scripts and the associated crontab entries above, nothing more. And since (2) and (3) are managed by cron, I'm informed about the status of my backup every time one is performed. The list of entries you see in the mail below are the individual backups I could roll back to, or just copy individual files from after mounting the whole caboodle with 'borg mount -v /bam/backup/attic /bam/attic_mnt/' (see the screenshot below). You see how these backups are organized: hourly for the last 24h, daily for the last week, weekly for the past month, and monthly for the five months after.

From: "(Cron Daemon)" <ob@pdi282>
Subject: Cron <ob@pdi282> $HOME/bin/backup.borg

------------------------------------------------------------------------------
Archive name: pdi282-2018-12-21--14:15:01
Archive fingerprint: d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8
Time (start): Fri, 2018-12-21 14:15:01
Time (end):   Fri, 2018-12-21 14:15:21
Duration: 19.52 seconds
Number of files: 173314
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                                           Original size      Compressed size    Deduplicated size
This archive:               30.14 GB             22.17 GB              1.40 MB
All archives:              975.54 GB            715.02 GB             43.41 GB

                                           Unique chunks         Total chunks
Chunk index:                  216890              6892358
------------------------------------------------------------------------------
pdi282-2018-06-30--21:15:01          Sat, 2018-06-30 21:15:01 [113186c574898837d0fb11e6fb7b71f62b0a5422d71b627662aec0d2d6a0e0bf]
pdi282-2018-07-31--21:15:01          Tue, 2018-07-31 21:15:01 [8af0cccbab5645490fcec5e88576dad1a3fbbfd3d726a35e17851d7bec545958]
pdi282-2018-08-31--21:15:01          Fri, 2018-08-31 21:15:01 [2d763ea253d18222015d124c48826425e75b83efedeedcc11b24cf8f0d7e8899]
pdi282-2018-09-30--21:15:01          Sun, 2018-09-30 21:15:01 [39932a0d8c081bc05f9cdff54637e3962fd9e622edce8ef64160e79ae767541f]
pdi282-2018-10-31--21:15:01          Wed, 2018-10-31 21:15:02 [49386980b5270554c6c92b8397809736dea5d07c7ccb3861187a6ed5065ba7a6]
pdi282-2018-11-18--21:15:01          Sun, 2018-11-18 21:15:02 [c2eb215ce883fa5a0800a9d4b9a6c53ac82ace48151180e6a15e944dbf65e009]
pdi282-2018-11-25--21:15:01          Sun, 2018-11-25 21:15:01 [e99c2f3baed4a863b08551605eb8ebeaa5ed6a02decccdb88268c89a9b9b9cc0]
pdi282-2018-11-30--21:15:01          Fri, 2018-11-30 21:15:01 [882f6466adcbc43d7e1a12df5f38ecc9b257a436143b00711aa37e16a4dbf54d]
pdi282-2018-12-02--21:15:02          Sun, 2018-12-02 21:15:02 [7436da61af62faf21ca3f6aeb38f536ec5f1a4241e2d17c9f67271c3ba76c188]
pdi282-2018-12-09--21:15:01          Sun, 2018-12-09 21:15:02 [82e6c845601c1a12266b0b675dfeaee44cd4ab6f33dafa981f901a3e84567bbb]
pdi282-2018-12-13--21:15:01          Thu, 2018-12-13 21:15:01 [9ac3dfd4aca2e56df8927c7bc676cd476ea249f4dd2c1c39fc2a4997e0ada896]
pdi282-2018-12-14--21:15:02          Fri, 2018-12-14 21:15:02 [c8c1358f58dae6eb28bd66e9b49c7cfe237720de21214ebd99cc4b4964ec9249]
pdi282-2018-12-15--21:15:01          Sat, 2018-12-15 21:15:01 [e24d3b26dcdf81d0b0899085fb992c7a7d33d16671fba7a2c9ef1215bd3ae8fb]
pdi282-2018-12-16--21:15:01          Sun, 2018-12-16 21:15:01 [27a8a6943f1053d106ced8d40848eccbfb6c145d80d5e2a9e92f891ed98778ce]
pdi282-2018-12-17--21:15:02          Mon, 2018-12-17 21:15:02 [14118ea958387e0a606e9f627182e521b92b4e2c2dd9fb5387660b84a08971a6]
pdi282-2018-12-18--21:15:01          Tue, 2018-12-18 21:15:01 [842c3f7e301de89944d8edf7483956aff2b7cf9e15b64b327f476464825bd250]
pdi282-2018-12-19--21:15:01          Wed, 2018-12-19 21:15:01 [b7f99c56a8e6ee14559b3eddec04646c8a756515765db562c35b8fbefcd4e58e]
pdi282-2018-12-20--15:15:01          Thu, 2018-12-20 15:15:01 [e832afd41762a69cb8c5fe1c14395dde313dc4368871fd27073fdc50e9f7c6c9]
pdi282-2018-12-20--16:15:01          Thu, 2018-12-20 16:15:01 [8471ccb87d513604d31320ff91c2e0aaf0d31e5ff908ff41b8653c55ee11c1e5]
pdi282-2018-12-20--17:15:01          Thu, 2018-12-20 17:15:01 [73a3ae72815a10732fc495317a7e0f8cd9d05eb2ea862f8c01b437138ac82103]
pdi282-2018-12-20--18:15:01          Thu, 2018-12-20 18:15:01 [7eced8e18b52d00300c8f1b17e188fbfc1124dc60adf68ef2924425677615a96]
pdi282-2018-12-20--19:15:01          Thu, 2018-12-20 19:15:01 [6b7dbc4095b704209921424a52ed37d854b3a61c49cc65ac6889d215aad95a6f]
pdi282-2018-12-20--20:15:01          Thu, 2018-12-20 20:15:01 [66da0f57d6c93b149a9fdf679acf5e43fc22ce6b582db4da3ab606df741bdf82]
pdi282-2018-12-20--21:15:01          Thu, 2018-12-20 21:15:01 [1fce9aa4751be905a45ccce7fca3d44be3cf580d5e4b7c4f5091167099df57ad]
pdi282-2018-12-21--07:15:01          Fri, 2018-12-21 07:15:02 [ee551653a18d400719f9ffe1a67787326f5d5dad41be7d7b5482d5610ed86d43]
pdi282-2018-12-21--08:15:01          Fri, 2018-12-21 08:15:01 [264d7ce1dab3bc1578b521a170ee944598fa99f894d6ca273793ad14824b1689]
pdi282-2018-12-21--09:15:01          Fri, 2018-12-21 09:15:01 [b37de3616438e83c7184af57080690db3a76de521e77fd1ae6e90262f6beb1cc]
pdi282-2018-12-21--10:15:01          Fri, 2018-12-21 10:15:01 [6862d0136b2e4ac7fc0544eb74c0085e7baceca7147bd59b13cd68cbf00cb089]
pdi282-2018-12-21--11:15:01          Fri, 2018-12-21 11:15:01 [e5c6ee4ea65d6dacb34badb850353da87f9d5c19bb42e4fb3b951efecd58e64f]
pdi282-2018-12-21--12:15:01          Fri, 2018-12-21 12:15:01 [5b93f864b9422ed953c1aabb5b1b98ce9ae04fe2f584c05e91b87213082e2ff0]
pdi282-2018-12-21--13:15:01          Fri, 2018-12-21 13:15:01 [461f976422c45a7d10d38d1db097abd30a4885181ec7ea2086d05f0afd9169eb]
pdi282-2018-12-21--14:15:01          Fri, 2018-12-21 14:15:01 [d31db1cd8223ca084cc367deb62e440bfe2dfe4fd163aefc6b6294935f1877b8]

Here's a screenshot of nemo running on my notebook with an sftp connection to my office desktop, after having mounted the available backups with the command given above.

../images/backup_list.png

The age of (digital) decline

About a decade ago, AppleInsider presented an enthusiastic report on the latest innovation from the iPhone inventor:

Apple is dramatically rethinking how applications organize their documents on iPad, leaving behind the jumbled file system [...].

Outside of savvy computer users, the idea of opening a file by searching through hierarchical paths in the file system is a bit of a mystery.

Apple has already taken some steps to hide complexity in the file system in Mac OS X, [...] the iPhone similarly abstracts away the file system entirely; there is no concept of opening or saving files.

I remember reading that and being very sceptical, perhaps as one of a few, but not the only one:

Heck, my 6 year old daughter can understand the idea of saving some files to a folder with her name on it, and others to different locations.

Another lone voice in the digital wilderness:

While this might sound like some kind of user experience utopia, I have a grave concern that eliminating a file system in this manner misses a huge audience. Us.

Now, almost 10 years later, we begin to pay the price for this development. How's that? Well, my experience shows that users who grew up with iOS or Android as their prime computing environment have difficulties to grasp the basic paradigms that still dominate professional work with computers. In particular, only few young users seem to understand the concept of a file (yes, a file, but see above), file types, and file systems. Even less understood is the client-server model, a concept that is indispensable in a modern IT infrastructure.

Consequences of this erosion of knowledge range from the comical to the disastrous. As an example for the former: when I ask for original data I do not mean ASCII data embedded in an MS Word document and a photo- or micrograph embedded in a Powerpoint presentation. However, many young users do not know that a 'data.dat' or a 'photograph.tiff' are valid file formats that can be viewed and edited by suitable applications. A secretary at an associated university had the opposite tendency: she wrote invitations for seminars with Word, printed them, scanned the printout with 1200 dpi, and attached the resulting 100 MB bitmap to electronic invitations sent by e-mail.

That's funny if your e-mail account has no size limit. But even if, you may see that this development has also much less amusing consequences. On a very general level, these users are incapable to appropriately interact with professional IT infrastructures, including common desktop environments (regardless of their provenience). More specifically, users with these deficiencies should not be trusted with handling and managing important data at all. Because ... they will lose them.

At least that's what's happening here: the number of users who experience a total loss of their data increased rapidly over the past few years. In most cases, the cause was not negligence and carelessness, but an alarming level of ignorance. Often, the root cause arose simply from bypassing the infrastructure we provide, and employing the private notebook for data analysis and presentation instead of the dedicated office desktop. Now, our employees can bring their own devices if they like, but if they don't register them with our IT staff, they will be classified as guest devices that have no access to our intranet – with the rather obvious result that the data on these devices cannot be synced to the home directory of the respective user on our file server (which is part of a daily incremental backup on tape, covering every day over the last 10 years as legally required).

The users, naturally, don't find that obvious at all (although they have been informed at length about these facts). They claim to have acted in the firm believe that the data on their private notebook will be automatically backed up to “the cloud” as soon as they enter a certain “zone” around their working place. When I asked how they imagined this miracle backup would work, one of them referred to Apple commercials in which photographs were transferred from an iPhone to an iPad “magically”. “That's the state-of-the-art, right? I expected that it would be implemented here!” She also said that she imagined the mechanism to work wirelessly, but that she wouldn't care how it worked, as long as it did.

Now, when people approach me with these stories, they want (i) forgiveness and understanding and (ii) an immediate solution. Well...

../images/mccoy-doctor-not-magician.svg

Fortunately, we now have first level support consisting of an invariably cheery youth who finds these problems most entertaining. Let's see what he says in a few years from now, when pampering the “digital natives” has become the next big thing.

And let's see where we are then, with our big hopes and high flying dreams of, for example, artificial intelligence and quantum computing, autonomous electric mobility, populating the Mars, establishing controlled fusion on Earth, and controlling the world's climate. Personally, when I see the present generation of which the majority has difficulties to count to three, well, you know, I'm not all that optimistic. ;)

Living in your own private Idaho

Golem published an interview with Enigmail developer Patrick Brunschwig (the following quotes were translated by DeepL):

“Then I took a look at the different tools that existed. I didn't like Evolution that much and Mozilla was still in its infancy. Nevertheless, we used it.”

All mail clients suck. This one just sucks less. ;)

“What's really important is that PGP is integrated directly into the tools. At K-9 I have to install Openkeychain first, at Thunderbird I have to install Enigmail and then GnuPG. Just this step, to have PGP directly, without addon and without additional software installation in the tool, would simplify a lot of things.”

Well, I had PGP “directly” for the past 15 years. It's a matter of choosing the right tool for the job. ;) I never had to install GnuPG manually (in fact, it's a core package in many distributions: the package managers depend on it since the packages are PGP signed!), and both KMail and Evolution support PGP right out of the box. Mutt as well, by the way. ;) The lack of a native support is only one reason to avoid Thunderbird and its shaky PGP integration via an addon. There are others that are more basic in nature.

"Which more modern mail clients are there? There aren't that many that are still widespread. It's true that Thunderbird isn't very modern, but there are very, very few really modern mail clients. I don't know of any real alternative."

That's either ill informed or deliberately misleading. It's ok to dislike programs on a subjective basis, but one shouldn't entirely lose an objective stance. Of course there are alternatives to Thunderbird (and with native PGP support), for all operating systems. Windows: eM Client, EverDesk, and of course, The Bat! (the best E-Mail program ever, period). MacOS: Canary Mail. And Linux: KMail, Evolution.

Admittedly, the latter two are not exactly “modern”, although I'm not sure what that is supposed to mean. If “modern” refers to the interface, there's Geary, which was initially released 2012 and has recently become quite popular, having been adopted by the Gnome project and being the default mailer in ElementaryOS. Geary features a modern interface and a modern feature set: it does neither support POP3 nor PGP. Even more modern is Mailspring, the successor of Nylas Mail: it's based on Electron and thus weighs 280 MB, its interface is très chic and themable, and it requires a Customer ID to synchronize mails over the different devices modern people have. It doesn't need to bother with PGP, as there's no need for that in the modern, post-privacy world.

But I'm being sarcastic, and that's not entirely fair, since there are also developments on the other end of the spectrum. The icelandic Mailpile, for example, has privacy as its top priority, and invented an entirely new concept for that: a local webmailer. I would actually consider to use it on my notebook wouldn't the interface be the most dilettantishly designed one I have seen since Geocities.

Yay!

Veteran users of Archlinux scoff at AUR helpers, tools that automate certain tasks for using the Arch User Repository (AUR). If you listen to them, they all install packages from the AUR in the canonical way, i.e., via 'makepkg -si'. In reality, of course, they write their own scripts for interacting with the AUR. ;)

I've used yaourt from the start until it became clear that it has serious security issues. At that point I switched to pacaur, which was declared to be orphaned in December 2017.

Well, when looking at the table displayed in the AUR helper page, it's clear that there are a few promising candidates for a future AUR helper on my desktop. I finally opted for yay since I like the name and appreciate the support for all three shells I'm using. But what I like best is the system summary:

➜  ~ yay -Ps
Yay version v8.1115
===========================================
Total installed packages: 1821
Total foreign installed packages: 81
Explicitly installed packages: 510
Total Size occupied by packages: 12.1 GiB
===========================================
Ten biggest packages:
texlive-fontsextra: 1.0 GiB
texlive-core: 384.7 MiB
libreoffice-fresh: 370.1 MiB
linux-firmware: 297.6 MiB
atom: 244.3 MiB
languagetool: 229.6 MiB
nvidia-utils: 183.1 MiB
jre10-openjdk-headless: 169.6 MiB
chromium: 164.6 MiB
gravit-designer-bin: 159.8 MiB
===========================================
:: Querying AUR...
 -> Orphaned AUR Packages:  ipython-ipyparallel
 -> Out Of Date AUR Packages:  hyperspy  netselect  pacserve  scidavis  ttf-vista-fonts

The fifth biggest package in this list is atom, an editor. 244 MB for an editor? Well, there are very few applications that I'm using as much as an editor, and I'm thus constantly evaluating potentially promising newcomers in the editor/IDE category. Is atom worth the ¼ GB of disk space? You'll get the answer in the comprehensive editor shootout that I will post later this year. ;)

A one-liner to upgrade your virtualenvs

This blog is powered by Nikola, a Python-based static blog compiler, which I've installed in a Python virtualenv to separate it from the system-wide Python installation of my notebook. Updates of the major version of Python (like from 3.6 to 3.7) inevitably break these virtualenvs, and I have so far accepted that there's no other way to get them back than to rebuild them from scratch. In fact, that's what you get to hear even from experienced Linux developers.

The recent update to Python 3.7 brought that topic back to my attention, and I kind of lost my patience. I just couldn't accept that there shouldn't be a better way, and indeed found a solution for those using the venv module of Python 3.3+:

python -m venv --upgrade <path_of_existing_venv>

Despite the fact that I'm using virtualenv instead of venv, this command worked exactly as I had hoped. ☺

The virtualenv can now be updated as usual. Well, almost – both pip and pip-tools got a lot more conservative and actually have to be told explicitly that they really should upgrade to the latest version. For a particular package, that looks like this:

pip install --upgrade Nikola --upgrade-strategy=eager

A rather weird behavior, if you ask me, but what do I know. ☺

Back to our virtualenv. To really, genuinely and truly update all requirements, the following sequence of commands is necessary:

pip install --upgrade setuptools
pip install --upgrade pip
pip install --upgrade pip-tools
pip-compile --rebuild --upgrade --output-file requirements.txt requirements.in
pip-sync requirements.txt

The update to Nikola 8.0.0 broke my old theme (based on bootstrap2), and it was about time: too many things were not working as desired. I'm now using an essentially unmodified bootblog4, as before with Kreon as the main font, and Muli for the headlines (from 15.12.2018: Oswald for the latter).

Security headers

A few simple provisions were sufficient to make this blog GDPR compliant. Even less was required to get a high rating regarding security. As a first step, I've obtained certificates from the Let's Encrypt initiative and configured Hiawatha (our web server) accordingly:

VirtualHost {
        ...
        TLScertFile = /etc/hiawatha/tls/pdes-net.org.pem
        RequireTLS = yes, 31536000; includeSubDomains; preload
        ...
}

This configuration got an A+ rating from Qualys SSL labs.

However, there's more to the security of a website than transport encryption. For example, the Content Security Policy “provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website”. There are actually a number of these security headers, and after some research I came up with the following settings:

VirtualHost {
        ...
        CustomHeader = Vary: Accept-Encoding
        CustomHeaderClient = X-Frame-Options: sameorigin
        CustomHeaderClient = X-XSS-Protection: 1; mode=block
        CustomHeaderClient = X-Content-Type-Options: nosniff
        CustomHeaderClient = X-Robots-Tag: none
        CustomHeaderClient = X-Permitted-Cross-Domain-Policies: none
        CustomHeaderClient = Referrer-Policy: same-origin
        CustomHeaderClient = Expect-CT: enforce; max-age=3600
        CustomHeaderClient = Content-Security-Policy: child-src 'self'; connect-src *; default-src 'self'; img-src 'self' data: chrome-extension-resource:; font-src 'self' data:; object-src 'self'; media-src 'self' data:; manifest-src 'self'; script-src 'self' 'nonce-...' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; base-uri 'none'
        CustomHeaderClient = Feature-Policy: geolocation 'none'; midi 'none'; notifications 'none'; push 'none'; sync-xhr 'none'; microphone 'none'; camera 'none'; magnetometer 'none'; gyroscope 'none'; speaker 'none'; vibrate 'none'; fullscreen 'self'; payment 'none';
        ...
}

The CSP proved to be tricky since Chromium did not display images in SVG format with stricter settings. A post of April King finally provided the missing piece of the puzzle.

Mike Kuketz listed a number of online scanners evaluating the implementation of security policies on arbitrary web sites. One of the most informative one of these tools is Mozilla's observatory, developed by the same April whose post had helped me with the CSP. ☺

../images/observatory.png