Unexpected

Contrary to common belief, scientists do not communicate via secret encrypted communication channels, but usually by the exchange of Powerpoint presentations via ordinary e-mail. :sh My Fedora 17 at work surprised me a few weeks ago by opening one of these (pptx) files not by LibreOffice Impress, but by okular. Apparently, the default file associations had been altered. I usually despise operating systems which take the initiative without having the proper authority, but this time, I actually have to be grateful: the file was displayed just as intended, in marked contrast to the faulty display by LibreOffice which I've tried immediately afterwards for comparison.

The actual backend for the handling of these files is the Calligra suite, the successor of Koffice. This office suite has evolved quite a bit since the times of KDE 3, and I would prefer it over LibreOffice anytime (if I'd use any office software at all). It's starting up fast, has a modern UI, and way better font rendering.

All right then. Let's say you get a docx from a collaborator, and decide to open and edit it using calligra-word. Everything's fine until you want to save: there's only odt (open document text) as an option. Well, that's an open standard, what can be wrong with it? Let's save it as that.

Now open it in LibreOffice and save it from there as docx. I guarantee that you'll find the result interesting.

These office suites are just so entirely pathetic. 😞

If you hope now to save space by installing the slim Calligra instead of the LibreOffice suite, this monolithic chunk of a software, I have to disappoint you: including all dependencies, the size of these suites does not differ much.

libreoffice:            844.14 MiB
calligra:           858.21 MiB

If you use KDE, of course, it's only a fraction of that space. In this case, I'd certainly recommend to install Calligra if only for the display of Powerpoint presentations.

On the Mini, I really need that space for TeX:

texlive-most:           980.29 MiB

On the other hand: the new SSD for the Mini has arrived yesterday. I'll report about its installation and all consequences in one of the next posts. 😉

Transcoding

Video transcoding has suddenly become an important criterion for me when judging CPU performance. The reasons are my latest hardware additions: a Google Nexus 7 and a Western Digital TV Live. The former is intended to allow us to access the multimedia content on our NAS from everywhere in our home, and the latter should deliver this content to our TV. Both of these gadgets are connected to the network wirelessly. Efficiently encoded videos stream smoothly, but DVDs tend to stutter.

Now, instead of breaking through walls to get GB ethernet cables everywhere, or planting dlan plugs in all sockets, we've decided to transcode all DVDs to a format which facilitates smooth streaming. As a transcoder, we use handbrake with the default settings for a matroska (mkv) container (video encoding by x264 and audio by 160 bit aac). This encoding typically results in file sizes of a quarter to a half of the original while retaining the (visual and aureal) quality.

My E6600 is crunching through the DVDs as fast as it can, but my wife's i750 beats the poor thing by a factor of three! Just wait, though, just wait ... 😄

Some movies, which presumably have been ripped poorly, have to processed file by file. For these instances, mkvmerge is useful, which is a part of the mkvtoolnix suite. The GUI version is truly helpful since it automatically determines sensible parameters for the merge.

Batch job

What I like about Mathematica is the combination of a stringent and economic code, a decent numerical performance, and the possibility to submit a job remotely via the command line.

Consider the following example:

batchfit

The 2nd line reads in all files having the extension "dat" contained in the directory defined in the 1st line. The array 'data' then contains all data sets as individual elements.

The 3rd line defines a fit function composed of three symmetric and one asymmetric Gaussian conveniently represented by the probability density of appropriate statical distributions. Initial values for all parameters are defined in in the 4th line.

The 5th line starts a loop (concluded in the 11th line) over all data sets. The 6th line defines a least square fit of the data. Note that the initial values are not given as constants, but in form of a declaration that will automatically take into account their change from data set to data set. Hence the 7th line.

In the 8th and 9th line we integrate over individual components of the spectrum. Line 10 defines the result as a list element. The complete list is defined in line 12, and exported in line 13.

My present spectra have 250 points, and the fit and integration for a single spectrum takes about 70 ms. Since I have usually just a few thousand of spectra, the whole business is done in minutes. Plotting the spectra and the fits, by the way, takes significantly longer. The following graph shows an example of such a fit.

batchfit-graph

One can, by the way, generate such graphs even when submitting the calculation from the command line by a "math -noprompt -run "< output &". Just remember to have an Exit[] as the last line in your Mathematica notebook. Then, convert it to InputForm, copy and paste it to you favorite editor, and save it as (for example) batchfit.m.

pacman and friends

Let's have a look at what occupies these 21 GB in my /dev/sda3.

In Arch Linux, one may use pacman directly to get this information, but there are at least two more convenient ways which lead essentially to the same results:

expac -H M '%m\t%n' | sort -h

or even shorter

pacgraph -c | sort -h

The result of the latter on my system is:

1198MB urbanterror
920MB xonotic
856MB nexuiz
514MB sauerbraten
496MB warsow
486MB texlive-fontsextra
399MB openarena
362MB freecad
280MB quake4
214MB fpc

The extra fonts for TeX is the only package I really need from this top ten of storage hogs.

For more visually oriented users, pacgraph offers an interesting view:

pacgraph  -b "#333333" -t "#FFFFFF" -d "#FFFFBB"

pacgraph

I looked closely and discovered mono. WTF do I need mono for?

[cobra@blackvelvet ~]# whoneeds mono
Packages that depend on [mono]
  nant
  sparkleshare

[cobra@blackvelvet ~]# pactree -r -d 1 mono
|--mono
   |--dbus-sharp
   |--gtk-sharp-2
   |--nant
   |--ndesk-dbus

Or graphically:

pacgraph -b "#333333" -t "#ffffff" -d "#ffffbb" -i "#ff0000" "#00ff00" "#0000ff" mono

whymono?

Ah, sparkleshare! By the way: did I actually install all optional dependencies for sparkleshare?

[cobra@blackvelvet Documents]$ pacman -T $(expac -Q '%o' sparkleshare)
python2-nautilus

Naturally not, as I don't use nautilus.

Sparkleshare is located on the AUR and needs to be compiled whenever a new version is released. That's not a problem for my future desktop, which will be distinguished by a very potent CPU. 😄

For a netbook like my Mini, however, the compilation of larger programs takes time and drains the battery. Let's see how many packages on my current desktop are from the AUR:

[cobra@blackvelvet Documents]$ pacman -Qqm | wc -l
64

Too many, if I'd plan to install my system on a netbook! I'm thus still hesitating to install Archbang on my Mini...

And what about the rest, i.e., packages explicitly installed via the official repositories?

[cobra@blackvelvet Documents]$ pacman -Qqe | grep -vx "$(pacman -Qqg base)" | grep -vx "$(pacman -Qqm)" | wc -l
272

The lists with these 336 packages which I have installed in addition to the base system will be very useful for a painless installation of my new system sometime at the end of this year.

Gibibytes and Tebibytes

After almost 6 years with my Core 2 Duo E6600, I'm seriously thinking about a replacement. I already have a pretty clear idea of most components, but I am still ambivalent about a few of them, particularly, the mass storage.

In terms of performance, the magnetic hard drives (HD) commonly used for mass storage are the bottleneck in today's computing. Booting the system, starting applications or reading/writing large files, running a sync or a backup: all of these everyday situations are usually limited by disk speed, not the processor or memory.

You can monitor the disk access by iotop, and limit it by ionice. Or you can buy a solid-state drive (SSD). These little miracles are usually much more responsive, but also more expensive than their mechanical counterparts. People thus commonly use a small SSD for the system and a large HD for their data. But how large and fast an SSD and how large an HD do I need for this strategy? And do I need the latter at all?

To answer these questions, I need an idea of my current storage use and my potential future requirements. I have partitioned my HD such as to simulate a system residing on a 60 GB SSD, and my home on a 230 GB HD. I have also generously installed all packages which could be of even the slightest interest for me, including all first-person shooters available on the ArchLinux repositories. 😉

[cobra@blackvelvet ~]$ df -h
Filesystem          Size    Used    Avail   Use%    Mounted on
/dev/sda1           130M    21M     103M    17%     /boot
/dev/sda3           63G 21G 39G 35% /
/dev/sda4           227G    106G    110G    49% /home

Despite having installed essentially all I can think of, the system only occupies 21 GB. Apparently, I will have no difficulties to find an SSD on which my system fits very comfortably. (My Windows 7 installation at work, in contrast, requires 27 GB after installing Origin, MS Word and Powerpoint. 27 GB, and rising ... 😲)

But what about the ridiculous 106 GB in my /home? Is that all? Well, it could actually be even much less, since more than 60 GB are virtual machines. And what else should I store there anyway? All multimedia content resides on the NAS. Current projects are synchronized between all of my machines using Wuala, but that rarely occupies more than 1 GB. Completed projects are archived and always accessible via an ssh connection to my office machine, and thus don't need to be stored at my home in addition.

In principle, a single 240 GB SSD would be more than sufficient to hold both the system and my data. But that feels like a rather radical solution to me. Right now, I tend to a dual SSD configuration, but also that's still subject to change. 😉

Not anyone's fault than yours

We now know that the tragic story of Mat Honan was made possible by a major blunder committed by Amazon and, foremost, Apple. Does that mean that Mat is the innocent victim, and Apple is to blame? Is it all Apple's 'fault'?

Technically? Come on. Morally? Yes, of course. But that's really nothing new. People with an IT background are certainly not surprised about any security shortfalls of Apple. On the contrary. Mac OS X, for example, is generally agreed upon to be the most vulnerable consumer operating system.

So why do people buy Apple's products? What's the reason for Apple's phenomenal success, and their almost hypnotic influence on the general public and the mass media?

Meet Clarke's third law: "Any sufficiently advanced technology is indistinguishable from magic." Nobody has lived this statement more than Steve Jobs, Apple's chief-magician who single-handedly succeeded to turn Apple from a computer producer into a factory of lifestyle gadgets with a must-have factor.

The key to this transformation is the element of magic, or better to say its celebration, in Apple's presentation of their products even if the technical background is as mundane as it can be. Take Apple's recent television commercial as illustration. A photograph of an ugly boy made by a girl with an iPhone is seen a moment after on the girl's iPad. Most people are simply entranced by this little sketch, many are delighted, and only a very few complicate things by asking how its actually done. I've interviewed a couple of hard-core Apple users, and none of them even had a clue. Nor did they care, as if it wouldn't matter.

But it really does, and that's the point at where we are back to Mat's participation in his own little tragedy. Consider, for illustration, the typical situation at home where several devices and appliances are connected to each other by a network. Needless to say, a wireless connection is generally much more convenient, and for some devices essential to function as intended (think of a tablet with a network cable). But this convenience has to be earned: wireless traffic must be encrypted, and better with algorithms judged to be secure. The consequences of ignorance can be severe.

This simple example demonstrates an immutable law of system administration: the easier for the users, the harder for the administrator. Qualified decisions about the systems' security while potentially opening them to access from the outside requires an understanding of the available counter-measurements and their deployment.

Apple tries to reduce the impact of this fact by cutting configuration options. They simply hope that the user, giving only option A or B, cannot do anything which will endanger the integrity of the system. To the user, they present the system as infallible. Everything is "direct", "simple", "magic", etc., nothing requires thought or consideration. Complexity or complications are unknown and unheard of.

This tendency of oversimplification and toyification is a general one and goes much beyond Apple. It represents a genuine paradigm shift. "Computer" in 2012 means consumption, not creation. Whether you have an Android or an iOS smartphone, a Windows 8 tablet or an iPad, or a Mac Book Air: you are basically positioned to consume media created by others. That's what these devices are made for, and the software is optimized for. Which, of course, does not change physical limitations, and the fact that a full HD movie looks better on a 50" plasma screen than on the retina display of the iPad 3. Basically, you are in the position of a kid getting a sneak-peek of a book from the public library. From the position of the kid, that's quite exciting.

Of course, the same paradigm shift is evident in the interwebs. Everything is social and cloudy. Sascha Lobo, Germany's most hated blogger, has recently provided a surprisingly intelligent (really) assessment of the current situation: "Your internet is only borrowed". I'm quoting literally.

"Daten auf sozialen Netzwerken müssen unter allen Umständen so behandelt werden, als könnten sie jederzeit verloren gehen. Denn sie können jederzeit verloren gehen. Trotzdem scheint die Welt likebegeistert anders zu handeln: All ihr digitales Schaffen findet im geborgten Internet statt. [...] Dabei kann man auf einem Blog machen, was man möchte. Ärgerlicherweise bedeutet das auch, dass man machen muss, was man möchte. Und dauernd möchten zu müssen ist recht energieaufwendig."

The experience of Mat Honan is more than a funny episode. It's a symptom. This blog entry will not change that, but marks the end of an era, and the start of a new one. I don't care much for the taste of the masses, and will just continue as before: with ArchLinux on my desktop and a Debian-powered server for social tendencies owned by me and friends.

Berlin, 35°C, signing off. 😉

Sunshine

Carsten read my hot sauce entry and decided to help: he went to the next supermarket and bought a popular hot sauce. Popular in Colombia, I should add. 😉

Here you see the 'Salsa de Ají Piquetasco' in company with its somewhat more vicious friends.

hot sauces

The sauce is not very hot (I'd guess about 5000 scoville¹), which makes it ideal for pizza. I had one yesterday night and used half of the bottle. Absolutely delicious! I never had a sauce which such a full and fruity taste. So I got curious and started to search.

Ají, I learned, is the caribbean word for chili in general, but likely refers here to a specific family of chilis, the capsicum baccatum, whose most popular species is the ají amarillo. In Europe, this family of chilis is much less known than anuum (which includes bell pepper, peperoncini, jalapeño, cayenne), chinense (bird's eye, habanero, bhut jolokia, ...), or frutescens (tabasco).

One finds a lot of praise of the ají amarillo in the web, but the article on Serious Eats describes this chili most vividly:

"Besides its phylogeny, ají amarillo is worth seeking out for its unique flavor, which offers a lot of fruitiness for its heat. It's a different kind of fruitiness from other chiles like poblanos: less sharp and harsh, more full-bodied, and a lot more subtle. If there were a chile to taste like sunshine, this would be it. It may sound odd to use the word "comforting" to describe a hot chile, but for ají amarillo, it seems fitting."

Exactly my impression.

Thanks so much for the great sauce and for widening my culinary horizon, Carsten. But...couldn't you've stayed in Bogotá a little longer?

¹ For comparison, the 'After Death' and 'Sudden death' sauces are rated at 50000 and 100000 scoville, respectively. Particularly the latter one is too hot to be sprinkled generously across a pizza.

Lobotomy

If my Mini would be stolen or lost, I wouldn't worry a bit about a stranger prying about my data since its home partition is encrypted. Now, this feature isn't reserved to users of Linux. Apple, for example, gave it the catchy name FileVault and integrated it into Mac OS X since 2003.

Despite of the existence of this reportedly easy-to-use disk encryption, the celebrated iCloud offers the much-welcomed feature to remotely wipe the storage of any device with a WiFi connection, including iPods, iPhones, iPads, and, well, most Macs in general. The remote wipe removes all personal data from the device and locks it down subsequently, rendering it useless for anyone not in possession of the secret code to reactivate it.

Now, listen to the story of Mat Honan, a former writer for the gadget blog Gizmodo. Apparently, somebody took over his iCloud account, and soon after Mat was watching helplessly when his iPhone, iPad, and Macbook Air were all remotely wiped within minutes from each other. But his nightmarish experience was only the beginning.

Mat, having the trust of a puppie, had connected his iCloud with his Google account, and the latter in turn with Twitter and God only knows what else. The 'hacker', as Mat repeatedly calls the intruder, took the opportunity, deleted the Google account, and posted profanities in the Twitter channel of his former employer Gizmodo.

Mat initially speculated that the 'hacker' brute-forced into his account as it was secured by an 'only' 7 digits alphanumeric password. Contrary to this naive conception, services such as iCloud do not facilitate direct brute force attacks since they lock down after a few unsuccessful attempts. Hacking the iCloud itself would be, well, much bigger news than taking over the account of a rather insignificant individual.

In fact, Mat now claims to know what has happened: "They got in via Apple tech support and some clever social engineering that let them bypass security questions." I seriously doubt that. A keylogger seems a much more simple and likely supposition. See update below.

Can we learn something from this incident? Well, Mat's Australian colleagues know the answer: "… use super-secure passwords … use insanely secure (and unique) master passwords …". Whow! I'm deeply impressed. And the illustration they have chosen for this article further underlines the impression I got from this assessment.

Has Mat, poor dumb fuck incarnated, learned anything from this incident? After all, it must have had hurt a lot: "Because I’m a jerk who doesn’t back up data, I’ve lost at more than a year’s worth of photos, emails, documents, and more." My guess is that his future iCloud password will be 8 digits long. Or even 9.

Update: Unbelievable as it may sound, the access code for the iCloud account really was the last 4 digits of Mat's credit card which the hacker got from Amazon. So, I was wrong, and the hacker followed good old traditions by employing social engineering instead of a key logger.

And what can we now learn from that? I'll comment on this point in a couple of days in one my next entries.

À la recherche du temps perdu

Time is the most valuable resource we have in our live. I thus try to minimize the amount of time I need for routine tasks, such as data processing, analysis and presentation. I'm not at all proud about the result, and I think that I can still improve a lot. Compared to many others, however, I'm a champion of efficiency.

For example, there's a very valued colleague who has Linux (well, Ubuntu) as a host system (no idea why) and Windows XP as a guest. He's using both in the same grandfatherly style: the left arm dangling down, the right one slowly shifting the mouse cursor around with his eyebrows drawn up and his mouth wide open.

He was shocked when an update brought Unity. Shocked! He messed around in the gconf settings, installed kde, reverted back, and managed that the file associations are now all wrong: when he's double(!)-clicking a *.txt, two Writer open instead of gedit, a *.pdf is opened by Karbon, a *.jpg by Krita, etc.

He's loudly complaining about that, day for day. I told him to repair his install or at least the file associations, which is a matter of at most 2 min. He found an alternative solution: he simply opens all these files in the virtual machine. Thus, to open a pdf, he's booting his virtual Windows XP, starts Total Commander (which involves explicitly agreeing to a license agreement since he—of course—did not copy the license file to the program folder), navigates (click-click, click-click) to the pdf, and double-clicks it. Acrobat Reader then loads all of its supplements and subsequently tries to load the file.

When the file is finally opened, I'm typically already 5000 km away.