Automated LaTeX

We've got several tools which intend to automate the generation of complex documents by LaTeX. Without these tools, the generation of a publication ready for submission would require several compiler passes to resolve all references for building, for example, the index and the bibliography. This thread illustrates the problem and highlights some of the most popular of these tools.

For my modest requirements, I've found that rubber is perhaps the most convenient choice. Rubber seems not to be in active development any more, but does what it should in the most unobtrusive way (and support for XeLaTeX is easy to add).

For the command line, I use a simple makefile:

TEXFILE = $(wildcard *.tex)
PDFFILE = $(TEXFILE:.tex=.pdf)
VIEWER = mupdf

all: pdf

pdf: $(PDFFILE)

%.pdf: %.tex
    @rubber --pdf $<
clean:
    @rubber --clean $(TEXFILE:.tex=)
tidy:
    @rubber --clean --pdf $(TEXFILE:.tex=)
view: pdf
    $(VIEWER) $(PDFFILE)

.PHONY: pdf clean tidy all

For vim, I use TeX-PDF (and LaTeX-Box for command completion). TeX-PDF respects my makefile, if one is present, or uses rubber or even plain pdflatex as a fallback.

On my notebook with ArchBang, however, TeX-PDF opened Gimp for viewing the pdf, which is a wonderfully absurd choice.

To correct that, I defined mupdf as the default application for opening pdf files:

xdg-mime default mupdf.desktop application/pdf,

but that didn't change a thing.

I finally found out that this behavior was caused by a glitch in the default configuration of ArchBang:

$ grep DE ~/.xinitrc
export DE=openbox

Openbox is not one of the possible values of the environment variable DE, and is thus not recognized by xdg-open. One should either substitute 'openbox' by 'xfce', or comment out the entire command.

Having fun with classical ciphers

19.12.2013
I've written most of what follows three years ago, but I've never found the time to finish and post it. As usual this close to Christmas, I have a few days off, and took the opportunity to delve once again into the fascinating world of classical ciphers. :cheesey:

27.12.2010
The millenia-old war between code-makers and code-breakers has essentially stopped in the last century: modern ciphers are so strong that they just can't be broken in any humanly accessible time-span. Diffie, Hellmann, and Merkle solved the key exchange problem, Phil Zimmermann gave encryption to the masses, and the GnuPG project transferred Zimmermanns 'Pretty Good Privacy' to the open source world.

End of story. Or is it?

Well, for most practical applications, that's basically it. We can trust that either of the following commands

gpg -ca --passphrase-repeat 0 --cipher-algo aes256 <file>
gpg -sea -r cobra <file>

results in an encrypted file whose content will be inaccessible to even the most nosy of our sniffing and snooping friends. Unless, of course, they revert to alternative ways of collecting information. :P

People had this particular problem since the beginning of time and invented steganography. Nowadays, one can can easily hide and encrypt information in one and the same command:

steghide embed -e rijndael-256 -z 9 -cf <coverfile> -ef <embeddedfile>; -sf <combinedfile>

All right, so where's the catch?

My itch with modern ciphers is not a practical one. I rather find their mathematical foundation abstract and dry, although I can digest most of it when I try really hard. However, you can't possibly utilize modern encryption without a computer.

Ancient ciphers, in contrast, are often intuitive and easy to master by paper and pencil. And what's more, ancient ciphers are surrounded by a mystical and glorious aura. Just think of the Enigma!

When I was a kid, Diffie and Hellman as well as Rivest, Shamir and Adleman still chased attractive female students across the campus. There was no modern cryptography, just an afterglow from the war. There were only a handful of computers, and hardly any of them in public possession. Being a nerdy kid, I became member of the local chess club with age eight. People there talked a lot about chess, but also about encryption, and the atmosphere turned out to be infectious. I adored Edgar Allan Poes Gold Bug as well as Arthur Conan Doyles Adventure of the Dancing Men. I was thrilled to death by Vigenère's Le chiffre indéchiffrable (French for 'the indecipherable cipher'). And I would have died for book ciphers. The unbreakable type, you know.

And now over Christmas, I thought, aw what the hell ... let's just play with all that's fun! Being fascinated by the Vigenère cipher, I sat down and implemented it using the only programming language I've ever learned: Pascal. And since we don't have to fill the _tabula recta _ourselves anymore, I've generalized the classical scheme to the entire 95 letter alphabet of the ASCII printable characters. That took me an entire day, but my Pascal courses took place almost 30 years ago. 😞

A Vigenère cipher can be as strong as desired. If the key is as long as the message, and truly random, it constitutes an example of a one-time pad (OTP), the only cipher which is information-theoretically proven to be unbreakable. Despite of this fact, OTPs have been rarely used in the past since key distribution proved to be an insurmountable problem.

Quantum key distribution has solved this problem in principle, but is not (yet) for everyone. Still, if the number of people on a communication channel is limited, the distribution of an OTP is easier than ever. Remember than one A4 page of text roughly corresponds to 3 kB. The 2 GB of an average USB stick would thus suffice for 700,000 of such pages, which should be enough for even the most communicative individuals. 😉

True randomness, however, is an requirement not easily obtained at all, since only natural phenomena such as atmospheric noise or radioactive decay are truly random. A computer, instead, is a deterministic beast and can only generate pseudo-random numbers. The "best" algorithms for doing that are called 'cryptographically secure pseudo-random number generators' (CSPRNG), but even these are predictable to a certain degree. I consider the one I'll use in the following to be a very solid contender, but not necessarily the best (for which Blum-Blum-Shub generators may qualify, though they are slow, and there's no readily available open-source implementation anyway).

How do we judge the quality of a CSPRNG? One of the most popular battery of tests are compiled in the dieharder suite, a successor of George Marsaglia's famous diehard tests. This suite interfaces to the continuous stream of random numbers of the generator, here csrng, like that:

csprng-generate | dieharder -g 200 -a

Two tests were indicated as "weak", but that turned out to be a result of the limited statistics with the default settings of dieharder.

We can also examine a finite list of random numbers created by such a generator. An infinite perfectly random sequence would exhibit an entropy per character equal to S=log2(k), where k is the character space. We will come back to that later.

A Vigenère cipher used in conjunction with a pad generated by a CSPRNG is an example for a running key cipher, but one could equally well call it a pseudo one-time pad (POTP). Depending on the cryptographic quality of the CSPRNG used, it may be trivial or essentially impossible to break. It will never be an information-theoretical secure system such as the OTP, but it may approach a level of security similar to that offered by modern cryptographic techniques.

My little Pascal routine in conjunction with the POTP created by csrng is perfectly capable of encrypting pure ASCII data, but can't handle anything beyond such as extended ASCII, UTF-8 or binary data. To improve on that, I preprocess the 'plaintext' (which may also be an image or an executable) with base91 to obtain a well-defined character set regardless of the file type.

Finally, I need to cut a section of the POTP serving as the key for each respective message, and to subsequently shred this section so it cannot be used again. All of that is easily done by a short shell script. Let's examine the essential ingredients of this script (in pseudo script code, for the actual one, see below):

csprng-generate -n 4G | tr -cd '[:graph:]' | fold -w 65 > POTP

Creates a stream of random numbers with a total volume of 4 GB. The binary stream is filtered through 'tr', letting pass only printable ASCII characters. These characters are then arranged in lines of 65 characters length and saved as our POTP. On my desktop, this command takes about 35 s to complete.

Let's now look at the encryption part (decryption is largely the same, just the order is different):

b91enc -w 65 PLAINTEXT -o ENCODEDTEXT

Encodes the PLAINTEXT with base91, and ensures that the ENCODEDTEXT has the same format as the POTP or the key.

tail -n NUMBER POPT > KEY

Generates the key (NUMBER is determined by the length of the ENCODEDTEXT, see the shell script for details).

vig -e ENCODEDTEXT KEY > ENCRYPTEDTEXT

Encrypts.

truncate -s KEY POTP

Removes the KEY from the POTP.

All of that is pretty straightforward except for the strange fact that I get the KEY from the bottom of the file rather than from the top, which would be the intuitive choice for a 'pad'. The reason for this choice is very simple: removing the key from the bottom takes 1 ms, but from the top several seconds depending on the speed of the hard disk. This asymmetry is an inevitable consequence of the file system's way to store data. With the present 'backward' scheme, the total time for the steps 1. — 4. amounts to 30 (450) ms for an A4 text page (a 5MB pdf).

Let's examine the files created by these steps using John Walker's entropy estimator ent. For the 95 characters used here, a perfectly random and infinite sequence of numbers is expected to have an entropy of 6.570 bit per character. For our POTP as well as for the plaintext encrypted with it, ent arrives at an entropy of 6.569 bits per character.

That's pretty satisfactory, but let's have a closer look. Deviations from randomness are often plainly evident in a visual inspection of the data in question. For example, we can examine the distribution of characters in our POTP using

ent -c POTP | head -97 | tail -94 | awk '{print $3}'> hist.dat

Plotting these values yields a histogram of the character distribution:

Looks almost entirely equally distributed, as we had hoped. However, in a finite random sequence, fluctuations do occur which can be made visible by plotting the deviation of each value from the mean of the distribution. Besides the fact that the values are small (the standard deviation is indeed only 0.02%), these deviations do not form any obvious pattern, but look indeed random.

We can perform the same analysis for the data after being encrypted by the Vigenère routine:

The fluctuations are larger (standard deviation 0.4%) since the sample size is significantly smaller, but once again the data do not exhibit any obvious deviations from a random distribution.

To analyze these data, by the way, I've used only the ipython console (ipython --pylab) and a few pylab commands:

a=loadtxt('hist.dat')
n=arange(94)
bar(n,a)
bar(n,a-mean(a))
std(a)/mean(a)

For sake of clarity and documentation, I've attached an archive containing the shell script hfwcc and the Pascal routine vig.pas. For the former, I've disabled truncation as this feature forfeits local testing (it should only be used with two independent POTPs). For convenience, I've included a 64 bit binary of the latter compiled with fpc 2.6.2. The encoding tool base91, the random number generator csnrg and the analysis program ent are all available in the AUR (Arch User Repository). I'd expect you'll also get them for Debian, but users of other distributions will probably have to compile them themselves.

Not compatible with Firefox

The above message has indeed become less frequent compared to early versions of Firefox, but certain extensions are prone to break again and again. Particularly the most important one: Pentadactyl.

The Dactyls are strange people since they usually fix the problem within a couple of days, but then forget to change the maxVersion entry in the install.rdf. My usual routine looks then like that: I open a terminal, navigate to the folder containing the pentadactyl-nightly.xpi, and open it using vim (yes, vim can open zip archives directly).

vim pen〈TAB〉〈ENTER〉
/install〈ENTER〉〈ENTER〉
/maxV〈ENTER〉
wwlR
8           /* at least as high as the current version */
〈ESC〉
ZZ
ZZ

Dragging the repaired xpi into Firefox' add-on manager concludes this procedure which never takes me more than 30 s. There are other ways to deal with that issue, but I prefer this one.

PS: Right, I don't use Opera anymore.

Keeping Iceweasel up-to-date

I found it increasingly irritating that Crunchbang, being based on the current stable version of Debian (Wheezy), is afflicted with a stone-age Firefox or Iceweasel as the Debilians insist on calling it. I decided to again follow the recommended procedure from the Debian Mozilla page, but to no avail: apt would insist that I have the up-to-date version. Searching the web for fellow sufferers, I came across this thread on the Crunchbang forum. Jesus, I've never even looked at the preferences! And sure enough, the Waldorf repository had a pinning priority of 1001, meaning that it took absolute precedence no matter what. After decreasing this value to 990, I finally got what I wanted all the time: an up-to-date version of Firefox (okok, Iceweasel) with automated updates even on Debian stable.

It's really almost embarrassing, but I never even thought about these damned preferences. Well, that's what holidays are for. 😊

Flashcrash

I recently used Chromium with its up-to-date flash plugin (pepper) to play videos since the standard flashplayer (11.2) in Firefox kept crashing. I was tired of that, searched, and found that the settings of the flash plugin are defined in /etc/adobe/mms.cfg. I tested all permutations for these two critical options, and found that

OverrideGPUValidation=1
EnableLinuxHWVideoDecode=1

works just as well as the opposite, i.e., both options set to zero. In contrast, 0/1 and 1/0 both crash the plugin. Do I need to understand that? No. It only demonstrates that it's high time to get rid of flash. In the office, where I rarely watch videos 😉 , flash is already adequately substituted by shumway.

Chitty Chitty Bang Bang

New hardware always has the effect that everything else suddenly feels slow and outdated. After experiencing the ease and effortlessness with which my desktop handles everything I throw at it, my notebook (a Fujitsu Lifebook AH530) running OpenSuse 12.3 felt unresponsive and sluggish. Instead of updating to OpenSuse 13.1, I decided to install Archbang, a distribution I always wanted to run on one of my systems. Similar to Crunchbang which runs on my Mini, Archbang offers an Openbox powered desktop which is about as lightweight and snappy as it gets.

Archbang, however, is even more frugal and spartan than Crunchbang. For example, there's no out-of-the-box support for Bluetooth devices.

The Archlinux Wiki has detailed instructions as to the configuration of a Bluetooth mouse:

systemctl start bluetooth
systemctl enable bluetooth
bluetoothctl
 [bluetooth]# list
  Controller <cmac> BlueZ 5.5 [default]
 [bluetooth]# select <cmac>
 [bluetooth]# power on
 [bluetooth]# scan on
 [bluetooth]# devices
   Device <mmac> Name: Bluetooth Mouse
 [bluetooth]# trust <mmac>
 [bluetooth]# pairable on
 [bluetooth]# pair <mmac>
 [bluetooth]# connect <mmac>

After that, you only need to create a new udev rule to activate the mouse upon a reboot:

vim /etc/udev/rules.d/10-local.rules
# Set bluetooth power up
ACTION=="add", KERNEL=="hci0", RUN+="/usr/bin/hciconfig hci0 up"

All that works well until I suspend the session by closing the lid of the laptop. After that, the bluetooth mouse is inactive until I reboot the system.

Turns out that other people using Arch have the same problem. They even file bug reports. Well, what's more, people using other distributions have this problem too, and they fill out reports on the respective (Ubuntu, Fedora) platform as well.

There's even a fix, but for reasons I don't understand it didn't even make it into the recently released 3.12. What a bummer. Update: The fix has been included in 3.12.4. 😊

Apart from this, everything's just perfect. In the spirit of the distro, I try to use lightweight applications. For example, I use aarchup, guake and geany instead of yapan, kuake and texmaker or kile. Other than that, I just moved tint2 to the top and changed the conkyrc, but I even kept the wallpaper.

Desktop screenie

Image size

The image in my last entry was taken with Awesome Screenshot, a useful extension for Firefox. It offers some basic annotation tools and allows the user to save the edited image as portable network graphics (png). The resulting file had a size of 742 kB, too large for deploying it on the web since it would delay the initial page load due to its size even for fast connections.

Minimizing the size of an image file without compromising its quality can be a tricky business. In this particular case, for example, there's not much room for further loss-less compression. Whatever the format (png, jbig, plasma,...), the minimum size of the resulting file is about 540 kB, still too large for my purpose.

What about saving it in a lossy format, such as jpeg?

Issuing

convert eilsbrunn.png -quality 100% eilsbrunn.jpeg

results in a file of 325 kB size. Not bad, but we can do better.

Both pngquant and pngnq reduce the file size drastically (to 210 to 220 kB) by mapping the 24 bit colors to an 8 bit palette. There are no visual differences to the original, and this approach is thus usually my default way to reduce the file size of images for the web.

However, depending on the nature of the image, we may do better by converting it to the lossy jpeg format and accepting a slight (essentially invisible) loss of quality. Repeating the above call to convert without the explicit quality statement, for example, results in a 124 kB file (the defaults are usually sensible). Even jpgcrush can reduce this size further only by a few kB.

I thus usually follow this simple workflow for producing images for the web:

(a) If the image is a line graphics, I save it as svgz, which guarantees minimal file sizes while retaining the maximum possible quality. The workflow ends here when I want to use the image for publications.
(b) If the image contains both line and photographic elements, I check out both loss-less and lossy compression schemes:

pngquant original.png quantized.png
convert original.png converted.jpeg

I then simply look at the images and decide on the basis of visual impression and file size.

Update 02.07.20: Using the webp format, we can get even smaller files than with jpeg without introducing visible compression artefacts.

    convert eilsbrunn.png -quality 90% eilsbrunn_90.webp

results in a file of 100 kB size that is virtually indistiguishable from the original.

Wurstsalat

I've studied physics in Regensburg and Munich, which are cities in Bavaria, the land of beer and beer gardens. I soon learned to love the live-and-let-live philosophy characterizing the traditional beer gardens in these cities.

During the first years of my studies, I couldn't afford to live downtown. However, I was fortunate to find a place for me, the girl which'd chosen me and our two cats in Eilsbrunn, a scenic village situated in the black Laber valley 15 km west of Regensburg. And as it happened, our home (green arrow) was 200 m from one of the most picturesque beer gardens I have ever seen, the Röhrl Bräu garden:

map of Eilsbrunn

We went there often already in the morning, sans cat, but armed with the quantum mechanics script and enough determination for a lifetime. After a couple of beer (even the smallest one, called Halbe, is essentially a pint) the edges would soften somewhat, and we would have grand insights and great visions. With the chestnut trees protecting us from sunshine, rain and even thunderstorms, we would sit there for hours, drink our beer, eat some snacks, and watch tourists arriving in Bentleys with HH plates and ordering “small beers” with mild amusement.

We were superior. We were Bavarian. We just didn't speak the lingo.

But we ate what the natives ate, and liked it a lot. In fact, Obazda and Wurstsalat, being comparatively low-priced, were a major component of my dietary plan for many years of my student life. Particularly the latter, of which a thousand variations exist. I always liked the "Swiss" or "Elsass" variant with Emmental cheese best. The following recipe is a distillation of the best Wurstsalat recipes I've experienced in Bavarian beer gardens.

500 g Lyoner oder Regensburger
200 g Schweizer Emmentaler
5 Saure Gurken
5 Radieschen

Wurst in Stifte schneiden, Käse und Gurken in Würfel, Radieschen in Scheiben.

3 Schalotten, in Ringe geschnitten
5 EL feingehackten Schnittlauch
10 EL Weißweinessig
5 EL Rapsöl
2 TL Salz
2 TL Schwarzen Pfeffer
2 TL Senf
1 TL Zucker

Alles gut vermengen. 24 h kühl stellen.

Mit frischem Brot und einem kühlen Bier servieren. 😉

Word compatible

Working in a publicly funded research institute instead of a university has many advantages. The most obvious ones are a decent funding and the lack of any teaching obligations. Disadvantages, however, also exist. In particular, we have to prepare an annual report in which we present the main activities to our advisory board, guests and third-party funding agencies.

For the past 20 years, we have prepared and produced our annual report ourselves using LaTeX. The report grew in volume until reading and correcting all contributions became finally unmanageable. We thus decided last year to focus on our most important results. Furthermore, we outsourced the layout of the report to an external media agency to be able to concentrate on the content. We then supplied the content in the form of text files enriched with LaTeX directives and graphics files in postscript format.

The first report produced in this way was indeed quite presentable and left a favorable impression on most people. This year, however, we were told that the media agency "had severe difficulties with converting LaTeX", and were "urged to produce conventional word-compatible files."

"Conventional word-compatible files". There's a whole world of ignorance in this short statement. Evidently, the agency employs people with a good sense for color and arrangement, but no idea about technical issues.

Because "converting LaTeX" is actually straightforward:

pandoc -s source.tex -o result.docx

transforms a standard LaTeX file to OOXML.

Complicated equations may not be converted successfully in this way. In this case, it is better to export to OpenDocument:

pandoc -s source.tex -o result.odt

Now just mark those equations not converted and click on the π-icon of the LibreOffice extension TexMaths. Select svg as format. Then save as docx.

Existing vector graphics is best converted to bitmaps

pdftocairo -png -r 1200 image.pdf image.png

and then imported into the "word-compatible file".

There are no "severe difficulties". That's hipster nonsense.

24dd

All my online banking is done in a virtual machine in which I run a spartan and minimal Archlinux. No AUR there, no experiments, just a vanilla up-to-date Linux with a vanilla up-to-date browser. I've never had any problems, but today I was greeted by a bizarre message upon the required reboot after a kernel update. I first checked if the problem is caused by today's update:

Obviously, it is. Since 'startx' is just a shell script, I opened it in vim, and discovered that the shebang is preceded by 24 empty lines. A search for that revealed bugreports for Redhat and Archlinux. The comment of Allan McRae is noteworthy: "Using a C preprocessor as a fancy sed on a shell script falls in the category of not my problem." 😄

Anyway, a '24dd ZZ' solved the problem for me.

PS: 'pacman -Qo' only works for locally installed packages, and is thus analogous to 'dpkg-query' for Debian-based systems. 'pkgfile' is needed for nonlocal packages.