Representative surveys

Statistical surveys are a standard tool of sociology, and have been the subject of extensive research. In the hands of professionals, the results of these surveys can be surprisingly accurate. As a result, surveys have attained the status of the oracle of Delphi, and people fervently believe in them. Naturally, this development has made surveys an attractive tool for manipulating public opinion. The standard way to do this is to load the questions with a moral obligation. Don't you agree that the internet should be regulated to deprive extremists of their safe spaces online? No? Really, what a kind of person are you? Don't you ever think of the children?

However, unexpected results of surveys do not always have sinister reasons, but may instead simply reflect the incompetence of the inquirer. In particular, the most elementary rule for designing a survey is frequently forgotten: namely, that those taking part in the survey have to understand the questions. Sounds obvious, doesn't it? But is it?

An example: the recent news of heise online that only(!) 16% percent of all Germans encrypt their emails (Umfrage: Nur 16 Prozent der Deutschen verschlüsseln ihre E-Mails). This survey was conducted by Convios Consulting on behalf of United Internet (UI), one of the largest internet and mail providers in Germany.

UI claims that about 750,000 of their users have generated PGP key pairs. That's a very impressive number, particularly since according to UI, only 4.7 million keys “exist” worldwide. The UI users would thus account for 16% of all PGP keys. Doesn't that demonstrate very nicely that UI's encryption initiative introduced in August 2016 is highly successful?

Well, the whole reason for the survey was to create exactly this impression. I have no doubts that the numbers quoted above are correct, but what do they mean?

First of all, the number reported for the existing keys worldwide only accounts for the keys that have been deposited on key servers. Nobody can estimate how many keys have actually been generated or are in use. That's quite different in the case of the UI encryption scheme, which is based on mailvelope, and the storage of the public key of the user in a database located on a UI server. The number given above is thus the total number of UI customers with a PGP key, unless they use a separate key in a stand-alone MUA (of which I know two 😉).

There are about 40 million email users in Germany. According to UI, about half of them use GMX or WEB.DE, which seems reasonable as UI is reported to have close to 20 million customers. Now, let's suppose that all UI customers who have generated a key are also actually using it to encrypt their mail. In this unlikely case, 3.75% of all UI users would encrypt their mail, much less than the “only 16%” of their survey. Obviously, that must mean that 28.25% of the other 20 million email users in Germany, who are mostly customers of the German Telekom, Google, and Microsoft, encrypt their mail. Right?

Of course not. Try to ask arbitrary Gmail users if they encrypt their mail. 84% will look at you with with blank eyes, but 16% will recognize the word and confirm that they do, YES! Ask them afterwards if they know the difference between transport and end-to-end encryption. I guarantee that you will soon get tired of asking people because even after several hundreds you won't find a single one who can answer your second question...

How many people do encrypt their mails? I don't think there exist any bona fide surveys on that topic. I can only provide anecdotal evidence with very limited statistical significance. On the other hand, I've been a serious advocate of end-to-end-encryption since about 15 years. I've written tutorials and motivated many of my personal contacts to use end-to-end encryption in email and messenging. Well, some would say I forced them at gunpoint. But that would be exaggerated...

I have currently 49 personal contacts with public PGP keys, and 16 business contacts. That doesn't sound too bad, does it? However, 17 and 4 of these keys are expired, leaving 32 and 12, respectively. Subtracting keys whose pass phrases have been forgotten by their users or were otherwise disposed of leaves 19 and 7. Some of my contacts have passed away, are retired, or I've simply lost touch, leaving in the end 4 and 5 with which I can, in principle, exchange end-to-end encrypted mails. Actually, however, there are only three persons with whom I regularly exchange encrypted mails: my patent attorney at work and my fellow PdeS (that's why they have that label) in actual life.

Three out of 65 with an actively used key, but of how many without any clue what that even means? I don't want to spent time on the question of how I could count the number of unique addresses in my mail folders over the past few years. But obviously, this number would be in the several hundreds. In other words, the total percentage of people employing end-to end-encryption in my emails is way lower than 1%. And if I wouldn't be interested in these kind of things, and if I wouldn't be a scientist, this percentage would be exactly zero. Not 16%. Not 0.16%. Zero.

Can we find out how many people really encrypt their mails by a survey? Not really. If less than 100 ppm of all people encrypt (which is the number I find most plausible), we would need a mega-survey over 50000 people to include at least 5 people who actually do encrypt, and that's never going to happen. And don't let them tell you that the rules of statistics somehow don't apply there and representative surveys can answer all of these questions as if by magic. That's bullshit. All of it.