The fastest search, or: find your password

A couple of weeks ago, a monstrous set of login data was passed around in the dark web. Now known as collections #1 – #5, the files contain an estimated 2.2 billion of unique email address/password combinations. And that means cracked passwords, mind you, not only hashes.

Some of my account data have been leaked ages ago by Adobe and Dropbox (at that time, I believed in simple – 8-digit – passwords for test accounts). Incidentally, I had already deleted these accounts when the breach became public.

Currently, all of my accounts are secured by an at least 25-digit random password with an actual entropy not lower than 90 bytes, which is essentially impossible to brute force even from unsalted SHA1 hashes. Hence, none of my accounts should show up in the collections mentioned above. If one did, it would show that the password had been saved in plain text, and I would react correspondingly by deleting this account.

The most comprehensive list of password hashes (including collection #1) has been assembled by Troy Hunt. We can download and search this list very easily as shown in the following.

$ cd /hdd/Downloads/pwned
$ wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v4.7z
$ cp pwned-passwords-sha1-ordered-by-hash-v4.7z /ssd/temp/pwned
$ cd /ssd/temp/pwned/
$ unp ppwned-passwords-sha1-ordered-by-hash-v4.7z
$ wc -l <pwned-passwords-sha1-ordered-by-hash-v4.txt
551509767
$ awk -F: '{ SUM += $2 } END { print SUM }' pwned-passwords-sha1-ordered-by-hash-v4.txt
3344070078

551 million passwords, and 3.344 billion accounts: holy cow. What's the fastest way to search for a single string in such a huge file? C't 5/2019 has actually an entire article on this issue, complete with a github repository.

Pina Merkert points out that grep, the standard tool for searching text files under Linux, does not exploit the fact that Troy Hunt offers the download also in a version where the passwords are ordered by hash (that's the version I've downloaded above). The linear search performed by grep is indeed rather slow:

$ time echo -n "111111" | sha1sum | awk '{print toupper($1)}' | grep -f - pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m48.227s
user    0m30.939s
sys     0m5.752s

The possibility to order the data allows a binary search to be performed, which is potentially orders of magnitude faster – O(log[n]) vs. O(n) – than a linear search. Pina's python script is indeed dramatically faster:

$ time ./binary_search.py '111111'
Searching for hash 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D of password "111111".
Password found at byte  5824044773: "3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220"
Your password "111111" was in 3093220 leaks or hacked databases! Please change it immediately.

real    0m0.042s
user    0m0.035s
sys     0m0.007s

But my one-liner is four times faster than her 57-line script: ;)

$ time look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m0.011s
user    0m0.009s
sys     0m0.005s

'Look', by the way, is a binary search tool, part of the util-linux package and thus present on any Linux installation (and also on the Windows subsystem for Linux, I guess).

With the help of 'look', one can also very easily go through lists of passwords – let's take these completely arbitrary examples:

less topten.dat

123456
123456789
qwerty
password
111111
12345678
abc123
1234567
password1
12345

$ time while read -r password; do look $(echo -n "$password" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt; done <topten.dat

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079
6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
8CB2237D0679CA88DB6464EAC60DA96345513964:2333232

real    0m0.053s
user    0m0.048s
sys     0m0.023s

If you ever had a shred of doubt that humanity has a glorious future, look at these 10 ingenious passwords, which secure the 54 million accounts of our most brilliant minds!