A couple of weeks ago, a monstrous set of login data was passed around in the dark web. Now known as collections #1 – #5, the files contain an estimated 2.2 billion of unique email address/password combinations. And that means cracked passwords, mind you, not only hashes.

Some of my account data have been leaked ages ago by Adobe and Dropbox (at that time, I believed in simple – 8-digit – passwords for test accounts). Incidentally, I had already deleted these accounts when the breach became public.

Currently, all of my accounts are secured by an at least 25-digit random password with an actual entropy not lower than 100 bits, which is essentially impossible to brute force even from unsalted SHA1 hashes. Hence, none of my accounts should show up in the collections mentioned above. If one did, it would show that the password had been saved in plain text, and I would react correspondingly by deleting this account.

The most comprehensive list of password hashes (including collection #1) has been assembled by Troy Hunt. We can download and search this list very easily as shown in the following.

$cd /hdd/Downloads/pwned$ wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v4.7z
$cp pwned-passwords-sha1-ordered-by-hash-v4.7z /ssd/temp/pwned$ cd /ssd/temp/pwned/
$unp ppwned-passwords-sha1-ordered-by-hash-v4.7z$ wc -l <pwned-passwords-sha1-ordered-by-hash-v4.txt
551509767
$awk -F: '{ SUM +=$2 } END { print SUM }' pwned-passwords-sha1-ordered-by-hash-v4.txt
3344070078


551 million passwords, and 3.344 billion accounts: holy cow. What's the fastest way to search for a single string in such a huge file? C't 5/2019 has actually an entire article on this issue, complete with a github repository.

Pina Merkert points out that grep, the standard tool for searching text files under Linux, does not exploit the fact that Troy Hunt offers the download also in a version where the passwords are ordered by hash (that's the version I've downloaded above). The linear search performed by grep is indeed rather slow:

$time echo -n "111111" | sha1sum | awk '{print toupper($1)}' | grep -f - pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m48.227s
user    0m30.939s
sys     0m5.752s


The possibility to order the data allows a binary search to be performed, which is potentially orders of magnitude faster – O(log[n]) vs. O(n) – than a linear search. Pina's python script is indeed dramatically faster:

$time ./binary_search.py '111111' Searching for hash 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D of password "111111". Password found at byte 5824044773: "3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220" Your password "111111" was in 3093220 leaks or hacked databases! Please change it immediately. real 0m0.042s user 0m0.035s sys 0m0.007s  But my one-liner is four times faster than her 57-line script: ;) $ time look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

real    0m0.011s
user    0m0.009s
sys     0m0.005s


'Look', by the way, is a binary search tool, part of the util-linux package and thus present on any Linux installation (and also on the Windows subsystem for Linux, I guess). Unfortunately, several distributions (such as Debian and Ubuntu) manage to ship a broken version of look since about 10 years. There's an unofficial patched version available on GitHub if you'd like to try.

With the help of 'look', one can also very easily go through lists of passwords – let's take these completely arbitrary examples:

less topten.dat

123456
123456789
qwerty
111111
12345678
abc123
1234567
12345

$time while read -r password; do look$(echo -n "$password" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt; done <topten.dat

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079