My take on the MySpace dump

About a year ago, a full MySpace data breach dump surfaced on the average-Joe Internet. This huge dump (15 GiB compressed) is very interesting because many user accounts have two different password hashes. The first hash is non-salted, and represents a lower-cased, striped to 10 characters, version of the user original password. The second hash, not always present, is salted, and represents the full original user password.
Hence, the dump content can be summarized by this :

id : email : id/username : sha1(strtolower(substr($pass, 0, 9))) : sha1($id . $pass)

It contains about 116.8 million unique unsalted sha1 hashes, and about 68.5 million salted sha1 hashes.

Of course, people who crack passwords will tell you that the unsalted hashes have no value, because then don't represent real user passwords. They are right. But when you crack those hashes you have a very interesting password candidate to crack the salted hashes. And this is very interesting!

After you cracked most of unsalted hashes, the question is: how do you proceed to crack their salted counterpart? Spoiler alert: hashcat on an Nvidia GTX 1080 is more than 200 times slower than John the Ripper on a single CPU core on this very particular job.

I'm a long time John the Ripper user (on CPU), and I'm pretty fan of it's intelligent design. Working on CPU requires wits and planing. And the more versatile your software is, the more efficient you can be. Hashcat sits on the other end of the spectrum: huge raw power thanks to GPU optimization. But it lacks the most sensible attack mode: "single".

Single mode works by computing password candidates from GECOS data like login, user name, email address, etc. So it makes sense to provide a full password file to JtR, instead of just naked hashes. These passwords metadata are very efficient when you want to create contextual password candidates.
The password retrieved from unsalted hash is more than a clue to retrieve its salted counterpart, in many case it's also the real user password. And when it's not, simple variations handled by mangling rules will do the trick.
You've probably guessed by now: I've created a file where password cracked from non-salted hashes are paired with the corresponding salted hash. The known password impersonate the user login, so that with proper tuning John the Ripper will try only this particular candidate against the corresponding salted hash.
Because of a bug in JtR, I was not able to use this attack on a huge file, I had to split it into small chucks. Nevertheless, I was able to retrieve 36708130 passwords in just 87 minutes. On a single CPU core.
In order to find those passwords with hashcat, I had to rely on a wordlist attack with on a GTX 1080. It took about 14 days to complete. No matter how fast your GPU is (about 1000 MH/s in that particular case), it will brainlessly try every single candidate on every single hash. Remember hashes are salted, so each one requires its own computation. If your file is 60M hashes long, then your GPU will only try 16.6 candidates per second (1000/60). It's very slow and inefficient.

Hashcat performance on a file containing 50% of total hashes.

Sometime, brain is better than raw power. Thank you John ;)

More on this topic:
https://hashes.org/forum/viewtopic.php?t=1715
http://cynosureprime.blogspot.fr/2016/07/myspace-hashes-length-10-and-beyond.html

Cognitive Overhead

My take on the MySpace dump

Laisser un commentaire