My take on the MySpace dump

About a year ago, a full MySpace data breach dump surfaced on the average-Joe Internet. This huge dump (15 GiB compressed) is very interesting because many user accounts have two different password hashes. The first hash is non-salted, and represents a lower-cased, striped to 10 characters, version of the user original password. The second hash, not always present, is salted, and represents the full original user password.
Hence, the dump content can be summarized by this :

id : email : id/username : sha1(strtolower(substr($pass, 0, 9))) : sha1($id . $pass) 

It contains about 116.8 million unique unsalted sha1 hashes, and about 68.5 million salted sha1 hashes.

Of course, people who crack passwords will tell you that the unsalted hashes have no value, because then don't represent real user passwords. They are right. But when you crack those hashes you have a very interesting password candidate to crack the salted hashes. And this is very interesting!

After you cracked most of unsalted hashes, the question is: how do you proceed to crack their salted counterpart? Spoiler alert: hashcat on an Nvidia GTX 1080 is more than 200 times slower than John the Ripper on a single CPU core on this very particular job.

I'm a long time John the Ripper user (on CPU), and I'm pretty fan of it's intelligent design. Working on CPU requires wits and planing. And the more versatile your software is, the more efficient you can be. Hashcat sits on the other end of the spectrum: huge raw power thanks to GPU optimization. But it lacks the most sensible attack mode: "single".

Single mode works by computing password candidates from GECOS data like login, user name, email address, etc. So it makes sense to provide a full password file to JtR, instead of just naked hashes. These passwords metadata are very efficient when you want to create contextual password candidates.
The password retrieved from unsalted hash is more than a clue to retrieve its salted counterpart, in many case it's also the real user password. And when it's not, simple variations handled by mangling rules will do the trick.
You've probably guessed by now: I've created a file where password cracked from non-salted hashes are paired with the corresponding salted hash. The known password impersonate the user login, so that with proper tuning John the Ripper will try only this particular candidate against the corresponding salted hash.
Because of a bug in JtR, I was not able to use this attack on a huge file, I had to split it into small chucks. Nevertheless, I was able to retrieve 36708130 passwords in just 87 minutes. On a single CPU core.
In order to find those passwords with hashcat, I had to rely on a wordlist attack with on a GTX 1080. It took about 14 days to complete. No matter how fast your GPU is (about 1000 MH/s in that particular case), it will brainlessly try every single candidate on every single hash. Remember hashes are salted, so each one requires its own computation. If your file is 60M hashes long, then your GPU will only try 16.6 candidates per second (1000/60). It's very slow and inefficient.

Hashcat performance on a file containing 50% of total hashes.

Sometime, brain is better than raw power. Thank you John ;)

More on this topic:

Self-hosted password manager: installing Passbolt on FreeBSD

Arthur Duarte CC-BY-SA-4.0

Arthur Duarte CC-BY-SA-4.0

Password managers, or password safes, are an important thing these days. With the constant pressure we (IT people) put our users under to setup a different password for every single registration/application/web site, it's the best, if not only, way to keep track of these secrets. On one hand, the isolated client-side software can be really powerful and/or well integrated with the OS or the software ecosystem of the user, but it lacks the modern touch of "cloud" that makes your data available anywhere and anytime. On the other hand, a full commercial package will come with client for every device you own, and a monthly fee for cloud synchronization, but you have absolutely no control over your data (just imagine that tomorrow the company you rely on goes bankrupt).
Better safe than sorry: I don't rely on cloud services. It comes at a cost, but it's quite rewarding to show the world another way exists.
Disclaimer: I don't give a sh*t about smartphones, so my needs are computer-centric.

In order to store passwords, and more generally speaking "secrets", in such a way that I can access them anywhere/anytime, I've tried Passbolt. Passbolt is an OpenSource self-hosted password manager, written in PHP/Javascript with a database back end. Hence, install and config are not for the average Joe. On the user side it's quite clean and surprisingly stable for alpha software. So once a LAMP admin has finished installing the server part, any non-skilled user can register and start storing passwords.

Enough chit-chat, let's install.

My initial setup was a vanilla FreeBSD 10.3 install, so I've had to make everything. I won't replay every single step here, especially on the configuration side.


pkg install apache24
pkg install mod_php56
pkg install php56-gd
pkg install pecl-memcached
pkg install mysql57-server
pkg install pecl-gnupg
pkg install git
pkg install php56-pdo_mysql
pkg install sudo
pkg install php56-openssl
pkg install php56-ctype
pkg install php56-filter

Everything else should come as a dependency.


Apache must allow .htaccess, so you'll have to put an AllowOverride All somewhere in your configuration. You must also load the Rewrite module. Also, go now for SSL (letsencrypt is free and supported). Non-SSL install of Passbolt are for demo purpose only.
Apache will also need to execute gnupg commands, meaning the www user needs an extended $PATH. The Apache startup script provided on FreeBSD sources Apache environment variables from /usr/local/sbin/envvars and this very file sources every /usr/local/etc/apache24/envvars.d/*.env, so I've created mine:

$ cat /usr/local/etc/apache24/envvars.d/path.env

You also need to tune your MySQL server. If you choose the 5.7, you must edit it's configuration. Just add the following line into [mysqld] section of /usr/local/etc/mysql/my.cnf:


This is due to a bug in Passbolt and could be useless in a not to distant future.

Install recipe:

You can now follow the install recipe at
Generating the GPG key is quite straightforward but you have to keep in mind that Apache's user (www) will need access to the keyring. So if you create this key and keyring with a different user, you'll have to mv and chown -R www the full .gnupg directory somewhere www can read it (outside DocumentRoot is perfectly fine).

Use git to retrieve the application code into appropriate path (according to your Apache config):

cd /usr/local/www
git clone

Edit php files as per the documentation.

Beware the install script: make sure you chown -R www the whole passbolt directory before using cake install.
On FreeBSD you won't be able to use su to run the install script, because www's account is locked. You can use sudo instead:

sudo -u www app/Console/cake install --no-admin

Same for the admin account creation:

sudo -u www app/Console/cake passbolt register_user -u -f Pat -l Pro -r admin

Follow the end of the install doc, and you should be ok. Install the Firefox passbolt extension into your browser, and point to your server.

I'm pretty happy with passbolt so far. I'll have to install a proper production server, with SSL and all, but features are very appealing, the passbolt team is nice and responsive, and the roadmap is loaded with killing features. Yeah BRING ME 2FA \o/.

Cracking passwords: testing PCFG password guess generator

Cracking passwords is a kind of e-sport, really. There's competition among amateurs and professionals "players", tools, gear. There are secrets, home-made recipes, software helpers, etc.
One of this software is PCFG password guess generator, for "Probabilistic Context-Free Grammar". I won't explain the concept of PCFG, some scientific literature exists you can read to discover all the math inside.
PCFG password guess generator comes as two main python programs: and Basic mechanism is the following:
- you feed with enough known passwords to generate comprehensive rules describing the grammar of known passwords, and supposedly unknown passwords too.
- you run, using previously created grammar, to create millions of password candidates to feed into your favorite password cracker (John the Ripper, Hashcat…).

In order to measure PCFG password guess generator's efficiency I've made few tests. Here is my setup:

  • Huge password dump, 117205873 accounts with 61829207 unique Raw-SHA1 hashes;
  • John the Ripper, Bleeding Jumbo, downloaded 20160728, compiled on FreeBSD 10.x;
  • PCFG password guess generator, downloaded 20160801, launched with Python 3.x;

Here's my methodology:

Of these 61829207 hashes, about 35 millions are already cracked. I've extracted a random sample of 2 millions known passwords to feed the trainer. Then I've used to create a 10 millions lines word list. I've also trimmed the famous Rockyou list to it's 10 millions first lines, to provide a known reference.

Finally, I've launched this shell script:

for i in none wordlist jumbo; do
  ./john --wordlist=pcfg_crckr --rules=$i --session=pcfg_cracker-$i --pot=pcfg_cracker-$i.pot HugeDump
  ./john --wordlist=ry10m --rules=$i --session=ry10m-$i --pot=ry10m-$i.pot HugeDump

No forking, I'm running on one CPU core here. Each word list is tested three times, with no word mangling rules, with defaults JtR rules, and finally with Jumbo mangling rules.

Some results (number of cracked passwords):

Rules PCFG Rockyou
none 4409362 2774971
wordlist 5705502 5005889
Jumbo 21146209 22781889

That I can translate into efficiency, where efficiency is Cracked/WordlistLength as percentage:

Rules PCFG Rockyou
none 44.1% 27.7%
wordlist 57.1% 50.1%
Jumbo 211.5% 227.8%

It's quite interesting to see that the PCFG generated word list has a very good efficiency, compared to Rockyou list, when no rules are involved. That's to be expected, as PCFG password guess generator has been trained with a quite large sample of known passwords from the same dump I am attacking.
Also, the PCFG password guess generator creates candidates that are not very well suited for mangling, and only the jumbo set of rules achieves good results with this source. Rockyou on the other hand starts quite low with only 27.7% but jumps to 50.1% with common rules, and finally defeats PCFG when used with jumbo rules.

On the word list side, Rockyou is known and limited: it will never grow. But PCFG password guess generator looks like it can create an infinite list of candidates. Let see what happens when I create a list of +110 M candidates and feed them to JtR.

Rules PCFG Efficiency
none 9703571 8.8%
wordlist 10815243 9.8%

Efficiency plummets: only 9.7 M hashes cracked with a list of 110398024 candidates, and only 1.1 M more when the set of rules "wordlist" is applied. It's even less beneficial than with a list of 10 M candidates (+1.3 M with "wordlist" rules, compared to "none").

On the result side, both word list with jumbo rules yields to +21 M cracked passwords. But are those passwords identical, or different?

Rules Total unique cracked Yield
none 6013896 83.7%
wordlist 8184166 76.4%
Jumbo 26841735 61.1%
Yield = UniqueCracked / (PcfgCracked + RockyouCracked)

A high yield basically says that you should run both word lists into John. A yield of 50% means that all pwd cracked thanks to PCFG are identical to those cracked with the Rockyou list.

As a conclusion, I would say that the PCFG password guess generator is a very interesting tool, as it provides a way to generate valid candidates pretty easily. You probably still need a proper known passwords corpus to train it.
It's also very efficient with no rules at all, compared to the Rockyou list. That might make it a good tool for very slow hashes when you can't afford to try thousands of mangling rules on each candidate.

Some graphs to illustrate this post:

every john session on the same graph

every john session on the same graph

every session, zoomed on the first 2 minutes

every session, zoomed on the first 2 minutes

Rules "wordlist" on both lists of candidates

Rules "wordlist" on both lists of candidates

Rules "none", both lists of candidates

Rules "none", both lists of candidates