Moving to Borgbackup

I used to have a quite complicated backup setup, involving macOS Time Machine, rsync, shell scripts, ZFS snapshots, pefs, local disks, a server on the LAN, and a server 450 km away. It was working great but I've felt like I could use a unified system that I could share across every systems and that would allow me to encrypt data at rest.
Pure ZFS was a no-go: snapshot send/receive is very nice but it lacks encryption for data at rest (transfer is protected by SSH encryption) and macOS doesn't support ZFS. Rsync is portable but does not offer encryption either. Storing data in a pefs vault is complicated and works only on FreeBSD.
After a while, I've decided that I want to be able to store my encrypted data on any LAN/WAN device I own and somewhere on the cloud of a service provider. I've read about BorgBackup, checked its documentation, found a Borg repository hosting provider with a nice offer, and decided to give it a try.

This is how I've started to use Borg with hosting provider BorgBase.

Borg is quite simple, even though it does look complicated when you begin. BorgBase helps a lot, because you are guided all along from ssh key management to creation of your first backup. They will also help automating backups with a almost-ready-to-use borgmatic config file.

Borg is secure: it encrypts data before sending them over the wire. Everything travels inside an SSH tunnel. So it's perfectly safe to use Borg in order to send your backups away in the cloud. The remote end of the SSH tunnel must have Borg installed too.

Borg is (quite) fast: it compresses and dedup data before sending. Only the first backup is a full one, every other backup will send and store only changed files or part of files.

Borg is cross-plateform enough: it works on any recent/supported macOS/BSD/Linux.

Borg is not for the faint heart: it's still command line, it's ssh keys to manage, it's really not the average joe backup tool. As rsync.net puts it: "You're here because you're an expert".

In the end, the only thing I'm going to regret about my former home-made backup system was that I could just browse/access/read/retrieve the content of any file in a backup with just ssh, which was very handy. With Borg this ease of use is gone, I'll have to restore a file if I want to access it.

I won't detail every nuts and bolts of Borg, lots of documentation exists for that. I would like to address a more organizational problem: doing backups is a must, but being able to leverage those backups is often overlooked.
I backup 3 machines with borg: A (workstation), B (home server), C (distant server). I've setup borgmatic jobs to backup A, B and C once a day to BorgBase cloud. Each job uses a dedicated SSH key and user account, a dedicated Repository key, a dedicated passphrase. I've also created similar jobs to backup A on B, A on C, B on C (but not Beyoncé).
Once you are confident that every important piece of data is properly backed up (borg/borgmatic job definition), you must make sure you are capable of retrieving it. It means even if a disaster occurs, you have in a safe place:

  • every repository URIs
  • every user accounts
  • every SSH keys
  • every repository keys
  • every passphrases

Any good password manager can store this. It's even better if it's hosted (1password, dashlane, lastpass, etc.) so that it doesn't disappear in the same disaster that swallowed your data. Printing can be an option, but I would not recommend it for keys, unless you can encode them as QRCodes for fast conversion to digital format.

You must check from time to time that your backups are OK, for example by restoring a random file in /tmp and compare to current file on disk. You must also attempt a restoration on a different system, to make sure you can properly access the repository and retrieve files on a fresh/blank system. You can for example create a bootable USB drive with BSD/Linux and borg installed to have a handy recovery setup ready to use in case of emergency.

Consider your threat model, YMMV, happy Borg-ing.

Short review of the KeyGrabber USB keylogger

keygrapper © keelog.comFew days ago I've bought a USB keylogger to use on my own computers (explanation in french here). Since then, and as I'm sitting in front of a computer more than 12 hours a day, I've got plenty of time to test it.
The exact model I've tested is the KeyGrabber USB MPC 8GB. I've had to choose the MPC model because both my current keyboards are Apple's Aluminum keyboards. They act as USB hubs, hence requiring some sort of filtering so that the keylogger won't try and log everything passing through the hub (mouse, usb headset, whatever…) and will get only what you type.
My setup is close to factory settings: I've just set LogSpecialKeys to "full" instead of "medium" and added a French layout to the keylogger, so that typing "a" will record "a", and not "q".

First of all, using the device on a Mac with a French Apple keyboard is a little bit frustrating: the French layout is for a PC keyboard, so typing alt-shift-( to get a [ will log [Alt][Sh][Up]. "[Up]"? Seriously? The only Macintosh layout available is for a US keyboard, so it's unusable here.

The KeyGrabber has a nice feature, especially on it's MPC version, that allows the user to transform the device into a USB thumbdrive with a key combination. By default if you press k-b-s the USB key is activated and mounts on your system desktop. The MPC version allows you to continue using your keyboard without having to plug it on another USB port after activation of the thumbdrive mode, which is great. You can then retrieve the log file, edit the config file, etc.
Going back to regular mode requires that you unplug and plug back the KeyGrabber.
Applying the "kbs" key combo needs some patience: press all three keys for about 5 seconds, wait about 15-20 seconds more, and the thumbdrive could show up. If it does not, try again. I've not tested it on Windows, but I'm not very optimistic, see below.

I'm using a quite special physical and logical setup on my home workstation. Basically, it's an ESXi hypervisor hosting a bunch of virtual machines. Two of these VM are extensively using PCI passthrough: dedicated GPU, audio controller, USB host controller. Different USB controllers are plugged to a USB switch, so I can share my mouse, keyboard, yubikey, USB headset, etc. between different VMs. Then, the KeyGrabber being plugged between the keyboard and the USB switch, it's shared between VMs too.
Unfortunately, for an unidentified reason, the Windows 10 VM will completely loose it's USB controller few seconds after I've switched USB devices from OSX to Windows. So for now on, I have to unplug the keylogger when I want to use the Windows VM, and that's a bummer. Being able to use a single device on my many systems was one of the reasons I've opted for a physical keylogger, instead of a piece of software.
Worse: rebooting the VM will not restore access to the USB controller, I have to reboot the ESXi. A real pain.

But in the end, it's the log file that matters, right? Well, it's a bit difficult here too, I'm afraid. I've contacted the support at Keelog, because way too often what I see in the log file does not match what I type. I'm not a fast typist, say about 50 to 55 words per minute. But it looks like it's too fast for the KeyGrabber which will happily drop letters, up to 4 letters in a 6 letters word (typed "jambon", logged "jb").
Here is a made-up phrase I've typed as a test:

c'est assez marrant parce qu'il ne me faut pas de modèle pour taper

And here is the result as logged by the device:

cesae assez ma[Alt][Ent][Alt]
rrant parc qu'l ne e faut pa de modèl pour taper

This can't be good. May be it's a matter of settings, some are not clearly described in the documentation, so I'm waiting for the vendor support to reply.

Overall, I'm not thrilled by this device. It's a 75€ gadget that won't work properly out of the box, and will crash my Win 10 system (and probably a part of the underlying ESXi). I'll update this post if the support helps me to achieve proper key logging.

My take on the MySpace dump

About a year ago, a full MySpace data breach dump surfaced on the average-Joe Internet. This huge dump (15 GiB compressed) is very interesting because many user accounts have two different password hashes. The first hash is non-salted, and represents a lower-cased, striped to 10 characters, version of the user original password. The second hash, not always present, is salted, and represents the full original user password.
Hence, the dump content can be summarized by this :

id : email : id/username : sha1(strtolower(substr($pass, 0, 9))) : sha1($id . $pass) 

It contains about 116.8 million unique unsalted sha1 hashes, and about 68.5 million salted sha1 hashes.

Of course, people who crack passwords will tell you that the unsalted hashes have no value, because then don't represent real user passwords. They are right. But when you crack those hashes you have a very interesting password candidate to crack the salted hashes. And this is very interesting!

After you cracked most of unsalted hashes, the question is: how do you proceed to crack their salted counterpart? Spoiler alert: hashcat on an Nvidia GTX 1080 is more than 200 times slower than John the Ripper on a single CPU core on this very particular job.

I'm a long time John the Ripper user (on CPU), and I'm pretty fan of it's intelligent design. Working on CPU requires wits and planing. And the more versatile your software is, the more efficient you can be. Hashcat sits on the other end of the spectrum: huge raw power thanks to GPU optimization. But it lacks the most sensible attack mode: "single".

Single mode works by computing password candidates from GECOS data like login, user name, email address, etc. So it makes sense to provide a full password file to JtR, instead of just naked hashes. These passwords metadata are very efficient when you want to create contextual password candidates.
The password retrieved from unsalted hash is more than a clue to retrieve its salted counterpart, in many case it's also the real user password. And when it's not, simple variations handled by mangling rules will do the trick.
You've probably guessed by now: I've created a file where password cracked from non-salted hashes are paired with the corresponding salted hash. The known password impersonate the user login, so that with proper tuning John the Ripper will try only this particular candidate against the corresponding salted hash.
Because of a bug in JtR, I was not able to use this attack on a huge file, I had to split it into small chucks. Nevertheless, I was able to retrieve 36708130 passwords in just 87 minutes. On a single CPU core.
In order to find those passwords with hashcat, I had to rely on a wordlist attack with on a GTX 1080. It took about 14 days to complete. No matter how fast your GPU is (about 1000 MH/s in that particular case), it will brainlessly try every single candidate on every single hash. Remember hashes are salted, so each one requires its own computation. If your file is 60M hashes long, then your GPU will only try 16.6 candidates per second (1000/60). It's very slow and inefficient.

Hashcat performance on a file containing 50% of total hashes.

Sometime, brain is better than raw power. Thank you John ;)

More on this topic:
https://hashes.org/forum/viewtopic.php?t=1715
http://cynosureprime.blogspot.fr/2016/07/myspace-hashes-length-10-and-beyond.html

Self-hosted password manager: installing Passbolt on FreeBSD

Arthur Duarte CC-BY-SA-4.0

Arthur Duarte CC-BY-SA-4.0

Password managers, or password safes, are an important thing these days. With the constant pressure we (IT people) put our users under to setup a different password for every single registration/application/web site, it's the best, if not only, way to keep track of these secrets. On one hand, the isolated client-side software can be really powerful and/or well integrated with the OS or the software ecosystem of the user, but it lacks the modern touch of "cloud" that makes your data available anywhere and anytime. On the other hand, a full commercial package will come with client for every device you own, and a monthly fee for cloud synchronization, but you have absolutely no control over your data (just imagine that tomorrow the company you rely on goes bankrupt).
Better safe than sorry: I don't rely on cloud services. It comes at a cost, but it's quite rewarding to show the world another way exists.
Disclaimer: I don't give a sh*t about smartphones, so my needs are computer-centric.

In order to store passwords, and more generally speaking "secrets", in such a way that I can access them anywhere/anytime, I've tried Passbolt. Passbolt is an OpenSource self-hosted password manager, written in PHP/Javascript with a database back end. Hence, install and config are not for the average Joe. On the user side it's quite clean and surprisingly stable for alpha software. So once a LAMP admin has finished installing the server part, any non-skilled user can register and start storing passwords.

Enough chit-chat, let's install.

My initial setup was a vanilla FreeBSD 10.3 install, so I've had to make everything. I won't replay every single step here, especially on the configuration side.

Prerequisites:

pkg install apache24
pkg install mod_php56
pkg install php56-gd
pkg install pecl-memcached
pkg install mysql57-server
pkg install pecl-gnupg
pkg install git
pkg install php56-pdo_mysql
pkg install sudo
pkg install php56-openssl
pkg install php56-ctype
pkg install php56-filter

Everything else should come as a dependency.

Tuning:

Apache must allow .htaccess, so you'll have to put an AllowOverride All somewhere in your configuration. You must also load the Rewrite module. Also, go now for SSL (letsencrypt is free and supported). Non-SSL install of Passbolt are for demo purpose only.
Apache will also need to execute gnupg commands, meaning the www user needs an extended $PATH. The Apache startup script provided on FreeBSD sources Apache environment variables from /usr/local/sbin/envvars and this very file sources every /usr/local/etc/apache24/envvars.d/*.env, so I've created mine:

$ cat /usr/local/etc/apache24/envvars.d/path.env
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin

You also need to tune your MySQL server. If you choose the 5.7, you must edit it's configuration. Just add the following line into [mysqld] section of /usr/local/etc/mysql/my.cnf:

sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'

This is due to a bug in Passbolt and could be useless in a not to distant future.

Install recipe:

You can now follow the install recipe at https://www.passbolt.com/help/tech/install.
Generating the GPG key is quite straightforward but you have to keep in mind that Apache's user (www) will need access to the keyring. So if you create this key and keyring with a different user, you'll have to mv and chown -R www the full .gnupg directory somewhere www can read it (outside DocumentRoot is perfectly fine).

Use git to retrieve the application code into appropriate path (according to your Apache config):

cd /usr/local/www
git clone https://github.com/passbolt/passbolt.git

Edit php files as per the documentation.

Beware the install script: make sure you chown -R www the whole passbolt directory before using cake install.
On FreeBSD you won't be able to use su to run the install script, because www's account is locked. You can use sudo instead:

sudo -u www app/Console/cake install --no-admin

Same for the admin account creation:

sudo -u www app/Console/cake passbolt register_user -u patpro@example.com -f Pat -l Pro -r admin

Follow the end of the install doc, and you should be ok. Install the Firefox passbolt extension into your browser, and point to your server.

I'm pretty happy with passbolt so far. I'll have to install a proper production server, with SSL and all, but features are very appealing, the passbolt team is nice and responsive, and the roadmap is loaded with killing features. Yeah BRING ME 2FA \o/.

Escaping the Apple ecosystem: a view of the setup

Here is a quick & dirty view of the physical and logical setup of my new workstation. The linux part is not finished yet (no drivers for Radeon GPU, thank you Ubuntu), it's a work in progress.

esx
Not depicted: each USB controller sports 4 USB ports (yellow) or 2 USB ports (pink and blue). It allows me to plug few devices that won't be "managed" by the USB switch.
USB devices plugged-in on the switch are made available to only one VM at a time. When I press the switch button, they disappear for the current VM and are presented to the next one.

Escaping the Apple ecosystem: part 3

In part 2, I was able to create and use a Windows 7 VM with the Radeon R9 270x in passthrough. It works really great. But OSX and Linux where more difficult to play with.

List of virtual machines

List of virtual machines


Since then, I've made tremendous progress: I've managed to run an OSX 10.11.6 VM properly, but more importantly, I've managed to run my native Mac OS X 10.6.8 system as a VM, with the Mac's Radeon in passthrough.
I've removed my Mac OS X SSD and the Mac's graphics card from the Mac Pro tower, and installed them into the PC tower. Then I've created the VM for the 10.6.8 system, configured ESXi to use Mac's Radeon with VT-d, etc.
The only real problem here is that adding a PCI card into the PC tower makes PCI device numbers change: it breaks almost every passthrough already configured. I had to remake VT-d config for the Windows VM. Apart from that, it went smoothly.
Currently, I'm working on my native 10.6.8 system, that runs as a VM, and the Windows VM is playing my music (because the Realtek HD audio controller is dedicated to the Windows VM).
Moving from a Mac Pro with 4-core 2.8 GHz Xeon to a 6-core 3.5 GHz Core i7 really gives a boost to my old 10.6.8 system.

Running both OSes, the box is almost as silent as the Mac Pro while packing almost twice as more raw CPU power and 2.7x more GPU power.

The Mac Pro is now empty: no disks, no graphics card, and will probably go on sale soon.

to-do list:

  • secure the whole infrastructure ;
  • install 2nd-hand MSI R9 270x when it's delivered ;
  • properly setup Linux to use AMD graphics card.

I might also add few SSDs and a DVD burner before year's end.

Escaping the Apple ecosystem: part 2

In part 1, I've written about the BoM of my project and the associated to-do list.
First item on this list was: build the box. That did not go as smoothly as expected. The motherboard was not fully operational: after few minutes of run time (between 5 and 30), it would trigger a CPU overheat alarm, even when the CPU was idle and cool. Supermicro's Support made me tried a new BIOS, with no effect, so I've finally send the board for exchange. The new board arrived but I've had to delay the rebuilt for few weeks.
Now the PC is up and running. The new motherboard seems to work great, but I've not tested IPMI yet. IPMI was the very first feature I've used on the first board, and there is a slight probability that the CPU overheat problem comes from a probe malfunction related to the BMC. Let's keep that for later.

I've chosen to run this box on VMware ESXi 5.5, because it's quite common (more than the latest 6.x), because it sports features I need like passthrough, and because most VMware based multi-sit PC projects like mine are using ESXi 5.x.

ESXi is quite easy to install, I won't give details. Main hdd in the box is installed with ESXi, which is configured thanks to a USB keyboard and a display plugged in the VGA port of the motherboard. After basic configuration (network, user password...), I've switched to remote configuration through VSphere Client, installed on a Windows PC (really a VM running on the Mac).

General view

General view of ESX's configuration


Configuring passthrough for GPUs is pretty straightforward, because I've started with only one GPU, and because these are discrete PCI cards. On the other hand, passthrough of USB controller can be tricky: many controllers, nothing to identify them except trial & error (unless you have the blueprint for your motherboard telling what physical USB ports belong to what controller).
Go to configuration tab, then "Advanced", and finally click "edit" on the right

Go to configuration tab, then "Advanced", and finally click "edit" on the right


When you click "Edit…" a window opens that lists interesting devices you can try to passthrough.
Choose some. Then you have to reboot the ESXi and add some of these devices to a VM.
passthrough
I've created a Windows 7 pro 64bits VM, with raw device mapping pointing to an SSD. I've added every available PCI devices to this VM (USB, sound, GPU) and installed Windows plus updates.
It's important to remember that initially, most PCI devices might not work at all because of missing drivers on guest OS (here it's Windows). Hence, after installing Windows, the Radeon was detected but not used, and only Intel USB controllers where working. I've installed AMD drivers, and ASMedia drivers (courtesy of Supermicro). I've also installed VMware Tools.
After all this, the Windows VM properly uses my Dell display hocked-up on the MSI R9 270x Radeon, and I can interact with the system thanks to a real keyboard and mouse. Passing through the whole USB controller allows me to use any USB device I want. I've successfully plugged-in and used a thumbdrive, a USB gaming headset and a USB hub.
I've made some GPU/CPU benchmarks and everything looks perfect. I've tested Left 4 Dead 2 game play, and it looked great too (I'll probably have to tweak anti aliasing settings to make it perfect).

The Windows part was quite fast to setup, and is almost done now. I've started to fight with OSX and Ubuntu, but things are not easy with both of them. It looks like my 3 years old graphics card it so new that OSX does not support it until 10.11.x, and Ubuntu won't allow me to install Radeon drivers on 16.x LTS because they wait for some software to stabilize before packaging it…

To-do list:

  • fix problems with OSX and Ubuntu virtualization
  • find another MSI Radeon R9 270X GAMING 2G (of course it's no longer in stock…)
  • fully test Mac OS X 10.6.8 with Mac's graphics card instead of MSI Radeon

Cracking passwords: testing PCFG password guess generator

Cracking passwords is a kind of e-sport, really. There's competition among amateurs and professionals "players", tools, gear. There are secrets, home-made recipes, software helpers, etc.
One of this software is PCFG password guess generator, for "Probabilistic Context-Free Grammar". I won't explain the concept of PCFG, some scientific literature exists you can read to discover all the math inside.
PCFG password guess generator comes as two main python programs: pcfg_trainer.py and pcfg_manager.py. Basic mechanism is the following:
- you feed pcfg_trainer.py with enough known passwords to generate comprehensive rules describing the grammar of known passwords, and supposedly unknown passwords too.
- you run pcfg_manager.py, using previously created grammar, to create millions of password candidates to feed into your favorite password cracker (John the Ripper, Hashcat…).

In order to measure PCFG password guess generator's efficiency I've made few tests. Here is my setup:

  • Huge password dump, 117205873 accounts with 61829207 unique Raw-SHA1 hashes;
  • John the Ripper, Bleeding Jumbo, downloaded 20160728, compiled on FreeBSD 10.x;
  • PCFG password guess generator, downloaded 20160801, launched with Python 3.x;

Here's my methodology:

Of these 61829207 hashes, about 35 millions are already cracked. I've extracted a random sample of 2 millions known passwords to feed the trainer. Then I've used pcfg_manager.py to create a 10 millions lines word list. I've also trimmed the famous Rockyou list to it's 10 millions first lines, to provide a known reference.

Finally, I've launched this shell script:

#!/bin/sh
for i in none wordlist jumbo; do
  ./john --wordlist=pcfg_crckr --rules=$i --session=pcfg_cracker-$i --pot=pcfg_cracker-$i.pot HugeDump
  ./john --wordlist=ry10m --rules=$i --session=ry10m-$i --pot=ry10m-$i.pot HugeDump
done

No forking, I'm running on one CPU core here. Each word list is tested three times, with no word mangling rules, with defaults JtR rules, and finally with Jumbo mangling rules.

Some results (number of cracked passwords):

Rules PCFG Rockyou
none 4409362 2774971
wordlist 5705502 5005889
Jumbo 21146209 22781889

That I can translate into efficiency, where efficiency is Cracked/WordlistLength as percentage:

Rules PCFG Rockyou
none 44.1% 27.7%
wordlist 57.1% 50.1%
Jumbo 211.5% 227.8%

It's quite interesting to see that the PCFG generated word list has a very good efficiency, compared to Rockyou list, when no rules are involved. That's to be expected, as PCFG password guess generator has been trained with a quite large sample of known passwords from the same dump I am attacking.
Also, the PCFG password guess generator creates candidates that are not very well suited for mangling, and only the jumbo set of rules achieves good results with this source. Rockyou on the other hand starts quite low with only 27.7% but jumps to 50.1% with common rules, and finally defeats PCFG when used with jumbo rules.

On the word list side, Rockyou is known and limited: it will never grow. But PCFG password guess generator looks like it can create an infinite list of candidates. Let see what happens when I create a list of +110 M candidates and feed them to JtR.

Rules PCFG Efficiency
none 9703571 8.8%
wordlist 10815243 9.8%

Efficiency plummets: only 9.7 M hashes cracked with a list of 110398024 candidates, and only 1.1 M more when the set of rules "wordlist" is applied. It's even less beneficial than with a list of 10 M candidates (+1.3 M with "wordlist" rules, compared to "none").

On the result side, both word list with jumbo rules yields to +21 M cracked passwords. But are those passwords identical, or different?

Rules Total unique cracked Yield
none 6013896 83.7%
wordlist 8184166 76.4%
Jumbo 26841735 61.1%
Yield = UniqueCracked / (PcfgCracked + RockyouCracked)

A high yield basically says that you should run both word lists into John. A yield of 50% means that all pwd cracked thanks to PCFG are identical to those cracked with the Rockyou list.

As a conclusion, I would say that the PCFG password guess generator is a very interesting tool, as it provides a way to generate valid candidates pretty easily. You probably still need a proper known passwords corpus to train it.
It's also very efficient with no rules at all, compared to the Rockyou list. That might make it a good tool for very slow hashes when you can't afford to try thousands of mangling rules on each candidate.

Some graphs to illustrate this post:

every john session on the same graph

every john session on the same graph

every session, zoomed on the first 2 minutes

every session, zoomed on the first 2 minutes

Rules "wordlist" on both lists of candidates

Rules "wordlist" on both lists of candidates

Rules "none", both lists of candidates

Rules "none", both lists of candidates