Borg, Kopia, Restic: from configuration to first backup

[This is the English translation by DeepL]
[Version originale en français]

All backup solutions require a minimum of configuration to do the job. These three solutions are no exception.
What Borg, Restic and Kopia have in common is the use of a repository into which they push backups. This repository must be initialized before the first backup can be made. In practice, here's how I proceeded with each of these programs.

Borg

Borg relies on an SSH key to open the communication tunnel between Borg on the server and Borg on the client. For this purpose, I use a dedicated SSH key and configuration. If you've mastered SSH configuration on the client side, this opens the door to fairly fine-tuning of the tunnel between client and server. We're back in familiar territory, and it's quite comfortable.

$ borg init --encryption=repokey-blake2 --rsh "ssh -F .ssh/config_bkp" bkppat@192.168.0.22:/backupmaison/bench-borg

Creating the first backup, on the other hand, is a completely different story. The number of parameters is so great that, in my scripts, I define them as variables which I then use in the various commands:

$ SSH="ssh -F .ssh/config_bkp"
$ HOMEREPO="bkppat@192.168.0.22:/backupmaison/bench-borg"
$ HOMEOPTIONS="--stats --exclude-caches --exclude-if-present .nobench-bkup"
$ SOURCEROOT="/Users/patpro"
$ ARCHIVENAME=$(date "+%Y%m%d.%s")
$ export BORG_PASSPHRASE="foobar"
$ borg create ${HOMEOPTIONS} --rsh "${SSH}" ${HOMEREPO}::${ARCHIVENAME} ${SOURCEROOT}

With Borg, all parameterization is done in this way. There is no configuration file. Fortunately, a backup task is supposed to be automated, i.e. to reside in one form or another in a script that doesn't need to be retyped manually at the command line. The disadvantage is therefore rather minor.

Kopia

If you're using SFTP transport, then Kopia works in much the same way as Borg, except that Kopia uses its own SSH implementation by default. This requires you to specify the location of the known_hosts file, for example. Nor will it take into account the configuration file of the machine's native SSH client. Unfortunately, Kopia's native implementation of the SSH protocol does not support all key types. This bug has been identified but not yet corrected. The use of Elliptic Curve keys remains possible if the Kopia client is told to use an external SSH client. For my part, I chose to create an RSA key to get around the bug.

$ kopia repository create sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

Unlike Borg, Kopia's configuration is largely deported to the repository. This involves using policy management commands to set a certain number of parameters for use by the Kopia client.
Here's how it works in my case :

$ kopia policy list
85bf7bc7688d5819433aad9ba1c84f6b (global)

$ kopia policy set --add-dot-ignore .nobench-bkup 08c59f69009c88ee463de6acbbc59a3a
Setting policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a
 - adding ".nobench-bkup" to "dot-ignore filenames"

$ kopia policy set --compression=zstd --global
Setting policy for (global)
 - setting compression algorithm to zstd

These policies are inherited, from the most general to the most specific:

$ kopia policy show --global| head
Policy for (global):

Retention:
  Annual snapshots:                     3   (defined for this target)
  Monthly snapshots:                   24   (defined for this target)
  Weekly snapshots:                     4   (defined for this target)
  Daily snapshots:                      7   (defined for this target)
  Hourly snapshots:                    48   (defined for this target)
  Latest snapshots:                    10   (defined for this target)
  Ignore identical snapshots:       false   (defined for this target)

$ kopia policy show 08c59f69009c88ee463de6acbbc59a3a| head
Policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a:

Retention:
  Annual snapshots:                     3   inherited from (global)
  Monthly snapshots:                   24   inherited from (global)
  Weekly snapshots:                     4   inherited from (global)
  Daily snapshots:                      7   inherited from (global)
  Hourly snapshots:                    48   inherited from (global)
  Latest snapshots:                    10   inherited from (global)
  Ignore identical snapshots:       false   inherited from (global)

As the various client parameters used to define backup and purge tasks are deported, the command line for the various actions is greatly simplified. You still need to to connect the client to the backup server. This is done with a command like :

$ kopia -p foobar repository connect sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

Disconnection is then carried out as follows:

$ kopia repository disconnect

It's important to understand that this connection and disconnection doesn't have to be done systematically. It's a mistake I made myself: in the early days of running my backup scripts, for each Kopia task I would connect, perform the task and then disconnect. The side effects are minimal, but I'll come back to that later. In any case, a connection at time T persists, because it's not a network connection. It's best to think of this as a pre-configuration of the client, enabling it to use the server's login, password and contact details to carry out backup tasks. In the rest of my tests, I simply made a connection once and never disconnected my client again. As a result, creating a backup is as simple as this trivial command:

$ kopia snapshot create /Users/patpro

The difference with Borg is obvious.

Restic

To test Restic, I decided to use the REST-server. Transport is over HTTP and the command is relatively trivial. On the server side, configuration is fairly basic, since I've simply specified the storage path, interface and TCP port for listening and the --no-auth parameter, as I don't need authentication.

$ restic -r rest:http://192.168.0.22:8000/ init

Like Borg, backup and purge tasks require all parameters to be passed on the command line:

$ export RESTIC_PASSWORD=foobar
$ restic -r rest:http://192.168.0.22:8000/ backup --verbose --exclude-if-present .nobench-bkup --one-file-system --exclude-caches --no-scan --read-concurrency 4 /Users/patpro

There's not much more to say about Restic, which behaves much like Borg in terms of settings and command line. Even if the number of parameters were to multiply, it would all take place in a perfectly readable shell script.

To conclude

I really like the readability of a command line with or without variables: it allows me to analyze a backup script in any editor and in any context. Nevertheless, I have to admit that I'm quite taken with the elegance of Kopia's policy system, even if accessing the policy content requires access to the Kopia command in a terminal as well as to the backup repository (which is a drawback). I think we're faced with a choice criterion that will depend solely on the end-user's preference.

Related posts

Borg, Kopia, Restic: functional scope and general information

[This is the English translation by DeepL]
[Version originale en français]

The functional scope of the three backup solutions is relatively equivalent. Each operates in "push" mode, i.e. the client initiates the backup to the server. This mode of operation is undoubtedly best suited to people who manage a small handful of machines. The "pull" mode is available for Borg, but requires a few adjustments.
All three programs use duplication and compression mechanisms to limit the volume of data transmitted and stored. With Kopia, compression is not activated by default. All three also implement backup retention and expiration management, repository maintenance mechanisms, etc.

Borgbackup

logo BorgbackupBorgbackup, or Borg, is a solution developed in Python. It is relatively easy to install, but does not allow Windows workstations to be backed up. Its documentation is relatively exhaustive, and its ecosystem includes automation and GUI solutions. It evolves regularly. Tests were carried out with version 1.2.7. Branch 1.4 (beta) is available and branch 2 is under active development.

To use Borg, the software must also be installed on the destination server. This is a constraint that may prevent some people from using Borg. Remote backup requires an SSH connection. In addition, access to the backup repository is exclusive (lock), i.e. it is not possible to back up several machines to the same repository at the same time. Furthermore, backing up multiple clients to the same repository is discouraged for various reasons.

Should the client machine be compromised, there is cause for concern that the backup software present on the machine will allow remote archives to be destroyed. By default, this is the case. Nonetheless, Borg's client-server operation allows you to add controls. It is possible, for example, to configure the client so that it can only add and restore files, but not delete archives.

All Borg settings are made using command-line arguments. There are no parameter files, although dedicated files can be used to deport lists of exclusions and inclusions, for example.

Official website: https://www.borgbackup.org/
Official documentation: https://borgbackup.readthedocs.io/en/stable/
Cooperative development platform: https://github.com/borgbackup/borg

Kopia

logo KopiaKopia is a solution coded in GO. The version used here is 0.15.0 and the project is active. It is a monolithic binary available for different architectures and operating systems. Kopia is Windows-compatible and offers a GUI version. It also works with many different types of remote storage, such as S3, Azure blob Storage, Backblaze B2, WebDAV, SFTP and others. You don't need to install any server-side components to use Kopia.
It is also possible to back up several clients in the same repository. Deduplication can then be achieved across backups of all clients.

Kopia also features an optional "server" mode, which can be configured to add user management when users share the same repository, for example. This feature can also be used to limit the client's rights over stored archives, for example, by prohibiting their deletion. If you don't use the server part of Kopia, you won't be able to prevent a compromised client from deleting its backups.

Kopia lets you manage backup policies that are saved and then applied to backup jobs (or cleanup jobs, since these policies also include the notion of retention). This backup policy management is rather elegant, and allows you to limit the number of command line arguments each time a job is launched.

Official website: https://kopia.io/
Official documentation: https://kopia.io/docs/
Cooperative development platform: https://github.com/kopia/kopia/

Restic

logo ResticRestic is also coded in GO. The version used here is 0.16.4 and the project is active. It is available for various architectures and operating systems, including Windows. The range of remote storage solutions supported by Restic is broadly similar to that of Kopia. Restic also offers its own high-performance HTTP server, which implements Restic's REST API. As with Kopia and Borg, this can ensure secure backups by limiting client rights.

This secure backup protocol, commonly referred to as "append-only", poses constraints on use that can be discouraging for users. Indeed, backing up machines to repositories on which old backups cannot be deleted poses a problem of volume management. For my part, I'd much prefer the client to have full rights over the backups, but for the repository of these backups to be stored on a file system that allows regular snapshots to be generated. In my case, all backups are written to a ZFS file system and automatic snapshots are taken every day with relatively long retention times.

Like Borg, Restic relies exclusively on command-line parameters to configure the various tasks.

Official website: https://restic.net/
Official documentation: https://restic.readthedocs.io/en/stable/index.html
Cooperative development platform: https://github.com/restic/restic

In a nutshell

Everyone has to make up their own mind according to their own needs, but if you don't rule out any scenario (Windows client backup, shared repository or S3 storage...), Borg doesn't have the edge. If, like me, you back up a handful of macOS/FreeBSD/Linux machines to servers on which you can install software and to which you can connect via SSH, then Borg is on a par with Kopia and Restic.

Borg Kopia Restic
Portability - + +
Storage options - + +
Transport options - + +
Multi-client repository - + +
Related posts

Borg, Kopia, Restic: a comparison

[This is the English translation by DeepL]
[Version originale en français]

I've been using Borg Backup for several years now. However, at the end of 2023, I encountered major problems with the backup of one of my machines. So I decided to explore alternative solutions. A quick search revealed two solutions that were relatively equivalent to Borg Backup: Restic and Kopia.

In the absence of a comparison of these three solutions (there is an "old" comparison between Borg and Restic, but it doesn't go as far and doesn't mention Kopia), I decided to carry out some tests myself, to be able to compare simple metrics common to all three software packages.

Finally, once a regular backup is automated, the important metrics for me are: backup duration, network exchange volume and storage volume on the backup server. Similar metrics are worth exploring for backup restore operations and backup server maintenance operations.

My test environment consists of the following components:

  • client: macmini M2 Pro running macOS 13.6.3, 32 GB RAM
  • backup server: Intel(R) Core(TM) i3-3220T-based PC, running FreeBSD 14, 16 GB RAM, ZFS storage on SSD
  • switched 1Gbps ethernet network
  • Splunk instance to ingest logs and produce statistics and graphs

To make sure I was working with representative data, I decided to back up part of my user directory. I've excluded a huge directory of digital photos that rarely changes, as well as numerous cache directories that it's not relevant to back up. The final size of my backup target is around 140GB.

I based my backup protocol on my experience with Borg. However, as each software has its own specificities, it took some time to document and adapt. In spite of these preparations, the backup process did have a few surprises in store for me.

Backups were automated using a single bash script, which launched the creation of backups via Borg, Restic and Kopia in random order. I had in mind that the first backup task would be potentially penalized by access to files on the client workstation that were not in the system cache, but after 15 days of testing, I found that there was no visible difference in backup time attributable to the order of the backup tasks. I therefore subsequently decided to freeze the order of backup jobs, which also makes it much easier to track metrics.

The client machine has scheduled sleep periods at night. During these sleep periods, no backup takes place. In addition, the script is scheduled in Launchd by setting the StartInterval argument to 3600 seconds. Unlike a crontab schedule, for example, this one ensures an incompressible delay of one hour between two script executions. In other words, one hour must elapse between the end of the previous execution and the start of the new one. This means that the script is actually run just under once an hour. For example, if each execution lasts 5 minutes, then it takes 26 hours, not 24 hours, to run the script 24 times.

Important note: I'm not going to go into detail in these articles about the differences in the software's ergonomics, the availability or otherwise of a graphical management interface, etc. Some of the solutions presented here have several different graphical interfaces, the functionalities of the three products vary, and I don't wish to make a catalog comparing each software point by point.

Here's the table of contents of my comparison. The link to each article will become active upon publication.

Related posts

Borgbackup: it’s complicated

As I wrote in a previous article, I'm using Borgbackup to back up some servers and workstations. And most of the time it's going OK. Unfortunately, when it's not OK, it can be very (very) hard to recover. Spoiler alert: one of my backup repositories had a problem, it took me 16 days and 1131 connections to the repository to fix it.

Few weeks ago, I've noticed that the Borg repository for the backups of my MacBook Pro was holding way too many backups. The pruning process was not doing its job properly. So I’ve started to investigate, and discovered that every attempt to prune older backups failed.
Trying to check the repository failed too: every single time the connection was reset. Apparently the remote server was dropping the SSH connection after a while (between 20 and 90 minutes). Client-side the error message was Connection closed by remote host.
I started to wonder if the problem was the very large number of backups, so I tried to delete backups manually. Some of them were easy to delete but most of the older backups needed delete --force and even delete --force --force, which is pretty bad.
Basically, I destroyed two years worth of backups, but it was still impossible to check and repair the repository. Also, the use of --force and --force --force makes it impossible to reclaim the space until you repair the repository.

Digging in the (mis)direction of a connection problem I’ve tried everything I could. It failed every single time:

  • tune ClientAliveInterval and ClientAliveCountMax server-side
  • tune ServerAliveInterval and ServerAliveCountMax client side
  • using and not using multiplexed ssh connections
  • moving the client to another physical network
  • using caffeinate to launch the borg command
  • using Amphetamine
  • upgrading ssh on the client
  • upgrading borg
  • giving mosh a try (nope)
  • running ssh with -vvv
  • using ktrace server-side to find the cause

None of this helped at all. Not to find a cause, not to make it work properly.
Eventually I stumbled upon this blog post: Repair Badly Damaged Borg Repository.

The repair method described failed too:
- step check --repair --repository-only worked
- but check --repair --archives-only died with the same Connection closed by remote host.
As a workaround attempt I’ve scripted a loop that would do check --repair --archives-only on every archive individually (~630).
After 3 full days and more than 600Go transferred from the repo to the client, every archives were checked and none was faulty.
Next step was to attempt check --repair again on the whole repository. About 15 hours later the output was

2851697 orphaned objects found!
Archive consistency check complete, problems found.

Full working solution for me:

export BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=YES 
borg check -p --repair --repository-only LOGIN@HOST:/path/to/repo
borg -p list LOGIN@HOST:/path/to/repo | awk '{print $1}' > listborg
cat listborg | while read i; do 
	borg -vp check --repair --archives-only LOGIN@HOST:/path/to/repo::$i
done
borg -p check --repair LOGIN@HOST:/path/to/repo

Final step was to reclaim space on the server with compact:

$ borg compact -p --verbose path/to/repo/
compaction freed about 376.22 GB repository space.

Boy that was painful!

I’m pretty sure it’s not really an ssh issue. Something is causing the connection to drop but I have ssh connections opened for days between that client and that server, so I guess ssh is not the problem here. Also, borgbackup’s github is loaded with issues about failing connections. In the end it boils down to:
- borg repositories can break badly without warning
- repairing a repo might not worth the bandwidth/CPU/IOs

Related posts

Multi-head virtualized workstation: the end.

Four years ago I've started a journey in workstation virtualization. My goal at the time was to try and escape Apple's ecosystem as it was moving steadily toward closedness (and iOS-ness). I also though back then that it would allow me to pause planned obsolescence by isolating the hardware from the software.
I've been very wrong.

My ESXi workstation was built with power, scalability and silence in mind. And it had all this for a long time. But about 1.5 year ago I've started to notice the hum of one of its graphics card. Recently this hum turned into an unpleasant high pitched sound under load. The fans were aging and I needed a solution. Problem is, one just can't choose any graphics card off the shelf and put it into an ESXi server. It requires a compatibility study: card vs motherboard, vs PSU, vs ESXi, vs VM Operating system. If you happened to need an ESXi upgrade (from 5.x to 6.x for example) in order to use a new graphics card then you need to study the compatibility of this new ESXi with your other graphics cards, your other VM OSes, etc.
And this is where I was stuck. My main workstation was a macOS VM using an old Mac Pro Radeon that would not work on ESXi 6.x. All things considered, every single upgrade path was doomed to failure unless I could find a current graphics card, silent, that would work on ESXi 5.x and get accepted by the Windows 10 guest via PCI passthrough. I've found one: the Sapphire Radeon RX 590 Nitro+. Worked great at first. Very nice benchmark and remarquable silence. But after less than an hour I noticed that HDDs inside the ESXi were missing, gone. In fact, under GPU load the motherboard would lose its HDDs. I don't know for sure but it could have been a power problem, even though the high quality PSU was rated for 1000W. Anyway, guess what: ESXi does not like losing its boot HDD or a datastore. So I've sent the graphics card back and got a refund.
Second problem: I was stuck with a decent but old macOS release (10.11, aka El Capitan). No more updates, no more security patches. Upgrading the OS was also a complex operation with compatibility problems with the old ESXi release and with the older Mac Pro Radeon. I've tried a few things but it always ended with a no-go.
Later this year, I've given a try to another Radeon GPU, less power-hungry but it yielded to other passthrough and VM malfunctions. This time I choose to keep the new GPU as an incentive to deal with the whole ESXi mess.

So basically, the situation was: very nice multi-head setup, powerful, scalable (room for more storage, more RAM, more PCI) but stuck in the past with a 5 years old macOS using a 10 years old Mac Pro graphics card in passthrough on top of a 5 years old ESXi release, the Windows 10 GPU becoming noisy, and nowhere to go from there.

I went through the 5 stages of grief and accepted that this path was a dead-end. No more workstation virtualization, no more complex PCI passthrough, I've had enough. Few weeks ago I've started to plan my escape: I need a silent Mac with decent power and storage (photo editing), I need a silent and relatively powerful Windows 10 gaming PC, I need an always on, tiny virtualization box for everything else (splunk server, linux and FreeBSD experiments, etc.). It was supposed to be a slow migration process, maintaining both infrastructures in parallel for some weeks and allowing perfect testing and switching.
Full disclosure: it was not.

I've created the Mac first, mostly because the PC case ordered was not delivered yet. Using a NUC10i7 I've followed online instructions and installed my very first Hackintosh. It worked almost immediately. Quite happy about the result, I've launched the migration assistant on my macOS VM and on my Hackintosh and injected about 430 Go of digital life into the little black box. Good enough for a test, I was quite sure I would wipe everything and rebuild a clean system later.

Few days later I started to build the PC. I was supposed to reclaim a not-so-useful SSD from the ESXi workstation to use as the main bare metal PC storage. I've made sure nothing was on the SSD, I've shutdown ESXi and removed the SSD and it's SATA cable. I've also removed another SSD+cable that was not used (failed migration attempt to ESXi 6.x and test for Proxmox). I've restarted ESXi just to find out a third SSD has disappeared: a very useful datastore is missing, 7 or 8 VM are impacted, partially or totally. The macOS VM is dead, main VMDK is missing (everything else is present, even its Time Machine VMDK), the Splunk VM is gone with +60 Go of logs, Ubuntu server is gone, some FreeBSD are gone too, etc.
Few reboots later, I extract the faulty SSD and start testing: different cable, different port, different PC. Nothing works and the SSD is not even detected by the BIOS (on both PCs).
This is a good incentive for a fast migration to bare metal PCs.
Fortunately:
- a spare macOS 10.11 VM, blank but fully functional, is waiting for me on an NFS datastore (backed by FreeBSD and ZFS).
- the Time machine VMDK of my macOS VM workstation is OK
- my Hackintosh is ready even though its data is about a week old
- the Windows 10 VM workstation is fully functional

So I've plugged the Time machine disk into the spare macOS VM, booted it, and launched Disk Utility to create a compressed image of the Time machine disk. Then I've copied this 350 Go dmg file on the Hackintosh SSD, after what I've mounted this image and copied the week worth of out-of-sync data to my new macOS bare metal workstation (mostly Lightroom related files and pictures).
I've plugged the reclaimed SSD into the new PC and installed Windows 10, configured everything I need, started Steam and downloaded my usual games.
Last but not least, I've shutdown the ESXi workstation, for good this time, unplugged everything (a real mess), cleaned up a bit, installed the new, way smaller, gaming PC, plugged everything.

Unfortunately, the Hackintosh uses macOS Catalina. This version won't run many of paid and free software I'm using. Say good bye to my Adobe CS 5 suite, bought years ago, good bye to BBEdit (I'll buy the latest release ASAP), etc. My Dock is a graveyard of incompatible applications. Only sparkle of luck here: LightRoom 3 that seems to be pretty happy on macOS 10.15.6.

In less than one day and a half I've moved from a broken multi-head virtualized workstation to bare metal PCs running up-to-date OSes on top of up-to-date hardware. Still MIA, the virtualization hardware to re-create my lab.

What saved me:
- backups
- preparedness and contingency plan
- backups again

Things to do:
- put the Hackintosh into a fanless case
- add an SSD for Time machine
- add second drive in Windows 10 PC for backups
- buy another NUC for virtualization lab
- buy missing software or find alternatives

Related posts

Moving to Borgbackup

I used to have a quite complicated backup setup, involving macOS Time Machine, rsync, shell scripts, ZFS snapshots, pefs, local disks, a server on the LAN, and a server 450 km away. It was working great but I've felt like I could use a unified system that I could share across every systems and that would allow me to encrypt data at rest.
Pure ZFS was a no-go: snapshot send/receive is very nice but it lacks encryption for data at rest (transfer is protected by SSH encryption) and macOS doesn't support ZFS. Rsync is portable but does not offer encryption either. Storing data in a pefs vault is complicated and works only on FreeBSD.
After a while, I've decided that I want to be able to store my encrypted data on any LAN/WAN device I own and somewhere on the cloud of a service provider. I've read about BorgBackup, checked its documentation, found a Borg repository hosting provider with a nice offer, and decided to give it a try.

This is how I've started to use Borg with hosting provider BorgBase.

Borg is quite simple, even though it does look complicated when you begin. BorgBase helps a lot, because you are guided all along from ssh key management to creation of your first backup. They will also help automating backups with a almost-ready-to-use borgmatic config file.

Borg is secure: it encrypts data before sending them over the wire. Everything travels inside an SSH tunnel. So it's perfectly safe to use Borg in order to send your backups away in the cloud. The remote end of the SSH tunnel must have Borg installed too.

Borg is (quite) fast: it compresses and dedup data before sending. Only the first backup is a full one, every other backup will send and store only changed files or part of files.

Borg is cross-plateform enough: it works on any recent/supported macOS/BSD/Linux.

Borg is not for the faint heart: it's still command line, it's ssh keys to manage, it's really not the average joe backup tool. As rsync.net puts it: "You're here because you're an expert".

In the end, the only thing I'm going to regret about my former home-made backup system was that I could just browse/access/read/retrieve the content of any file in a backup with just ssh, which was very handy. With Borg this ease of use is gone, I'll have to restore a file if I want to access it.

I won't detail every nuts and bolts of Borg, lots of documentation exists for that. I would like to address a more organizational problem: doing backups is a must, but being able to leverage those backups is often overlooked.
I backup 3 machines with borg: A (workstation), B (home server), C (distant server). I've setup borgmatic jobs to backup A, B and C once a day to BorgBase cloud. Each job uses a dedicated SSH key and user account, a dedicated Repository key, a dedicated passphrase. I've also created similar jobs to backup A on B, A on C, B on C (but not Beyoncé).
Once you are confident that every important piece of data is properly backed up (borg/borgmatic job definition), you must make sure you are capable of retrieving it. It means even if a disaster occurs, you have in a safe place:

  • every repository URIs
  • every user accounts
  • every SSH keys
  • every repository keys
  • every passphrases

Any good password manager can store this. It's even better if it's hosted (1password, dashlane, lastpass, etc.) so that it doesn't disappear in the same disaster that swallowed your data. Printing can be an option, but I would not recommend it for keys, unless you can encode them as QRCodes for fast conversion to digital format.

You must check from time to time that your backups are OK, for example by restoring a random file in /tmp and compare to current file on disk. You must also attempt a restoration on a different system, to make sure you can properly access the repository and retrieve files on a fresh/blank system. You can for example create a bootable USB drive with BSD/Linux and borg installed to have a handy recovery setup ready to use in case of emergency.

Consider your threat model, YMMV, happy Borg-ing.

Related posts

Short review of the KeyGrabber USB keylogger

keygrapper © keelog.comFew days ago I've bought a USB keylogger to use on my own computers (explanation in french here). Since then, and as I'm sitting in front of a computer more than 12 hours a day, I've got plenty of time to test it.
The exact model I've tested is the KeyGrabber USB MPC 8GB. I've had to choose the MPC model because both my current keyboards are Apple's Aluminum keyboards. They act as USB hubs, hence requiring some sort of filtering so that the keylogger won't try and log everything passing through the hub (mouse, usb headset, whatever…) and will get only what you type.
My setup is close to factory settings: I've just set LogSpecialKeys to "full" instead of "medium" and added a French layout to the keylogger, so that typing "a" will record "a", and not "q".

First of all, using the device on a Mac with a French Apple keyboard is a little bit frustrating: the French layout is for a PC keyboard, so typing alt-shift-( to get a [ will log [Alt][Sh][Up]. "[Up]"? Seriously? The only Macintosh layout available is for a US keyboard, so it's unusable here.

The KeyGrabber has a nice feature, especially on it's MPC version, that allows the user to transform the device into a USB thumbdrive with a key combination. By default if you press k-b-s the USB key is activated and mounts on your system desktop. The MPC version allows you to continue using your keyboard without having to plug it on another USB port after activation of the thumbdrive mode, which is great. You can then retrieve the log file, edit the config file, etc.
Going back to regular mode requires that you unplug and plug back the KeyGrabber.
Applying the "kbs" key combo needs some patience: press all three keys for about 5 seconds, wait about 15-20 seconds more, and the thumbdrive could show up. If it does not, try again. I've not tested it on Windows, but I'm not very optimistic, see below.

I'm using a quite special physical and logical setup on my home workstation. Basically, it's an ESXi hypervisor hosting a bunch of virtual machines. Two of these VM are extensively using PCI passthrough: dedicated GPU, audio controller, USB host controller. Different USB controllers are plugged to a USB switch, so I can share my mouse, keyboard, yubikey, USB headset, etc. between different VMs. Then, the KeyGrabber being plugged between the keyboard and the USB switch, it's shared between VMs too.
Unfortunately, for an unidentified reason, the Windows 10 VM will completely loose it's USB controller few seconds after I've switched USB devices from OSX to Windows. So for now on, I have to unplug the keylogger when I want to use the Windows VM, and that's a bummer. Being able to use a single device on my many systems was one of the reasons I've opted for a physical keylogger, instead of a piece of software.
Worse: rebooting the VM will not restore access to the USB controller, I have to reboot the ESXi. A real pain.

But in the end, it's the log file that matters, right? Well, it's a bit difficult here too, I'm afraid. I've contacted the support at Keelog, because way too often what I see in the log file does not match what I type. I'm not a fast typist, say about 50 to 55 words per minute. But it looks like it's too fast for the KeyGrabber which will happily drop letters, up to 4 letters in a 6 letters word (typed "jambon", logged "jb").
Here is a made-up phrase I've typed as a test:

c'est assez marrant parce qu'il ne me faut pas de modèle pour taper

And here is the result as logged by the device:

cesae assez ma[Alt][Ent][Alt]
rrant parc qu'l ne e faut pa de modèl pour taper

This can't be good. May be it's a matter of settings, some are not clearly described in the documentation, so I'm waiting for the vendor support to reply.

Overall, I'm not thrilled by this device. It's a 75€ gadget that won't work properly out of the box, and will crash my Win 10 system (and probably a part of the underlying ESXi). I'll update this post if the support helps me to achieve proper key logging.

Related posts

My take on the MySpace dump

About a year ago, a full MySpace data breach dump surfaced on the average-Joe Internet. This huge dump (15 GiB compressed) is very interesting because many user accounts have two different password hashes. The first hash is non-salted, and represents a lower-cased, striped to 10 characters, version of the user original password. The second hash, not always present, is salted, and represents the full original user password.
Hence, the dump content can be summarized by this :

id : email : id/username : sha1(strtolower(substr($pass, 0, 9))) : sha1($id . $pass) 

It contains about 116.8 million unique unsalted sha1 hashes, and about 68.5 million salted sha1 hashes.

Of course, people who crack passwords will tell you that the unsalted hashes have no value, because then don't represent real user passwords. They are right. But when you crack those hashes you have a very interesting password candidate to crack the salted hashes. And this is very interesting!

After you cracked most of unsalted hashes, the question is: how do you proceed to crack their salted counterpart? Spoiler alert: hashcat on an Nvidia GTX 1080 is more than 200 times slower than John the Ripper on a single CPU core on this very particular job.

I'm a long time John the Ripper user (on CPU), and I'm pretty fan of it's intelligent design. Working on CPU requires wits and planing. And the more versatile your software is, the more efficient you can be. Hashcat sits on the other end of the spectrum: huge raw power thanks to GPU optimization. But it lacks the most sensible attack mode: "single".

Single mode works by computing password candidates from GECOS data like login, user name, email address, etc. So it makes sense to provide a full password file to JtR, instead of just naked hashes. These passwords metadata are very efficient when you want to create contextual password candidates.
The password retrieved from unsalted hash is more than a clue to retrieve its salted counterpart, in many case it's also the real user password. And when it's not, simple variations handled by mangling rules will do the trick.
You've probably guessed by now: I've created a file where password cracked from non-salted hashes are paired with the corresponding salted hash. The known password impersonate the user login, so that with proper tuning John the Ripper will try only this particular candidate against the corresponding salted hash.
Because of a bug in JtR, I was not able to use this attack on a huge file, I had to split it into small chucks. Nevertheless, I was able to retrieve 36708130 passwords in just 87 minutes. On a single CPU core.
In order to find those passwords with hashcat, I had to rely on a wordlist attack with on a GTX 1080. It took about 14 days to complete. No matter how fast your GPU is (about 1000 MH/s in that particular case), it will brainlessly try every single candidate on every single hash. Remember hashes are salted, so each one requires its own computation. If your file is 60M hashes long, then your GPU will only try 16.6 candidates per second (1000/60). It's very slow and inefficient.

Hashcat performance on a file containing 50% of total hashes.

Sometime, brain is better than raw power. Thank you John ;)

More on this topic:
https://hashes.org/forum/viewtopic.php?t=1715
http://cynosureprime.blogspot.fr/2016/07/myspace-hashes-length-10-and-beyond.html

Related posts