Unix | Cognitive Overhead

14 mars 2024

Borg, Kopia, Restic : de la configuration à la première sauvegarde

Toute solution de sauvegarde nécessite un minimum de configuration pour faire le travail attendu. Ces trois solutions ne font pas exception.
Borg, Restic et Kopia ont comme point commun d’utiliser un dépôt dans lequel ils poussent les sauvegardes. Ce dépôt doit être initialisé avant de pouvoir faire une première sauvegarde. En pratique, voilà comment j'ai procédé pour chacun de ses logiciels.

Borg

Le fonctionnement de Borg s’appuie sur une clé SSH pour ouvrir le tunnel de communication entre Borg sur le serveur et Borg sur le client. À cette fin, j’utilise une clé et une configuration SSH dédiées. Si on maîtrise la configuration SSH côté client, cela ouvre la porte à un paramétrage assez fin du tunnel entre le client et le serveur. On se retrouve en terrain connu et c'est plutôt confortable.

$ borg init --encryption=repokey-blake2 --rsh "ssh -F .ssh/config_bkp" bkppat@192.168.0.22:/backupmaison/bench-borg

La création de la première sauvegarde par contre est une toute autre histoire. Le nombre de paramètres est plus important, au point que dans mes scripts, je les défini sous forme de variables que j'utilise ensuite dans les différentes commandes :

$ SSH="ssh -F .ssh/config_bkp"
$ HOMEREPO="bkppat@192.168.0.22:/backupmaison/bench-borg"
$ HOMEOPTIONS="--stats --exclude-caches --exclude-if-present .nobench-bkup"
$ SOURCEROOT="/Users/patpro"
$ ARCHIVENAME=$(date "+%Y%m%d.%s")
$ export BORG_PASSPHRASE="foobar"
$ borg create ${HOMEOPTIONS} --rsh "${SSH}" ${HOMEREPO}::${ARCHIVENAME} ${SOURCEROOT}

Avec Borg, tout le paramétrage se fait de cette manière. Il n'y a pas de fichier de configuration. Heureusement, une tâche de sauvegarde est supposée être automatisée, donc résider sous une forme sous une autre dans un script qu'il ne sera pas nécessaire de retaper manuellement à la ligne de commandes. L'inconvénient est donc plutôt mineur.

Kopia

Si l'on utilise le transport SFTP, alors le fonctionnement de Kopia est à peu près similaire à celui de Borg, à ceci près que Kopia utilise par défaut sa propre implémentation SSH. Cela nécessite de lui préciser l'emplacement du fichier known_hosts par exemple. Il ne prendra donc pas en compte non plus le fichier de configuration du client SSH natif de la machine. Malheureusement, l'implémentation native du protocole SSH dans Kopia ne supporte pas tous les types de clés. Il s'agit d'un bug identifié mais pas encore corrigé. L'utilisation de clés de type Elliptic Curve reste possible si on dit au client Kopia d'utiliser un client SSH externe. J'ai, pour ma part, choisi de créer une clé RSA pour contourner le bug.

$ kopia repository create sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

À l'inverse de Borg, le paramétrage de Kopia est très largement déporté dans une configuration côté dépôt. Il s'agit, au travers de commandes de gestion de politiques, de régler un certain nombre de paramètres qui seront utilisables par le client Kopia.
Voici ce que cela donne dans mon cas de figure :

$ kopia policy list
85bf7bc7688d5819433aad9ba1c84f6b (global)

$ kopia policy set --add-dot-ignore .nobench-bkup 08c59f69009c88ee463de6acbbc59a3a
Setting policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a
 - adding ".nobench-bkup" to "dot-ignore filenames"

$ kopia policy set --compression=zstd --global
Setting policy for (global)
 - setting compression algorithm to zstd

Ces politiques fonctionnent avec une notion d'héritage, de la politique la plus générale à la plus particulière :

$ kopia policy show --global| head
Policy for (global):

Retention:
  Annual snapshots:                     3   (defined for this target)
  Monthly snapshots:                   24   (defined for this target)
  Weekly snapshots:                     4   (defined for this target)
  Daily snapshots:                      7   (defined for this target)
  Hourly snapshots:                    48   (defined for this target)
  Latest snapshots:                    10   (defined for this target)
  Ignore identical snapshots:       false   (defined for this target)

$ kopia policy show 08c59f69009c88ee463de6acbbc59a3a| head
Policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a:

Retention:
  Annual snapshots:                     3   inherited from (global)
  Monthly snapshots:                   24   inherited from (global)
  Weekly snapshots:                     4   inherited from (global)
  Daily snapshots:                      7   inherited from (global)
  Hourly snapshots:                    48   inherited from (global)
  Latest snapshots:                    10   inherited from (global)
  Ignore identical snapshots:       false   inherited from (global)

Les différents paramètres du client permettant de définir les tâches de sauvegarde et de purge étant déportés, la ligne de commandes pour les différentes actions s’en trouve grandement simplifiée. Il reste néanmoins à connecter le client au serveur de sauvegarde. Cela se fait avec une commande de ce style :

$ kopia -p foobar repository connect sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

La déconnexion se faisant alors de cette façon :

$ kopia repository disconnect

Il est important de comprendre que cette connexion et cette déconnexion ne doivent pas nécessairement être faites de manière systématique. C'est une erreur que j'ai moi-même commise : lors des premiers jours d'exécution de mes scripts de sauvegarde, pour chaque tâche de Kopia, je faisais la connexion, la tâche puis la déconnexion. Les effets de bord sont minimes, mais j'y reviendrai. Dans tous les cas, une connexion à un instant T persiste, car ce n'est pas une connexion réseau. Il vaut mieux voir cela comme une pré-configuration du client, lui permettant d'utiliser notamment les login, mot de passe et coordonnées du serveur pour dérouler les tâches de sauvegarde. Dans la suite de mes tests, j'ai simplement fait une connexion une fois et je n'ai plus déconnecté mon client. Si bien que la création d'une sauvegarde se résume à cette commande triviale :

$ kopia snapshot create /Users/patpro

La différence avec Borg saute aux yeux.

Restic

Pour tester Restic, j'ai souhaité utiliser le REST-server. Le transport se fait sur HTTP et la commande est relativement triviale. Côté serveur, la configuration est assez basique, puisque j'ai simplement précisé le chemin du stockage, l'interface et le port TCP pour l'écoute et le paramètre --no-auth car je n'ai pas besoin d'authentification.

$ restic -r rest:http://192.168.0.22:8000/ init

Tout comme Borg, les tâches de sauvegarde ou de purge nécessitent le passage de l'intégralité des paramètres dans la ligne de commandes :

$ export RESTIC_PASSWORD=foobar
$ restic -r rest:http://192.168.0.22:8000/ backup --verbose --exclude-if-present .nobench-bkup --one-file-system --exclude-caches --no-scan --read-concurrency 4 /Users/patpro

Pas grand-chose à ajouter du côté de Restic, qui se comporte à peu de chose près comme Borg en terme de paramétrage et de ligne de commandes. Même si les paramètres venaient à se multiplier, le tout prendrait place dans un script shell parfaitement lisible.

Pour conclure

J'aime beaucoup la lisibilité d'une ligne de commandes avec ou sans variables : elle me permet d'analyser un script de sauvegarde dans n'importe quel éditeur et dans n'importe quel contexte. Néanmoins je dois admettre que je suis assez séduit par l'élégance du système de politiques de Kopia, même si accéder au contenu de la politique nécessite d'accéder à la commande Kopia dans un terminal ainsi qu’au dépôt de sauvegarde (c'est un inconvénient). Je crois que nous sommes face à un critère de choix qui dépendra uniquement de la préférence de l'utilisateur final.

9 mars 2024

Borg, Kopia, Restic: functional scope and general information

[This is the English translation by DeepL]
[Version originale en français]

The functional scope of the three backup solutions is relatively equivalent. Each operates in "push" mode, i.e. the client initiates the backup to the server. This mode of operation is undoubtedly best suited to people who manage a small handful of machines. The "pull" mode is available for Borg, but requires a few adjustments.
All three programs use duplication and compression mechanisms to limit the volume of data transmitted and stored. With Kopia, compression is not activated by default. All three also implement backup retention and expiration management, repository maintenance mechanisms, etc.

Borgbackup

Borgbackup, or Borg, is a solution developed in Python. It is relatively easy to install, but does not allow Windows workstations to be backed up. Its documentation is relatively exhaustive, and its ecosystem includes automation and GUI solutions. It evolves regularly. Tests were carried out with version 1.2.7. Branch 1.4 (beta) is available and branch 2 is under active development.

To use Borg, the software must also be installed on the destination server. This is a constraint that may prevent some people from using Borg. Remote backup requires an SSH connection. In addition, access to the backup repository is exclusive (lock), i.e. it is not possible to back up several machines to the same repository at the same time. Furthermore, backing up multiple clients to the same repository is discouraged for various reasons.

Should the client machine be compromised, there is cause for concern that the backup software present on the machine will allow remote archives to be destroyed. By default, this is the case. Nonetheless, Borg's client-server operation allows you to add controls. It is possible, for example, to configure the client so that it can only add and restore files, but not delete archives.

All Borg settings are made using command-line arguments. There are no parameter files, although dedicated files can be used to deport lists of exclusions and inclusions, for example.

Official website: https://www.borgbackup.org/
Official documentation: https://borgbackup.readthedocs.io/en/stable/
Cooperative development platform: https://github.com/borgbackup/borg

Kopia

Kopia is a solution coded in GO. The version used here is 0.15.0 and the project is active. It is a monolithic binary available for different architectures and operating systems. Kopia is Windows-compatible and offers a GUI version. It also works with many different types of remote storage, such as S3, Azure blob Storage, Backblaze B2, WebDAV, SFTP and others. You don't need to install any server-side components to use Kopia.
It is also possible to back up several clients in the same repository. Deduplication can then be achieved across backups of all clients.

Kopia also features an optional "server" mode, which can be configured to add user management when users share the same repository, for example. This feature can also be used to limit the client's rights over stored archives, for example, by prohibiting their deletion. If you don't use the server part of Kopia, you won't be able to prevent a compromised client from deleting its backups.

Kopia lets you manage backup policies that are saved and then applied to backup jobs (or cleanup jobs, since these policies also include the notion of retention). This backup policy management is rather elegant, and allows you to limit the number of command line arguments each time a job is launched.

Official website: https://kopia.io/
Official documentation: https://kopia.io/docs/
Cooperative development platform: https://github.com/kopia/kopia/

Restic

Restic is also coded in GO. The version used here is 0.16.4 and the project is active. It is available for various architectures and operating systems, including Windows. The range of remote storage solutions supported by Restic is broadly similar to that of Kopia. Restic also offers its own high-performance HTTP server, which implements Restic's REST API. As with Kopia and Borg, this can ensure secure backups by limiting client rights.

This secure backup protocol, commonly referred to as "append-only", poses constraints on use that can be discouraging for users. Indeed, backing up machines to repositories on which old backups cannot be deleted poses a problem of volume management. For my part, I'd much prefer the client to have full rights over the backups, but for the repository of these backups to be stored on a file system that allows regular snapshots to be generated. In my case, all backups are written to a ZFS file system and automatic snapshots are taken every day with relatively long retention times.

Like Borg, Restic relies exclusively on command-line parameters to configure the various tasks.

Official website: https://restic.net/
Official documentation: https://restic.readthedocs.io/en/stable/index.html
Cooperative development platform: https://github.com/restic/restic

In a nutshell

Everyone has to make up their own mind according to their own needs, but if you don't rule out any scenario (Windows client backup, shared repository or S3 storage...), Borg doesn't have the edge. If, like me, you back up a handful of macOS/FreeBSD/Linux machines to servers on which you can install software and to which you can connect via SSH, then Borg is on a par with Kopia and Restic.

	Borg	Kopia	Restic
Portability	-	+	+
Storage options	-	+	+
Transport options	-	+	+
Multi-client repository	-	+	+

9 mars 2024

Borg, Kopia, Restic : périmètre fonctionnel et généralités

Le périmètre fonctionnel des trois solutions de sauvegarde est relativement équivalent. Chacune fonctionne sur le mode «push», c'est-à-dire que c'est le client qui initie la sauvegarde vers le serveur. Ce mode de fonctionnement est sans doute le plus adapté aux personnes qui gèrent une poignée de machines. Le mode «pull» est disponible pour Borg, mais requiert quelques aménagements.
Les trois programmes utilisent les mécanismes de duplication et de compression pour limiter les volumétries transmises et stockées. Attention cependant, avec Kopia, la compression n'est pas activée par défaut. Ils implémentent aussi tous trois une gestion de la rétention et de l'expiration des sauvegardes, des mécanismes d'entretien du dépôt, etc.

Borgbackup

Borgbackup, ou Borg pour les intimes, est une solution développée en langage python. Elle s'installe plutôt simplement mais ne permet pas la sauvegarde des postes Windows. Sa documentation est relativement exhaustive et son écosystème inclut des solutions d'automatisation et d'interfaces graphiques. Elle évolue régulièrement. Les tests ont été faits avec la version 1.2.7. La branche 1.4 (beta) est disponible et la branche 2 est en cours de développement actif.

Pour utiliser Borg, il est nécessaire que le logiciel soit aussi installé sur le serveur de destination. C'est une contrainte qui pourra bloquer certaines personnes. La sauvegarde distante passe par une connexion SSH. De plus, l'accès au dépôt de sauvegarde est exclusif (verrou), c'est-à-dire qu'il n'est pas possible de sauvegarder plusieurs machines dans le même dépôt en même temps. Par ailleurs, la sauvegarde de plusieurs clients sur le même dépôt est découragée pour différentes raisons.

En cas de compromission de la machine cliente, on peut s'inquiéter du fait que le logiciel de sauvegarde présent sur la machine permette la destruction des archives distantes. Par défaut c'est bien le cas. Néanmoins, le fonctionnement client-serveur de Borg permet d'ajouter des contrôles. Il est possible par exemple de configurer le client pour qu'il ne puisse faire que des ajouts et des restaurations de fichiers, mais pas de suppression d'archives.

Tout le paramétrage de Borg se fait avec des arguments en ligne de commande. Il n'y a pas de fichier de paramètres, même si on peut déporter certaines choses dans des fichiers dédiés comme les listes d'exclusions et d’inclusions par exemple.

Site officiel : https://www.borgbackup.org/
Documentation officielle : https://borgbackup.readthedocs.io/en/stable/
Plate-forme de développement coopératif : https://github.com/borgbackup/borg

Kopia

Kopia est une solution codée en GO. La version utilisée ici est la 0.15.0 et le projet est actif. Il s'agit d'un binaire monolithique disponible pour différentes architectures et systèmes d'exploitation. Kopia est compatible Windows et propose une version graphique. Il fonctionne aussi avec de nombreux stockages distants différents comme par exemple les stockages de type S3, Azure blob Storage, Backblaze B2, WebDAV, SFTP, etc. Il n'est pas nécessaire d'installer des composants côté serveur pour utiliser Kopia.
Il est aussi possible de sauvegarder plusieurs machines dans le même dépôt. La déduplication pourra alors être mise en commun.

Kopia dispose tout de même d'un mode «serveur» facultatif qu'il est possible de configurer pour ajouter une gestion d'utilisateurs, lorsque ces derniers partagent un même dépôt par exemple. C'est aussi grâce à cette fonctionnalité qu'il est possible de limiter les droits du client sur les archives stockées, par exemple en interdisant leur suppression. Si vous n'utilisez pas la partie serveur de Kopia, vous ne pourrez pas empêcher qu’un client compromis supprime ses sauvegardes.

Kopia permet de gérer des politiques de sauvegarde qui sont enregistrés et ensuite applicables à des tâches de sauvegarde (ou de nettoyage, puisque ces politiques incluent aussi la notion de rétention). Cette gestion de politiques de sauvegarde est plutôt élégante et permet de limiter le nombre d'arguments de la ligne de commande à chaque lancement d'une tâche.

Site officiel : https://kopia.io/
Documentation officielle : https://kopia.io/docs/
Plate-forme de développement coopératif : https://github.com/kopia/kopia/

Restic

Restic est aussi codé en GO. La version utilisée ici est la 0.16.4 et le projet est actif. il est disponible pour différentes architectures et systèmes d'exploitation, y compris Windows. L'éventail de solutions de stockage distant supporté par Restic est à peu près similaire à celui de Kopia. Restic propose en plus son propre serveur HTTP haute performance qui implémente l’API REST de Restic. Cela permet, comme dans le cas de Kopia et de Borg, d'assurer une sécurisation des sauvegardes en limitant les droits du client.

Ce protocole de sécurisation des sauvegardes, communément appelé «append-only», pose des contraintes d'utilisation qui peuvent être décourageantes pour l'utilisateur. En effet, sauvegarder des machines vers des dépôts sur lesquels on ne peut pas supprimer les anciennes sauvegardes pose un problème de gestion de volumétrie. Pour ma part, je préfère de loin que le client ait tous les droits sur les sauvegardes, mais que le dépôt de ces sauvegardes soit stockés sur un système de fichiers qui permet de générer des snapshots réguliers. Dans mon cas, toutes les sauvegardes sont écrites sur un système de fichiers ZFS et des snapshots automatiques sont pris tous les jours avec des rétentions relativement longues.

Tout comme Borg, Restic s'appuie exclusivement sur des paramètres de ligne de commande pour assurer la configuration des différentes tâches.

Site officiel : https://restic.net/
Documentation officielle : https://restic.readthedocs.io/en/stable/index.html
Plate-forme de développement coopératif : https://github.com/restic/restic

Pour résumer

Chacun doit se faire sa propre idée à l’aune de ses propres besoins, mais si l’on n’exclue aucun scénario (sauvegarde de client Windows, dépôt partagé ou encore stockage S3…), Borg n’a pas l’avantage. Si comme moi vous sauvegardez une poignée de machines macOS/FreeBSD/Linux à destinations de serveurs sur lesquels vous pouvez installer des logiciels et auxquels vous pouvez vous connecter en SSH alors Borg fait jeu égal avec Kopia et Restic.

	Borg	Kopia	Restic
Portabilité	-	+	+
Options de stockage	-	+	+
Options de transport	-	+	+
Dépôt multi-client	-	+	+

7 mars 2024

Borg, Kopia, Restic : un comparatif

[English translation by DeepL]
Depuis plusieurs années, j'utilise Borg Backup. Cependant, fin 2023, j'ai rencontré des problèmes importants avec la sauvegarde d'une de mes machines. J'ai donc décidé d'explorer des pistes alternatives. Une recherche rapide m'a permis de trouver des solutions relativement équivalentes à Borg Backup : Restic et Kopia.

En absence d'un comparatif de ces trois solutions (il existe un comparatif «ancien» entre Borg et Restic mais il ne va pas aussi loin et ne mentionne pas Kopia), j'ai décidé de me lancer dans des tests moi-même, pour pouvoir comparer des métriques simples et communes aux trois logiciels.

Finalement, une fois qu'une sauvegarde régulière est automatisée, les métriques importantes sont pour moi : la durée d'une sauvegarde, la volumétrie des échanges réseau et la volumétrie de stockage sur le serveur de sauvegarde. Des métriques similaires sont intéressantes à explorer sur les opérations de restauration d’une sauvegarde et sur les opérations de maintenance du serveur de sauvegardes.

Mon environnement de test est constitué de ces différentes briques :

machine cliente : macmini M2 Pro sous macOS 13.6.3, 32 Go RAM
serveur de sauvegarde : PC à base de Intel(R) Core(TM) i3-3220T, sous FreeBSD 14, 16 Go RAM, stockage ZFS sur SSD
réseau ethernet 1Gbps «switché»
une instance Splunk pour ingérer les logs et produire des statistiques et des graphiques

Pour m'assurer de travailler sur des données représentatives, j'ai décidé de sauvegarder une partie de mon répertoire utilisateur. J'ai exclu un immense répertoire de photos numériques qui ne bouge que très rarement, ainsi que de nombreux répertoires de cache qu’il n’est pas pertinent de sauvegarder. La volumétrie finale de ma cible de sauvegarde avoisine les 140Go.

Je me suis basé pour mon protocole de sauvegarde sur mon expérience acquise avec Borg. Néanmoins, chaque logiciel ayant ses spécificités, un temps de documentation et d'adaptation a été nécessaire. Malgré ces préparatifs, le déroulement des sauvegardes m'a réservé quelques surprises.

L'automatisation des sauvegardes s'est faite par un script bash unique permettant de lancer dans un ordre aléatoire la création des sauvegardes via Borg, Restic et Kopia. J’avais dans l'idée que la première tâche de sauvegarde serait potentiellement pénalisée par l'accès à des fichiers sur le poste client qui ne seraient pas dans le cache du système, mais après 15 jours de test, j'ai constaté qu'il n'y a aucune différence visible de temps de sauvegarde qui soit imputable à l'ordre des tâches de sauvegarde. J'ai donc décidé ultérieurement de figer l’ordre des tâches de sauvegarde, ce qui me permet aussi de suivre les métriques bien plus facilement.
La machine cliente a des périodes de sommeil planifié la nuit. Pendant ces périodes de sommeil, aucune sauvegarde n'a lieu. De plus, la planification du script est faite dans Launchd via l’argument StartInterval à 3600 secondes. Contrairement à une planification de type crontab, par exemple, celle-ci assure un délai incompressible d'une heure entre deux exécutions du script. C'est-à-dire qu'il doit s'écouler une heure entre la fin de l'exécution précédente et le début de la nouvelle exécution. Ainsi, le script est réellement lancé un peu moins d'une fois par heure. Par exemple si chaque exécution dure 5 minutes, alors il faut 26h et non 24h pour qu’il soit lancé 24 fois.

Note importante : je ne vais pas aborder en détails dans ces articles les différences d'ergonomie de ces logiciels, la disponibilité ou non d'une interface graphique de gestion, etc. Certaines des solutions présentées ici disposent de plusieurs interfaces graphiques différentes, les fonctionnalités des trois produits varient, et je ne souhaite pas faire un catalogue qui comparerait chaque logiciel point par point.

Voici le sommaire de mon comparatif. Le lien vers chaque article deviendra actif à sa publication.

7 mars 2024

Borg, Kopia, Restic: a comparison

[This is the English translation by DeepL]
[Version originale en français]

I've been using Borg Backup for several years now. However, at the end of 2023, I encountered major problems with the backup of one of my machines. So I decided to explore alternative solutions. A quick search revealed two solutions that were relatively equivalent to Borg Backup: Restic and Kopia.

In the absence of a comparison of these three solutions (there is an "old" comparison between Borg and Restic, but it doesn't go as far and doesn't mention Kopia), I decided to carry out some tests myself, to be able to compare simple metrics common to all three software packages.

Finally, once a regular backup is automated, the important metrics for me are: backup duration, network exchange volume and storage volume on the backup server. Similar metrics are worth exploring for backup restore operations and backup server maintenance operations.

My test environment consists of the following components:

client: macmini M2 Pro running macOS 13.6.3, 32 GB RAM
backup server: Intel(R) Core(TM) i3-3220T-based PC, running FreeBSD 14, 16 GB RAM, ZFS storage on SSD
switched 1Gbps ethernet network
Splunk instance to ingest logs and produce statistics and graphs

To make sure I was working with representative data, I decided to back up part of my user directory. I've excluded a huge directory of digital photos that rarely changes, as well as numerous cache directories that it's not relevant to back up. The final size of my backup target is around 140GB.

I based my backup protocol on my experience with Borg. However, as each software has its own specificities, it took some time to document and adapt. In spite of these preparations, the backup process did have a few surprises in store for me.

Backups were automated using a single bash script, which launched the creation of backups via Borg, Restic and Kopia in random order. I had in mind that the first backup task would be potentially penalized by access to files on the client workstation that were not in the system cache, but after 15 days of testing, I found that there was no visible difference in backup time attributable to the order of the backup tasks. I therefore subsequently decided to freeze the order of backup jobs, which also makes it much easier to track metrics.

The client machine has scheduled sleep periods at night. During these sleep periods, no backup takes place. In addition, the script is scheduled in Launchd by setting the StartInterval argument to 3600 seconds. Unlike a crontab schedule, for example, this one ensures an incompressible delay of one hour between two script executions. In other words, one hour must elapse between the end of the previous execution and the start of the new one. This means that the script is actually run just under once an hour. For example, if each execution lasts 5 minutes, then it takes 26 hours, not 24 hours, to run the script 24 times.

Important note: I'm not going to go into detail in these articles about the differences in the software's ergonomics, the availability or otherwise of a graphical management interface, etc. Some of the solutions presented here have several different graphical interfaces, the functionalities of the three products vary, and I don't wish to make a catalog comparing each software point by point.

Here's the table of contents of my comparison. The link to each article will become active upon publication.

11 février 2024

Borgbackup: it’s complicated

As I wrote in a previous article, I'm using Borgbackup to back up some servers and workstations. And most of the time it's going OK. Unfortunately, when it's not OK, it can be very (very) hard to recover. Spoiler alert: one of my backup repositories had a problem, it took me 16 days and 1131 connections to the repository to fix it.

Few weeks ago, I've noticed that the Borg repository for the backups of my MacBook Pro was holding way too many backups. The pruning process was not doing its job properly. So I’ve started to investigate, and discovered that every attempt to prune older backups failed.
Trying to check the repository failed too: every single time the connection was reset. Apparently the remote server was dropping the SSH connection after a while (between 20 and 90 minutes). Client-side the error message was Connection closed by remote host.
I started to wonder if the problem was the very large number of backups, so I tried to delete backups manually. Some of them were easy to delete but most of the older backups needed delete --force and even delete --force --force, which is pretty bad.
Basically, I destroyed two years worth of backups, but it was still impossible to check and repair the repository. Also, the use of --force and --force --force makes it impossible to reclaim the space until you repair the repository.

Digging in the (mis)direction of a connection problem I’ve tried everything I could. It failed every single time:

tune ClientAliveInterval and ClientAliveCountMax server-side
tune ServerAliveInterval and ServerAliveCountMax client side
using and not using multiplexed ssh connections
moving the client to another physical network
using caffeinate to launch the borg command
using Amphetamine
upgrading ssh on the client
upgrading borg
giving mosh a try (nope)
running ssh with -vvv
using ktrace server-side to find the cause

None of this helped at all. Not to find a cause, not to make it work properly.
Eventually I stumbled upon this blog post: Repair Badly Damaged Borg Repository.

The repair method described failed too:
- step check --repair --repository-only worked
- but check --repair --archives-only died with the same Connection closed by remote host.
As a workaround attempt I’ve scripted a loop that would do check --repair --archives-only on every archive individually (~630).
After 3 full days and more than 600Go transferred from the repo to the client, every archives were checked and none was faulty.
Next step was to attempt check --repair again on the whole repository. About 15 hours later the output was

2851697 orphaned objects found!
Archive consistency check complete, problems found.

Full working solution for me:

export BORG_CHECK_I_KNOW_WHAT_I_AM_DOING=YES 
borg check -p --repair --repository-only LOGIN@HOST:/path/to/repo
borg -p list LOGIN@HOST:/path/to/repo | awk '{print $1}' > listborg
cat listborg | while read i; do 
	borg -vp check --repair --archives-only LOGIN@HOST:/path/to/repo::$i
done
borg -p check --repair LOGIN@HOST:/path/to/repo

Final step was to reclaim space on the server with compact:

$ borg compact -p --verbose path/to/repo/
compaction freed about 376.22 GB repository space.

Boy that was painful!

I’m pretty sure it’s not really an ssh issue. Something is causing the connection to drop but I have ssh connections opened for days between that client and that server, so I guess ssh is not the problem here. Also, borgbackup’s github is loaded with issues about failing connections. In the end it boils down to:
- borg repositories can break badly without warning
- repairing a repo might not worth the bandwidth/CPU/IOs

2 septembre 2022

Recherche Administrateur/trice Systèmes

Au sein du Service Opérations de la DSI de l'Université Lyon 2, nous cherchons un ou une Administrateur/trice Systèmes pour renforcer notre équipe et nous aider à relever des défis au quotidien.
Lieu de travail : campus de Bron (arrêt de tram T2 Europe Université).

Vous habitez la région Lyonnaise ou êtes mobile ;
Vous êtes motivée par les enjeux de la gestion d’un parc de plus de 600 serveurs Linux RedHat/CentOS (70%), Windows, FreeBSD ;
Les problématiques d’une ferme de virtualisation multi-site avec balance de charge, PRA, sauvegardes croisées ne vous font pas peur ;
Les infrastructures à fort enjeu de disponibilité, les outils d’automatisation, de monitoring et les SIEM vous intéressent ;
Vous êtes passionnée par les problématiques système et souhaitez évoluer dans un environnement riche et varié ;
Vous êtes curieuse, très rigoureuse et vous avez le sens du service.

Si vous vous reconnaissez dans ce profil, contactez-moi !

Édition d'un script shell dans le terminal

8 août 2020

Multi-head virtualized workstation: the end.

Four years ago I've started a journey in workstation virtualization. My goal at the time was to try and escape Apple's ecosystem as it was moving steadily toward closedness (and iOS-ness). I also though back then that it would allow me to pause planned obsolescence by isolating the hardware from the software.
I've been very wrong.

My ESXi workstation was built with power, scalability and silence in mind. And it had all this for a long time. But about 1.5 year ago I've started to notice the hum of one of its graphics card. Recently this hum turned into an unpleasant high pitched sound under load. The fans were aging and I needed a solution. Problem is, one just can't choose any graphics card off the shelf and put it into an ESXi server. It requires a compatibility study: card vs motherboard, vs PSU, vs ESXi, vs VM Operating system. If you happened to need an ESXi upgrade (from 5.x to 6.x for example) in order to use a new graphics card then you need to study the compatibility of this new ESXi with your other graphics cards, your other VM OSes, etc.
And this is where I was stuck. My main workstation was a macOS VM using an old Mac Pro Radeon that would not work on ESXi 6.x. All things considered, every single upgrade path was doomed to failure unless I could find a current graphics card, silent, that would work on ESXi 5.x and get accepted by the Windows 10 guest via PCI passthrough. I've found one: the Sapphire Radeon RX 590 Nitro+. Worked great at first. Very nice benchmark and remarquable silence. But after less than an hour I noticed that HDDs inside the ESXi were missing, gone. In fact, under GPU load the motherboard would lose its HDDs. I don't know for sure but it could have been a power problem, even though the high quality PSU was rated for 1000W. Anyway, guess what: ESXi does not like losing its boot HDD or a datastore. So I've sent the graphics card back and got a refund.
Second problem: I was stuck with a decent but old macOS release (10.11, aka El Capitan). No more updates, no more security patches. Upgrading the OS was also a complex operation with compatibility problems with the old ESXi release and with the older Mac Pro Radeon. I've tried a few things but it always ended with a no-go.
Later this year, I've given a try to another Radeon GPU, less power-hungry but it yielded to other passthrough and VM malfunctions. This time I choose to keep the new GPU as an incentive to deal with the whole ESXi mess.

So basically, the situation was: very nice multi-head setup, powerful, scalable (room for more storage, more RAM, more PCI) but stuck in the past with a 5 years old macOS using a 10 years old Mac Pro graphics card in passthrough on top of a 5 years old ESXi release, the Windows 10 GPU becoming noisy, and nowhere to go from there.

I went through the 5 stages of grief and accepted that this path was a dead-end. No more workstation virtualization, no more complex PCI passthrough, I've had enough. Few weeks ago I've started to plan my escape: I need a silent Mac with decent power and storage (photo editing), I need a silent and relatively powerful Windows 10 gaming PC, I need an always on, tiny virtualization box for everything else (splunk server, linux and FreeBSD experiments, etc.). It was supposed to be a slow migration process, maintaining both infrastructures in parallel for some weeks and allowing perfect testing and switching.
Full disclosure: it was not.

I've created the Mac first, mostly because the PC case ordered was not delivered yet. Using a NUC10i7 I've followed online instructions and installed my very first Hackintosh. It worked almost immediately. Quite happy about the result, I've launched the migration assistant on my macOS VM and on my Hackintosh and injected about 430 Go of digital life into the little black box. Good enough for a test, I was quite sure I would wipe everything and rebuild a clean system later.

Few days later I started to build the PC. I was supposed to reclaim a not-so-useful SSD from the ESXi workstation to use as the main bare metal PC storage. I've made sure nothing was on the SSD, I've shutdown ESXi and removed the SSD and it's SATA cable. I've also removed another SSD+cable that was not used (failed migration attempt to ESXi 6.x and test for Proxmox). I've restarted ESXi just to find out a third SSD has disappeared: a very useful datastore is missing, 7 or 8 VM are impacted, partially or totally. The macOS VM is dead, main VMDK is missing (everything else is present, even its Time Machine VMDK), the Splunk VM is gone with +60 Go of logs, Ubuntu server is gone, some FreeBSD are gone too, etc.
Few reboots later, I extract the faulty SSD and start testing: different cable, different port, different PC. Nothing works and the SSD is not even detected by the BIOS (on both PCs).
This is a good incentive for a fast migration to bare metal PCs.
Fortunately:
- a spare macOS 10.11 VM, blank but fully functional, is waiting for me on an NFS datastore (backed by FreeBSD and ZFS).
- the Time machine VMDK of my macOS VM workstation is OK
- my Hackintosh is ready even though its data is about a week old
- the Windows 10 VM workstation is fully functional

So I've plugged the Time machine disk into the spare macOS VM, booted it, and launched Disk Utility to create a compressed image of the Time machine disk. Then I've copied this 350 Go dmg file on the Hackintosh SSD, after what I've mounted this image and copied the week worth of out-of-sync data to my new macOS bare metal workstation (mostly Lightroom related files and pictures).
I've plugged the reclaimed SSD into the new PC and installed Windows 10, configured everything I need, started Steam and downloaded my usual games.
Last but not least, I've shutdown the ESXi workstation, for good this time, unplugged everything (a real mess), cleaned up a bit, installed the new, way smaller, gaming PC, plugged everything.

Unfortunately, the Hackintosh uses macOS Catalina. This version won't run many of paid and free software I'm using. Say good bye to my Adobe CS 5 suite, bought years ago, good bye to BBEdit (I'll buy the latest release ASAP), etc. My Dock is a graveyard of incompatible applications. Only sparkle of luck here: LightRoom 3 that seems to be pretty happy on macOS 10.15.6.

In less than one day and a half I've moved from a broken multi-head virtualized workstation to bare metal PCs running up-to-date OSes on top of up-to-date hardware. Still MIA, the virtualization hardware to re-create my lab.

What saved me:
- backups
- preparedness and contingency plan
- backups again

Things to do:
- put the Hackintosh into a fanless case
- add an SSD for Time machine
- add second drive in Windows 10 PC for backups
- buy another NUC for virtualization lab
- buy missing software or find alternatives