Cognitive Overhead

19 avril 2024

Borg, Kopia, Restic: going further

In this final article, I'm going to talk about the little extras that can't be included in my comparison as a measurable criterion.

I'm going to talk briefly about graphical interfaces, because for some users, this can make a big difference. Application sizes are given for the macOS version.
Of the three programs, Restic has the richest and least mature ecosystem in terms of graphical user interfaces. There are several of them, but they're all independent projects more or less in the alpha or beta phase, and they don't all offer the same level of functionality.
I have only tested Restic Browser. This lightweight graphical interface (13 MB) is designed solely for browsing snapshots and restoring files. The application works perfectly and does exactly what's expected of it. Nothing to say, except that it's a very pleasant complement to a stand-alone backup script running on the user's machine.

Borg's main GUI project is called Vorta. It's a full-featured application that lets you fully configure backups, with scheduling, exclusion lists, compression parameters and more. This software weighs in at around 127 MB, but offers full functionality (configuration, scheduling, cleanup, restore, repository verification, etc.). On macOS, you'll just need to install the Fuse drivers to mount a remote archive locally. As Borg works without a configuration file, and without storing its configuration on the repository, it is unfortunately impossible for Vorta to retrieve the settings for backups, which are done on the command line. It can, however, connect to an existing repository, list archives, perform backups, restores and all other tasks. Just bear in mind that Vorta has no knowledge of the parameters you pass to Borg in any backup script. It's therefore a program that's best suited to stand-alone use: you set your backup/restore parameters directly in Vorta, and Vorta takes care of the rest.

In contrast, Kopia offers a totally unified backup experience between its command-line client and its KopiaUI interface. All backup, cleanup and other parameters are configured in a policy accessible to both environments. This means you can switch from the command line to the GUI, no questions asked. Still, KopiaUI weighs in at 313 MB and offers far less advanced functionality than Vorta. It is impossible, for example, to launch a verification task on a snapshot, or to make a comparison between two snapshots. Note that it is possible to display this user interface without even installing KopiaUI: in fact, kopia's command line integrates a server mode that displays exactly the same graphical interface in a Web browser.

In addition to graphical interfaces, there are "wrappers" that facilitate the use of Restic and Borg. For Kopia, this type of tool probably doesn't exist, or exists only marginally, due to its integrated management of backup policies. On the other hand, Restic and Borg do not benefit from complete configuration management, and these command-line "wrappers" can be very useful. I haven't tested one for Restic, but I've been using Borgmatic for some of my machines for several years. Using YAML files, this tool allows you to configure all backup, retention and other parameters.

Example of use:

@daily borgmatic -v 1 create --files prune ; borgmatic list
0 2 16-23 * 2 borgmatic -v 1 check
0 2 16-23 * 3 borgmatic -v 1 compact

kopia's server mode and Restic's REST-server mode make it easy to manage a multi-user backup repository. This is a type of operation that is not trivial to reproduce with Borg, as the latter requires users to be provided with SSH access. Of course, SSH access can be limited (must be!), but this entails management complications that go far beyond a simple login and password, as is the case with Restic and Kopia.
In multi-user mode, Kopia is superior to the REST-server. In fact, the latter only allows users to be managed via an .htpasswd file, whereas Kopia provides rather elaborate management of users and their access authorizations. I couldn't find any mention in the Restic documentation of its ability to share duplication when several users are backing up their data on the same server. However, this is an official feature of Kopia in server mode. It's important to note that this raises questions in terms of security. If users share a common repository, they can theoretically retrieve data from other users. Fortunately, this is only possible if the user knows the 256-bit hash corresponding to the file he wishes to retrieve. On the other hand, it could very well be used intentionally by Alice to share with Bob a file she has saved via Kopia. All Alice has to do is transmit the 256-bit hash to Bob, who can then restore it via his Kopia client.

Of these three backup solutions, only Borg doesn't offer a wide range of storage modes, including several in cloud environments that don't require deployment of the application. It is no doubt because of this difference that the market offers little or no storage for Restic or Kopia-based backups. There are a few commercial services offering storage for Borg backup repositories, such as rsync.net or borgbase.com. The latter is the only one, as far as I know, that also offers Restic backup hosting. I haven't found anything for Kopia, so don't hesitate to correct me in the comments. There are, however, a few turnkey solutions based on consumer NAS boxes, such as QNap, for which these three applications have been implemented.

To conclude this series of articles, I'll summarize the qualification tables for all three products here.

	Borg	Kopia	Restic
Portability	-	+	+
Storage options	-	+	+
Transport options	-	+	+
Multi-client storage	-	+	+
backup duration	-	+	-
network volume	+	+	-
retention management	+	-	+
Restore duration	-	-	+
Network volume	-	-	+
Cleanup duration	+	+	-
Network volume	-	+	+
Check duration	+	-	+
Local cache size	+	-	-
Final score	5	8	9

Please note that functional coverage (portability, storage and transport options, etc.) weighs heavily in the final score. If these aspects are of no importance to you, the final score will be significantly altered.

My analysis of the Borg, Kopia and Restic solutions is now complete. Unfortunately, I'm no further ahead than before, as these solutions are relatively equivalent and all have attractive aspects. I still haven't decided whether to replace Borg with something else and, if so, with what.

19 avril 2024

Borg, Kopia, Restic : aller plus loin

Dans ce dernier article, je vais aborder les petits à-côtés qui ne peuvent pas faire partie de ma comparaison en tant que critère mesurable.

Je vais rapidement parler des interfaces graphiques, car pour certains utilisateurs, cela peut faire une grande différence. Les tailles des applications sont données pour la version macOS.
Des trois logiciels, c'est Restic qui a l'écosystème le plus riche et le moins mature en terme d'interfaces graphiques de gestion. Il en existe plusieurs, mais ce sont des projets indépendants plus ou moins en phase alpha ou beta, et qui n'assurent pas tous un niveau de fonctionnalités équivalent.
Je n'ai testé que Restic Browser. Cette interface graphique très légère (13 Mo) a pour unique vocation de permettre de naviguer dans les snapshots et de faire des restaurations de fichiers. Cette application fonctionne parfaitement, fait exactement ce qu'on attend d’elle. Rien à dire, si ce n'est que c'est un complément très agréable à un script de sauvegarde autonome qui tournerait sur la machine de l'utilisateur.

Le principal projet d'interface graphique autour de Borg s'appelle Vorta. C'est une application complète qui permet de configurer entièrement des sauvegardes avec planification, liste d'exclusions, paramètres de compression, etc. Ce logiciel pèse autour des 127 Mo mais dispose des fonctionnalités complètes (configuration, planification, nettoyage, restauration, vérification du dépôt, etc.). Sur macOS, il faudra juste installer les drivers Fuse pouvoir monter localement une archive distante. Comme Borg fonctionne sans fichier de configuration, ni stockage de sa configuration sur le dépôt, il est malheureusement impossible à Vorta de récupérer le paramétrage des sauvegardes qui sont faites à la ligne de commandes. Il peut néanmoins se connecter à un dépôt existant en lister les archives, faire des sauvegardes, des restaurations et toutes autres tâches. Il faut juste garder l'esprit que Vorta n’aura pas connaissance des paramètres que vous passez à Borg dans un éventuel script de sauvegarde. C'est donc un logiciel qui est plutôt adapté à une utilisation autonome : vous paramétrez vos sauvegardes/nettoyages directement dans Vorta, et c'est lui qui se charge de les réaliser.

À l’inverse, Kopia propose une expérience de sauvegarde totalement unifiée entre son client en ligne de commande et son interface KopiaUI. L'ensemble des paramètres de sauvegarde de nettoyage etc. est configuré dans une politique accessible aux deux environnements. Il est donc possible de passer de la ligne de commande à l'interface graphique, sans se poser de questions. KopiaUI pèse tout de même 313 Mo tout en offrant des fonctionnalités beaucoup moins avancés que celle de Vorta. Impossible par exemple de lancer une tâche de vérification sur un snapshot, ou de faire une comparaison entre deux snapshots. Notez qu'il est possible d'afficher cette interface utilisateur sans même installer KopiaUI : en effet, Kopia en ligne de commande intègre un mode serveur qui permet d'afficher dans un navigateur Web exactement la même interface graphique.

Outre les interfaces graphiques, il existe des «wrappers» qui facilitent l'utilisation de Restic et Borg. Pour Kopia, ce type d'outils n'existe probablement pas, ou de manière totalement marginale, en raison de sa gestion intégrée des politiques de sauvegarde. Par contre Restic et Borg ne bénéficient pas de gestion de configuration complète et ces «wrappers» pour la ligne de commandes peuvent être d'une grande utilité. Je n'en ai pas testé pour Restic, j'utilise par contre depuis plusieurs années Borgmatic pour certaines de mes machines. Cet outil permet, au travers de fichiers YAML, de configurer l'intégralité des paramètres de sauvegarde, rétention etc.

Exemple d'utilisation :

@daily	borgmatic -v 1 create --files prune ; borgmatic list
0 2 16-23 * 2 borgmatic -v 1 check
0 2 16-23 * 3 borgmatic -v 1 compact

Les modes de fonctionnement serveur de kopia et REST-server de Restic permettent assez facilement de gérer un dépôt de sauvegardes multi-utilisateur. C'est un type de fonctionnement qu'il n'est pas trivial de reproduire avec Borg car ce dernier nécessite de fournir aux utilisateurs un accès SSH. Bien sûr cet accès SSH peut être limité (doit !), mais cela entraîne des complications de gestion qui vont bien au-delà d'un simple login et mot de passe comme pour Restic et Kopia.
En fonctionnement multi-utilisateurs Kopia est supérieur au REST-server. En effet, ce dernier ne permet la gestion des utilisateurs qu'au travers d'un fichier .htpasswd alors que Kopia fournit une gestion des utilisateurs et des autorisations d'accès de ces derniers plutôt élaborée. Je n'ai pas trouvé de mention dans la documentation de Restic de la capacité de ce dernier à partager la duplication lorsque plusieurs utilisateurs sauvegardant leurs données sur le même serveur. C'est par contre une fonctionnalité tout à fait officielle de Kopia en mode serveur. Il est important de noter que cela pose question en terme de sécurité. En effet, si les utilisateurs partagent un dépôt commun, alors ils peuvent théoriquement récupérer les données des autres utilisateurs. Cela n'est heureusement possible que si l'utilisateur a connaissance du hash de 256 bits correspondant au fichier qu'il souhaite récupérer. Par contre, cela peut fort bien être utilisé intentionnellement par Alice pour partager avec Bob un fichier qu'elle a sauvegardé via Kopia. Il suffit à Alice de transmettre le hash de 256 bits à Bob pour que ce dernier puisse le restaurer au travers de son client Kopia.

Parmi ces trois solutions de sauvegarde, seul Borg ne propose pas de nombreux modes de stockage dont plusieurs dans des environnements cloud qui ne nécessitent pas le déploiement de l'application. C'est sans doute à cause de cette différence que le marché ne propose pas, ou peu, d'offres de stockage pour les sauvegardes basées sur Restic ou Kopia. Il existe quelques services commerciaux proposant le stockage de dépôts de sauvegarde Borg, par exemple rsync.net ou borgbase.com. Ce dernier est d'ailleurs le seul, à ma connaissance, à proposer aussi l'hébergement de sauvegardes Restic. Je n'ai rien trouvé pour Kopia, n'hésitez pas à me corriger en commentaire. Il existe cela dit quelques solutions clé-en-main à base de boîtier NAS grand public comme par exemple QNap pour lequel ces trois applications ont été implémentées.

Pour conclure avec cette série d'articles, je vais résumer ici l'intégralité des tableaux de qualification des trois produits.

	Borg	Kopia	Restic
Portabilité	-	+	+
Options de stockage	-	+	+
Options de transport	-	+	+
Dépôt multi-client	-	+	+
durée des sauvegardes	-	+	-
volume réseau	+	+	-
gestion de la rétention	+	-	+
Durée des restaurations	-	-	+
Volume réseau	-	-	+
Durée des nettoyages	+	+	-
Volume réseau	-	+	+
Durée des vérifications	+	-	+
Taille du cache local	+	-	-
Score final	5	8	9

Attention à cette notation, vous remarquerez que la couverture fonctionnelle (portabilité, options de stockage et de transport etc.) pèse énormément dans la note finale. Si ces aspects n'ont pas d'importance pour vous, le résultat final s’en trouve largement modifié.

Mon analyse des solutions Borg, Kopia et Restic est désormais terminée. Malheureusement, je ne suis pas plus avancé qu’avant car ces solutions sont relativement équivalentes et ont toutes des aspects séduisants. Je n'ai toujours pas décidé si j'allais remplacer Borg par autre chose et, si oui, par quoi.

9 avril 2024

Borg, Kopia, Restic: restoration and maintenance

[This is the English translation by DeepL]
[Version originale en français]

Restoration

From the previous article, we know how the backup process works in Borg, Kopia and Restic. Now we'll take a look at restoration and maintenance.
Restoration is a stage in the life of a backup, which - when all goes well - is never used. So it's understandable that the benchmark for this stage isn't overly interesting or representative. Tracing was done in the original backup script, but restoration was not performed every day.
The test consisted in restoring a 39 MB zip archive from the oldest snapshot available in the archive, as well as restoring a Library/Preferences directory from the oldest of the last 10 snapshots. The size of the directory varies from one backup to the next, but is generally around 290 MB for 951 directories and ~15K files.

During this restoration test, the difference between the three programs is once again very marked. Borg receives around 575 MB of data from the server. It sends around 6 MB of data. Kopia is a slightly better pupil, performing the same restoration with only 450 MB of data received and around 20 MB sent. But it's Restic that has the smallest network footprint, managing to perform the expected full restore with just 120 MB of data received and 0.5 MB of data sent.

	Borg	Kopia	Restic
avg in	-575.00 MB	-447.76 MB	-120.82 MB
avg out	5.92 MB	20.59 MB	0.52 MB

In terms of restoration time, Restic is also in first place, with a task completed in 8 seconds, while Kopia takes between 12 and 14 seconds and Borg between 18 and 19 seconds. These are still relatively short times, and although the differences may be significant in proportion, they remain fairly small in absolute terms.

As far as restoration is concerned, Restic is the clear winner, well ahead of Kopia. Borg comes last, not least because of its network consumption.

	Borg	Kopia	Restic
restoration time	-	-	+
network volume	-	-	+

Maintenance

The first maintenance task, and also the easiest to measure, is the "plum" phase. This task aims to remove obsolete archives, snapshots and restore points. It is essential to ensure that the backup repository does not explode out of control. It is this step that is responsible for ensuring compliance with the user's retention policy.
I detailed in my previous article how retention management can be full of surprises. I won't return to that particular point. Here, I'll focus on the resource consumption involved in these operations.
Note that Kopia spontaneously performs the cleanup operation when a new backup is launched, without needing to be asked to do so. To be able to measure this operation independently, I had to schedule it just before the backup operation (and not after, as initially planned).

The bandwidth consumption of the cleanup operation is surprising, to say the least. Borg is the software that stands out here for its great regularity, with around 180 MB of incoming data and 1.3 MB of outgoing data. For the other two, the situation is much more chaotic. Kopia seems to behave in a fairly stable way over time, with incoming data extremely limited to ~92 Kb and outgoing data to a volume of around 70 Kb, but there are some huge anomalies, with certain clean-up operations showing incoming transfers approaching 1.4 GB and outgoing transfers close to 1 GB. It seems that these peaks can be attributed to other maintenance tasks (see below).
Restic also shows great irregularity in the volumes of data exchanged with the server during cleaning operations. Nevertheless, volumes remain under control, well below the 40 MB mark for outgoing data and the 70 MB mark for incoming data.

4-day figures, mid-March

	Borg	Kopia	Restic
avg in	-175.65 MB	-62.02 MB	-24.74 MB
avg out	1.34 MB	42.23 MB	27.55 MB
median in	-181.02 MB	-95 KB	-16.97 MB
median out	1.31 MB	70 KB	28.01 MB
perc98 in	-190.17 MB	-879.78 MB	-66.49 MB
perc98 out	2.27 MB	599.28 MB	34.52 MB

Over the duration of the cleaning stages, Borg and Kopia are relatively well-placed, with Kopia showing some significant peaks and Borg a certain regularity. Restic, on the other hand, is in last place, with a significant gap to these two competitors.

4-day figures, mid-March

	avg	median	perc98
Borg	9.30 s	9.00 s	14.90 s
Kopia	4.41 s	2.00 s	33.00 s
Restic	113.50 s	126.00 s	140.20 s

To conclude on the subject of cleaning, I'll give first place to Kopia, as the anomalies observed are most certainly due to automatic snapshot/repository verification operations, which are handled automatically by this software.

	Borg	Kopia	Restic
cleaning time	+	+	-
network volume	-	+	+

Other maintenance tasks relate to checking backups and the backup repository. Like cleaning, checking snapshots and the health of the repository is handled automatically by Kopia (at least in theory). For Borg and Restic, these operations have to be planned in addition to the rest.
These operations are considered rare, some even exceptional (such as depot repair operations). I have therefore decided not to measure them over the long term, nor to draw any statistics from them. I've simply taken a few measurements on specific tasks.

Borg

$ time borg check --rsh "${SSH}" ${HOMEREPO}

real 8m36.258s
user 4m49.125s
sys 0m25.353s

This operation consumes 100% CPU on the client side (python) and 0% CPU on the server side.

$ time borg check --verify-data --rsh "${SSH}" ${HOMEREPO}

real 26m0.151s
user 15m9.882s
sys 2m37.629s

This operation has phases with 100% CPU on the client side (python) and 1% CPU on the server side (python) and other phases with 50% python + 35% ssh on the client side and 50-60% sshd + 30-40% python on the server side.

$ time borg check --repair --repository-only --rsh "${SSH}" ${HOMEREPO}
This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES

real 3m29.619s
user 0m0.252s
sys 0m0.088s

The latter loads almost exclusively the server: 0% CPU on the client side and 85% CPU on the server side.
Note that this repair was carried out in a vacuum, as the repository is in perfect health.

Kopia

$ time kopia maintenance run
Running full maintenance...
Looking for active contents...
Looking for unreferenced contents...
GC found 16436 unused contents (1.3 GB)
GC found 16550 unused contents that are too recent to delete (1.2 GB)
GC found 409930 in-use contents (73.3 GB)
GC found 28 in-use system-contents (71.6 KB)
Previous content rewrite has not been finalized yet, waiting until the next blob deletion.
Found safe time to drop indexes: 2024-04-06 15:51:09.649426 +0200 CEST
Dropping contents deleted before 2024-04-06 15:51:09.649426 +0200 CEST
Looking for unreferenced blobs...
deleted 100 unreferenced blobs (2 GB)
Deleted total 138 unreferenced blobs (2.5 GB)
Cleaning up old index blobs which have already been compacted...
Cleaned up 8 logs.
Finished full maintenance.

real 0m9.883s
user 0m15.166s
sys 0m6.066s

$ time kopia maintenance run
Running full maintenance...
Looking for active contents...
Looking for unreferenced contents...
GC found 0 unused contents (0 B)
GC found 18882 unused contents that are too recent to delete (1.6 GB)
GC found 409930 in-use contents (73.3 GB)
GC found 23 in-use system-contents (56 KB)
Rewriting contents from short packs...
Total bytes rewritten 649.4 MB
Found safe time to drop indexes: 2024-04-06 15:51:09.649426 +0200 CEST
Dropping contents deleted before 2024-04-06 15:51:09.649426 +0200 CEST
Skipping blob deletion because not enough time has passed yet (59m59s left).
Cleaning up old index blobs which have already been compacted...
Cleaned up 0 logs.
Finished full maintenance.

real 0m27.121s
user 0m29.134s
sys 0m11.207s

Note the difference between the two runs, which are only a few seconds apart.

$ time kopia snapshot verify --verify-files-percent=10 --file-parallelism=10 --parallel=10
Listing blobs...
Listed 4141 blobs.
Processed 0 objects.
Processed 12137 objects.
Processed 17883 objects.
Processed 20581 objects.
../..
Processed 365793 objects.
Processed 368714 objects.
Finished processing 389361 objects.

real	3m26.325s
user	0m52.439s
sys	0m48.059s

Kopia has the interesting feature of being able to check only a percentage of the files in a backup (in this case 10%).
CPU consumption is distributed as follows: 30-60% CPU on client and server 80-90% CPU for sshd + 15-20% CPU for sftpd.

Checking 100% of files simply lengthens the operation:

$ time kopia snapshot verify --verify-files-percent=100 --file-parallelism=10 --parallel=10
Listing blobs...
Listed 4034 blobs.
Processed 0 objects.
Processed 4566 objects.
Processed 4577 objects.
Processed 4594 objects.
../..
Processed 372956 objects.
Processed 373370 objects.
Finished processing 393794 objects.

real	15m54.926s
user	5m37.337s
sys	4m3.843s

Restic

$ time restic -r rest:http://192.168.0.22:8000/ check
using temporary cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-2552920101
repository 9c062709 opened (version 2, compression level auto)
created new cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-2552920101
create exclusive lock for repository
load indexes
[0:00] 100.00%  6 / 6 index files loaded
check all packs
check snapshots, trees and blobs
[3:46] 100.00%  55 / 55 snapshots
no errors were found

real	3m48.649s
user	2m24.891s
sys	0m31.295s

$ time restic -r rest:http://192.168.0.22:8000/ check --read-data
using temporary cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-448461585
repository 9c062709 opened (version 2, compression level auto)
created new cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-448461585
create exclusive lock for repository
load indexes
[0:00] 100.00%  6 / 6 index files loaded
check all packs
check snapshots, trees and blobs
[3:46] 100.00%  55 / 55 snapshots
read all data
[10:36] 100.00%  4588 / 4588 packs
no errors were found

real	14m23.868s
user	6m55.821s
sys	1m45.116s

Data verification consumes 45-55% CPU on the client side and around 67% CPU on the server side.

$ time restic -r rest:http://192.168.0.22:8000/ repair snapshots
repository 9c062709 opened (version 2, compression level auto)
found 1 old cache directories in /Users/patpro/Library/Caches/restic, run `restic cache --cleanup` to remove them
[0:00] 100.00%  6 / 6 index files loaded

snapshot 09e89c9e of [/Users/patpro] at 2024-04-04 11:38:02.747322 +0200 CEST)

snapshot 0cf2fd62 of [/Users/patpro] at 2024-04-01 19:20:10.583797 +0200 CEST)

../..

snapshot fbeb283e of [/Users/patpro] at 2024-03-17 19:17:04.390071 +0100 CET)

snapshot fec2f8b1 of [/Users/patpro] at 2024-04-07 17:15:45.588862 +0200 CEST)

no snapshots were modified

real	3m22.149s
user	3m15.453s
sys	0m9.031s

Here, CPU consumption is around 100% on the client side and 0% on the server side. In principle, it works entirely on the client from the local cache. If the local cache is unavailable, Restic rebuilds it and server CPU consumption peaks at over 80%.

Overview
Relative to the number of snapshots/archives and the size of the repositories, the exhaustive verification times are as follows:

	Borg	Kopia	Restic
Number of snapshots	72	26	55
Repository size	78.4 GB	74.5 GB	80.5 GB
Per snapshot	22s	37s	18s
Per GB	19.9s	12.9s	10.7s

Borg and Restic are fairly close in terms of performance when it comes to verifying backed-up data, but Borg is a hair slower while consuming more CPU on the client. In the end, Kopia is much slower than the other two.

I briefly mentioned Restic above, and the same applies to the other two: these programs use a local cache system to speed up some operations. The size of this cache can be very important in some contexts, and seems to be proportional to the number of unique data blocks saved on the repository (at least this is the case for Borg). I'm unable to present reliable figures for Borg due to an oversight on my part: having not relocated Borg's cache for my experiment, it got mixed up with the cache of my other Borg repositories (those of my real backups), so its volume is not comparable with that of other softwares.
That said, here's some information:

Borg cache size, all mixed up	7.1 GB
Borg cache size, test, late isolated	828 MB
Kopia cache size	6.2 GB
Restic cache size	11 GB

The "Borg cache size, test, isolated late" corresponds to a cache for Borg, dedicated to my test script, created 2 days earlier. It immediately reached 828 MB and hasn't grown since.
In any case, Restic is the big loser here, and if we relate the size of the cache to the number of snapshots/archives on the repository, then Borg is the winner.

	Borg	Kopia	Restic
Check duration	+	-	+
Local cache size	+	-	-

During this experiment, I was pleased to discover that Kopia manages maintenance operations for the user, which the other two programs have to launch explicitly. I also found Kopia's option to check a fraction of the files interesting. Finally, I like the fact that Restic offers an option for the cleaning operation, allowing the user to vary the cursor between performance and storage optimization.

9 avril 2024

Borg, Kopia, Restic : restauration et maintenance

Restauration

Depuis l’article précédent nous savons comment se déroule la sauvegarde dans Borg, Kopia et Restic. Nous allons maintenant examiner la restauration et la maintenance.
La restauration est une étape de la vie d'une sauvegarde, qui — quand tout se passe bien — n’est jamais utilisée. Aussi, il est tout à fait compréhensible que le benchmark de cette étape ne soit pas excessivement intéressant et représentatif. La prise de trace est faite dans le script de sauvegarde original mais la restauration n'a pas été exécutée tous les jours.
Le test a consisté en la restauration d'une archive zip de 39 Mo à partir du plus ancien snapshot disponible dans les archives, ainsi que la restauration d'un répertoire Library/Preferences à partir du plus ancien des 10 derniers snapshots. La taille du répertoire est variable d'une sauvegarde à l'autre, mais tourne globalement autour des 290 Mo pour 951 répertoires et ~15K fichiers.

Pendant ce test de restauration, la différence entre les trois logiciels est à nouveau très marquée. Borg reçoit de la part du serveur autour de 575 Mo de données. Il envoie autour de 6 Mo de données. Kopia est un peu meilleur élève puisqu'il va exécuter la même restauration avec seulement 450 Mo de données en réception et autour de 20 Mo en émission. Mais c'est ici Restic qui a l'empreinte réseau la plus réduite, puisqu'il parvient à faire la restauration complète attendue avec seulement 120 Mo de données en réception et 0,5 Mo de données en émission.

	Borg	Kopia	Restic
avg in	-575,00 Mo	-447,76 Mo	-120,82 Mo
avg out	5,92 Mo	20,59 Mo	0,52 Mo

Sur la durée de la restauration, Restic est aussi premier, avec une tâche qui se conclut en 8 secondes là où Kopia tourne entre 12 et 14 secondes et Borg, entre 18 et 19 secondes. On reste sur des durées relativement courtes et même si ces différences peuvent être importantes en proportion cela reste assez faible en valeur absolue.

Pour la restauration, Restic est clairement le vainqueur, assez loin devant Kopia. Borg est bon dernier, en particulier à cause de sa consommation réseau.

	Borg	Kopia	Restic
Durée des restaurations	-	-	+
Volume réseau	-	-	+

Maintenance

La première tâche de maintenance, aussi la plus simple à mesurer, est la phase de «prune». Cette tâche vise à supprimer les archives, snapshots et points de restauration obsolètes. Elle est essentielle pour que le dépôt des sauvegardes ne voit pas sa volumétrie exploser hors de contrôle. C'est cette étape qui est responsable du respect de la politique de rétention que l'utilisateur souhaite appliquer.
J'ai détaillé dans mon article précédent comment la gestion de la rétention pouvait être pleine de surprises. Je ne reviens pas sur ce point en particulier. Je vais m'intéresser ici à la consommation de ressources lors de ces opérations.
À noter que Kopia fait spontanément l'opération de nettoyage au lancement d'une nouvelle sauvegarde sans qu'il soit nécessaire de lui demander. Pour pouvoir mesurer cette opération de manière indépendante j'ai dû la planifier juste avant l'opération de sauvegarde (et pas après, comme initialement prévu).

La consommation de bande passante par l'opération de nettoyage est pour le moins surprenante. Borg est ici le logiciel qui se distingue par sa grande régularité avec environ 180 Mo de données entrantes et 1,3 Mo de données sortantes. Pour les deux autres, la situation est beaucoup plus chaotique. Kopia semble se comporter de manière plutôt stable dans le temps avec des données entrantes extrêmement limitées à ~92 Ko et des données sortantes pour un volume d'environ 70 ko, mais on constate la présence d'anomalies énormes, avec certaines opérations de nettoyage affichant des transferts entrants se rapprochant de 1,4 Go et des transferts sortants proches de 1 Go. Il semble que ces pics soient attribuables à d’autres tâches de maintenance (voir plus loin).
Restic aussi montre une grande irrégularité dans les volumétries de données échangées avec le serveur lors des opérations de nettoyage. Néanmoins, les volumes restent maîtrisés, largement sous la barre des 40 Mo en émission et sous la barre des 70 Mo en réception.

Chiffres sur 4 jours, mi-mars

	Borg	Kopia	Restic
avg in	-175,65 Mo	-62,02 Mo	-24,74 Mo
avg out	1,34 Mo	42,23 Mo	27,55 Mo
median in	-181,02 Mo	-95 Ko	-16,97 Mo
median out	1,31 Mo	70 Ko	28,01 Mo
perc98 in	-190,17 Mo	-879,78 Mo	-66,49 Mo
perc98 out	2,27 Mo	599,28 Mo	34,52 Mo

Sur la durée des étapes de nettoyage, Borg et Kopia sont relativement bien placés avec bien évidemment quelques pics importants pour Kopia et une certaine régularité pour Borg. Restic, quant à lui, est bon dernier avec un écart important avec ces deux concurrents.

Chiffres sur 4 jours, mi-mars

	avg	median	perc98
Borg	9.30 s	9.00 s	14.90 s
Kopia	4.41 s	2.00 s	33.00 s
Restic	113.50 s	126.00 s	140.20 s

Pour conclure sur le nettoyage je laisse la première place à Kopia car les anomalies constatées sont très certainement dues à des opérations de vérification automatique des snapshots/du dépôt qui sont gérées automatiquement par ce logiciel.

	Borg	Kopia	Restic
Durée des nettoyages	+	+	-
Volume réseau	-	+	+

Les autres tâches de maintenance sont liées à la vérification des sauvegardes et du dépôt des sauvegardes. Tout comme le nettoyage, la vérification des snapshots et de l’état de santé du dépôt est gérée automatiquement par Kopia (du moins en théorie). Pour Borg et Restic ce sont des opérations à planifier en plus du reste.
Ces opérations sont réputées rares, certaines même exceptionnelles (comme les opérations de réparation de dépôt). Aussi j’ai décidé de ne pas les mesurer sur le long terme, ni d’en tirer des statistiques. J’ai simplement pris quelques mesures sur des tâches spécifiques.

Borg

$ time borg check --rsh "${SSH}" ${HOMEREPO}

real	8m36.258s
user	4m49.125s
sys	0m25.353s

Cette opération consomme à la louche 100% CPU côté client (python) et 0% CPU côté serveur.

$ time borg check --verify-data --rsh "${SSH}" ${HOMEREPO}

real	26m0.151s
user	15m9.882s
sys	2m37.629s

Cette opération a des phases à 100% CPU côté client (python) et 1% CPU côté serveur (python) et d’autres phases à 50% python + 35% ssh côté client et 50-60% sshd + 30-40% python côté serveur.

$ time borg check --repair --repository-only --rsh "${SSH}" ${HOMEREPO}
This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES

real	3m29.619s
user	0m0.252s
sys	0m0.088s

Cette dernière charge presque uniquement le serveur : 0% CPU côté client et 85% CPU côté serveur.
À noter que cette réparation s’est faite dans le vide car le dépôt est en parfaite santé.

Kopia

real 0m9.883s
user 0m15.166s
sys 0m6.066s

real 0m27.121s
user 0m29.134s
sys 0m11.207s

Notez la différence entre les deux exécutions que séparent seulement quelques secondes.

$ time kopia snapshot verify --verify-files-percent=10 --file-parallelism=10 --parallel=10
Listing blobs...
Listed 4141 blobs.
Processed 0 objects.
Processed 12137 objects.
Processed 17883 objects.
Processed 20581 objects.
../..
Processed 365793 objects.
Processed 368714 objects.
Finished processing 389361 objects.

real	3m26.325s
user	0m52.439s
sys	0m48.059s

Kopia dispose de la capacité intéressante de pouvoir vérifier seulement un pourcentage des fichiers d’une sauvegarde (ici 10%).
La consommation CPU se répartie comme suit : 30 à 60% CPU sur le client et sur le serveur 80-90% CPU pour sshd + 15-20% CPU pour sftpd.

Une vérification de 100% des fichiers allonge simplement l’opération :

$ time kopia snapshot verify --verify-files-percent=100 --file-parallelism=10 --parallel=10
Listing blobs...
Listed 4034 blobs.
Processed 0 objects.
Processed 4566 objects.
Processed 4577 objects.
Processed 4594 objects.
../..
Processed 372956 objects.
Processed 373370 objects.
Finished processing 393794 objects.

real	15m54.926s
user	5m37.337s
sys	4m3.843s

Restic

$ time restic -r rest:http://192.168.0.22:8000/ check
using temporary cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-2552920101
repository 9c062709 opened (version 2, compression level auto)
created new cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-2552920101
create exclusive lock for repository
load indexes
[0:00] 100.00%  6 / 6 index files loaded
check all packs
check snapshots, trees and blobs
[3:46] 100.00%  55 / 55 snapshots
no errors were found

real	3m48.649s
user	2m24.891s
sys	0m31.295s

$ time restic -r rest:http://192.168.0.22:8000/ check --read-data
using temporary cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-448461585
repository 9c062709 opened (version 2, compression level auto)
created new cache in /var/folders/1t/fbr0pvvc8xjg8f006s7cd4280000gp/T/restic-check-cache-448461585
create exclusive lock for repository
load indexes
[0:00] 100.00%  6 / 6 index files loaded
check all packs
check snapshots, trees and blobs
[3:46] 100.00%  55 / 55 snapshots
read all data
[10:36] 100.00%  4588 / 4588 packs
no errors were found

real	14m23.868s
user	6m55.821s
sys	1m45.116s

La vérification des données consomme 45 à 55% CPU côté client et autour de 67% CPU côté serveur.

$ time restic -r rest:http://192.168.0.22:8000/ repair snapshots
repository 9c062709 opened (version 2, compression level auto)
found 1 old cache directories in /Users/patpro/Library/Caches/restic, run `restic cache --cleanup` to remove them
[0:00] 100.00%  6 / 6 index files loaded

snapshot 09e89c9e of [/Users/patpro] at 2024-04-04 11:38:02.747322 +0200 CEST)

snapshot 0cf2fd62 of [/Users/patpro] at 2024-04-01 19:20:10.583797 +0200 CEST)

../..

snapshot fbeb283e of [/Users/patpro] at 2024-03-17 19:17:04.390071 +0100 CET)

snapshot fec2f8b1 of [/Users/patpro] at 2024-04-07 17:15:45.588862 +0200 CEST)

no snapshots were modified

real	3m22.149s
user	3m15.453s
sys	0m9.031s

Ici la consommation CPU est d’environ 100% côté client et 0% côté serveur. À priori il travaille entièrement sur le client à partir du cache local. Si le cache local n’est pas disponible, Restic le reconstruit et la consommation CPU du serveur fait alors des pics jusqu’à plus de 80%.

Bilan
Rapportées au nombre de snapshots / archives et à la taille des dépôts, les durées de vérifications exhaustives sont les suivantes :

	Borg	Kopia	Restic
Nombre de snapshots	72	26	55
Volume du dépôt	78,4 Go	74,5 Go	80,5 Go
Par snapshot	22s	37s	18s
Par Go	19,9s	12,9s	10,7s

Borg et Restic sont assez proches en terme de performance quand il s’agit de vérifier les données sauvegardées mais Borg est un poil plus lent tout en consommant plus de CPU sur le client. Kopia est finalement bien plus lent que les deux autres.

Je l’ai évoqué rapidement un peu plus haut dans le cas de Restic et c’est valable pour les deux autres : ces logiciels utilisent un système de cache local pour accélérer certaines opérations. La volumétrie de ce cache peut avoir une forte importance dans certains contextes et semble proportionnelle à la quantité de blocs de donnée uniques sauvegardés sur le dépôt (c’est au moins le cas pour Borg). Je ne suis pas en mesure de présenter des données chiffrées fiables pour Borg en raison d’un oubli de ma part : n’ayant pas relocalisé le cache de Borg pour mon expérience, il s’est mélangé avec le cache de mes autres dépôts Borg (ceux de mes vraies sauvegardes), si bien que sa volumétrie n’est pas comparable avec celle des autres logiciels.
Ceci dit, voici tout de même quelques informations :

Volume du cache de Borg, tout confondu	7,1 Go
Volume du cache de Borg, test, isolé tardivement	828 Mo
Volume du cache de Kopia	6,2 Go
Volume du cache de Restic	11 Go

Le «Volume du cache de Borg, test, isolé tardivement» correspond à un cache pour Borg, dédié à mon script de test, créé 2 jours auparavant. Il a immédiatement atteint les 828 Mo et n’a pas grossi depuis.
Dans tous les cas, Restic est ici le grand perdant, et si on ramène la taille du cache au nombre de snapshots / archives sur le dépôt, alors Borg est vainqueur.

	Borg	Kopia	Restic
Durée des vérifications	+	-	+
Taille du cache local	+	-	-

Pendant cette expérience j’ai apprécié de découvrir que Kopia gère pour l’utilisateur des opérations de maintenance qu’il faut lancer explicitement sur les deux autres logiciel. Je trouve aussi que l’option de Kopia permettant la vérification d’une fraction des fichiers est intéressante. Pour finir j’apprécie que Restic propose une option pour l’opération de nettoyage permettant de faire varier le curseur entre performance et optimisation du stockage.

28 mars 2024

Running Splunk forwarder on a FreeBSD 14 host

Few months ago I discovered that Splunk did not bother updating its forwarder to support FreeBSD 14. It’s a real PITA for many users, including myself. After asking around for support about that problem and seeing Splunk quietly ignoring the voice of its users, I’ve decided to try and run the Linux version on FreeBSD.

Executive summary: it works great on both FreeBSD 14 and 13, but with some limitations.

A user like me has few options:

(re)check if you really need a local log forwarder (for everything that is not handled by syslog), if you don’t, just ditch the Splunk forwarder and tune syslogd to send logs to a Splunk indexer directly
find an alternate solution that suits you: very hard is you have a full Splunk ecosystem or if, like me, you really are addicted to Splunk
Run the Linux version on FreeBSD: needs some skills but works great so far

Obviously, I’m fine with the latest.

Limitations

You will run a proprietary Linux binary on a totally unsupported environment: you are on your own & it can break anytime, either because of FreeBSD, or because of Splunk.

You will run the Splunk forwarder inside a chroot environment: your log files will have to be available inside the chroot, or Splunk won’t be able to read them. ~~Also, no ACL residing on your FreeBSD filesystem will be available to the Linux chroot, so you must not rely on ACLs to grant Splunk access to your log files~~. This latest statement is partially wrong. You can rely on FreeBSD ACLs but it might require some tweaks on the user/group side.

How to

Below you’ll find a quick&dirty step by step guide that worked for me. Not everything will be detailed or explained and YMMV.

First step is to install a Linux environment. You must activate the Linux compatibility feature. I’ve used both Debian and Devuan successfully. Here is what I’ve done for Devuan:

zfs create -o mountpoint=/compat/devuan01 sas/compat_devuan01 
curl -OL https://git.devuan.org/devuan/debootstrap/raw/branch/suites/unstable/scripts/ceres
mv ceres /usr/local/share/debootstrap/scripts/daedalus
curl -OL https://files.devuan.org/devuan-archive-keyring.gpg
mv devuan-archive-keyring.gpg /usr/local/share/keyrings/
ln -s /usr/local/share/keyrings /usr/share/keyrings
debootstrap daedalus /compat/devuan01

This last step should fail, it seems that it’s to be expected. Following that same guide:

chroot /compat/devuan01 /bin/bash
dpkg --force-depends -i /var/cache/apt/archives/*.deb
echo "APT::Cache-Start 251658240;" > /etc/apt/apt.conf.d/00chroot
exit

Back on the host, add what you need to /etc/fstab:

# Device        Mountpoint              FStype          Options                      Dump    Pass#
devfs           /compat/devuan01/dev      devfs           rw,late                      0       0
tmpfs           /compat/devuan01/dev/shm  tmpfs           rw,late,size=1g,mode=1777    0       0
fdescfs         /compat/devuan01/dev/fd   fdescfs         rw,late,linrdlnk             0       0
linprocfs       /compat/devuan01/proc     linprocfs       rw,late                      0       0
linsysfs        /compat/devuan01/sys      linsysfs        rw,late                      0       0

and mount all, then finish install:

mount -al
chroot /compat/devuan01 /bin/bash
apt update
apt install openrc
exit

Make your log files available inside the chroot:

mkdir -p /compat/debian_stable01/var/hostnamedlog
mount_nullfs /var/named/var/log /compat/debian_stable01/var/hostnamedlog
mkdir -p /compat/debian_stable01/var/hostlog
mount_nullfs /var/log /compat/debian_stable01/var/hostlog

Note: /var/named/var/log and /var/log are ZFS filesystems. You’ll have to make the nullfs mounts permanent by adding them in /etc/fstab.

Now you can install the Splunk forwarder:

chroot /compat/devuan01 /bin/bash
ln -sf /usr/share/zoneinfo/Europe/Paris /etc/localtime
useradd -m splunkfwd
export SPLUNK_HOME="/opt/splunkforwarder"
mkdir $SPLUNK_HOME
echo /opt/splunkforwarder/lib >/etc/ld.so.conf.d/splunk.conf 
ldconfig
apt install curl
dpkg -i splunkforwarder_package_name.deb
/opt/splunkforwarder/bin/splunk enable boot-start -systemd-managed 0 -user splunkfwd
exit

Note: splunk enable boot-start -systemd-managed 0 activates the Splunk service as an old-school init.d service. systemd is not available in the context of a Linux chroot on FreeBSD.

Now from the host, grab your config files and copy them in your Linux chroot:

cp /opt/splunkforwarder/etc/system/local/{inputs,limits,outputs,props,transforms}.conf /compat/devuan01/opt/splunkforwarder/etc/system/local/

Then edit /compat/devuan01/opt/splunkforwarder/etc/system/local/inputs.conf accordingly: in my case it means I must replace /var/log by /var/hostlog and /var/named/var/log by /var/hostnamedlog.

Go back to your Devuan and start Splunk:

chroot /compat/devuan01 /bin/bash
service splunk start
exit

To do

I still need to figure out how to properly start the service from outside the chroot (when FreeBSD boots). No big deal.

23 mars 2024

Borg, Kopia, Restic: backup and resource utilization

[This is the English translation by DeepL]
[Version originale en français]

In this article, I'm going to take a closer look at backup-related metrics. In particular, those that are easy to measure: backup execution time and network transfer volumes. CPU consumption is not easy to measure on the test platform and, in my context, is of little importance. Measuring I/O on storage could have been interesting, but as the backup destination disk is shared with other uses, it wasn't a metric that could be recovered during my tests.

Without going into too much detail on the methodology, here's how I proceeded. On the client side, the backup script places time-stamped lines in a log file (start of backup for Borg, end of backup for Borg, start of purge for Borg, etc.). On the server side, I used tcpdump to record network traffic. The PCap files were transformed in tshark to obtain the list of network conversations. The time-stamped list and the client log file were then injected into Splunk. I thus obtain the time-stamped traces corresponding to the start and end of each backup task, with the number of packets and data volumes exchanged between client and server as a sandwich.
At each iteration of the backup script, I also collect the volume of the backup repositories.

The first statistics I'd like to detail here relate to the creation of the first backup. This already reveals a major difference between the three solutions:

For an initial volume of 145 GB

	duration	vol in	vol out	storage
Borg	1118 s	1.2 GB	52 GB	62 GB
Kopia	716 s	1 GB	39 GB	75 GB
Restic	583 s	0.7 GB	32.2 GB	58 GB

In terms of creation time, Restic and Borg vary by a factor of two. This is quite significant. Restic is configured to open four concurrent read streams, which will obviously maximize the use of all resources (network, IO). Restic also uses its HTTP server for transport. This enables it to send fewer packets than Borg or Kopia (33.1 million for Restic versus 54.6 million for Kopia). The result is less data in transit than the other two (fewer packets = less overhead).
Note that at this stage of my experiment I hadn't activated Kopia's compression (inactive by default), but a priori the integrated ssh client compresses streams. Compression is also disabled on the destination ZFS storage.

I have a few theories but no real explanation for the difference in volume on arrival, between the software on the one hand and with the volume of data in transit on the other.

Subsequent backups are much shorter, since only data modified since the previous backup is sent to the server. In the long run, Restic loses its advantage over Kopia and comes close to Borg's times.

	Avg	Median	perc98
Borg	30.54 s	30.00 s	38.86 s
Kopia	8.53 s	8.00 s	13.86 s
Restic	24.96 s	25.00 s	30.00 s

Kopia's advantage is truly impressive.

Over a month or so of measurements, the average duration of Borg and Restic backups has even shown an upward trend, while that of Kopia has shown a downward trend (4 periods of 7 days, 1 of 4 days, 295 backups for each program):

	Borg	Kopia	Restic
2024-02-19	29.07	8.76	23.72
2024-02-26	31.28	9.08	25.57
2024-03-04	31.59	8.26	25.29
2024-03-11	31.36	7.77	26.34
2024-03-18	32.38	8.38	26.09

In terms of backup times, Kopia is far superior to its competitors. In terms of network exchange volumes, Kopia is also the best, but by a much smaller margin. It ranks just ahead of Borg and far ahead of Restic. I haven't aggregated any statistics because my tcpdump collection wasn't "manageable" enough over a long period. I do, however, have a few graphs over a 4-day period, which I have checked manually.

The evolution of backup storage volumes is interesting for several reasons. Of course, it allows us to make capacity planning. It also enables us to determine which software is the most efficient for stacking a maximum number of backups in a minimum amount of space. Finally, it highlights an absolutely fundamental difference in retention management between Kopia and its 2 competitors. A difference that really took me by surprise.

The following graph shows the evolution of backup volumes from February 17 to March 21. The y-axis is in bytes, and the abbreviation B stands for "billion": 60B is therefore equivalent to 60 GB.

From the outset, Borg has been top of the class, with a very efficient approach. Kopia and Restic, on the other hand, saw their volumes soar. Restic finally bends its trajectory with the first purging tasks aimed at complying with the retention policy.
Marker 1 marks the moment when I activated compression for Kopia, without which I feared my test would be cut short by saturation of the destination storage.
Marker 2 marks the first purge of obsolete backups for Kopia, the descending sawtooth pattern is due to the fact that Kopia purges uncompressed backups and replaces them with compressed ones.
Marker 3 marks the evening of March 5. At this point, Borg has 68 archives, Restic has 52 snapshots, and Kopia 24. Yet the retention policies were aligned:

Borg purge options:

--keep-last 10 --keep-hourly 48 --keep-daily 7 --keep-weekly 4 --keep-monthly 24 --keep-yearly 3

Restic purge options:

--keep-last 10 --keep-hourly 48 --keep-daily 7 --keep-weekly 4 --keep-monthly 24 --keep-yearly 3

Kopia retention options:

	 Annual snapshots: 3 
	 Monthly snapshots: 24 
	 Weekly snapshots: 4 
	 Daily snapshots: 7
	 Hourly snapshots: 48
	 Latest snapshots: 10

The difference between Borg and Restic is very simple. Borg keeps an archive based on a single retention criterion: an archive may be kept because it corresponds to an hourly retention or a daily retention, but never both. Restic, on the other hand, labels snapshots with different retentions: a snapshot can be hourly & daily & weekly and so on. It will also label long retention periods (annual, monthly…) in advance. As of March 5, Restic was storing the equivalent of 74 snapshots, even if in reality it only had 52 restore points.
All in all, this is coherent.

Borg:

(rule: daily #5):        20240222.1708642488
(rule: daily #6):        20240221.1708553824
(rule: daily #7):        20240220.1708469744
(rule: weekly #1):       20240218.1708295990
(rule: weekly[oldest] #2): 20240217.1708164925

Restic:

2024-02-17 11:35:05                monthly snapshot  /Users/patpro
                                   yearly snapshot
2024-02-18 23:41:20                weekly snapshot   /Users/patpro
2024-02-25 23:40:46                weekly snapshot   /Users/patpro
2024-02-28 19:45:29                daily snapshot    /Users/patpro
2024-02-29 12:46:31                hourly snapshot   /Users/patpro
../..
2024-02-29 19:23:42                hourly snapshot   /Users/patpro
                                   daily snapshot
                                   monthly snapshot

Kopia is a very different story. It simply doesn't calculate retention like the other two. For Borg and Restic --keep-hourly 48 means "keeps the last 48 archives/hourly snapshots". For Kopia Hourly snapshots: 48 means "retains hourly snapshots for a maximum of 48 hours". This is extremely different. Remember, in the first article of the series I pointed out that due to the use of Launchd with StartInterval "the script is actually launched a little less than once per hour. For example, if each execution lasts 5 minutes, then it takes 26h and not 24h for it to be launched 24 times". I also pointed out that the client sleeps at night, causing an interruption in backups. Furthermore, from February 27 onwards, I further reduced the backup window from 17 or 18 backups per day to 8 or 9.
Within this reduced window, Kopia can now only make 8 to 9 backups per day. After 48 hours, it will have taken the equivalent of 16 to 18 hourly snapshots; an hour later, the oldest will have exceeded its lifetime and will be purged. In the same situation, Borg will keep 48 "hourly" archives spanning some 6 days.
Add to this the fact that Kopia manages its snapshots in the same way as Restic: each snapshot can have several retention labels.

Kopia:

  2024-02-18 23:38:41 CET ../.. (weekly-4)
  2024-02-25 23:43:54 CET ../.. (weekly-3)
  2024-02-28 19:46:56 CET ../.. (daily-7)
  2024-02-29 19:20:29 CET ../.. (daily-6,monthly-2)
  2024-03-01 19:02:51 CET ../.. (daily-5)
  2024-03-02 19:55:36 CET ../.. (daily-4)

Finally, Kopia doesn't hesitate to delete the oldest snapshots, whereas Borg and Restic treasure them. By March 5, Kopia had already deleted the very first snapshot. As of March 20, the oldest snapshot available is from February 29.

Here's the situation on March 20:

number of restoration points by date and program

	Borg	Kopia	Restic
2024-02-17	1		1
2024-02-18	1
2024-02-25	1
2024-02-29	1	1	1
2024-03-03	1	1	1
2024-03-06	1
2024-03-07	1
2024-03-08	1
2024-03-09	1
2024-03-10	1	1	1
2024-03-11	1
2024-03-12	1
2024-03-13	4
2024-03-14	8	1	1
2024-03-15	8	1	8
2024-03-16	8	1	8
2024-03-17	8	1	8
2024-03-18	8	1	8
2024-03-19	8	8	8
2024-03-20	7	8	7
total	71	24	52

In conclusion, if you're looking for very fast, bandwidth-saving backup tasks at the expense of guaranteed, readable retention, Kopia is the best candidate. If, on the other hand, you want to maximize the number of restore points at the expense of backup time and bandwidth usage, Borg is the clear winner. Kopia users will nevertheless be able to use Latest snapshots to ensure minimal retention for a client that is not always switched on for its periodic backup.

	Borg	Kopia	Restic
backup duration	-	+	-
network volume	+	+	-
retention management	+	-	+

23 mars 2024

Borg, Kopia, Restic : sauvegarde et utilisation des ressources

Je vais m'intéresser dans cet article un peu plus en détail aux métriques relatives aux sauvegardes. Notamment celles qui sont faciles à mesurer : le temps d'exécution d'une sauvegarde et les volumétries des transferts réseau. La consommation CPU n'est pas évidente à mesurer sur la plate-forme de test et dans mon contexte, c'est quelque chose qui a assez peu d’importance. La mesure des I/O sur le stockage aurait pu être intéressante, mais comme le disque de destination des sauvegardes est partagé avec d'autres usages ce n'était pas une métrique récupérable lors de mes tests.

Sans m'étendre trop sur la méthodologie, voici rapidement comment j'ai procédé à ces mesures. Côté client le script de sauvegarde place des lignes horodatées dans un fichier de log (début de sauvegarde pour Borg, fin de sauvegarde pour Borg, début de purge pour Borg, etc.). Côté serveur, tcpdump m'a permis d'enregistrer le traffic réseau. Les fichiers PCap ont été transformés dans tshark pour obtenir la liste des conversations réseaux. La liste horodatée et le fichier de log du client ont ensuite été injectés dans Splunk. J'obtiens donc les traces horodatées correspondant au début et à la fin de chaque tâche de sauvegarde avec, en sandwich, les nombres de paquets et volumes de données échangés entre client et serveur.
Je collecte aussi, à chaque itération du script de sauvegarde, les volumétrie des dépôts de ces sauvegardes.

Les premières statistiques que je voudrais détailler ici sont relatives à la création de la première sauvegarde. Cette création révèle déjà une différence importante entre les trois solutions :

Pour un volume initial de 145 Go

	durée	vol in	vol out	stockage
Borg	1118 s	1,2 Go	52 Go	62 Go
Kopia	716 s	1 Go	39 Go	75 Go
Restic	583 s	0,7 Go	32,2 Go	58 Go

Pour la durée de création nous avons une variation du simple au double entre Restic et Borg. C’est assez significatif. Restic est configuré pour ouvrir quatre flux de lecture concurrents, ce qui va bien évidemment maximiser l’utilisation de toutes les ressources (réseau, IO). Restic utilise aussi son serveur HTTP pour le transport. Cela lui permet une taille de paquets supérieure et donc d'en envoyer moins que Borg ou Kopia (33,1 millions pour Restic contre 54,6 millions pour Kopia). Il en résulte une quantité de données transitant inférieure aux deux autres (moins de paquets = moins d’overhead).
Notez qu’à ce stade de mon expérimentation je n’avais pas activé la compression de Kopia (inactive par défaut) mais a priori le client ssh intégré compresse les flux. La compression est aussi désactivée sur le stockage ZFS de destination.

J’ai quelques théories mais pas de vraie explication pour la différence de volume à l’arrivée, entre les logiciels d’une part et avec le volume de données qui transite d’autre part.

Les sauvegardes suivantes sont bien plus courtes puisque seules les données modifiées depuis la sauvegarde précédente sont envoyées sur le serveur. Au long cours, Restic perd son avantage face à Kopia et se rapproche des temps constatés pour Borg.

	Avg	Median	perc98
Borg	30,54 s	30,00 s	38,86 s
Kopia	8,53 s	8,00 s	13,86 s
Restic	24,96 s	25,00 s	30,00 s

L’avantage de Kopia est réellement impressionnant.

Sur environ un mois de mesures on note même une tendance haussière pour la durée moyenne des sauvegardes Borg et Restic et baissière pour celles de Kopia (4 périodes de 7 jours, 1 de 4 jours, 295 sauvegardes pour chaque logiciel) :

	Borg	Kopia	Restic
2024-02-19	29,07	8,76	23,72
2024-02-26	31,28	9,08	25,57
2024-03-04	31,59	8,26	25,29
2024-03-11	31,36	7,77	26,34
2024-03-18	32,38	8,38	26,09

Sur les durées des sauvegardes, Kopia est donc très largement meilleur que ses concurrents. Sur les volumes d’échange réseau il est aussi le meilleur, mais avec une marge bien moins importante. Il se place juste devant Borg et loin devant Restic. Je n’ai pas agrégé de statistiques chiffrées car ma collecte tcpdump n’était pas suffisamment «maniable» sur une longue période. J'ai par contre quelques graphiques sur une période de 4 jours, dont j’ai vérifié manuellement les données.

L’évolution de la volumétrie du stockage des sauvegardes est intéressante à plusieurs titres. Elle permet bien sûr de faire un peu de prévision en terme de capacité. Elle permet aussi de déterminer quel logiciel est le plus efficace pour empiler un maximum de sauvegardes dans un minimum d’espace. Pour finir elle met en exergue une différence absolument fondamentale dans la gestion des rétentions entre Kopia et ses 2 concurrents. Différence qui m’a vraiment pris par surprise.

Sur le graphique suivant, l’évolution du volume des sauvegardes est visible du 17 février au 21 mars. L’axe des ordonnées est en octets, l’abréviation B est pour l’anglais «billion» : 60B est donc équivalent à 60 Go.

Dès les premiers jours, Borg est bon élève, avec une approche très efficace. Kopia et Restic par contre voient les volumétries s’envoler. Restic infléchi finalement sa trajectoire avec les premières tâches de purge visant à respecter la politique de rétention.
Le repère 1 marque le moment où j’ai activé la compression pour Kopia, sans cela je craignais que mon test soit écourté par la saturation du stockage de destination.
Le repère 2 marque la première purge de sauvegardes obsolètes pour Kopia, le motif en dents de scie descendant est probablement du au fait que Kopia purge des sauvegardes non-compressées et les remplace par des sauvegardes compressées.
Le repère 3 marque le soir du 5 mars. À ce moment les archives Borg sont au nombre de 68, les snapshots Restic au nombre de 52, et ceux de Kopia au nombre de 24. Pourtant les politiques de rétention ont été alignées :

Options de purge de Borg :

--keep-last 10 --keep-hourly 48 --keep-daily 7 --keep-weekly 4 --keep-monthly 24 --keep-yearly 3

Options de purge de Restic :

--keep-last 10 --keep-hourly 48 --keep-daily 7 --keep-weekly 4 --keep-monthly 24 --keep-yearly 3

Options de rétention de Kopia :

	  Annual snapshots:                     3  
	  Monthly snapshots:                   24 
	  Weekly snapshots:                     4  
	  Daily snapshots:                      7
	  Hourly snapshots:                    48
	  Latest snapshots:                    10

La différence entre Borg et Restic s’explique très simplement. Borg conserve une archive sur un unique critère de rétention : une archive pourra être conservée parce qu’elle correspond à une rétention horaire (hourly) ou à une rétention journalière (daily), mais jamais les deux. Restic par contre étiquette les snapshots avec différentes rétentions : un snapshot peut être horaire et journalier et hebdomadaire, etc. De plus, il va étiqueter à l’avance les rétentions longues (annuelles, mensuelles…). Au 5 mars Restic stocke l’équivalent de 74 snapshots, même si en réalité il n’a que 52 points de restauration.
Globalement on s’y retrouve.

Borg :

(rule: daily #5):        20240222.1708642488
(rule: daily #6):        20240221.1708553824
(rule: daily #7):        20240220.1708469744
(rule: weekly #1):       20240218.1708295990
(rule: weekly[oldest] #2): 20240217.1708164925

Restic :

2024-02-17 11:35:05                monthly snapshot  /Users/patpro
                                   yearly snapshot
2024-02-18 23:41:20                weekly snapshot   /Users/patpro
2024-02-25 23:40:46                weekly snapshot   /Users/patpro
2024-02-28 19:45:29                daily snapshot    /Users/patpro
2024-02-29 12:46:31                hourly snapshot   /Users/patpro
../..
2024-02-29 19:23:42                hourly snapshot   /Users/patpro
                                   daily snapshot
                                   monthly snapshot

Dans le cas de Kopia, l’histoire est très différente. Il ne calcule tout simplement pas la rétention comme les deux autres. Pour Borg et Restic --keep-hourly 48 signifie «conserve les 48 dernières archives/snapshots horaires». Pour Kopia Hourly snapshots: 48 signifie «conserve les snapshots horaires au maximum 48 heures». C’est extrêmement différent. Souvenez-vous, dans le premier article de la série j’indiquais qu’en raison de l’utilisation de Launchd avec StartInterval «le script est réellement lancé un peu moins d'une fois par heure. Par exemple si chaque exécution dure 5 minutes, alors il faut 26h et non 24h pour qu’il soit lancé 24 fois». J’indiquais aussi que le client dort la nuit, causant une interruption dans les sauvegardes. Par ailleurs, à partir du 27 février j’ai encore réduit la plage de sauvegarde, passant ainsi de 17 ou 18 sauvegardes par jour à 8 ou 9.
Sur cette fenêtre réduite, Kopia ne peut plus faire que 8 à 9 sauvegardes par jour. Après 48h il aura donc pris l’équivalent de 16 à 18 snapshots horaires, une heure plus tard le plus vieux dépassera sa durée de vie et sera purgé. Dans la même situation Borg va garder 48 archives «horaires» s’étalant sur à peu près 6 jours.
Ajoutons à cela que Kopia gère ses snapshots comme Restic : chacun peut avoir plusieurs étiquettes de rétention.

Kopia :

  2024-02-18 23:38:41 CET ../.. (weekly-4)
  2024-02-25 23:43:54 CET ../.. (weekly-3)
  2024-02-28 19:46:56 CET ../.. (daily-7)
  2024-02-29 19:20:29 CET ../.. (daily-6,monthly-2)
  2024-03-01 19:02:51 CET ../.. (daily-5)
  2024-03-02 19:55:36 CET ../.. (daily-4)

Pour finir, Kopia n’hésite pas à supprimer les snapshots les plus anciens là où Borg et Restic vont précieusement les conserver. Au 5 mars, Kopia a déjà supprimé le tout premier snapshot. Au 20 mars le plus ancien disponible est celui du 29 février.

Voici la situation au 20 mars :

nombre de points de restauration par date et par programme

	Borg	Kopia	Restic
2024-02-17	1		1
2024-02-18	1
2024-02-25	1
2024-02-29	1	1	1
2024-03-03	1	1	1
2024-03-06	1
2024-03-07	1
2024-03-08	1
2024-03-09	1
2024-03-10	1	1	1
2024-03-11	1
2024-03-12	1
2024-03-13	4
2024-03-14	8	1	1
2024-03-15	8	1	8
2024-03-16	8	1	8
2024-03-17	8	1	8
2024-03-18	8	1	8
2024-03-19	8	8	8
2024-03-20	7	8	7
total	71	24	52

Pour conclure, si on privilégie des tâches de sauvegardes très rapides et économes en bande passante au détriment d’une rétention garantie et lisible, Kopia est le meilleur candidat. Si à l’inverse on souhaite maximiser le nombre de points de restauration au détriment du temps de sauvegarde et de l’utilisation de la bande passante Borg l’emporte haut-la-main. L’utilisateur de Kopia pourra néanmoins utiliser Latest snapshots pour assurer une rétention minimale pour un client qui n’est pas toujours allumé pour sa sauvegarde périodique.

	Borg	Kopia	Restic
durée des sauvegardes	-	+	-
volume réseau	+	+	-
gestion de la rétention	+	-	+

14 mars 2024

Borg, Kopia, Restic: from configuration to first backup

[This is the English translation by DeepL]
[Version originale en français]

All backup solutions require a minimum of configuration to do the job. These three solutions are no exception.
What Borg, Restic and Kopia have in common is the use of a repository into which they push backups. This repository must be initialized before the first backup can be made. In practice, here's how I proceeded with each of these programs.

Borg

Borg relies on an SSH key to open the communication tunnel between Borg on the server and Borg on the client. For this purpose, I use a dedicated SSH key and configuration. If you've mastered SSH configuration on the client side, this opens the door to fairly fine-tuning of the tunnel between client and server. We're back in familiar territory, and it's quite comfortable.

$ borg init --encryption=repokey-blake2 --rsh "ssh -F .ssh/config_bkp" bkppat@192.168.0.22:/backupmaison/bench-borg

Creating the first backup, on the other hand, is a completely different story. The number of parameters is so great that, in my scripts, I define them as variables which I then use in the various commands:

$ SSH="ssh -F .ssh/config_bkp"
$ HOMEREPO="bkppat@192.168.0.22:/backupmaison/bench-borg"
$ HOMEOPTIONS="--stats --exclude-caches --exclude-if-present .nobench-bkup"
$ SOURCEROOT="/Users/patpro"
$ ARCHIVENAME=$(date "+%Y%m%d.%s")
$ export BORG_PASSPHRASE="foobar"
$ borg create ${HOMEOPTIONS} --rsh "${SSH}" ${HOMEREPO}::${ARCHIVENAME} ${SOURCEROOT}

With Borg, all parameterization is done in this way. There is no configuration file. Fortunately, a backup task is supposed to be automated, i.e. to reside in one form or another in a script that doesn't need to be retyped manually at the command line. The disadvantage is therefore rather minor.

Kopia

If you're using SFTP transport, then Kopia works in much the same way as Borg, except that Kopia uses its own SSH implementation by default. This requires you to specify the location of the known_hosts file, for example. Nor will it take into account the configuration file of the machine's native SSH client. Unfortunately, Kopia's native implementation of the SSH protocol does not support all key types. This bug has been identified but not yet corrected. The use of Elliptic Curve keys remains possible if the Kopia client is told to use an external SSH client. For my part, I chose to create an RSA key to get around the bug.

$ kopia repository create sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

Unlike Borg, Kopia's configuration is largely deported to the repository. This involves using policy management commands to set a certain number of parameters for use by the Kopia client.
Here's how it works in my case :

$ kopia policy list
85bf7bc7688d5819433aad9ba1c84f6b (global)

$ kopia policy set --add-dot-ignore .nobench-bkup 08c59f69009c88ee463de6acbbc59a3a
Setting policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a
 - adding ".nobench-bkup" to "dot-ignore filenames"

$ kopia policy set --compression=zstd --global
Setting policy for (global)
 - setting compression algorithm to zstd

These policies are inherited, from the most general to the most specific:

$ kopia policy show --global| head
Policy for (global):

Retention:
  Annual snapshots:                     3   (defined for this target)
  Monthly snapshots:                   24   (defined for this target)
  Weekly snapshots:                     4   (defined for this target)
  Daily snapshots:                      7   (defined for this target)
  Hourly snapshots:                    48   (defined for this target)
  Latest snapshots:                    10   (defined for this target)
  Ignore identical snapshots:       false   (defined for this target)

$ kopia policy show 08c59f69009c88ee463de6acbbc59a3a| head
Policy for patpro@cassandre:/Users/patpro/08c59f69009c88ee463de6acbbc59a3a:

Retention:
  Annual snapshots:                     3   inherited from (global)
  Monthly snapshots:                   24   inherited from (global)
  Weekly snapshots:                     4   inherited from (global)
  Daily snapshots:                      7   inherited from (global)
  Hourly snapshots:                    48   inherited from (global)
  Latest snapshots:                    10   inherited from (global)
  Ignore identical snapshots:       false   inherited from (global)

As the various client parameters used to define backup and purge tasks are deported, the command line for the various actions is greatly simplified. You still need to to connect the client to the backup server. This is done with a command like :

$ kopia -p foobar repository connect sftp --path=/backupmaison/bench-kopia --host=192.168.0.22 --username=patpro --keyfile=/Users/patpro/.ssh/kopia_rsa --known-hosts=/Users/patpro/.ssh/known_hosts-kopia

Disconnection is then carried out as follows:

$ kopia repository disconnect

It's important to understand that this connection and disconnection doesn't have to be done systematically. It's a mistake I made myself: in the early days of running my backup scripts, for each Kopia task I would connect, perform the task and then disconnect. The side effects are minimal, but I'll come back to that later. In any case, a connection at time T persists, because it's not a network connection. It's best to think of this as a pre-configuration of the client, enabling it to use the server's login, password and contact details to carry out backup tasks. In the rest of my tests, I simply made a connection once and never disconnected my client again. As a result, creating a backup is as simple as this trivial command:

$ kopia snapshot create /Users/patpro

The difference with Borg is obvious.

Restic

To test Restic, I decided to use the REST-server. Transport is over HTTP and the command is relatively trivial. On the server side, configuration is fairly basic, since I've simply specified the storage path, interface and TCP port for listening and the --no-auth parameter, as I don't need authentication.

$ restic -r rest:http://192.168.0.22:8000/ init

Like Borg, backup and purge tasks require all parameters to be passed on the command line:

$ export RESTIC_PASSWORD=foobar
$ restic -r rest:http://192.168.0.22:8000/ backup --verbose --exclude-if-present .nobench-bkup --one-file-system --exclude-caches --no-scan --read-concurrency 4 /Users/patpro

There's not much more to say about Restic, which behaves much like Borg in terms of settings and command line. Even if the number of parameters were to multiply, it would all take place in a perfectly readable shell script.

To conclude

I really like the readability of a command line with or without variables: it allows me to analyze a backup script in any editor and in any context. Nevertheless, I have to admit that I'm quite taken with the elegance of Kopia's policy system, even if accessing the policy content requires access to the Kopia command in a terminal as well as to the backup repository (which is a drawback). I think we're faced with a choice criterion that will depend solely on the end-user's preference.