We live in a time when having backups isn’t reserved for the biggest ones, and frequent backing up and storing has become as important as the data itself. Have you ever wondered how your repositories in your company are secured?

Backup has now become the norm as well as a good and desired practice. I perfectly remember that just a few years ago, persuading your employer to spend money on an archiving system was a task comparable to understanding the entire code of Docker. Sometimes I get the impression that nowadays the computing power and RAM have become so cheap that no one thinks about code optimization anymore, but this is a story for another post. Over the last few years, developers have come to like and learn to work using GIT, and it doesn’t matter if we’re talking about repositories on the local GIT server or on a code hosting platform like GitHub, Bitbucket, etc. One thing is for sure, now the distributed version control system is used everywhere.

Different Git backup strategies

Providing/Implementing GIT backups

Today we’re going to tackle the issue of Git backup, which is so overlooked in the world today. I must admit that I still meet people who openly say that if they have a local repo and a commit made to the server, they are safe. In the pages of the last few entries on this blog, I tried to present that this isn’t entirely true. However, let’s focus on the backup strategies that are used today.

Repositories copy

The first method used very often is to copy the repositories at the level of the operating system itself. For example, admins were instructed to secure the company’s creativity, so they decided to do it in a well-known, and in most cases, working way. Depending on the operating system, they set the task either using the CRON tool built into Linux distributions or the Task Scheduler that is part of Microsoft’s systems. Now let’s take a look at some examples of commands that are likely to be used:

#Linux
0 * * * * rsync -av repos/ /backup/gitMirror/
0 1 * * * tar -cvpzf /repos/$(/bin/date +\%Y_\%m_\%d_\%H_\%M_\%S).co.tar.gz /backup/gitBackup/ > /backup/logfile.log 2>&1

#Windows
robocopy M:\repos\ \\10.0.0.2\m$\backup\ /MIR
7z.exe a -ttar "\\10.0.0.2\m$\backup\Archives%DATE:~10,4%%DATE:~4,2%%DATE:~7,2%__%TIME:~-11,2%hour
.tar" "\\10.0.0.2\m$\backup"

Git-level backup

The second method proposed by the IT department will probably be GIT-level backup. This means that backups will be made using the built-in tools. Here we can of course use the git archive command, but also command like git clone or git bundle. For example:

#GIT Clone
git clone --mirror repos/repo/ /backup/gitBackup/backup.repo

#GIT Bundle
git bundle create backup.bundle master

#GIT Archive
git archive --output=./backup/gitBackup/backup.tar --format=tar HEAD

What do you mean by backup?

The question is whether these are really backups. After all, there are many differences between the built-in tools mentioned above. The proposed git clone will give us a dump of the entire repo, but backing up an entire forest full of different repositories is another story. By default, the git bundle and git archive will create repository archives without unnecessary files such as old versions, missed flags, etc. The problem is that while we can assume that we don’t need these unnecessary files during production, developers may have a different opinion on this. This isn’t the end of potential ways of dealing with data backup at the GIT level. After all, you can easily push repositories to external resources or even to sites like GitHub. There are many ways, but can they be called full-fledged backups?

Of course, some kind of dispute between developers and sysadmin / DevOps may also arise in the third scenario. Let’s say that the choice fell on backing up entire servers and storing them on LTO tapes. Everything is fine, but such backups usually take much longer, and the restoration itself is associated with a much longer break. However, many more questions appear here. Firstly, how do you know that such forms of backup ensure data integrity and consistency? We can assume that everyday administrators restore repositories from such backups to the test infrastructure and after that programmers check their repositories. Let us agree, however, that finding time for such activities for both sides will be rather a task that is impossible to do in modern times.

Xopero decided to take the initiative and present GitProtect.io, an application that allows you to answer all the above questions. We invite you to monitor the GitProtect Blog. Soon, we will present, discuss and answer questions about the application that will change your approach to Git backup.

Comments are closed.

You may also like