Your code is your intellectual property, the most valuable asset inside your company. And you know that. As a CTO, IT manager, software-house owner, or team leader you know how much time and costs were spent to achieve this intellectual property and how important it is to protect this property and not to lose a single line of code. That’s why you should backup your GIT repositories.
Now you can say, “ok, but GIT itself is a backup tool.” Well.. not really. You don’t have to look far: April 2020 (Github), June 2020 (Github, Bitbucket), December 2020 (Bitbucket) are just a few crashes where “your” code was unavailable for a while. Could ‘backup’ be even unavailable? That sounds ridiculous and shouldn’t be the case in the modern organization. We should treat our repositories as production environments, and we cannot afford not to have access to the production environment. Do you backup your production data? So please do a backup of your repositories too.
How to clone Git repository
Speaking of Git, backup is not that straightforward thing. One of its primary functions is clone. As you probably know, clone allows you to … clone a repository, which means creating a local, fully functional copy. But what does this feature actually do? Contrary to e.g. SVN and the checkout command, git clone makes a complete copy of the repository. Every version of every file from the beginning of the project. Nice feature. To achieve that, the project directory is generated on our machine, remote-tracking branches are created, then the fetch operation is performed (which takes the code for the aforementioned branches), and finally pull function is performed for the default branch. Now all that’s left to do is to set the addresses for origin – which clone also does for us – and that’s it! Simple as that. You have the perfect copy of the cloned repository. Well, almost perfect, but more on that in a moment.
git clone command
Of course, the clone command can be parameterized. We can play with a selection of active branch – as we don’t need to pull the default one – but also, for example, we can set config params or filter some elements that we do not want to download. Sample calls:
git clone https://myrepo.com/project –no-tags –filter = blob: none
– it won’t download tags nor blob files
git clone https://myrepo.com/project –config core.editor vim
– will set our favorite editor
We can also do cloning with the –mirror option. That one will finally clone the entire repository. Literally the entire, as mirrored, repository which contains all the extended refs of the remote repository and maintains remote branch tracking configuration.
git bundle command
It might seem that this feature is enough not to worry about backing up our repository. But that’s not entirely true. The GIT itself already gives us a hint that cloning is not enough by entering the bundle command. This function creates a single archive-file that contains all the refs needed to restore the repository. You can easily run clone or pull functions on that bundle file. It is not very useful for developers, as they want to have working copy and start their work easily. But the bundle is a great function to create a copy of our repository.
Use git bundle –all to create incremental dump files of content and copy configuration files separately.
Would such a feature exist if cloning were enough? Well, I don’t think so. In addition, e.g. Github provides an API to create a repository backup, and in its official documentation also encourages the use of Third-party tools to do so.
So why do we need Git backup?
So now we know how clone and bundle functions works in GIT. We have tools that we can use to create copies, but we don’t want to do it manually every time! Usually, we create scripts that execute the right commands at the right time. For this we also need some own hosting to keep these copies. Simple in theory, but practice requires us to maintain these scripts constantly, which can be problematic and time-consuming, e.g. – there is a new repository
– we want to change hosting
– another repository appeared and some old project has been closed and archived There can be many reasons, but each of them forces us to bear the costs of maintaining such scripts. Which, moreover, should also have a backup of themselves!
There is another huge disadvantage of that approach. Even if your script creates a copy – how to restore the data? By another script? It means another maintenance work. And are you sure that this copy even works? Creating a copy of your data is a very good idea, but it’s not enough to call it a proper backup. You should be able to easily create copies, check their correctness, version them and restore if needed.
Maintaining your own scripts certainly seems beneficial and cheaper in the early days of a project, but over time it always turns into a nightmare and consumes enormous costs and resources if you want to get it done right.
Using Third-party backup tools seems to be a good solution in this situation. Ok, we will have an initial cost related to the configuration, but we would have to pay this cost anyway. However, we have an advantage in the long term – our costs do not accumulate, the backup of our repositories is automated, we have access to them anytime-anywhere, and our employees can take care of the company’s development, not maintenance.
Summing up, there is a huge difference between ‘copy’ and ‘backup’. Copy is fine for daily work, but it’s not enough for the real protection of your data. To create a complete backup you need to care about encryption, versioning, data retention, and so on, to be prepared for unexpected situations. It would also be nice to have audit logs, notifications, and a nice UI for administrating all of this. Some out-of-the-box tool that can take multiple repository backups off your shoulders while providing a lot of flexibility.
Be a smart leader – do your backups in a proper way!