How to write a GitLab backup script and why (not) to do it?
Backing up your data is an essential part of any good data management strategy. Whether you are a business owner, a student, or a DevOps specialist, it is important to have a plan in place for protecting your important files and information from being lost or corrupted. This can help prevent data loss due to hardware failure, accidental deletion, or other unforeseen issues. By understanding the importance of backups, you can take steps to ensure that your data is always safe and secure.
The need to secure source code and maintain regular backups is a common problem in software development. One solution is to create a script that automatically clones the entire repository on a periodic basis. The main challenge in implementing this solution is ensuring proper authorization, but you can easily overcome it with a few lines of code. By automating the backup process, we can be sure that our code is properly protected without having to devote significant time and resources to the task. Yet, can we?
While a simple script for cloning repositories may be effective for a small number of repositories, it becomes less practical as the number of repositories grows or when new ones are added regularly. Additionally, such a script may not adequately protect metadata associated with the repositories, such as information contained in pull requests and issues.
Ensuring proper authorization is crucial to protecting against the risk of exposing sensitive credentials and compromising security. Therefore, it is important to carefully consider the limitations and potential risks of using a simple cloning script for backup purposes.
How to write a proper GitLab backup script
In order to effectively create a backup script, several key elements must be considered. These include parameters like this:
- address of the repository (or organization)
- storage location
- filename convention for versioning
Additionally, it may be useful to include other elements such as a hostname, reports on who performed the backup and when, options for deleting old data, or scheduling for the backup process. While a simple script can be useful, it may lack flexibility for more specific needs, such as making a backup after each code merge to a specific branch. Overall, creating an effective backup script requires careful planning and consideration of all relevant elements.
GitLab, since its inception, is not only a repository but also a DevOps tool. And as such, it gives us the ability to do manual backups quite easily and allows quite a few configuration options. But let’s start with the warnings we’ve found in the official documentation:
“GitLab doesn’t back up items that aren’t stored on the file system. If you’re using object storage, be sure to enable backups with your object storage provider, if desired.”
“The backup command doesn’t verify if another backup is already running, as described in issue 362593. We strongly recommend you make sure that all backups are complete before starting a new one.”
Yet before we start using the options built into GitLab, we need to make sure that Rsync is installed on our machine. For example, by simply running “sudo apt-get install rsync” in CLI.
As for filename, by default we have here a simple convention like this: [TIMESTAMP]_gitlab_backup.tar, where TIMESTAMP identifies the exact moment of the backup creation. It is also noteworthy, that GitLab backup would include an entire instance with elements like repositories, CI/CD job output logs, group wikis, snippets or GitLab Pages content.
The basic scenario is very simple. All you need to do is run the following command (from version 12.2):
sudo gitlab-backup create
or if our instance is running GitLab from within a Docker container:
docker exec -t <container name> gitlab-backup create
And that’s basically it, for the most basic scenario. Of course, we must be aware that GitLab will not back up the configuration files, located in the “etc/gitlab” directory, so it would be worthwhile to take care of a manual copy of at least the gitlab-secrets.json and gitlab.rb files from the aforementioned directory.
Of course, we can parameterize the “create” instruction for ourselves, for example, by changing the values of the optional STRATEGY, SKIP or INCREMENTAL attributes. I would also add that it is possible to create a backup directly to the cloud, but this already requires some configuration of our GitLab instance, so I will skip the details now.
As you can see, it seems trivial to perform such a backup, but the above actions are performed on the entire GitLab instance. What if we wanted to make a copy for just one repository? This is where the problems begin. Not only with the script itself, but most importantly with maintenance, because in such a case, each separate repository would have to be run on separate backup and restore scripts. It’s a nightmare!
Tools for backup
Backup is a crucial process for safeguarding your data. It involves creating a copy of your computer data and storing it in a separate location. This copy can be used to restore your original data in case of an emergency. A good backup should have several key features, including:
- data retention
- recovery process
The backup script for GitLab created in “the old way” falls short of meeting the definition of a good backup. One of the major issues is scalability, as it may not be able to handle large amounts of data. Additionally, there is a lack of encryption for added security. Another major concern is recovery, as it is crucial to be able to restore a backup quickly in case of an emergency.
Eliminate data loss risk and ensure business continuity with the first TRUE Disaster Recovery software for GitLab.
Downtime can be costly and unacceptable in today’s fast-paced world, so the ability to restore a backup quickly is essential. Overall, it may be difficult to achieve all of these features with a simple, manually created backup script.
Why you should pick a backup tool instead of the script
To address the challenges with the manually created backup script for GitLab, we can consider using third-party tools. While this may come at a cost, it allows us to overcome many obstacles and ensures a more reliable backup. Initially, creating your own scripts may seem like a more cost-effective and customizable solution. However, in the long run, the benefits of using third-party tools far outweigh the initial cost and lack of customization.
One of the major drawbacks of creating your own Git backup scripts is the high maintenance costs and time required for management and administration. Additionally, there is no guarantee of the reliability of the backup, making it a risky solution. In the event of a disaster, we would need some additional scripts for recovery, creating a never-ending cycle of making backups of backups. It is not a sustainable or efficient use of time and resources. Instead, it may be more worthwhile to invest in third-party tools that provide reliable and efficient backups.
Pros and cons of GitProtect
To overcome the challenges of creating your own Git backup scripts, we can consider using a third-party tool such as GitProtect. While it may be troubling at first to use a tool that is not fully customizable by us and requires some cost. However it offers several benefits such as central management, extensive configuration options, and the ability to assign permissions and roles. All of the mentioned ensures efficient management as you know who has access to what, enabling teams and departments within an organization to take responsibility for backup management.
GitProtect offers convenience and transparency through email or Slack notifications and daily reports with compliance and audit information. It also includes Disaster Recovery capabilities, ensuring a reliable and efficient backup solution. For more details, please visit the GitProtect website.
In summary, the title of this article should be changed to “Why should you never ever use a GitLab backup script?” Instead of wasting time and resources on creating our own backup scripts, we should focus on developing our business and let third-party tools handle the tedious and necessary administrative tasks. This allows us to benefit from a more reliable and efficient backup solution without sacrificing our time and expertise.