This chapter shows you the different types of tools you can use to back up your version control system data. Please bear in mind that GitHub. GitLab or Atlassian, as most cloud services providers, rely on shared responsibility models in which you are responsible for data protection. They even encourage you to use third-party backup software to make sure your data is accessible and recoverable. So now, let’s take a look at what external options you have.
Option one: git backup scripts
GitHub, Bitbucket, and GitLab are the platforms used by developers who see the need to back up their work. Thus on developers forums, inside Git communities you can find a lot of self-written scripts that provide you with some basic features and a bit of peace of mind.
Among some backup scripts are:
- Python GitHub Backup
- based on PowerShellForGitHub
- ghe-backup by Zalando Tech
and many more…
PRO: Using an open-source script is free and usually the code is pretty stable. You can check the entire script on GitHub or Bitbucket and decide whether it’s sufficient for all your or your organization’s requirements.
CON: Sometimes scripts do not provide you with a restore option, give you rather archiving features, or require specific data storage. So you can perform backups but if the event of failure occurs, you will need to write your own script to recover the data. Having backups without a reliable recovery method provides you with very limited possibilities. In the event of failure every minute matters, so besides backups it might be even more relevant to have a recovery option that makes your data accessible and recoverable from any point-in-time exactly when you need it.
Scripts – for who? Individual developers or small project owners. It’s a solution for those who are in need of very rare backups, made rather sporadically. As it doesn’t include a recovery option – it would be efficient only for people whose business does not rely on version control system data and might survive with no access to information for some while. Finally, it’s efficient only assuming that afterward you or your team would be able to write a recovery script to not lose data.
You can even solve this by writing your own scripts if you have enough skills, time and resources. But please bear in mind that it can be tricky to get this right. Moreover maintaining such a tool might generate long-term costs and administrative time (as mentioned in the previous chapter).
How secure are your repos and metadata? Don’t push luck – secure your code with the first professional GitHub, Bitbucket, and GitLab backup.
Option 2: S3 Backup
Amazon S3 Backup is an open-source script made with the use of GitHub Action. GitHub Action is an internal GitHub tool that helps to build, test and deploy applications. In short – it automates every step of the development workflow.
S3 Backup app allows you to backup git repositories into popular Amazon S3 compatible object storage only.
PRO: S3 Backup is an open-source script meaning it’s for free and quite simple to use. You don’t need to install any additional software. All changes are captured as it works with every push so you don’t need to worry about missing some data in your backups.
CON: S3 Backup relates only to repositories so all metadata – including pull requests, wikis, projects, issues, etc. are not protected and can be lost. The biggest disadvantage is that it does not provide you with any retention settings and options – old backups are immediately replaced with new ones here. It gives a very poor recovery option – all you can recover is only the last full version and there is no possibility of point-in-time recovery. Just imagine that malicious actors can compromise backups by infecting a new copy that will replace an old reliable one. It opens them a gate to perform further attacks. Yes, it is possible with only one backup version stored. Moreover – it does apply only to cloud storage so if you want to use your own, local infrastructure, it’s not for you.
S3 Backup – for who? As S3 Backup does not differ a lot from the scripts mentioned above we can also recommend it to individual developers or small project owners who need to perform backups rather sporadically. As there are no retention settings and it simply replaces old copies with new ones it may be useful only when you don’t need to track changes and recover historical data. In short – all you need is only the last state of data. Moreover, if you consider using S3 Backup, please make sure you have all the most important security measures in place to prevent any malicious actions.
Option 3 – Git and GitHub API
Finally, you can use the official GitHub API to back up your Git repository. First, you need to clone and download a repository or wiki to your local machine. Once you have it done, you need to use API to export elements of your GitHub Enterprise Server repository (like issues, pull requests, forks, comments, etc.) to your local machine, create a zip archive and save it in some secure place – external hard drive or cloud service.
PRO: Using GIT and GitHub API allows you to create copies of the entire set of data – repositories together with all metadata.
CON: We would say that this option is rather a replica of your repositories and all metadata but not a backup itself – it’s not encrypted, can be compromised, and lost. It’s like an external copy of your repository saved in separate localization. It doesn’t run automatically so it requires repeating this action over and over again.
GIT and GitHub API – for who? As this method doesn’t work automatically, it’s rather an option for archiving a GitHub repository and old projects. It doesn’t provide any security measures for such copies so it shouldn’t be treated as a backup. It might become useful for small project owners who simply want to keep access to older projects for any future use.
Option 4 – GitProtect.io powered by Xopero ONE
If your organization expect more from a backup than simply one copy of your data set (and mostly only repository copy) better consider a professional backup and recovery software for GitHub, Bitbucket, and GitLab such as GitProtect.io powered by Xopero ONE which automatically protects all your version control systems data – including repositories and metadata (pull requests, wikis, issues, branches, projects, etc.).
PRO: GitProtect.io is the software dedicated to GitHub, Bitbucket, and GitLab (soon) which includes the best features of enterprise backup software created by the vendor with more than 10 years of experience on the backup market. Trusted by thousands of customers and partners worldwide (including T-Mobile, Orange, ESET, Subway, AVIS).
Whether you use GitHub, Bitbucket, or GitLab (soon) you can protect all your data – including repositories, and metadata. And it does not matter if you use your version control system as a SaaS application or locally on your developers’ devices.
When it comes to storage, you don’t need to invest in an additional IT infrastructure – you can store backups locally (your local machine or any NAS, SAN devices) as well as any private or public cloud compatible with Amazon S3 (AWS, Azure, Wasabi, Xopero, etc.). It can even be a hybrid or multi-cloud environment as within one license you can have multiple storages.
All you need to do is to set your administrative account and use a central web-based management console to set backup plans, recover the data, manage users, devices, and storages. Thanks to this cloud-based architecture you have access anytime and anywhere – every time you need it.
Once you add your GitHub, Bitbucket, or GitLab (soon) account to GitProtect.io you can set automatic backup plans which include data to be protected, storage where the copies should be stored, schedule so the time when the automatic backup should be performed, and backup execution manner. New repo? It can be automatically added to your scheduled backup plan. Moreover, you can set a push as a trigger so the backup will perform automatically with every push you make.
To make it even easier for you – you can choose a predefined backup plan from the list.
You have full control over retention due to the Grandfather-Father-Son scheme – probably the most efficient way of rotation that allows you to manage the copies in the long-term perspective while requiring minimal space in data storage and enables fast recovery.
Moreover, having full control over retention gives you the possibility to archive unused repositories and save your version control system’s free space and save money.
During the backup plan set up, you can even choose encryption level (all copies are encrypted with AES encrypted key considered as impossible to break but additionally you can change a force of this encryption) and compression level to control your storage capacity.
To make it even more safe for you – we have implemented a Secure Password Manager that enables you to create strong passwords that you don’t need to memorize.
GitProtect.io also provides audit logs and notifications, so you can stay up to date and keep track of your copies for security and compliance purposes.
And finally, you have a wide range of data recovery options. Flexible, point-in-time recovery to a repository or local device makes GitProtect.io a very reliable and complete backup and disaster recovery solution for your version control system.
CON: GitProtect.io powered by Xopero ONE is a paid solution but the price depends on the number of repositories to protect. The more repos you have, the less you pay for one repository. Unlike in your own script case, you are not an architect here so you have less control over how it works and what features it has. But the list of features is quite long (and based on years of market experience) so probably it can be even wider than you expect. Considering you can use your own infrastructure and nearly every storage as well as even archive git (and saving your version control system space) this price seems relatively low and reasonable. Adding any possible attacks and events of failures may even become an investment with a pretty high return. It is said that in the event of failure you can save 4$ on every 1$ spent on backup and disaster recovery solutions.
GitProtect.io – for who? For every organization that treats its code as an Intellectual Property and relies on version control systems like git and hosting platforms like GitHub, Bitbucket, or GitLab – regardless of its size, revenues, and even industry. It can be an enterprise, a small or medium-sized business that has an IT department as well as a software house and even individual developers. It’s for all organizations that are aware of data breach costs and legal penalties so want to prevent data loss and ensure business continuity.