Last Updated on September 24, 2024

Currently, the IT world is evolving at a tremendous pace. But if I had to find just one word that best describes how the industry looks like today, the best option would be the word “service.” Nowadays, most things are or are becoming services – from a simple image hosting to at least AWS, which is, after all, a service.

This time I’ll focus on a specific, extremely popular one – GitHub. And, what is more, on its on-premise version, namely GitHub Enterprise.

GitHub Enterprise

Being a paid plan for advanced collaboration for individuals and organizations, GitHub Enterprise includes security, compliance, and flexible deployment features. In fact, with it, we can deploy and manage GitHub on our own, with extensions for advanced auditing, single sign-on, LDAP, or some environment protection rules. However, we are most interested in GitHub Enterprise Server, its network connectivity, capabilities, and how we can use it.

GitHub Enterprise Server

It is a self-hosted platform for software development. I have already mentioned some of its functionalities above. The most important from the perspective of our enterprise is, of course, the topic of better security, increased control, and here we have a lot of opportunities to minimize risks associated with the public cloud.

Enterprise Server runs on your own infrastructure with yourr firewalls, policies, access controls, monitoring, identity access management, VPNs, network policies, etc. It is up to the company to decide how secure its source code and the organization is solely responsible for its security. The company makes decisions on how to back up its enterprise server and its source code, how many rotation schemes and which ones to use, and which specific repositories and metadata to protect.

However, there are some security measures that GitHub Enterprise Server provides to its customers to help them prevent the devastating consequences of data loss or infrastructure outages. For example, you can increase reliability by configuring a passive replica instance. So that if your system or network fails, you have the additional one. In this case, when you want to improve the performance, you can distribute the load of your application or service across multiple servers or instances by configuring active replicas. What about backup? One of the key requirements when it comes to security compliance

Well, to back up the configuration and GitHub data, GitHub Enterprise Server users can use the GitHub Backup Utilities system to take snapshots of their instances. Let’s look at how it happens more precisely…

GitHub Enterprise Server Backup Utilities

It is a data backup system installed on a separate host which works as a typical backup solution, by making initial backups and regular snapshots of the original server which allows us to restore our GitHub Enterprise instance if needed. It uses a secure SSH network connection to do these backup snapshots. This backup solution doesn’t require constant full snapshots. It allows to make only incremental backups, which is good. So to say, transfer only changes between the last snapshot and the current version. Moreover, to minimize the performance impact, those backups are usually done online and under the lowest CPU/IO priority. As a result, it will minimize the impact on performance or memory usage.

Starting point – how to use GitHub Enterprise Server Backup Utilities

In GitHub Docs we can find step-by-step instructions on how to set up this backup solution:

  1. From the github/backup-utils repository’s Release page, you need to download appropriate GitHub Enterprise Server Backup Utilities. 
  2. Extract the repository by using tar: tar -xzvf /path/to/github-backup-utils-vMAJOR.MINOR.PATCH.tar.gz
  3. Change into the local repo directory by running the cd backup-utilities command.
  4. Run cp backup.config-example backup.config. It will help copy the file to your backup configuration.
  5. Customize your configuration by setting the GHE-HOSTNAME and GHE-DATA-DIR.
  6. Allow your backup host to access your instance by adding the SSH key of a backup host to the list of your authorized SSH keys.
  7. Verify SSH connection between your backup host and the GitHub Enterprise Server. For that, you can use the command the-host-check.
  8. Perform your first full backup by running the command ./bin/ghe-backup.

Using the cron(8) command you can schedule regular backups on the backup host. To meet your RPO and foresee the worst-case scenario, you can start with an hourly backup schedule. Thus, in case of a catastrophe you will have a maximum of an hour of data loss.

The worst case scenario – what should you know about restore?

Outages and cyber-attacks happen every day. And, actually, that’s the reason why it’s critical to have a working backup of your GitHub repositories and metadata. So that in case of a disaster, you will be able to restore them and continue your work. 

GitHub Enterprise Server Backup Utilities permit you to restore your GitHub Enterprise Server instance. In this case, you need to have another instance and perform the restore process from the backup host. Though, you shouldn’t forget to add the backup host’s SSH key. Thus, it will be able to target the GitHub Enterprise instance as an authorized SSH key before the restore process of your instance starts. 

💡 Important: You should remember that this method allows you to restore data from no more than two latest feature releases behind. It means – if you have a backup from GitHub Enterprise Server 3.0x, you can restore your backup to an instance running 3.2x.

Well, you can use the ghe-restore command to restore your GitHub Enterprise Server instance from your latest snapshot. Moreover, you can use some additional options with the ghe-restore, like -c and -s flags. In this case, the -c flag is used to overwrite the settings, certificate, and license data on your restore host even if you have it already configured, and with the -s flag you can select different backup snapshots to restore. 
To monitor your backup or restore processes, you can run the ghe-backup-progress utility. Thus, you will see the progress of each of your job sequentially.

📚 Read about the most severe security incidents of 2023 and the best practices to be prepared for the worst-case scenario.

The State of DevOps Threats report

Can the High Availability replica serve as a backup?

We don’t want to go far from the topic, but it’s important to mention High Availability replica, which, as well as, Backup Utilities is used as a part of a GitHub Enterprise Server deployment… but it serves absolutely different role. 

The main purpose of the High Availability replica is not to provide you with a backup copy but help to minimize service disruption should a hardware failure or some major network outage take place and affect your primary instance. Being a fully redundant secondary GitHub Enterprise Server instance, the High Availability replica serves as an active or passive cluster configuration that is kept in synchronization between your primary instance and the replication one. It means that if ransomware hits your main GitHub Enterprise Server instance or some data is deleted, it will be replicated immediately to your High Availability replica. Thus, it won’t help you avoid data loss, and you still need to have some disaster recovery plan up in your sleeve.

Third-party backup software – GitProtect.io backup for GitHub

As always, there is an alternative. GitHub Backup Utilities are good, no doubt. However, we should at least consider other options, for example, third-party data backup software for GitHub Enterprise backup and recovery to build a reliable enterprise backup strategy. In general third-party data backup software is something common in IT, and such solutions strictly related to data backup and recovery are becoming more and more popular – as nobody wants to experience data loss or other restore data consequences, yet wants to have efficient data availability, accessibility, and recoverability.

Just look at the GitHub marketplace and the number and popularity of data backup solutions for creating and managing secure backups to protect data. Third-party backup software has the possibility to backup data from your environment, your entire system, to multiple locations, which will enhance your data protection and make recovery operations easier. Here you can see the nice comparison between GitProtect.io and Rewind (formerly BackHub) backup software for continuous data protection.

GitProtect.io backup and Disaster Recovery software for GitHub Enterprise Server will help you minimize the time your IT team spends on backups, as you have the possibility to automate your backup performance, meet compliance and security requirements, and reduce the risks related to data restore. Following the backup best practices, you can ensure that you have:

  • a full-data coverage backup (including repositories and metadata),
GitProtect interface
  • the possibility of keeping your backup copies in a few storage destinations to meet the 3-2-1 backup rule, and replication between storage instances etc.
  • unlimited retention, so that yu can restore your data from any point in time – even the oldest copies,
  • monitoring center to make sure that all your backups are performed effectively,
  • advanced security measures, including encryption in-flight and at rest with your own encryption key, data center region of choice, least privileged model, etc.,
  • ransomware protection,
  • restore and Disaster Recovery capabilities, full and granular restore, point-in-time restore, restore to your local machine or a cloud, and cross-over recovery (e.g. from GitHub to GitLab or Bitbucket).

GitHub Enterprise Server Backup Utility vs. GitProtect.io Backup for GitHub

Let me focus now on the details regarding GitHub. Why could we even be interested in choosing a third-party backup software when there is an Enterprise backup solution? Let’s check out some functionality for ourselves:

FeatureGitProtect Backup & DR software for GitHub EnterpriseGitHub Enterprise Server Backup Utilities
Backup performance
Automated scheduled backups✔️

you can automatically schedule backups at regular intervals (set up an interval as less as 10 minutes to meet the most strict RTO and RPO needs)
✔️

you can automatically schedule backups at regular intervals using cron(8) or similar command (GitHub advices to set a backup interval for an hour interval)
Manual backup triggering/backup on demand✔️✔️

depends on your storage type and configuration
Full backup copy✔️✔️
Differential backup copy✔️
Incremental backup copy✔️✔️
Multiple backup plans/policies✔️

create as many backup policies as your organization requires


single backup.config file
Storage type✔️

multi-storage support (any S3 compatibility cloud, local storage, NAS devices, GitProtect Cloud, etc.)
✔️

any S3 compatibility Cloud
Multi-storage compatibility✔️

you can assign as many storage instances (cloud, local) as you need according to your compliance and organizational needs


you can have a High Availability replica
Monitoring and Audit✔️

advanced audit logs, Webhooks, Slack, email notifications, data-driven dashboards


audit logs related to Enterprise Ser4ver itself, to monitor backup performance you need to run ghe-backup-progress utility 
Management✔️

user-friendly web GUI or REST API


manual
User Access Control for backups✔️

you can set different roles and privileges, e.g. a backup viewer, a backup operator, a restore operator, a system administrator
Backup retention policy✔️

you can set long-term retention up to unlimited to meet even the most strict compliance regulations


manual
Network throttling during backup✔️

allows to limit the network bandwidth used by backup operations to avoid slowing down other processes
Compression level customization✔️

permits users to choose different levels of compression during backups
Backup security
SAML SSO✔️✔️

GitHub Enterprise Server itself has it
Encryption✔️

encrypts your data in-flight and at rest
Encryption level customization✔️

moreover, you can even set p your own encryption key
Zero-knowledge encryption✔️
Data Center region of choice✔️

EU/US/AUS/custom to meet your needs and requirements


depends on your storage
Ransomware protection✔️
The possibility of meeting the GitHub Shared Responsibility Model✔️
Compliance✔️

GitProtect.io is SOC 2 Type II and ISO 27001 compliant among other compliance and security audits
✔️

GitHub itself is compliant with SOC and ISO standards 
Restore and Disaster Recovery
Full data restore✔️✔️

restore option is possible using the command ghe-restore
Point-in-time data restore✔️

manual
Granular recovery✔️
Cross-over recovery✔️

you can restore your data to GitLab, Bitbucket, Azure DevOps
Disaster Recovery Technology✔️

GitProtect.io allows you to foresee any possible disaster scenario


manual

As you can see, there are some differences, which may be important during the decision-making proces over backup and recovery solutions.

The GitProtect.io’s web user interface allows us to configure the appropriate plans, select storage, schedule regular backups, change schedules, create backup window, set up automated backups, create differential or incremental backups, or whatever we want to do or change with just a few clicks. It is a more convenient and versatile backup solution, requiring you to have less technical knowledge. Moreover, it provides different models for recovery solution, like point-in-time restore, cross-over recovery, and granular restore of only data you need to ensure data availability and all your data protection.

Conclusion

Let’s also note the basic difference between the two options. GitHub Backup Utilities is a part of GitHub Enterprise Server, the main purpose of which is quite different from taking care of backups and disaster recovery system. Yes, we do have data backup configuration options, but that is not the main reason for using Enterprise Server.

On the other hand, there is GitProtect.io, which was created precisely (with its user-friendly interface) to protect workloads and business value, prevent data loss, and take care of the continuity of backups, data long-term retention, and recovery plans. And in this field, it is simply better and offers more possibilities.

Nothing prevents you from using both solutions. For now, all we need to do is to answer for ourselves which will be a better choice for us and which we will ultimately opt for recovery assurance.

[FREE TRIAL] Ensure compliant DevOps backup and recovery with a 14-day trial 🚀
[CUSTOM DEMO] Let’s talk about how backup & DR software for DevOps can help you mitigate the risks

You may find this topic interesting: git clone with SSH key. For many security experts, SSH is a golden standard, hence it’s worth to implement it into your git operations. Using SSH keys you can limit the risk of data interception by unauthorized persons. Another benefit is…

Before you go:

📚 Find out the top reasons why it’s critical to back up your GitHub data – DevOps backup – top reasons for DevOps and Management

🔎 Learn more about security best practices to make sure that your GitHub repositories and metadata are safe and sound

✍️ Subscribe to our DevOps X-Ray Newsletter on the most recent news in the world of DevOps and DevSecOps

Comments are closed.

You may also like