Fortunately, Git has the right tools to recover deleted files. We are going to discuss one of the ways that will allow us to do this, named the GIT RESTORE function.
Behold – git restore
The RESTORE function was added to git version 2.23 (August 2019), so it is a relatively new thing. However, it is becoming an increasingly popular option, despite the fact that the official documentation still says:
“THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE. ”.
Anyway, it is a nice tool. It allows to unstage changes from Staging Area or to discard local changes. When we run the GIT STATUS command, the GIT RESTORE operation will be the suggested method to undo changes and it actually replaces the RESET command. Here you can read more about other ways to restore or remove.
Restore removed file
However, we will focus today not on cleaning up the Staging Area, but on restoring already deleted files. Let’s consider two cases here:
- restore locally deleted file
- restore a file removed from the external repository
As we know from the aforementioned article, changing the git history is a dangerous action. So the first case is easier because we can manipulate local history and local commits (those that have not yet been synchronized with the external repository). In the second case, however, we need to be more careful and take care of the correct history of commits before we push our code. Fortunately, using GIT RESTORE allows us to recover files without changing the history.
Regardless of which of the above situations we have, if we want to restore removed file, we first need to find out in which commit the file was deleted. We will use the REV-LIST operation for this. Let’s assume that we deleted the README.md file and now want to get it back.
Command: git rev-list HEAD – README.md will show us the list of commits that contains the file.
Probably the first commit on the list (the newest one) is the one where our file was deleted, so we should be more interested in the next one, in this case 6b2f73 ..
Once you know the hash of the commit that contained the deleted file, just run git restore with the appropriate parameters, i.e. the –source flag, hash and filename:
git restore –source 6b2f73 README.md
As a result, our file will be restored and marked as Untracked, but this is not a problem for us and we can add it as a new commit at any time.
You could use the CHECKOUT operation here to switch to a specific commit and manually recover the file you are interested in from there, but that doesn’t sound the best or the smartest. In addition, by using the CHECKOUT, we risk modifying our history. Using the GIT RESTORE operation allows us to keep the history and only make a new change that restores our file.
Restore deleted branches
We already know how to restore individual files. But what if we delete a branch? Is it possible to recover the entire branch? Of course, it is, and I’ll show you how to do it in a moment. The only difficulty is the need to obtain the appropriate SHA of the commit on the branch. Usually, it will be relatively easy, but I can imagine a complicated situation and a branch deleted long ago, which will be difficult to dig into.
Anyway, we can use the familiar REV-LIST operation to find the SHA of the commit that contains the file, or we can use the REFLOG tool, which I think is a better idea. Having the appropriate SHA code, we use the CHECKOUT command, and specifically build the following command:
git checkout -b BRANCH SHA, where BRANCH is the name of the deleted branch, and SHA is the commit’s hash that the branch pointed to.
Let’s consider how it is possible to recover data in git at all, since we have deleted them. The GIT RESTORE operation should not raise any doubts – git has a linear history. Once a given file existed in the repository, so despite its removal, the file still exists, saved in previous commits. We just recover its contents. But what happens when we delete a branch? What is a branch at all? By itself it does not store any information, commits do record it. Branch is just a pointer to the commit, and the only information it contains is the SHA code of the commit it points to. This is why branches in GIT run very fast and are very “light”.
But it also has consequences. When we delete a branch, i.e. an indicator, we don’t delete the commit it was pointing to. So in theory, this commit and changes from a given branch are still in our repository. It is then called “orphaned commits”. They exist, but there is nothing to point to them. They are alone, unrelated to anything, and in fact invisible. When we delete a branch, we will not see these commits either using the LOG function or from the browser level, when we open our repository on GitHub or Bitbucket.
The only way to recover or view them is to know their SHA codes. Now let’s look at it from a security perspective. We’ve deleted some branch with critical data and we think we’re safe. Well, not really. These files could still be recovered by a criminal or hacker.
In the IT world, you can come across something like Garbage Collector. It is a mechanism that cleans up, for example, unused objects or files. Such a mechanism exists, among others in Java Virtual Machine, but also exists in git, which many people don’t even realize. By default, “unnecessary” data is stored in git for 90 days, after which Garbage Collector will get rid of them, but only if nothing indicates a given commit. It may happen that we have reflog entries, or some other branch points to our commit, etc. Only completely “useless” items will be removed. GC further optimizes certain things and allows for less memory usage.
Having knowledge of data recovery, deletion (previous article) and GC, we see both great opportunities and great risks here. We must always bear in mind the possibility of doing what we do not want and the possibility of harmful action by criminals. We should always be prepared for this and make regular backups.
In the case of introducing critical data to the repository and then deleting it, we should also make sure that GC cooperates with us. Appropriate configuration and the ability to manually start the GC allows us to maximize the security of our data, but it will never give us 100% certainty. A suitable backup tool can prove to be an indispensable ally to restore removed files or data.