Git is the most popular version control system nowadays. It is a completely free open-source tool that allows you to e.g. work together on the development of source code. Git was developed by Linus Torvalds during the development of the Linux kernel, and its first version was released in 2005 and has been gaining popularity ever since. According to Stack Overflow’s 2018 survey, as many as 87.2% of programmers use Git version control system. The survey for 2020 did not have such a question, but there was a question about “Collaboration Tools” and as many as 82.8% indicated Github, which is only one of several popular services using Git, so overall popularity is even higher.
I would like to add that many popular open-source projects use Git. It is enough to mention popular raster graphics editor GIMP, programming languages such as Perl, Ruby on Rails or the jQuery framework. Each of us can collaborate with this project using the Git VCS.
What is a Git clone?
In order to be able to work with Git, whether in open-source, commercial, or our own projects, we need to have a copy of the repository on our computer. Git is a distributed version control system, which means that each clone is an exact copy of the underlying repository. In an extreme case, e.g. during a failure of the external server and the lack of backups, we can restore the entire repository on the basis of such a copy.
So what is a clone? This is literally a clone, it makes a complete copy of the repository along with a whole history of changes from the beginning of the project. At the same time, clone is also the name of a specific function in Git that allows us to make this copy. Importantly, performing this operation is ‘one-time’, which means that after the first launch, we no longer need this function during further work.
We already know that clone makes a local copy of the entire repository, but there still needs to be some external syncpoint. This is the place where everyone connects their changes and downloads changes made by others from there. Thanks to this configuration, regardless of the number of people working on one project at the same time, each local copy is connected to only this one, the so-called remote repository, and doesn’t need to know anything about the others. The clone function automatically connects our local repository with the remote one, which is also called origin. You can read more about Git clone here.
Let’s clone a Git repository
The clone operation, just like any other function in Git, has a basic default behavior that can be extended with various parameters. Let’s check this git clone example:
git clone <repo_address> <directory> –no-tags –filter = blob: none
This function will copy a project from a given address to a given directory. Moreover, it will skip tags and blob files while downloading data. For correct operation it is enough to provide only the repository address, the rest of the parameters are optional. And there are just a few of them! If you are looking for detailed knowledge about the available parameters, please refer to the official documentation: https://git-scm.com/docs/git-clone
I already mentioned that clone will download all the data along with the entire history of changes. But how will it act on branches? Well, remote-tracking branches will be created, all data will be fetched, but the pull operation will apply only to the main branch. What does it mean? That by default we will have locally created and fetched only this main branch. All others are labeled as remote. From the perspective of an ordinary user, this may not change much, but from the perspective of local Git files, it changes a lot, because the local branches are a mapping of the “real” ones marked as remote.
How to Git clone a specific branch?
One of the parameters for the clone function is –branch (or -b). By default, clone takes all branches and performs a checkout only on the main branch. The above-mentioned parameter allows us to change it and perform a checkout for a particular branch that we specified. However, it won’t change the fact that Git will fetch all branches anyway. This is not what we would like to achieve in this case.
Imagine a repository that has three branches, the master being the main one. Clone operation with the –branch develop parameter will allow us to pull and checkout the develop branch, but what will happen to the other two? Check out the pictures below:
As you can see, all the branches were downloaded anyway. Let’s try to modify our command in such a way as to clone a single branch only. Since Git 1.7.10 (and we currently have version 2.32.0 – released on the 6th of June, 2021) the clone operation has the –single-branch parameter. What does the documentation tell us about it?
“git clone” learned “–single-branch” option to limit cloning to a single branch (surprise!);
tags that do not point into the history of the branch are not fetched.”
So let’s check the operation in practice. We will copy the operation from the previous example, but this time we will add another parameter. Let’s see its effect
Managed to! This time we managed to clone a single branch only. Why do we need this? Sometimes the repository we are working on can be very extensive and we don’t need to download all branches. Both for reasons of saving memory and keeping order and avoiding chaos, such instruction can be useful and helpful for us.
Conclusion on how to clone a repository and how to clone a specific branch
Today we learned how to clone a repository and how to clone a specific branch in Git. It allows us to have more control over what we do, but it also has its consequences. At the beginning, I mentioned that a local copy of the repository, in the extreme case, allows you to restore the project. So each local clone works a bit like a backup of the base repository. The problem appears when this copy contains only a single branch, then of course we do not have the entire repository and we must be aware of it. Proper repository backup is important and should never rely on local reproductions because the parameters of the clone function allow you to filter many items and we can never be sure of the differences between ours and external repos. I recommend using dedicated backup solutions like GitProtect.io to avoid surprises.