Managing Git Projects: Git Subtree vs. Submodule
Last Updated on October 16, 2024
While working on a project, it’s common to have it combined with another one, especially if you work in a network with other people. It might be a library built by other developers, or a piece of the project developed independently and then reused in several projects. When such a thing happens, you want to keep both projects distinct yet you want to be able to use one of them in a different one. This post was written to assist you in managing projects using Git subtree and submodule. We will show you the key differences, so you can decide which choice is the best for you.
What is git submodule – why and how to use it?
A Git submodule is a separate repository within a repository, to put it simply. Project management is advantageous in a variety of ways. Submodules are similar to child repositories in the way that pointer commits must be manually updated. They are easy for a team to work together at the same time.
You don’t clone or integrate any of the actual code in your new repository when you use many submodules, it’s better to say that you include links to the forest repository on GitHub. These pointers lead to a submodule commit in a different repository.
Git submodules enable you to preserve one git repository as a sub directory of another. Also, Git submodules allow you to include and track the version history of external code in your Git repository.
Git’s basic package includes submodules that allow Git repositories to be nested within other separate repositories. The Git submodule, to be exact, corresponds to a specific commit on the child repository.
To manage the versioning of external dependencies for a project, you can use the Git submodules feature. For example, here are the scenarios in which you can use git submodules:
- You can lock the code to a specific commit for your own safety when an external component or subproject is changing too quickly or forthcoming modifications would break the API.
- When you wish to track a vendor dependence for a component that isn’t automatically updated too often.
- When you delegate a project component to a third party and wish to include their work at a certain time or release. When the changes aren’t too frequent, this method works well.
How to use git submodules?
Firstly, create a new submodule using the git submodule command that saves the path and hyperlink references in a folder called .gitmodules.
For example, to clone a repository with submodules, use:
git clone –recursive <URL to Git repo>
If you’ve previously cloned a repository and wish to load its submodules, use:
git submodule update –init
If there are nested submodules, do the following:
git submodule update –init –recursive
Specify a branch for a submodule using:
git submodule set-branch -branch <branch name> — <submodule path>
Or change branch using:
git submodule change branch
A great git submodule alternative – git subtree
Imagine your Git repository as a tree. Within this structure, a subtree serves as a smaller, manageable version of the main tree. Unlike submodules, subtrees allow you to nest one repository inside another as a subdirectory, offering a more seamless and flexible integration. They can be committed to, branched, and merged just like any other repository. This flexibility makes them an excellent alternative to submodules, particularly when you need to incorporate and manage a project within another. According to Git’s official documentation, when performing a subtree merge, Git recognizes the relationship between the two projects, allowing for intuitive merging and management. This approach is especially beneficial for projects requiring close integration without the overhead of submodule management.
Why consider git subtree?
- It has the same functions as a standard repository.
- It’s easy to use it with your main repository because it’s saved as commits.
- The module’s contents can be changed without the necessity to create a separate repository copy of the dependency.
- Users of your current repository do not need to learn anything new to use the git subtree. They can forget the fact that you’re managing dependencies with git subtree.
- Unlike git submodule, git subtree does not create new metadata files (i.e., .gitmodule).
How to use git subtree?
A subtree can be added to a parent repository. To add a new subtree to a parent repository, you must enter the following commands – firstly, remote add it, secondly, use the subtree add command, which looks like this:
git remote add remote-name <URL to Git repo>
git subtree add –prefix=folder/ remote-name <URL to Git repo> subtree-branch name
The commit history of the whole child project gets merged into the parent repository after such commands.
Changes to and from the subtree are pushed and pulled using:
git subtree push-all
git subtree pull-all
Git subtree vs submodule
Similarities
External git repositories can be incorporated into other git repositories using git submodule and git subtree. Both techniques allow you to link a specific version of an external component to the local repository and bundle them. Both tools keep tracking the external repository’s history, enabling you to check out previous commits.
Submodules or subtrees?
Submodules have been around for a long time, and have their command (git submodule) and extensive documentation. If we compare it with adding a subtree, adding a submodule is fairly straightforward. All of the hazards and flaws do not appear until the last moment, which can be annoying.
Submodules are sometimes the best option. This is especially true if your codebase is big and you don’t want to keep downloading it, as many existing codebases do. Submodules are then used to make it easy for other users, who have no need to download complete blocks of code, to collaborate with you. Because submodule code is the central code used by all container projects, you should aim to keep it independent of other container details.
Shortcomings of git submodules
- Cloning repositories, which contain submodules, requires downloading the submodules separately. The submodule folders will be empty after cloning if the source repository is moved or becomes unavailable.
- This is related to a couple of major disadvantages of Git submodules, including locking to a certain version of an external repository, a lack of good merge management, and the widespread assumption that the Git repository is unaware that it has become a multi-module repository.
Shortcomings of git subtrees
- A new merging approach must be learned.
- It’s a little more difficult to contribute code for the sub-projects upstream.
- You must be sure that super and sub-project code is not mixed in new commits.
Ensure data protection to your git repositories hosted in GitHub, GitLab, or Bitbucket and make your source code ransomware-proof and disaster-resistant.
Summary
Each tool has advantages and disadvantages. Here are some aspects to consider when you decide which one is ideal for you.
- Component-based development favors Git submodules, whereas system-based development favors Git subtrees.
- Git submodules have a smaller repository size since they are just links to a single commit in a subproject; whereas Git subtrees store the whole subproject, including its history.
- Subtrees are decentralized, while Git submodules must be accessible on the server.
A Git subtree isn’t the same thing as a Git submodule. There are certain restrictions on when and how each of them can be used. If you’re going to upload code to a third-party repository, consider a Git submodule since it’s faster to do so. Use a Git submodule if you have a third-party code that you won’t probably push since it is easier to pull.
Frequently Asked Questions:
What is the difference between Git main module and Git submodule?
The main module is the primary (parent) repository for a project which contains all the data. In terms of version history, it is managed independently and it tracks its commits along with branches. As for submodules, they are essentially repositories within a parent repository, at a specific path in the parent repository’s working directory.
The main difference between them is the purpose they serve. Using submodules permits cloning another repo into your project while keeping your commits separate.
When should we use Git LFS?
The usual case when you will need to use the Git LFS extension is to deal with large files when repo history is being transferred to the client during cloning. More precisely, cloning large files that are being modified regularly, since many different versions must be downloaded and that can be time-consuming.
By using Git LFS, you guarantee that these larger files will not be downloaded during cloning or fetching but during the checkout process. This way the effect of such large files on your repo is reduced. That is achieved by replacing the large files with smaller pointer files that can be used to map the locations of files later.
Why do we need multiple branches in Git?
The need for multiple branches in Git derives from project development and related security concerns. The main idea is to have another line of development that is completely isolated from the main master branch. Different branches give the ability to work simultaneously on multiple different modules without affecting the main branch. If your development team works from different remote locations, this could simplify the workflow.
Also, while developing or remaking features or fixing bugs you can do so in different branches and simply merge to your master branch once jobs are finished. Whereas with no branching it is more complicated. If you developed a feature on your main branch, then a second feature on that same branch and you will need to get rid of the first feature, you will have to delete parts of the code and adjust it properly to ensure there are no glitches.
What is the minimum version of git submodule?
The minimum version of git that supports submodules is 1.5.3.
Before you go:
🔎 Check out the top reasons why it’s worth starting to back up DevOps tools as soon as possible
🐙 Do you think that if you use GitHub/GitLab/Bitbucket, you don’t need a backup? We’ve busted this myth in our DevSecOps MythBuster blog post! Check it out!
📚 Don’t miss our series of articles where we’ve investigated 2023 for threats: Atlassian security incidents, infamous GitHub-related incidents, and GitLab vulnerabilities and security incidents
👀 Read our comprehensive analysis, Your own Git backup script vs. repository backup software, and see which option better meets your requirements
✍️ Subscribe to GitProtect DevSecOps X-Ray Newsletter and always stay up-to-date with the latest DevSecOps insights
📅 Schedule a live custom demo and learn more about GitProtect backups for your DevOps data protection
📌 Or try GitProtect backups for your GitLab, Bitbucket, GitHub, or Jira ecosystem to guarantee data protection and ensure continuous workflow