Managing Git Projects: Git Subtree vs. Submodule
Last Updated on January 2, 2025
While working on a project, it’s common to have it combined with another one, especially if you work in a network with other people. It might be a library built by other developers, or a piece of the project developed independently and then reused in several projects. When such a thing happens, you want to keep both projects distinct yet you want to be able to use one of them in a different one. This post was written to assist you in managing projects using Git subtree and submodule. We will show you the key differences, so you can decide which choice is the best for you.
What is git submodule
A Git submodule is a separate repository within a repository, to put it simply. Project management is advantageous in a variety of ways. A submodule is similar to a child repository in a way that pointer commits must be manually updated. They are easy for a team to work together at the same time.
Why should I use submodules?
You don’t clone or integrate any of the actual code in your new repository when you use many submodules, it’s better to say that you include links to the forest repository on GitHub. These pointers lead to a submodule commit in a different repository.
Git submodules enable you to preserve one git repository as a sub directory of another. Also, Git submodules allow you to include and track the version history of external code in your Git repository.
Git’s basic package includes submodules that allow Git repositories to be nested within other separate repositories. The Git submodule, to be exact, corresponds to a specific commit on the child repository.
Scenarios where submodules are useful
To manage the versioning of external dependencies for a project, you can use the Git submodules feature. For example, here are some scenarios in which you can use git submodules:
- You can lock the code to a specific commit for your own safety when an external component or subproject is changing too quickly or forthcoming modifications would break the API.
- When you wish to track a vendor dependence for a component that isn’t automatically updated too often.
- When you delegate a project component to a third party and wish to include their work at a certain time or release. When the changes aren’t too frequent, this method works well.
How to use git submodules?
Firstly, create a new submodule using the git submodule command that saves the path and hyperlink references in a folder called .gitmodules.
For example, to clone a repository with submodules, use:
git clone --recursive <URL to Git repo>
If you’ve previously cloned a repository and wish to load its submodules, use:
git submodule update --init
If there are nested submodules, do the following:
git submodule update --init --recursive
Specify a branch for a submodule using:
git submodule set-branch -branch <branch name> -- <submodule path>
Or change branch using:
git submodule change branch
Make sure that when making any changes within a submodule, these changes are committed in the main repo to track the submodule’s current state. Also, remember that changes in a submodule are not automatically pushed with the main project. So, you should go to each submodule and push changes individually.
A great alternative for git submodule – git subtree
Imagine your Git repository as a tree. Within this structure, a git subtree serves as a smaller, manageable version of the main tree. Unlike submodules, git subtrees allow you to nest one repository inside another as a subdirectory, offering a more seamless and flexible integration. They can be committed to, branched, and merged just like any other repository. This flexibility makes them an excellent alternative to submodules, particularly when you need to incorporate and manage a project within another.
According to Git’s official documentation, when performing a subtree merge, Git recognizes the relationship between the two projects, allowing for intuitive merging and management. This approach is especially beneficial for projects requiring close integration without the overhead of submodule management.
Why consider git subtree?
- It has the same functions as a standard repository.
- It’s easy to use it with your main repository because it’s saved as commits.
- The module’s contents can be changed without a necessity to create a separate repository copy of the dependency.
- Users of your current repository do not need to learn anything new to use the git subtree. They can forget the fact that you’re managing dependencies with git subtree.
- Unlike git submodule, git subtree does not create new metadata files (i.e., .gitmodule).
How to use git subtree?
A subtree can be added to a parent repository. To add a new subtree to a parent repository, you must enter the following commands – firstly, remote add it, secondly, use the subtree add command, which looks like this:
git remote add remote-name <URL to Git repo>
git subtree add --prefix=folder/ remote-name <URL to Git repo> subtree-branch name
The commit history of the whole child project gets merged into the parent repository after such commands.
Changes to and from the subtree are pushed and pulled using:
git subtree push-all
git subtree pull-all
Best scenarios to use git subtree
Now, let’s address some situations where the use of git subtrees is applicable and beneficial. To start off, take a look at managing heavy dependencies (i.e. external tools or libraries) – git subtree allows you to integrate the entire history of a dependency right into the parent repo. In this case, the user does not need any extra tools or specialized knowledge to be able to work with a dependency and it actually functions just like any other part of the repository. Subtrees can prove useful for projects with complex dependencies that require simple integration into the main codebase. Using subtrees takes away the need for separate .gitmodules files, which makes it easier for users to manage.
Another case is the handling of frequent updates. When using subtrees, you can pull and push updates to a dependency as part of the regular commit workflow. This way you can further simplify development processes. How? Well, changes to dependencies and the parent project can be synchronized and therefore, there is no need to manage any external links or metadata files (which often complicates the use of submodules). Subtrees can be rather useful for teams that frequently update libraries or modules.
Git subtree vs submodule
As you can see, both options can be used for different things. Now let’s put these two directly against each other to help you find the right option for your needs.
Similarities
External git repositories can be incorporated into other git repositories using git submodule and git subtree. Both techniques allow you to link a specific version of an external component to the local repository and bundle them. Both tools keep tracking the external repository’s history, enabling you to check out previous commits.
So, git submodule or git subtree?
Submodules have been around for a long time, and have their command (git submodule) and extensive documentation. If we compare it with adding a subtree, adding a submodule is fairly straightforward. All of the hazards and flaws do not appear until the last moment, which can be annoying.
Git submodule is sometimes the best option. This is especially true if your codebase is big and you don’t want to keep downloading it, as many existing codebases do. Submodules are then used to make it easy for other users, who have no need to download complete blocks of code, to collaborate with you. Because submodule code is the central code used by all container projects, you should aim to keep it independent of other container details.
As for git subtree – it integrates the external repository’s history directly into the main project. As a result, you get a unified commit history, unlike the git submodule which keeps separate histories. However, in terms of impact on the main repo’s size, subtrees add the actual code of the external repository and potentially increase the size. In contrast, the git submodule has minimal impact.
Git subtree also allows updates and integration of changes from external code. This is beneficial for projects that require frequent and two-way integration. Now, this makes them a good pick for managing large and/or complex repos. Why? Because they streamline the process of adding external changes without the complexity that is typically associated with the git submodule.
Shortcomings of git submodules
- Cloning repositories, which contain submodules, requires downloading the submodules separately. The submodule folders will be empty after cloning if the source repository is moved or becomes unavailable.
- This is related to a couple of major disadvantages of Git submodules, including locking to a certain version of an external repository, a lack of good merge management, and the widespread assumption that the Git repository is unaware that it has become a multi-module repository.
Shortcomings of git subtrees
Among the challenges, it’s possible to enlist:
- A new merging approach must be learned.
- It’s a little more difficult to contribute code for the sub-projects upstream.
- You must be sure that super and sub-project code is not mixed in new commits.
Ensure data protection to your git repositories hosted in GitHub, GitLab, or Bitbucket and make your source code ransomware-proof and disaster-resistant.
Summary
Summary
Each tool has advantages and disadvantages. Here are some aspects to consider when you decide which one is ideal for you.
- Component-based development favors Git submodules, whereas system-based development favors Git subtrees.
- Git submodules have a smaller repository size since they are just links to a single commit in a subproject; whereas Git subtree lets you store the whole subproject, including its history.
- Subtrees are decentralized, while Git submodules must be accessible on the server.
A Git subtree isn’t the same thing as a Git submodule. There are certain restrictions on when and how each of them can be used. If you’re going to upload code to a third-party repository, consider a Git submodule since it’s faster to do so. Use a Git submodule if you have a third-party code that you won’t probably push since it is easier to pull.
Frequently Asked Questions:
What is the difference between Git main module and Git submodule?
The main module is the primary (parent) repository for a project which contains all the data. In terms of version history, it is managed independently and it tracks its commits along with branches. As for submodules, they are essentially repositories within a parent repository, at a specific path in the parent repository’s working directory.
The main difference between them is the purpose they serve. Using submodules permits cloning another repo into your project while keeping your commits separate.
When should we use Git LFS?
The usual case when you will need to use the Git LFS extension is to deal with large files when repo history is being transferred to the client during cloning. More precisely, cloning large files that are being modified regularly, since many different versions must be downloaded and that can be time-consuming.
By using Git LFS, you guarantee that these larger files will not be downloaded during cloning or fetching but during the checkout process. This way the effect of such large files on your repo is reduced. That is achieved by replacing the large files with smaller pointer files that can be used to map the locations of files later.
Why do we need multiple branches in Git?
The need for multiple branches in Git derives from project development and related security concerns. The main idea is to have another line of development that is completely isolated from the main master branch. Different branches give the ability to work simultaneously on multiple different modules without affecting the main branch. If your development team works from different remote locations, this could simplify the workflow.
Also, while developing or remaking features or fixing bugs you can do so in different branches and simply merge to your master branch once jobs are finished. Whereas with no branching it is more complicated. If you developed a feature on your main branch, then a second feature on that same branch and you will need to get rid of the first feature, you will have to delete parts of the code and adjust it properly to ensure there are no glitches.
What is the minimum version of git submodule?
The minimum version of git that supports submodules is 1.5.3.
Before you go:
🔎 Check out the top reasons why it’s worth starting to back up DevOps tools as soon as possible
🐙 Do you think that if you use GitHub/GitLab/Bitbucket, you don’t need a backup? We’ve busted this myth in our DevSecOps MythBuster blog post! Check it out!
📚 Don’t miss our series of articles where we’ve investigated 2023 for threats: Atlassian security incidents, infamous GitHub-related incidents, and GitLab vulnerabilities and security incidents
👀 Read our comprehensive analysis, Your own Git backup script vs. repository backup software, and see which option better meets your requirements
✍️ Subscribe to GitProtect DevSecOps X-Ray Newsletter and always stay up-to-date with the latest DevSecOps insights
📅 Schedule a live custom demo and learn more about GitProtect backups for your DevOps data protection
📌 Or try GitProtect backups for your GitLab, Bitbucket, GitHub, or Jira ecosystem to guarantee data protection and ensure continuous workflow