The world is constantly changing. Nowadays, even at a very fast pace. My parents, when they were my age, didn’t have a color TV at home, the phone was only a landline, and something like the Internet existed only in sci-fi books. One generation has passed and we have smart homes, Boston Dynamics robots, each of us has a powerful computer connected to the network in our pocket, and a human is getting ready to land on Mars. Yes, change and development are something that defines our times.

Of course, it’s the same in the IT world. Here the changes are even faster than elsewhere. In 1954, the FORTRAN programming language was created, where code written in this language was then compiled into a computer-readable form. It was a milestone in software development. Initially, programs were created on punched cards or paper tapes, it was not until the late 1960s that data storage devices and computer terminals appeared.

The development of the Internet made it possible to share work on one program by many programmers. The first version control systems began to emerge. From then on, the source code did not have to be saved on the developers’ physical devices, but on an external server to which anyone could connect and quickly and easily download or add new changes. It was another, now a bit forgotten, breakthrough in software development. Thanks to this solution, practically everyone who has a computer and an Internet connection could be a programmer. From any place in the world!

Currently, working with version control systems is the bread and butter of every developer. Git is a king here, one and only. It’s very fast, easy to learn the basics, and free. Most often we come across web services that offer Git support plus many additional options. The most popular services are GitHub, GitLab, and Bitbucket. If someone starts to learn programming today, most likely he or she will be using one of them from the beginning.

What is SVN?

This abbreviation stands for Subversion, which is an open-source version control system, founded in 2000. However, its version 1.0 was released in February 2004. Currently, the newest version is 1.14 from May 2020, so as you can see, it is still supported and in use. However, the best years of this system are long over. And it is very clearly visible – Stack Overflow’s 2021 Developer Survey shows nearly 95% of developers are using Git. So let’s check how these two systems differ from each other and where the overwhelming advantage of one of them comes from.

GitSVN
architectureDistributed – act as both server and client.Centralized – requires a single point of sync.
cloningPossible for anyone
(only permissions can block Git clone).
No cloning option.
The checkout function copy the files from the SVN repository to the local working copy.
branchingReferences to a specific commit.
Easy to switch.
Branches are directories.
All are public.
Switching is time-consuming (reloading the project).
access controlEvery user has the same write/read access (by default).Path-based user permissions.
Easier to keep the access control.
offline workPossible – we have a copy of the repo on our local machine.
After getting back online we need to only sync the changes.
Not possible – every change or action needs to go through the central server.
popularityStack Overflow 2021 survey shows, that 94% of professional programmers use Git.
GitHub, GitLab, and Bitbucket have more than 100 million registered users combined. 
73% of repositories are Git-based*.
Some popular projects like PuTTY or WordPress use SVN.
23% of repositories are SVN-based*.

* according to OpenHub.

SVN to Git migration

Suppose we have already made a decision to migrate. Now we have to ask ourselves how to do it correctly. But before we get into that, one more important thing needs to be established. Well, when moving from one VSC to another, we have two strategies to choose from:

  • create a mirrored repository and maintain both repositories for some time,
  • transfer the entire manufacturing process to the new system and stop using the old one.


If we have the option of using the second choice, it is definitely better. In this way, we avoid double maintenance and thus minimize the risk. So single migration seems to be the preferred and more secure way.

First things first. Before we start migrating our repository, we need to prepare a few things. First of all, we need to have Git and SVN environments on our machine. And of course, our source, which is a full local copy of the entire SVN repository. Make sure to download the current version of the main directory of our project. Sample code – here we get the SVN itself:

svn co https://svn.apache.org/repos/asf/subversion/trunk

Another very important step is downloading the author’s list, namely all SVN committers. This will be necessary for the next steps. This part of the migration is tricky and can be different for each company. The problem is that author data is saved in Subversion differently than Git does, which requires the user’s email address. Suppose our developer is John Doe and his login is jodoe. How will different systems store this information?

SVN: jodoe = jodoe<jodoe>
Git: jodoe = John Doe <john-doe@domain.com>

There are several ways to automatically get a list of all authors from our repo. Below you will find a popular script that will make such a list for us. Remember to run it in the main directory of our repository:

svn log -q | 
awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | 
sort -u > authors-transform.txt

Having such a list, we need to create an account for each developer, for example on GitHub, and then map this list to match the Git engine format. This can be tedious work, but necessary for the migration to be fully successful.

Now, the thing is simple but also time-consuming. We can either use the tool built into the Git engine or use third-party software to perform the migration. The latter option can be beneficial when you decide to double maintenance after migration. However, today I am focusing on a one-time migration, and the built-in tools are enough for this. All we have to do is run the instruction below and wait. Important notice! Depending on the size of the repository, it may take up to several hours:

git svn clone <SVN_URL> --no-metadata -A authors-transform.txt --stdlayout <TARGET_DIR>

Here I refer you to read the documentation for the git svn instruction because the number of parameters is so large and complicated that it would require a separate article. I will only mention the –stdlayout flag. We can use it if we have a standard directory layout in the project. However, if we have something non-standard, e.g. instead of /trunk, we have the /trunk /root directory, then we cannot use it and we have to manually enter paths for flags such as –trunk, –branches i –tags.

.gitignore

As you know, ignoring some elements in our version control system is very useful and even necessary. If our SVN repository has an svn: ignore file (and I assume it does) then we have no problem, because we just need to execute a simple command that will change the file from one system to a corresponding one, matching the other.

git svn show-ignore > .gitignore

If we do not have such a file, unfortunately, we have to create our own .gitignore file. Fortunately, there are many ready-made templates, or we can use plugins that will create such a file for us based on the content of the project. Either way, after this step is complete, we should commit to keep the new file.

GitHub SVN migration

Theoretically, we could already push our project to an external repository, e.g. on GitHub, at this point, but a different approach would be safer. We should create a new, bare repository first. What does the –bare flag mean? In short, it omits the working directory. So if we are creating a new shared repository, we definitely want to use this option. More on this can be found in this fairly old but still current post.

So what are the next steps? We create a new repository, add symbolic-ref, and then push our created copy of the SVN repo to the newly created Git repo. When we do this, we are getting closer and closer to the goal. The next step is to “replace” our trunk with the main branch master / main or whatever we want it to be called. These few steps could look like this.

Migration from SVN to GitHub

Be aware that by default, downloaded branches and tags are marked as remotes and we don’t have local versions of them. Here we can manually assign (e.g. using a script) local branches, and then check whether the assignment was done correctly and whether our tags and branches were actually rewritten.

After this stage, the only thing left to do is to push our code from the local copy to an external service like GitHub or any other. I recommend that you read about the differences between the selected services in these posts – Git Battles: GitHub vs. Bitbucket, GitHub vs. GitLab, and GitLab vs. Bitbucket.

Benefits of the migration to Git from SVN

This is how SVN to Git migration with history, tags, and everything else needed to continue our development on new VCS works. At the beginning of this article, I briefly showed the differences between the two systems. And what benefit will such migration bring us? First of all, the market has verified these systems, which can be seen in the level of popularity of the technologies themselves, and the fact that many projects are being transferred to the Git engine. And personally, I see the greatest benefit mainly in how branching works and what benefits it brings. For example, the ease and speed of creating hotfixes. And the great advantage of Git is how pull requests and the code review process work. The code quality and stability of our pipelines are clearly better here in favor of Git.

In addition, the basis of this engine – speed. It is seemingly a small thing, but in the perspective of the entire company, it will be possible to save quite a lot of time. As a developer, I will add my own opinion that working with Git is simply nicer, I feel more control over the workflow. And a happy programmer is an effective programmer and this should be important to everyone who is in charge.

There is another benefit of such migration, which is the ability to use modern backup automation tools, such as GitProtect. There is no need to talk about the importance of this topic. And the moment of performing the migration is a very good moment to take care of the backup and restore plan from the very beginning. Forget about the past, let’s jump into the future. And the future is Git.

Comments are closed.

You may also like