Have you ever caught yourself thinking about what life is going to be in a 1,000 years from now? We have already noticed that life goes in a spiral but with huge modifications. So, it is difficult to imagine what life of our future generations will be, but technologies are going to be their part for sure. What modern technologies will they have? It’s hard to predict, but our “100-grand”-times children will definitely know everything about the technologies of nowadays thanks to GitHub and their Arctic Code Vault.  

There are very inspiring words of Steve Jobs: “We’re here to put a dent in the universe. Otherwise why else even be here?” And that is true! Just look, the Mayans left us their calendars, the Polynesians left us their massive stone heads on Easter Island – thus, we know at least something about their life. However, GitHub decided to go further and show not only life but the way the developers think. They made up their amazing Arctic challenge – GitHub Archive Program.

What is the GitHub Archive Program?

GitHub attracted many organizations, so to say, archive partners, among which are Software Heritage Foundation, Internet Archive, Long Now Foundation, the Arctic World Archive, Microsoft Research’s Project Silica, GHTorrent, and GHArchive to create a GitHub repository that will differ from what they usually do – GitHub Arctic Code Vault. This program ensures preservation of the world’s open source software in a long-term perspective. The main incentive of the idea is to leave the repositories with all their metadata for future generations – for them to give detailed understanding of what technologies were in the past. So to say, to leave future technology historians food for their brains.

Where to find this Arctic Code Vault?

To save it from curious onlookers GitHub hid all the information about modern software development in the Arctic Circle, to be precise, in a decommissioned coal mine deep beneath an Arctic mountain in Avalbard, a Norvegian archipelago between mainland Norway and the North Pole.

What does this Arctic Github data repository contain?

This Arctic vault includes roughly 21 trillion bytes of data, the snapshot of which was captured on the “mirror” data 02/02/2020. It is a single copy at a single definite time of every active public GitHub repository with its metadata (including every pull request, issue, wiki, etc.), which was being developed at the platform at that time. 

Which repositories were snapshotted? 

  • every active public repository with any commits within 80 days before the “big” data. 
  • every active public repo that had at least one star within 365 days before 02/02/2020.
  • every active public repository that had at least 250 stars without any time limitations. 

If you are curious how many people made their community input and who exactly they are, it is easy to check. Because, to honor every DevOps and his public repository, Github created a special badge, Arctic Code Vault Badge, which is displayed in the highlights section of DevOps’ GitHib account.

How is the data kept?

Though the question is open: “How will future generations of DevOps read all those codes?” The answer is on the surface – GitHub archived all the software, and recorded the information on the reels of films. There are 187 reels full of digital photosensitive archival film in the form of QR codes, and 1 reel written in human language – the most important one, as it is a “guide reel”, which is called the Tech Tree. Why? Because it is a kind of guidance which contains human-readable information for DevOps of the future to understand how to operate all those QRs. 


Eliminate data loss risk and ensure business continuity with the first TRUE Disaster Recovery software for GitHub.


No way! How to read it?

Actually, GitHub has created a complicated, but smart system to keep the data. Thus, let’s see how everything works. 

Due to the fact that the information is kept in the form of QR codes, future programmers will need to decode it. So to say, make the information readable for a machine. However, this is only the first step. As the information you get after decoding is compressed, future DevOps will need to decompress it to make it meaningful. After that they will get an archive file which contains a software project’s repository. And it will be like a book for them to read, as one repository can contain many files. But they could read it only if they find a machine which will be able to read this binary code, consisting of ones and zeros, like 11010100. 

So what is next..?

After all the manipulations are done – comes the most interesting – how to read and run the data? For sure, their modern computing will be much more advanced and can run everything, but what about configurations of the past, the ones we have today? That’s why let’s get back to the Tech Tree – human readable guidance, as it includes the instructions on how to build a machine to read all those codes, so that future generations could get the best use of it. 

What about the language?

Nowadays the human language of coding is English, no matter what popular programming languages are used (Java, Python, JavaScript, Ruby), but who knows what will happen in the future and which language will be in favor. That is why they provide information in other languages as well. So, future ‘kids’ will have a possibility to read the “Guide to the GitHub Code Vault” in five languages.

What is more, they even included an uncompressed UTF-8 file containing the Universal Declaration of Human Rights on more than 500 available languages at the beginning of every reel and also in the TechTree. Why? Because it is important for future generations to know what rights and freedoms we have now, as it is an essential part not only of human history, but technical history, as well.

What is the trick?

In the beginning we said that there is just one single snapshot of the data. But what if a natural disaster happens or an intruder comes across right now? What will be with all those efforts and resources? They will be lost for good. So, just because of this “in case” it is always better to have a backup. It will help to protect your code not only for future generations, but also for future reference which you will definitely need earlier in your life. Moreover, GitProtect.io provides you with unlimited retention – and it is much more than 1,000 years so you can use it to archive your old, unused repositories.

Comments are closed.

You may also like