Git is a Map and SHA1 hash

Reading time: 30 minutes

Welcome to OpenGenus. This is the second session to the advanced series about Git. In this article, we will see how Git is just a Map.

OpenGenus loves Git

At its core, Git is a map. This means it is a table with keys and values.

What are the keys and values?

The values are sequences of bytes, for example, the content of a text file, or even a binary file. Any sequence of bytes can be a value. You can give a value to Git, and it will calculate a key for it, a hash.

Git calculates hashes with the SHA1 algorithm. Every piece of content has its own SHA1.

For example, let's take a piece of content: the string OpenGenus. If you ask Git to generate a SHA1 out of this string, then you will get this hash. Exactly this one, there is only one hash for this string.

SHA1 hash of OpenGenus

SHA1s are 20 bytes in hexadecimal format, so they are a sequence of 40 hex digits. This will be Git's key to store this content in the map.

We can also calculate the SHA1 on the commandline. To do this, we need a command that you might never have heard about, because it's a low-level plumbing command, git hash-object. We can use the echo command to output this content, and then pipe the result into hash-object. We, also. need to tell hash-object to get its content from standard input.

SHA1 hash of OpenGenus

It prints out the hash for this piece of content. This is the SHA1 for the string OpenGenus.

If you change anything in the content, a single letter or a new line, then you get a completely different SHA1. Every object in a Git repository has a SHA1.

If you put the string OpenGenus in the file, and store this file in Git, then the SHA1 we just generated will identify the file. We will see later, directories also have their own SHA1, as do commits, and so on.

With so many SHA1s around, you might wonder what happens if they collide?

After all, the number of possible SHA1s is large but it is not infinite. What if I have two different pieces of content, and just by chance, they happen to have the same SHA1? Wouldn't that make a mess of my project and cause me to lose my data?

SHA1 has a set of $2^{160}$ possibilities due to which collision is unlikely to happen in our Universe. If you have ever worried that two SHA1s might collide in your Git project, then stop worrying now.