Git is an Onion

Reading time: 30 minutes

Welcome to OpenGenus. This is the introduction to the advanced series about Git.

OpenGenus loves Git

Even if you are a beginner, you will be able to follow along and pick up some good knowledge. We will explore how Git works internally under the hood.

Why is that important? Well, of course there is some geeky pleasure in understanding how things work, but that's not the most important reason to know this stuff. Give me one minute to tell you the real reason we're talking about the internals of Git.

When you think about Git, you probably think about the high-level user commands, the so called porcelain commands. You will be familiar with the basic ones such as add and commit and if you worked with a remote repository, then you probably also used push and pull, and if you worked with branches, then you used branch, checkout, merge and rebase. The list goes on.

Some people even get a little bit deeper than these into the low-level commands, the so called plumbing commands, such as cat-file, hash-object, and a few more. These are the basic building bricks that the porcelain commands are built upon. You might never need to use the plumbing commands unless you are doing some advanced Git scripting. Now understanding all these commands can be hard, some of them can be confusing; however, here is a key point, you could argue that the secret to Git is not about knowing the commands, either porcelain or plumbing. Instead, the secret to Git is about knowing the conceptual model behind Git.

If you want to use Git safely and unleash all of its power, and not get in trouble, then do not look at the commands, look at the model instead.

Git is a Distributed Revision Control System

There are over a billion computing devices all over the World. To make collaboate among people from different geographical regions possible, Git is distributed in nature.

Git is a Distributed Revision Control System

Git is a Revision Control System

Imagine that Git is not distributed at all. Imagine that there is only one computer in the world, and there is a repository on that computer. That's all you want to think about for the moment. So Git becomes just a revision control system, no distribution. However, a revision control system is still a complex beast, it includes things such as history, branches, merges, and these features make things more complicated, so let us make it simple and peel off one more layer.

Git is a Revision Control System

Git is a simple content tracker

What happens if you forget about branches, history, and the like?

Now we have a smaller onion, you can call it a simple content tracker, because that's all it does, it tracks content, files, or directories. And if you look at Git's documentation, you will see that this is actually Git's definition of itself, Git, the Content Tracker. If you look at it as a content tracker, then Git is easier to understand, but let's take this process one step farther.

Git is a simple content tracker

Git is a Persistent Map

Forget even about tracking files, forget about the notion of a commit or versioning.

Let us look at the very core of the onion, the basic idea behind Git.

At its core, Git is just a map, a simple structure that maps keys to values. This structure is persistent, it's stored on your disk. Now we got to the core of the onion.

During this series, we will rebuild the onion from the inside out, and we will understand each layer in depth. In consecutive articles, we will explain each layer in depth.

Git is a Persistent Map

Git has the following simplied abstractions:

  • A Distributed Revision Control System

  • A Revision Control System

  • A Simple Content Tracker

  • A Persistent Map

Do take a look at the other articles in this series to master Git skills.