Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 30 minutes
Welcome to OpenGenus. This is the introduction to the advanced series about Git.
Even if you are a beginner, you will be able to follow along and pick up some good knowledge. We will explore how Git works internally under the hood.
Why is that important? Well, of course there is some geeky pleasure in understanding how things work, but that's not the most important reason to know this stuff. Give me one minute to tell you the real reason we're talking about the internals of Git.
When you think about Git, you probably think about the high-level user commands, the so called porcelain commands. You will be familiar with the basic ones such as add and commit and if you worked with a remote repository, then you probably also used push and pull, and if you worked with branches, then you used branch, checkout, merge and rebase. The list goes on.
Some people even get a little bit deeper than these into the low-level commands, the so called plumbing commands, such as cat-file, hash-object, and a few more. These are the basic building bricks that the porcelain commands are built upon. You might never need to use the plumbing commands unless you are doing some advanced Git scripting. Now understanding all these commands can be hard, some of them can be confusing; however, here is a key point, you could argue that the secret to Git is not about knowing the commands, either porcelain or plumbing. Instead, the secret to Git is about knowing the conceptual model behind Git.
If you want to use Git safely and unleash all of its power, and not get in trouble, then do not look at the commands, look at the model instead.
Git is a Distributed Revision Control System
There are over a billion computing devices all over the World. To make collaboate among people from different geographical regions possible, Git is distributed in nature.
Git is a Revision Control System
Imagine that Git is not distributed at all. Imagine that there is only one computer in the world, and there is a repository on that computer. That's all you want to think about for the moment. So Git becomes just a revision control system, no distribution. However, a revision control system is still a complex beast, it includes things such as history, branches, merges, and these features make things more complicated, so let us make it simple and peel off one more layer.
Git is a simple content tracker
What happens if you forget about branches, history, and the like?
Now we have a smaller onion, you can call it a simple content tracker, because that's all it does, it tracks content, files, or directories. And if you look at Git's documentation, you will see that this is actually Git's definition of itself, Git, the Content Tracker. If you look at it as a content tracker, then Git is easier to understand, but let's take this process one step farther.
Git is a Persistent Map
Forget even about tracking files, forget about the notion of a commit or versioning.
Let us look at the very core of the onion, the basic idea behind Git.
At its core, Git is just a map, a simple structure that maps keys to values. This structure is persistent, it's stored on your disk. Now we got to the core of the onion.
During this series, we will rebuild the onion from the inside out, and we will understand each layer in depth. In consecutive articles, we will explain each layer in depth.
Git has the following simplied abstractions:
-
A Distributed Revision Control System
-
A Revision Control System
-
A Simple Content Tracker
-
A Persistent Map
Do take a look at the other articles in this series to master Git skills.