System Design of GitLab

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

This article talks about the Gitlab system design with a simplified overview of its architecture as well as explanation on the working of the important Gitlab architecture components. The System Design of GitHub will be on similar lines.

GitLab is a web-based DevOps lifecycle tool that provides a Git repository manager providing wiki, issue-tracking and continuous integration and deployment pipeline features, using an open-source license, developed by GitLab Inc.

The simplified gitlab architecture below, can be used to understand the working of Gitlab.

gitlab architecture

Gitlab is typically install on GNU/Linux however, the largest known Gitlab instance in on Gitlab.com. Now, let's talk about the gitlab architecture in detail.

The Gitlab architecture is made up of the following components:

Nginx
PUMA
SIDEKIQ
GITLAB WORKHORSE
POSTGRESQL
HTTP/HTTPS
SSH
GITALY
REDIS

We will dive into each component deeper now.

NGINX

Gitlab uses NGINX as a web server to proxy through GitLab Workhorse and into the Puma application server.

NGINX has an Ingress port for all HTTP requests and routes them to the appropriate sub-systems within GitLab.

Gitlab uses NGINX because it can handle a high volume of connections.

NGINX is commonly used as a reverse proxy and load balancer to manage incoming traffic and distribute it to slower upstream servers – anything from legacy database servers to microservices. NGINX is also one of the fastest web servers.

Alternatives to Nginx: Apache, Apache Tomcat, LiteSpeed Web Server, and AWS Elastic Load Balancing can also be used similar purposes.

PUMA

GitLab serves web pages and the GitLab API using the Puma application server. Puma is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Gitlab used to serve web pages using the Unicorn, however, Gitlab 13.0 no longer supports Unicorn.

The preference for Puma might possibly be because unlike other Ruby Webserver, Puma was built for speed and parallelism. It also provides a very fast and concurrent HTTP 1.1 server for Ruby web applications.

Some of the most popular alternatives and competitors to Puma are Atlas, Panther, NGINX, Apache HTTP Server, and Microsoft IIS.

SIDEKIQ

Gitlab uses Sidekiq as a job queue which, in turn, uses Redis as a non-persistent database backend for job information, metadata, and incoming jobs. Sidekiq is a Ruby background job processor that pulls jobs from the Redis queue and processes them.

Sidekiq is usually preferred for rails applications due to it simplicity and efficiency in background processing to handle many jobs simultaneously in the same process using threads.

A popular alternative to Sidekiq is kafka.

GITLAB WORKHORSE

Usually, the communication between Puma and Workhorse is done through a Unix domain socket, however, it can also be done by forwarding requests via TCP. Workhorse accesses the gitlab/public directory, bypassing the Puma application server to serve static pages, uploads and pre-compiled assets.

POSTGRESQL

The GitLab application uses PostgreSQL for persistent database information such as users, permissions and issues. GitLab stores the bare Git repositories in the location defined in the configuration file, repositories: section. It also keeps default branch and hook information with the bare repository.

Why should anyone use POSTGRESQL?

Postgres allows you to store large and sophisticated data safely. It helps developers to build the most complex applications, run administrative tasks and create integral environments.

HTTP/HTTPS

When serving repositories over HTTP/HTTPS GitLab uses the GitLab API to resolve authorization and access and to serve Git objects.

Git operations over HTTP use the stateless “smart” protocol described in the Git documentation, but responsibility for handling these operations is split across several GitLab components.

HTTP/HTTPS in GitLab

SSH

The add-on component GitLab Shell serves repositories over SSH. It manages the SSH keys within the location defined in the configuration file, GitLab Shell section. The file in that location should never be manually edited.

GitLab Shell accesses the bare repositories through Gitaly to serve Git objects, and communicates with Redis to submit jobs to Sidekiq for GitLab to process. GitLab Shell queries the GitLab API to determine authorization and access.

SSH in GitLab

GITALY

Gitaly executes Git operations from GitLab Shell and the GitLab web app, and provides an API to the GitLab web app to get attributes from Git (for example, title, branches, tags, or other metadata), and to get blobs (for example, diffs, commits, or files).

Gitaly provides high-level RPC access to Git repositories. It is used by GitLab to read and write Git data.

REDIS

Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability.

In Gitlab architecture, Redis is packaged to provide a place to store, session data, temporary cache information, and background job queues.

Gitlab uses Redis purposely for Caching, as a job processing queue with Sidekiq, to manage the shared application state, and as a pub/sub queue for ActionaCable.

With this article at OpenGenus, you must have the complete idea of System Design of GitLab.