Reading time: 30 minutes
Distributed version control system (DVCS) is a type of version control system like Git that replicates the repository onto each user’s machine that is each user has a self-contained first-class repository. The other type is Centralised Version Control (CVCS).
There is no need for a privileged master repository, though teams have it by convention, for doing continuous integration.
DVCS has emerged from the thought process that every developer has their own local repository, apart from the central repository. This way, they they don’t check out a snapshot of the source code, but they fully mirror the central repository. This means, DVCSs do not rely completely on the central server for all the versions of the source code, every developer has a clone of the source file that is available in the central repository, and the complete history of the project is available on their own hard drive. This clone has all the metadata of the original source file.
A developer, by means of ‘pulling’, gets the new changes from the central repository to the local repository. Developer’s changes are then applied to the code in the local repository and then ‘pushed’ back to the central repository.
Advantages of Distributed Version Control System
- Other than push and pull, all actions can be performed very quickly, since it is the hard drive, and not the remote server that is accessed every time.
- Changesets can be committed to the local repository first and then a group of these changesets can be pushed to the central repository in a single shot.
- Only the pushing and pulling activities need internet connectivity; everything else can be managed locally.
- Every developer has a complete copy of the entire repository and the impact any change can be checked locally before the code is pushed to the central repository.
- DVCS is built to handle changes efficiently, since every change has a Global Unique Identifier (GUID) that makes it easy to track.
- Tasks like branching and merging can be done with ease, since every developer has their own branch and every shared change is like reverse integration
- DVCS is very easy to manage compared to CVCS.
Disadvantages of Distributed Version Control System
- With many projects, large binary files that are difficult to compress, will occupy more space.
- Projects with a long history, i.e., a large number of changesets may take a lot of time and occupy more disk space.
- With DVCS, a backup is still needed, since the latest updated version may not be available to all the developers.
- Though DVCS doesn’t prevent having a central server, not having a central server might cause confusions in identifying the right recent version.
- Though every repo has its own revision numbers, releases have to be tagged with appropriate names to avoid confusions.
Distributed Version Control (DVCS) vs Centralised Version Control (CVCS)
- DVCS focuses on sharing changes; every change has a guid or unique id.
- Every developer has one local copy of the source code repository, in addition to the central source code repository.
- Distributed systems have no forced structure. You can create “centrally administered” locations or keep everyone as peers.
- DVCS enables working offline. Apart from push and pull actions, everything is done locally.
- CVCS focuses on synchronizing, tracking, and backing up files.
- CVCS works based on a client-server relationship, with the source repository located on one single server, providing access to developers across the globe.
- Recording/downloading and applying a change are separate steps in a centralized system, they happen together.
- CVCS relies on internet connectivity for access to the server.
Distributed Version Control Key Points
1. Private Workspace
Almost all version control tools offer a private workspace. In CVCS, developers get a working copy of the files, which acts as the private space. With DVCS developers get the complete repository as a private copy, which is the most important point to note about DVCS.
This private workspace provides an added advantage in the sense that the developers never have to think about coordinating with others during the development. When there are multiple developers in a team, the situation becomes complex. Normally, version control systems take this responsibility of managing the complexities. With the private space in DVCS, a developer gets a feel that he/she is working alone on the project, for at least a while. Developers have the flexibility to do anything within their private workspace, without affecting the workflow of other developers.
2. Easier Merging
Branching is easy as compared to merging. Branching is like two people going off in their own directions and not collaborating.
People using a CVCS usually avoid branching because most of the centralized tools aren’t good at merging. But on switching to a DVCS, they tend to bring that attitude with them, even though it’s not really necessary. Decentralized tools are better at merging.
The reasons are as follows:
- They’re built on a Directed Acyclic Graph (DAGs). Merge algorithms need prior information about history. A DAG is a better way to represent that kind of information than the techniques used by most centralized tools.
- DVCS keeps the developer’s changes distinct from the merge he/she had to do in order to get the changes committed. This approach is less error-prone, as the developer’s changes are already cleanly tucked away (during commit time) in an immutable changeset. As merge is the only thing that needs to be done, so it gets all the attention it needs. Also, while tracking down a problem, it is easier to figure out if the problem arrived during the changes or the merge, since those two things are distinct in the history.
- They deal with whole-tree branches, not directory branches. The path names are independent for each branch of the tree which improves interoperability with other tooling.
3. Easy to scale Horizontally
For CVCS, the server with the central repository needs to be powerful to serve needs of the entire team. For a team of 10 people, this is not an issue. For larger teams, the hardware limitations of the server can be a performance bottleneck. Some systems expect the server to do a lot of work. It can be challenging and expensive to set up a server to support thousands of users.
A DVCS has much more modest hardware requirements for a central server. Users don’t interact with the server, they do so in case they need to push or pull. All the changes happens on the client side so the simple server hardware can serve it's purpose. In DVCS, it is also a possiblity to scale the central server(by turning it into a server farm). Instead of using one large server machine, we can add more capacity by adding small server machines, and then using scripts to keep them all in sync with each other.
With this, you have a good understanding of Distributed Version Control Systems (DVCS). Enjoy.