Reading time: 20 minutes
You can see Monorepo and Multirepo as code management architectures. Monorepo is in use by Google, Facebook and Twitter while Multirepo is in use by Amazon and Netflix. Considering that, both monorepo and multirepo works but has significant differences which you should consider before moving into one.
To choose between Monorepo and Multirepo, we need to consider two main factors:
- Tooling infrastructure and investment
- Source code version control
- Large scale refactoring
Tooling infrastructure and investment
In short, managing multirepo is easy and cheap considering the fact that all work can be done using open source tools and skills required are minimum. For Monorepo, you need to develop custom tools which means significant investment in code management. Both paths are justifiable considering available investment and the benefits you want to enjoy.
Tooling challenge: Build
In monorepo, running build is not trivial as in multirepo.
One should not run tests and builds for all projects as it will consume time and computing resources. So the first challenge is to figure out is, given a change with one or more commits, which project(s) should build and what tests should run.
To solve this, it is necessary to have acyclic directed graph (DAG) of dependencies for all projects. When a change is submitted, it is checked against the DAG of dependencies to see which projects are affected. All affected projects are possible to break, so tests are run only for these affected projects and their transitive dependents. To tackle this problem, Google and Facebook has open sourced their build tools:
- Bazel (by Google)
- Buck (by Facebook)
While in multirepo, this problem does not exist because there is no need to figure out which project to build. Whenever a change happens to a project, only that project's deployment pipeline is triggered.
Tooling challenge: Source code version control
Source code version control is another tooling challenge imposed by monorepo.
It is well known that git and mercurial is bad at scaling.
Git fundamnetally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around - Linus Torvalds
Although sparse checkout and shallow clone may alleviate the scaling problem, it is not a sustainable solution to large organizations.
To solve this problem:
- Microsoft developed GVFS
- Facebook fixed Mercurial
- Google developed Piper
Tooling challenge: Large scale refactoring
Large scale refactoring in Monorepo requires dedicated tooling support.
Setting up deployment pipeline is complicated in monorepo.
In multirepo, the practice is to have separate pipeline for separate projects.
In monorepo, one possible way is to have the first stage to figure out relevant projects and then trigger child pipelines for each relevant project. And each child pipeline may trigger other pipelines according to the DAG dependency graph.
Only Continuous Delivery (CD) tool in the market that supports pipeline fan-in and fan-out is GoCD. Other CD solutions in the market have very simple pipeline modeling. They are designed for multirepo, not monorepo.
For instance: GitLab and Travis CI has no solution for Monorepo
Multirepo and monorepo vary in engineering culture and philosophy.
Multirepo values decoupling and engineering velocity while monorepo favours standardization and consistency. Various teams work in collaboration in Monorepo while are separate in Multirepo.
Netflix favours freedom and responsbility so it prefers mutlirepo. Google values consistency and code quality so it prefers monorepo.