Scaling the Codebase (Part 4): Wrapping Everything in a Monorepo

15 Nov 2021

greeting card with a 'one' title — photo by photo by Lanty on Unsplash

This is the fourth post in a four part blog series:

Introduction
Modular Code Architecture
Reducing Developer Friction
Wrapping Everything in a Monorepo (this part)

Managing hundreds of packages is not trivial. The standard way of storing package code is one Git repository per package. But when dealing with a modular codebase that has hundreds of packages, this is just not manageable.

The alternative way is a monorepo, where all packages are stored in one large repository, the “monorepository” (or monorepo for short). Each package is a directory in the monorepo. This has some disadvantages:

Git is unwieldy with huge Git repositories. Operations like git status take a long time, and git cloning a repository takes tens of minutes. While this is true, it is a problem only in REALLY large repos, and is currently not a real problem at Roundforest, nor is it expected to be in the upcoming years. Also, companies like Microsoft are actively working to fix this problem. As for cloning, which does take a long time, it happens only once, when a developer joins the company!
CI systems don’t support monorepos well. When a push happens, a job on the CI server runs, irrespective of which package was changed, and what needs to be built. Also, the CI server does not understand the dependency graph between the packages, and thus cannot build the packages in the correct order. While this is true, it isn’t currently a problem, because we don’t use CI servers at Roundforest! Moreover, if we need to build a set of packages together (very rare, but happens), our Bilt tool can execute the builds together and in the correct order.
IDEs don’t support monorepos well: from their point of view, each repo is one big source code, and not a set of folders where each is a separate source code. While this has been true in the past, Visual Studio Code Workspaces changed all that. It is perfect for monorepos!

The advantages of using a monorepo outweigh the disadvantages:

The context switch between packages is minimal. It’s just a “cd folder” away. No need to clone a git repository, or pull from it. The code is just there.
Searching for a usage of a package or function is easy, and crosses applications and microservices. This is really huge when doing a refactoring, or when troubleshooting something in production.
There is no friction when creating a new package: no need to create and configure a repo. We just create a directory in the monorepo. That’s it. See also “Templates for Packages” in the last part of the series to see how we make it even easier to create a package in the repository.

Our monorepo is not an “app” monorepo, and we don’t have a separate monorepo for separate apps. All apps are in the monorepo, with both frontend and backend code together. Visual Studio Code Workspaces enables a developer who is concentrated on one specific app to see only the specific packages they need for that project, so usually our developers are not “lost” in all the packages that are not relevant to their purpose. The folder structure in the monorepo also reflects this. The packages are not put in one big folder, but rather divided into folders based on the project they are in, with a “commons” folder that holds all the packages that are common to all projects.

Lerna #

Whenever we admit that Roundforest uses monorepos, someone invariably jumps up and says that they’re doing monorepos and using Lerna. Lerna is great, and lots of people use it, but it is anathema to a modular code architecture.

The way people use Lerna is, in fact, the opposite of what we’d like to see in a modular monorepo. Lerna enables, and in fact encourages, developers to develop all packages at the same time. By default, Lerna will link all the source code of all the packages, so that changes in one package source code will immediately be applied to all packages that depend on it. Thus, a developer can work on two or more packages at the same time.

Why is this bad? This sounds like a good thing! It’s bad because once you develop packages in tandem, they stop becoming independent:

They’re not independently coded. Well, obviously. They’re being coded together!
They’re not independently tested. While theoretically, when using Lerna, tests can be independent for each package, in practice developers get lazy and stop writing tests for lower level packages, because “we’re anyway writing tests for the packages that use them”. This kind of thinking also happens at Roundforest sometimes, but MUCH less frequently.
They’re not independently built. This is a major problem! Because the packages are linked by source, you need to build all the packages before you can start working on one of them. This is true locally, and is also true in CI. This is a huge time waster, and also means that if there is a problem building one package (because these things happen), you are now stuck without the ability to continue coding.
Another problem with source code linking, is that it doesn’t really work well when building docker images. So building the docker image of a package is a challenge, usually solved by building some ad hoc solution in the CI server.
(Note: there are now tools like Nx and TurboRepo that solve this problem, but it’s an additional level of complexity to implement them)

I like to call Lerna monorepos “project monorepos” or “monolithic monorepos” to differentiate them from the modular monorepo we use at Roundforest. Lerna is good for small-scale monorepos, when you want to generate multiple packages for a single project, and you want to work on all of the packages at once, because they’re published in tandem.

But if you want all the advantages of a modular code architecture, you should avoid Lerna.

Summary #

That’s it! We’ve covered the basics of the Roundforest methodology, and reached the end of the road.

At Roundforest, we pride ourselves on an engineering culture that enables the business to implement changes rapidly, while maintaining the quality of the codebase. We avoid the spaghetti code of rapid changes by using a modular code architecture that enables the code to be split into many independent packages that are easily developed.

This enables the development to be focused, controlled, fast, safe, and frequent, and in general is part of our goal to remove the friction from the development process.

And all this we do in a monorepo, which houses all the company’s code.

(Did you enjoy the series? Check out our open positions here)

Previous: Scaling the Codebase (Part 3): Reducing Developer Friction