Mastodon

To Love and to Learn (Gil Tayar's Blog)

Scaling the Codebase (Part 3): Reducing Developer Friction

ice skater
photo by Vidar Nordli-Mathisen on Unsplash

This is the third post in a four part blog series:

  1. Introduction
  2. Modular Code Architecture
  3. Reducing Developer Friction (this part)
  4. Wrapping everything in a Monorepo

Independence helps developers maintain their velocity in the face of an increasingly large codebase. The reason is that it removes the friction from the development process. And reducing friction is critical to the ability of Roundforest to move forward quickly.

This frictionless development process has five aspects to it:

Thanks to the modularization and independence of modules, reducing the friction does not come at the cost of engineering quality, which is very important for us at Roundforest.

Focus #

When developing a feature, our developers work on one package at a time. So even if the feature spans two microservices, or a microservice and a library, then the developer first focuses on one microservice, develops it, runs the tests, builds it, and deploys. Once that is done, they move to focus on the second library or microservice to do the same. They almost never develop both at the same time.

This means that at any specific point in time, the developer is focused on one package, and one package only. No matter if there are ten packages, a hundred packages, or a thousand, the focus of a developer is on one package, and they need to understand only that package, without needing to really understand all the hundreds of other packages…

This means that the velocity of the development is the same, no matter what the size of the codebase is. The ability to focus on small independent packages is the basis for everything else that comes next.

Control #

An interesting side effect of having independent packages that are fast to develop, build, test, and deploy, is that CI/CD becomes unnecessary.

Yes, unnecessary. Well, not exactly unnecessary. A build process that is consistent, standard, and safe (CI) is critical, and a deployment process that is consistent, standard, and safe (CD) is critical.

To get processes that are consistent, standard, and safe, most companies use CI/CD servers that run those processes (think Jenkins, CircleCI and others). They use CI/CD servers because they don’t trust “whatever is on the developer machine” and to avoid the “it works on my machine” problem. They also run it on other servers because it takes a lot of time to run the build and the tests of a monolithic application (10 minute? 30 minutes? 1 hour? Something like that).

Processes that are consistent, standard, and safe are important for a good developer process. But what is not necessary are CI/CD servers. If building, testing, and deploying a package takes 1 to 3 minutes, why not just run it locally? If we send it to a CI server, it will not take 1-3 minutes. It will take more. And during the time of the CI build, the developer has to switch to another task, because the build takes a long time. So the context switch is problematic. Moreover, CI-ing things locally has the added advantage that if something fails during the process, the troubleshooting is much easier.

It used to be that building and testing needed to be run in a “clean” environment, to ensure that anything on the developer machine can’t “dirty” the build. But this is not true in modern development environments. Moreover, modular code architecture ensures that packages can be coded independently (see above), and all the things needed to build and test are available in the package itself. This means that there is no inherent advantage in running in a “clean” environment such as a CI.

Note that the building of a package at Roundforest is done using Bilt, a tool that takes care of the steps needed to run the build, so there is no chance of a developer specifying those steps incorrectly. Bilt ensures that what needs to be done to build the package is done, and in the correct order. This build tool can also figure out what packages depend on the package to be built and thus also need to be built.

The same thing goes for deployment: in a modern deployment environment (e.g. Kubernetes or AWS Cloud Formation) do we really need an external CD server to run our deployments for us, when all the deployment does is send Kubernetes a set of YAML files?

Our answer at Roundforest is: no, there is no need to deploy using an external CD server. Deployment can be done locally. The YAML files of a microservice sit in a “deploy” package, and deploying is done via a command line tool that we built that modifies those yaml files to deploy the correct version of the microservice, gives those yaml files to Kubernetes, and when the deploy is done, also commits the changes to Git, so there’s an audit log of that deployment.

Given that CI and CD are done locally, the developer has total control over the whole process. And that control ensures speed, as well as ensuring that if there’s a problem somewhere in the build or deploy, the troubleshooting is easier than if those processes ran in a CI/CD server.

Speed #

Once you work in an environment where you can focus on code that is small in size, and you have control over the development and deployment process, speed becomes a side-effect that you get from the other two traits.

If a microservice, microfrontend, or a library is small enough to focus on, then its set of tests is (almost always) very fast to run, as we saw in the section about Control. And if the developer is in control of the CI and the CD, then the time it takes to develop and deploy a feature does not depend, again, on the size of the codebase, and is usually very fast.

Fast is obviously relative to the difficulty of the feature, but what one can say absolutely is that development at Roundforest does not have the regular friction developers encounter when writing features for a codebase that is monolithic.

Note: while there is no friction that comes from the size of the codebase, there will still be friction from the complexity of the architecture and design of the app. Modular code architecture does not help with that, and we still need good architects and designers that KEEP IT SIMPLE. But that’s a story for another blog post.

Safety #

“Move fast and break things” was an old motto of Facebook, which they abandoned a few years ago. And rightly so: you cannot move fast if you break things along the way. So if you want speed, you must also have safety. Safety to add a feature and know you’re not breaking anything; safety to remove a feature and know that you’re not breaking anything; safety to refactor code and know you’re not breaking anything; and safety to deploy knowing that you probably are not going to break anything; and safety in knowing that if you did break something, you’ll know about it immediately, and can roll back the change.

How do we make coding safe? One word: tests. Roundforest developers write tests. Much of their coding time is spent writing tests. This is a tax that you pay for modular code architecture, but it’s a tax that if you don’t pay it in this way, writing tests, then you pay it in other ways. So we pay it gladly.

All packages have tests. Tests that check the code of that package, and that package only. We do not “assume” that another package uses this package and so it does not need to be tested. We test it.

These tests allow us to change code without worrying that this change will break anything. So we can add and refactor very quickly, with lots of confidence. And because after each deployment we run a set of tests on the production, we can know with a high degree of confidence that the deploy we did didn’t break anything. And if it did, we can easily rollback the change because the deployment is always focused: just one microservice that can be easily rolled back using Kubernetes.

What is our testing methodology? That’s the subject of another whole series of blog posts!

IMPORTANT NOTE: without tests that have good coverage and give safety to change code and deploy the changes, all the advantages of a modular code architecture disappear. Confidence from tests is crucial to how we do things at Roundforest. It wouldn’t work otherwise.

Frequency #

Once you have that speed, and that safety, nothing stops a Roundforest developer from changing the code often, and deploying often. That is what we do: we write part of the feature, and deploy. Then we write some more code of that feature, and deploy again. We continue to do this in small increments until that feature is complete.

Ironically, and very differently from other development teams at other companies, we feel safe only when our code is in production because once it’s in production we know for a certainty that it works! This is why a Roundforest developer will deploy their code often.

This ensures even more safety. Because the smaller the deltas that are deployed, the lesser of a risk they are. Because we’re deploying often, and in little increments, the speed of development also grows.

In essence, high frequency deployments change the way you write and deploy code. And this change generates even more safety and speed. High frequency deploys are a game changer.

Reducing Friction #

It’s not just modularity that helps reduce friction of the development process. Sometimes it’s unrelated things that are only indirectly connected to modularity. Here are some of those we implemented at Roundforest:

Templates for Packages #

As we saw in the section on monorepos, creating a package is just a “mkdir” operation. But if we want to encourage developers to create packages, we need a way to “scaffold” a package, and make it really easy to create a “starter” package for all kinds of purposes.

So we created a scaffolding tool that is easy to use and easy to create templates for. A roundforest developer, after creating the directory, can just use

npm init @roundforest <template-name>

And a package for that template will appear, ready to install, build, and test. The package will also include tests, so all the developer needs to do is start modifying.

We have template packages for:

Trunk-based development #

Another practice that is widely used in lots of companies, but isn’t at Roundforest, is Pull Requests and branches. Pull requests are a kind of gating mechanism whereby code is not entered into the main branch until it is reviewed.

This has big advantages, and there are enough blog posts discussing the advantages of code reviews and pull requests. Unfortunately, there is one big disadvantage that is big enough to make us not use pull requests: gating code creates friction in the development process. It stops developers on their tracks. And this is especially true in a modular code architecture, where a developer wants to build and publish one package before working on the next.

So we don’t do pull requests. A developer pushes to the main branch and only to the main branch. We still do code reviews, but these are post-push code reviews, and if there are comments to fix, the developer fixes them later.

This ability to push directly to the main branch is possible because of the safety of our development process. Roundforest developers can push to main and deploy to production, because there are safety mechanisms like tests that guard them.

Checking a feature is done only in tests #

How does one encourage developers to write tests? Lots of best practices around this have been built, but at Roundforest we’ve found a simple and infallible method to do this (at least for the backend): do not allow developers to run the app on their machine!

We jokingly say to a new developer that if they want to write the code for a feature and then deploy immediately to production without running the microservice or writing the tests, they’re entitled to!

Obviously nobody chooses to do that, and opt for the alternative: to write tests for the feature. And that is why we have good test coverage. Because there is no other way to check that the code you wrote works.

No end-to-end tests for an app #

One interesting thing we found out during the implementation of a modular architecture is that those big end-to-end tests that test the whole app together become unnecessary. We found that testing each microservice/microfrontend separately is enough.

It’s not that we don’t want a big E2E test that spans the whole application. Having one would be nice to test features that span microservices. But the cost of creating such tests is prohibitive while the value of these tests is low.

We found that adding production tests after each deployment, coupled with a very fast rollback mechanism in case the test fails, is enough. The failures happen only a few times a year, and the disruption is minimal. The alternative—to write and maintain large, slow, and flaky tests that need to be run before each flow—is much worse, and probably won’t catch the bugs a production test can catch, because it’s not really running in production and is using some kind of mock data.

Next #

Want to continue on to important part? I’ve saved the best for last: monorepos and how they are crucial to implementing a modular code architecture and reducing developer friction. Read all about this in the next part of the series, here.