Dev Life
Greetings, fellow devs. We are thrilled to announce that after three long years of planning, designing, and building, our team has released our righteous app update. Sure, we’ve been toiling tirelessly to update the look and feel of our UI, but we’ve done an even more bodacious overhaul of our foundational technology stack.
What does this mean for you? Better performance, better interface, and scalable for future growth. We’re journeying back in time to bring you the full report on how we made it happen. Join us for the ride.
But wait! Before we jump back in time it’s important to introduce our improved app. “Improved” is an understatement. It may surprise you that this UI and performance facelift was not a revenue driver. So, why did we do it?
Two reasons:
What we’ve created is Portal, an epic interface to better connect you with your deliverability products and tools now, and as we scale. Portal lets you move between our Mailgun and Mailgun Optimize products with ease and serves up a highly performant experience for our power users that send tens of thousands of emails a day.
So, stick with us as we geek out, dial in, and journey through the past three years of our development process.
We started by creating a laundry list of all the different things we wanted – it ran the gamut of updating our repository structure to implementing a new deploy system, and we knew we were going to have to create it from the ground up as our SRE team was migrating from AWS to Google Cloud. We also wanted to move away from our tightly coupled monolithic design to a more distributed design. This would make everything easier for the team moving forward: from developing comprehensive tests to targeting rebuilds without needing to wade through or change all the code.
With any redesign we always want to consider the user, but a big part of the drive behind our app update was also to improve developer experience. Our goal was to enhance the developer experience by significantly reducing the complexity of getting data from the API to the rendered page, and to improve maintainability by reducing the total number of patterns in our toolkit.
Stacks can become outdated, and our original front-end stack was built nearly a decade ago – which is 350 in internet years. It included a Python Flask web server that probably handled more tasks than it should and had very tightly coupled state shared with the client. It used multiple layers of abstraction, originally intended to ease getting data from our APIs.
As that surface grew larger and larger, the patterns provided more overhead than convenience. It became very difficult to make foundational changes because there were so many places that could be impacted. By this point in time, our front-end client had become a museum of obsolete React patterns.
All our interactions with our own public and private APIs were obfuscated through multiple layers of excessive code complexities. This created a giant chasm between the Mailgun front-end developers and their understanding of our APIs and customer experiences.
The kludge of API abstractions not only resulted in excess complexity, it significantly inhibited our ability to scale our application. We knew that we wanted to eventually bring more and more products into the codebase. We’d need to totally rethink how we consumed our data.
When we were planning the new framework, we knew we wanted to be able to make API requests directly from the client to the public APIs with as little middleware as possible. Why? This way, we would be able to understand and consume our APIs in the same way our customers do by “eating our own dog food” as they say. I don’t know about you, but I’d rather eat steak than dog food.
Another significant change we wanted to make was to replace Redux, which managed our app-wide state (around 150 states). Each state mapped to a network call, or network of data transformations, and our legacy structure didn’t really lay guidelines for how to use it. The result? A lot of redundancies.
When a developer would introduce a call to manipulate data, they would integrate through three levels of abstraction. Additionally, we had our own custom clients for our browser and our Flask server (which interfaced will all our APIs). So, for a developer to add a new feature they would have go through around seven total levels of abstraction, and bugs are possible at any level.
What we eventually discovered was that 95% of our app-wide state was just a glorified network cache. Only we weren’t even reaping the benefit of cached data, often making redundant calls to request data we already had, causing nested layers of needless re-renders in our view components. Talk about a bumpy ride.
This system breakdown explains things a bit more clearly, and while we would never speak badly against our origins… a facetious picture is worth a thousand words.
Our team is the kind of team that builds on itself and iterates, but for this app rebuild we had to rewind and rework some foundational components of our infrastructure – starting with source code control.
A polyrepo is a repository that contains multiple projects. We used a polyrepo organization style, which just means that each front end project had its own repository, even though many of the projects repeated the same tasks and reproduced the same features. As a central hub, polyrepos can become a challenge to maintain as the number of projects stored grows. In many cases, we had to deploy each polyrepo simultaneously with the same updates to ensure a consistent experience across our products.
From a maintenance perspective, this made it difficult for teams to locate and work with the code they need. Managing multiple repos with redundant features usually means that they’re slower to respond and more resource-draining, which is a big problem if you frequently push and pull code. Because features can be distributed across multiple code bases, polyrepos can make finding bugs harder than finding the proverbial needle in the haystack.
With our pre-rebuild polyrepo structure, we knew that the future would be a reality of hunting through repos with a flashlight looking for what we needed, and that it would be challenging to collaborate, scale, and share code as we grew.
These were our main concerns:
When it comes to source code management, there are a lot of solutions, but we decided that a modular, monorepo structure would alleviate the majority of our development pains going forward.
A monorepo approach involves storing all the project’s code in a single, large repository. This would solve the code sharing and duplication problems we experienced with our polyrepo organization, but monorepos aren’t a silver bullet. If the code is tightly coupled, monorepos also need to be designed with modularity.
A modular approach involves dividing a large project into smaller, independent modules that can be developed, tested, and maintained separately. This allows for greater flexibility and reusability, as well as the ability to easily update individual modules without affecting the rest of the project.
We knew that a modular, monorepo solution would allow us to be more efficient with:
Hands down, the biggest challenge for living in a monorepo (monolithic or otherwise) is that the more people contributing, the greater the potential is that we step on each other’s toes. What good is an overhauled dev experience if your code gets overridden on a bad merge? If we are truly doing a wholistic overhaul of our stack, then we must also look at how we can improve our deployment process. These were the most pressing challenges:
Projects of any kind have challenges, expected ones, and surprising ones. We knew we needed to plan to overcome some problems and that the new version of the app needed to focus on performance from multiple angles.
Identified problems | Proposed solutions |
---|---|
Unclear code conventions Code conventions refers to the set of coding styles, solutions, and reusable patterns within a code base. Imagine a team of carpenters working on a project. If they keep 1,000 specialized tools on the jobsite, it’s likely that a worker may choose the wrong tool for the job or use it in an improper way. And unlike physical tools, software dependencies can become deprecated, inhibiting future updates to the codebase. | Code style guide Maintaining explicit coding style guides and a minimal set of simple, powerful patterns helps to reduce redundancy, inconsistencies, and complexity, and can even result in improved performance. Reusing a small set of generalized solutions and patterns makes code easier to understand and reduces the likelihood of hard-to-find bugs. |
Overburdened State We used Redux as a one-size-fits-all solution for global application state. However, 90% of our Redux store was just glorified network caching. Redux requires multiple layers of overhead for every state member. This resulted in many redundancies and needless complexity. | Separate network cache and app state Separating network cache features from application state allows us to keep a much smaller state footprint. State can be implemented more locally. Reduced overhead results in increased performance and gives developers more control over how the data is used. We chose React Query as the interface to our backend APIs, allowing us to easily manage data asynchronously and reduce network overhead. |
Multi-layered network architecture Our application grew organically from a basic API-centric control panel to the multiple-product suite of customer tools we have today. Initially, the client was built on top of common Python tooling shared by many of our API backends. This leads to having numerous middleware layers needed for even the simplest API call. This not only made it difficult to understand any given data’s origin, it also made finding bugs in the network stack very time consuming to identify. | Public API first Rather than pass requests through multiple proxies, auth layers, and Python interfaces, we decided early on that in any case possible, we’d collect needed data directly from the client using the publicly available APIs. This not only significantly reduced our overhead and complexity, it helped us better perceive our API services from our customers’ perspectives. We intend to take this initiative further in the near future, making more of our APIs public. |
We have written hundreds of thousands of lines of code over the last decade. With the scale and scope of this new plan, it would take nearly as many years to port and rewrite our old features into the new paradigm. This would be a non-starter. We needed to find a way we could continue to incorporate these existing features without compromising our objectives for the new stack. We found a solution in a technology called module federation.
With Webpack’s module federation plugin, we could create a new build of our old application with minimal changes, which would be able integrate existing features into the new Portal stack. We’d still have a long-term objective to port the now legacy code to newer standards, but this would allow us to start writing new features with the new benefits immediately.
We knew that consolidating our platforms while continually growing features and product sets would require a massive effort, and our planning influenced what we wanted for our new front-end stack, and how we were going to manage the undying legacy of antiquated CSS styling systems, deprecated class component patterns, and clunky app-state management.
Because we were building from the ground up – especially with the move from AWS to Google Cloud – the best solution was a clean break. A new repo with better, higher standards, module federation, modular, shared libraries, and more comprehensive user-facing and internal documentation.
It was an intense planning and preparation stage. Like with all big projects, generating the ideas and proof of concept was a technical and creative endeavor that only involved a few select people.
Phase 2? Turning the idea into reality – and that involved a lot more teams. Devs, I believe our adventure is about to take a most interesting turn.