IT & Engineering

How we realized a 30x performance improvement in MongoDB read preference

Performance metrics can be sensitive. When response times suddenly spike – figuring out why can be a challenge. Is it your servers? Your code? Or is it a tiny setting or best practice buried somewhere in MongoDB…

PUBLISHED ON

PUBLISHED ON

You know what they say…the more redundant you are, the more redundant you are.

Replica sets are database processes that provide high availability and reduce server failover through redundancy. They are the foundation for production deployments and aren’t really intended to be stand-alone tools that improve your response time and performance metrics. At Mailgun, we noticed high response times in one of our nodes that supports these processes. We hate being slowed down – especially when it doesn’t make sense. So, we set out to fix it.

Abnormal response times: Finding root cause

Back in 2021, we noticed some strange performance issues in our response times for USW2. Why are we bringing this up now? Well, we’re still benefiting from the updates we made and we’re looking to share the love.

We first noticed our rate limit service (based of Gubernator) was showing consistent response times of 150ms for our processes in USW2. Weirdly, the same code was averaging around 5ms in the 99th percentile of requests. This latency was particularly odd since we average around a 2ms response time in the 50th percentile. So, there was something there we really needed to investigate.

Our starting point was to compare metrics between USW2 and USW1. We added some additional metrics to report how long it took to fetch rate limit definitions from MongoDB (when the cache missed), which allowed us to get a better understanding of what was going on.

Graph shows 150ms response time trend in USW2.

The resulting graph confirmed the 150ms trend in USW2 but also showed 2-5ms response times from MongoDB in USE1. This isolated the issue, confirming that something was up with the MongoDB configuration affecting USW2 specifically.

MongoDB performance: Documentation insights

So, that was ‘Step 1: Locating the issue’ completed.

Next, we took a deep dive into MongoDB documentation. The eureka moment came once we understood the MongoDB main topology and discovered the MongoDB default read preferences. MongoDB clients prefer the primary instance in a cluster when performing reads. It’s not always nice when your DB plays favorites.

Image illustrates USW2 reading cross region to the primary node in USE1.

In the default MongoDB client configuration, the client in USW2 was reading cross region to the primary node in USE1. MongoDB defaults are set to where operations read from secondary members, unless the set has a single primary.

Their documentation indicated that if we were to use readPreference=secondaryPreferred in our MongoDB client, we would be able to connect to the local MongoDB nodes instead of only the primary (USE1).

Image shows how updating read preference to “equals secondary preference” allows us to connect to the local nodes instead of going across region.

Perfect. This structuring would allow us to optimize performance, spreading reads across multiple secondary servers where each server is responding to fewer read requests. We thought that updating our read preference would solve the issue but it’s never that simple, is it? Sometimes the situation is more of a clusterf@%* (pun intended).

Troubleshooting the Golang Mongo Client 

We really thought we nailed it with those read preferences but now we were failing to connect to the cluster using the URI:

Turns out, we were using an older version of the MongoDB client (go.mongodb.org/mongo-driver v1.0.2) that didn’t support our TLS options. Once we figured that out, we were finally able to deploy and see if the update to our read preference had any impact once we upgraded to go.mongodb.org/mongo-driver v1.7.3, and fixed a minor URI format issue with our framework:

PSA: If your URI Options are not working and the Mongo driver doesn’t give you an error, you are running an older version of Mongo and will need to update.

MongoDB performance improvement

This finally did the trick. The effect was dramatic, resulting in a ~30x improvement (response time of 5-10ms in the 99th percentile) in read performance for USW2.  Not bad.

Graph shows a 30-time improvement and response times hovering between 5-10ms in the 99th percentile.

Our solution started with an investigation of fetch rates to isolate the affected server and led us to optimize our read preference logic based on MongoDB client documentation. Ultimately, we discovered this was also a version issue and we needed to update our MongoDB Go Driver to successfully connect to the cluster URI and improve our response time.

How we continue to optimize

Diagnosing root cause is part of the dev journey and resolving performance issues is done a layer at a time. If you’re experiencing similar latency in your metrics, start with assessing your servers, rate limiting services, and don’t forget to check the feature supports of your Mongo driver version.

Investigations and updates like this are how we work to create a better overall system and user experience. So, if you want to hear more from our Engineering team, be sure to subscribe so you don’t miss out on future stories and operation insights.

Keep me posted! Get our news and tips every week.

Send me the newsletter. I expressly agree to receive the newsletter and know that I can easily unsubscribe at any time.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Related readings

Popular posts

Email inbox.

Build Laravel 10 email authentication with Mailgun and Digital Ocean

When it was first released, Laravel version 5.7 added a new capability to verify user’s emails. If you’ve ever run php artisan make:auth within a Laravel app you’ll know the...

Read more

Mailgun statistics.

Sending email using the Mailgun PHP API

It’s been a while since the Mailgun PHP SDK came around, and we’ve seen lots of changes: new functionalities, new integrations built on top, new API endpoints…yet the core of PHP...

Read more

Statistics on deliverability.

Here’s everything you need to know about DNS blocklists

The word “blocklist” can almost seem like something out of a movie – a little dramatic, silly, and a little unreal. Unfortunately, in the real world, blocklists are definitely something you...

Read more

See what you can accomplish with the world's best email delivery platform. It's easy to get started.Let's get sending
CTA icon