IT & Engineering

Mailgun on Performance Cloud Servers

This was posted on August 28, 2014. Until recently, Mailgun has been in a managed colo environment hosted by our parent company, Rackspace, using bare metal servers.

PUBLISHED ON

PUBLISHED ON

This was posted on August 28, 2014.

Until recently, Mailgun has been in a managed colo environment hosted by our parent company, Rackspace, using bare metal servers. This decision goes back to many of the same philosophical reasons of why Rackspace launched OnMetal servers in July. We weren’t comfortable deploying Mailgun on the public cloud given the reliability and performance that you typically need to engineer around. This changed when Rackspace rolled out Performance Cloud Servers in November 2013.

In this post, we’ll provide some background on our rationale behind the change, and how the performance of our infrastructure changed, as a result.

Managed Colo

We were pretty happy with our managed colo deployment, and we aren’t the only one. There are many arguments for using dedicated versus cloud, but the biggest advantage is having complete control of your hardware and networks, which can not be achieved by cloud deployments (yet). Other arguments against using cloud over dedicated include reliability and performance. It’s perceived that cloud is less robust and less performant when it comes to I/O-intensive loads.

Managed Colo

A typical Mailgun deployment would look like this:

On this diagram, the majority of traffic hits a highly available (HA) pair of F5 5000s load balancers, which routes all traffic to our dedicated API servers.

F5 Load Balancers

These F5 load balancers are pretty performant hardware servers

They are capable of handling:

  • L7 requests per second: 750K

  • L4 connections per second: 350K

  • L4 HTTP requests per second: 3.5M

  • Maximum L4 concurrent connections: 24M

  • Throughput: 30 Gbps/15 Gbps L4/L7

In addition to that, these beasts deal with SSL termination and are capable of mitigating some DDOS attacks.

R720

Dell R720s are Mailgun workhorses, used both as databases and processing servers. They are equipped with 64GB RAM, one or several SAS 15K RPM drives, depending on configuration, and one or two 10Gb/s NICs.

Cloud

Cloud servers are used for auxiliary tasks, such as logging, creating backups and running various jobs, 90% of the environment was located on dedicated hardware.

Overall, the Mailgun team was pleased with the existing state of things. So, why did we migrate?

Reasons to Migrate

Performance

There are different clouds available, but we’ve been most excited about one particular cloud – Rackspace Performance Cloud. Here are the benchmarks comparing the performance of the previous generation servers to the new Performance Cloud Servers that show a huge improvement:

These benchmarks indicate that we can start using Cloud Servers to host our Cassandra clusters as SSDs are adding a huge boost in speed:

In addition to that, the new cloud boxes are more performant than the standard R720s that we’ve been setting up in our managed colo environments. E.g. the UnixBench score for our R720s is 1219 compared to an impressive 4876 on the new Performance Cloud Servers.

API

Rackspace Cloud Servers can be deployed rapidly and operated using an API which really helps to automate provisioning, auto scale, and all the usual perks of operating in the cloud. (Note: Rapid API based provisioning is available for bare metal servers through Rackspace’s OnMetal offering. We look forward to augmenting our infrastructure with those beefy servers in the future.)

Concerns

We still had two major concerns that kept us back from migrating to the cloud.

Networking

Cloud networking may be a bottleneck, especially when it comes to high-frequency packet exchange. We’ve seen huge performance degradations when it came to using Redis when we’ve hit 5-6K packets per second.

In addition to that, throughput was also a major concern and 1Gb/s was not enough.

This was a major roadblock for us until Rackspace released dual, bonded 10Gbps non-virtualized networking for Performance Cloud Servers plus separate NICs for CBS. Our benchmarks showed that the new networking is robust and does not suffer from the degradations as the previous software networking did.

Reliability

The internet contains many stories of cloud instability, bringing entire businesses down, so it’s kinda scary to rely on this. However, the Cloud Servers team promised robust reliability, relative to typical cloud providers, so we decided to test it out.

New Mailgun Environment

Our new environment was using the same type of load balancers, but in this case, connected via Rackspace’s RackConnect, and routing all traffic directly to the cloud.

Networking

We’ve spent several weeks load testing this link, hitting our SMTP and HTTP API servers, residing on the cloud, and haven’t noticed any performance degradations.

Reliability

A typical Mailgun Performance Cloud Servers deployment uses around a hundred servers per region. For the last three months, two servers went down due to the problems with the host server, which was roughly equal to what we experienced on our dedicated servers when around 1 box out of 100 went down every month.

We should note though that Mailgun uses large and extra large (64GB and 128GB) flavors of Performance Cloud Servers, so you may observe different results while choosing smaller flavors.

Drawbacks

One drawback that we’ve seen so far is the Rackspace policy for DC-wide maintenances. This may require your application to be multi-DC from the start if you can’t tolerate downtime, as it could lead to multi-second downtime for up a significant portion of an environment. We experienced one such maintenance in April, but thankfully, we have multiple environments to fall back on.

Overall Results

Overall, we have been quite pleased with our migration. We can now use the Rackspace Cloud API to provision our servers, with the additional benefit of a more performant fleet compared to our Managed colo.

Related readings

An expanded Mailgun product suite to transform email deliverability

Today marks a special day for Sinch Mailgun. For over a decade, our focus has been to provide the best email experience for businesses all around the world. Now, we take...

Read more

How technical and marketing teams can join forces to support email deliverability

When your emails keep landing in spam or get blocked by mailbox providers, it’s a major cause for concern. Email is a communication channel with an incredible return on...

Read more

Software bugs and how to fix them faster

The cost of debugging isn’t the same for everyone. Cost doesn’t just depend on operation and service fees, but on how much technical debt you have. When we talk about...

Read more

Popular posts

Email inbox.

Build Laravel 10 email authentication with Mailgun and Digital Ocean

When it was first released, Laravel version 5.7 added a new capability to verify user’s emails. If you’ve ever run php artisan make:auth within a Laravel app you’ll know the...

Read more

Mailgun statistics.

Sending email using the Mailgun PHP API

It’s been a while since the Mailgun PHP SDK came around, and we’ve seen lots of changes: new functionalities, new integrations built on top, new API endpoints…yet the core of PHP...

Read more

Statistics on deliverability.

Here’s everything you need to know about DNS blocklists

The word “blocklist” can almost seem like something out of a movie – a little dramatic, silly, and a little unreal. Unfortunately, in the real world, blocklists are definitely something you...

Read more

See what you can accomplish with the world's best email delivery platform. It's easy to get started.Let's get sending
CTA icon