Back to main menu

IT & Engineering

Mailgun on Performance Cloud Servers

This was posted on August 28, 2014. Until recently, Mailgun has been in a managed colo environment hosted by our parent company, Rackspace, using bare metal servers.

PUBLISHED ON

PUBLISHED ON

This was posted on August 28, 2014.

Until recently, Mailgun has been in a managed colo environment hosted by our parent company, Rackspace, using bare metal servers. This decision goes back to many of the same philosophical reasons of why Rackspace launched OnMetal servers in July. We weren’t comfortable deploying Mailgun on the public cloud given the reliability and performance that you typically need to engineer around. This changed when Rackspace rolled out Performance Cloud Servers in November 2013.

In this post, we’ll provide some background on our rationale behind the change, and how the performance of our infrastructure changed, as a result.

Managed Colo

We were pretty happy with our managed colo deployment, and we aren’t the only one. There are many arguments for using dedicated versus cloud, but the biggest advantage is having complete control of your hardware and networks, which can not be achieved by cloud deployments (yet). Other arguments against using cloud over dedicated include reliability and performance. It’s perceived that cloud is less robust and less performant when it comes to I/O-intensive loads.

Managed Colo

A typical Mailgun deployment would look like this:

On this diagram, the majority of traffic hits a highly available (HA) pair of F5 5000s load balancers, which routes all traffic to our dedicated API servers.

F5 Load Balancers

These F5 load balancers are pretty performant hardware servers

They are capable of handling:

  • Badge Check

    L7 requests per second: 750K

  • Badge Check

    L4 connections per second: 350K

  • Badge Check

    L4 HTTP requests per second: 3.5M

  • Badge Check

    Maximum L4 concurrent connections: 24M

  • Badge Check

    Throughput: 30 Gbps/15 Gbps L4/L7

In addition to that, these beasts deal with SSL termination and are capable of mitigating some DDOS attacks.

R720

Dell R720s are Mailgun workhorses, used both as databases and processing servers. They are equipped with 64GB RAM, one or several SAS 15K RPM drives, depending on configuration, and one or two 10Gb/s NICs.

Cloud

Cloud servers are used for auxiliary tasks, such as logging, creating backups and running various jobs, 90% of the environment was located on dedicated hardware.

Overall, the Mailgun team was pleased with the existing state of things. So, why did we migrate?

Reasons to Migrate

Performance

There are different clouds available, but we’ve been most excited about one particular cloud – Rackspace Performance Cloud. Here are the benchmarks comparing the performance of the previous generation servers to the new Performance Cloud Servers that show a huge improvement:

These benchmarks indicate that we can start using Cloud Servers to host our Cassandra clusters as SSDs are adding a huge boost in speed:

In addition to that, the new cloud boxes are more performant than the standard R720s that we’ve been setting up in our managed colo environments. E.g. the UnixBench score for our R720s is 1219 compared to an impressive 4876 on the new Performance Cloud Servers.

API

Rackspace Cloud Servers can be deployed rapidly and operated using an API which really helps to automate provisioning, auto scale, and all the usual perks of operating in the cloud. (Note: Rapid API based provisioning is available for bare metal servers through Rackspace’s OnMetal offering. We look forward to augmenting our infrastructure with those beefy servers in the future.)

Concerns

We still had two major concerns that kept us back from migrating to the cloud.

Networking

Cloud networking may be a bottleneck, especially when it comes to high-frequency packet exchange. We’ve seen huge performance degradations when it came to using Redis when we’ve hit 5-6K packets per second.

In addition to that, throughput was also a major concern and 1Gb/s was not enough.

This was a major roadblock for us until Rackspace released dual, bonded 10Gbps non-virtualized networking for Performance Cloud Servers plus separate NICs for CBS. Our benchmarks showed that the new networking is robust and does not suffer from the degradations as the previous software networking did.

Reliability

The internet contains many stories of cloud instability, bringing entire businesses down, so it’s kinda scary to rely on this. However, the Cloud Servers team promised robust reliability, relative to typical cloud providers, so we decided to test it out.

New Mailgun Environment

Our new environment was using the same type of load balancers, but in this case, connected via Rackspace’s RackConnect, and routing all traffic directly to the cloud.

Networking

We’ve spent several weeks load testing this link, hitting our SMTP and HTTP API servers, residing on the cloud, and haven’t noticed any performance degradations.

Reliability

A typical Mailgun Performance Cloud Servers deployment uses around a hundred servers per region. For the last three months, two servers went down due to the problems with the host server, which was roughly equal to what we experienced on our dedicated servers when around 1 box out of 100 went down every month.

We should note though that Mailgun uses large and extra large (64GB and 128GB) flavors of Performance Cloud Servers, so you may observe different results while choosing smaller flavors.

Drawbacks

One drawback that we’ve seen so far is the Rackspace policy for DC-wide maintenances. This may require your application to be multi-DC from the start if you can’t tolerate downtime, as it could lead to multi-second downtime for up a significant portion of an environment. We experienced one such maintenance in April, but thankfully, we have multiple environments to fall back on.

Overall Results

Overall, we have been quite pleased with our migration. We can now use the Rackspace Cloud API to provision our servers, with the additional benefit of a more performant fleet compared to our Managed colo.

Related readings

How to prepare your Infrastructure for Black Friday

Black Friday – a time of year when all eyes are on the infrastructure team to keep the ship afloat. As marketers ramp up their email cadence, consumers rush to get the best deals...

Read More

How to improve your email deliverability for the future of email

If your customers aren’t getting your emails, then there’s a good chance that your email program needs some refreshing with these email deliverability tips taken from Email Camp: MessageMania speaker and industry pro, Laura Atkins.

Read More

Announcing new analytics features to maximize your email performance

Navigating email analytics has never been easier than with our latest updates. Advanced data analysis, faster performance, and better data management tools have been released...

Read More

Popular posts

Email inbox.

Email

5 min

Build Laravel 11 email authentication with Mailgun and Digital Ocean

Read More

Mailgun statistics.

Product

4 min

Sending email using the Mailgun PHP API

Read More

Statistics on deliverability.

Deliverability

5 min

Here’s everything you need to know about DNS blocklists

Read More

See what you can accomplish with the world's best email delivery platform. It's easy to get started.Let's get sending
CTA icon