- Customer Success
This week we focused on infrastructure improvements to reduce the chances of sending delays like those that occurred for some customers last week.
We also rolled out an improved version of our status page last week but didn’t really go into details, so we want to share some of the coolest and most useful features of the new status page with you here today.
We recently switched to a new status page provider, Statuspage.io. In addition to providing an up-to-date status for our service, you can also subscribe to status updates by email, SMS, and webhook. This will ensure that you always have the latest information on Mailgun’s health.
The Riak cluster we use for processing email messages is redundant by design. We store two replicas for every message on different boxes and use a cluster of a decent size to provide adequate performance. But we’ve discovered that some actions can cause the whole cluster to degrade, causing delays.
To provide failover in these cases, we’ve rolled out a fully functioning, hot-standby reserve cluster that is always serving just a portion of traffic. If something goes terribly wrong with the main cluster, we can fail over to this redundant cluster and avoid email delays.
Mailgun monitors the time between the moment a message is sent and the moment it is received in order to identify delays as they occur. We do this using a special bot that sends emails to itself through the same infrastructure that our customers use (it’s like we built our own version of Pingdom for monitoring delivery speed). This week we’ve tuned the algorithm that monitors delivery time to be more restrictive and alert us earlier when delivery time might be increasing.
That’s it for this week. Have a great weekend and happy sending!
Last updated on August 27, 2019