- What's new
Mailpets: For The Love Of Animals
The stuff of urban legends? An uncanny coincidence? Perhaps. What we do know is that this past Friday the 13th was not a great day for us.
Some of our customers were impacted by downtime, and we took immediate action to determine the root cause. We would like to be transparent and take a moment to share the details of our findings:
As a part of ongoing work by our engineering teams, several of our internal and external services were updated to delegate authentication to a centralized authentication service. One of those updated services was deployed at just after 10:00 UTC.
At 11:00 UTC on Friday, July 13, Mailgun engineering began receiving alerts of problems with several services. Our initial investigation suggested that the problem was related to this software change released earlier in the day, and we initiated immediate efforts to roll back that release.
Continued investigation revealed that, despite the roll back, our authentication services were still not responding in a timely manner. Authentication (and related) services were restarted, and systems began to resume normal operations. By 12:44 UTC, all services were fully functional again.
Before this release, we had deployed an unrelated set of changes to the authentication service. This introduced additional latency to the authentication flow and reduced the rate at which requests could be serviced. Combined with the additional load generated by our updated services, the queue of authentication requests grew faster than they could be serviced. Additionally, failed requests were being retried, which further compounded the load problem.
We worked to reduce the impact and took several immediate measures to restore services by:
reducing authentication load by reverting the most recently updated service
removing the circular dependency to reduce latency
restarting authentication services to clear request backlog
Mailgun engineering has performed a comprehensive root cause analysis of this incident, and we have identified several actions we’ll be taking to reduce the likelihood of future incidents.
In addition to code and configuration changes made to remove unnecessary response latency, we are also in the process of formalizing SLOs. This will help increase our visibility into service latency and introduce more comprehensive data collection, monitoring, and alerting to aid in SLO enforcement.
We are also developing tooling to identify potential problem areas earlier in the development and release cycle in order to keep incidents like this from impacting our customers.
We really appreciate the understanding from our customers while we worked to resolve the issue quickly. We’d be happy to answer any questions or address concerns for impacted accounts – just open a support ticket, and our team will get back to you.
Last updated on August 28, 2020
Mailpets: For The Love Of Animals
A Word of Caution For Laravel Developers
Privacy Matters: Your Data Is Safe With Us
TLS Version 1.0 and 1.1 Deprecation
The Mailgun Maverick Program Is Here!
Force for Change: It's Time to Speak Out
When Should You Use An Email API?
4 Tips To Improve Your Email Deliverability In 2020
Mailgun’s COVID-19 Plan of Action
Password Meters Are Not For Humans
Mailpets: For The Love Of Animals
Make Email Accessibility Your New Year’s Resolution
Sunset Policies: Allowing Unengaged Recipients to Ride Off into the Sunset
Email's Best of 2020
Catch-All Domain Support Is Now Available In Email Validations
The Best Time To Send Emails: Cracking The Code
Tips for Building Better Holiday Email Templates
Happy Festivus: Email Deliverability For The Holiday Season
The Basics of Email Subdomains
A Word of Caution For Laravel Developers
Make Email Accessibility Your New Year’s Resolution
Sunset Policies: Allowing Unengaged Recipients to Ride Off into the Sunset
Email's Best of 2020
How To Improve Email Open Rates
Preparing Your Email Infrastructure Correctly
4 Tips To Improve Your Email Deliverability In 2020
COVID-19 Email Communications Dos and Don’ts
How To Use Parallel Programming
Mailgun’s COVID-19 Plan of Action
Password Meters Are Not For Humans
Always be in the know and grab free email resources!
Mailgun is committed to protecting your privacy. Please read ourPrivacy Policybefore providing us with your details.