IT & Engineering
Today, Mailgun is excited to opensource Gubernator, a high performance distributed rate-limiting microservice. What does Gubernator do? Great question:
Now, we’re sure you have a lot of questions as to why we decided to opensource Gubernator. So before we get into how Gubernator works, let’s answer a few of those questions first. The biggest one you’re asking is probably…
A few things jumped out at us while evaluating Redis.
The optimal solution for Redis is to write a LUA script which implements the rate limit algorithm. That script is then stored on the Redis server and is called for each rate limit request. In this scenario, most of the work is done by Redis and our microservice is basically a proxy for accessing Redis. This being the case, we have two options:
Mailgun is a polyglot company with python and golang making up the majority of our codebase. If we choose to implement rate-limiting as a library, this requires a minimum of a python and a golang version of the library. We have gone this route before internally with a python and golang version of the same library. In our experience, shared libraries across services have the following downsides.
As the number of microservices and languages in our ecosystem continues to grow, these problems compound and get worse. In comparison, GRPC and HTTP libraries are easily created and maintained for every language which needs access to Gubernator.
For microservices, bug updates and new features can be added with no disruption to dependent services. As long as API breaking changes are not allowed, dependent services can opt into new features without needing to update all dependent services.
The killer feature for Gubernator as a microservice is that it creates a synchronization point for the many requests that enter the system. Requests received within a few microseconds of each other can be optimized and coordinated into batches which reduces the overall bandwidth and round trip latency the service uses under heavy load. Several services all running on a single host all with the same library running in their own respective processes do not have this capability.
Gubernator is stateless in that it doesn’t require disk space to operate. No configuration or cache data is ever synced to disk, and that’s because every request to Gubernator includes the config for the rate limit.
At first, you might think this an unnecessary overhead to each request. However, in reality, a rate limit config is made up of only 4 64-bit integers. The configuration is made up of the Limit, Duration, Algorithm, and Behavior (See below for details on how this works). It is because of this simple configuration that Gubernator can be used to provide a wide variety of rate limit use cases which clients can employ. A few of these use cases are:
In addition to the aforementioned use cases, a configuration-less design has major implications for microservice design and deployment:
Outside of those questions, let’s talk more about Gubernator as a whole, starting with how it works.
Gubernator is designed to run as a distributed cluster of peers which utilize an in-memory cache of all the currently active rate limits, as such no data is ever synced to disk. Since most network-based rate limit durations are held for only a few seconds, losing the in-memory cache during a reboot or scheduled downtime isn’t a huge deal. For Gubernator, we choose performance over accuracy as it’s acceptable for a small subset of traffic to over request for a short period of time (usually seconds) in the case of cache loss.
When a rate limit request is made to Gubernator, the request is keyed and a consistent hashing algorithm is applied to determine which of the peers will be the owner of the rate limit request. Choosing a single owner for a rate limit makes atomic increments of counts very fast and avoids the complexity and latency involved in distributing counts consistently across a cluster of peers.
Although simple and performant, this design could be susceptible to a thundering herd of requests since a single coordinator is responsible for possibly hundreds of thousands of requests to a rate limit.
To combat this, clients can request `Behaviour=BATCHING` which allows peers to take multiple requests within a specified window (default is 500 microseconds) and batch the requests into a single peer request, thus reducing the total number of over the wire requests to a single Gubernator peer tremendously.
In order to ensure each peer in the cluster accurately calculates the correct hash for a rate limit key, the list of peers in the cluster must be distributed to each peer in the cluster in a timely and consistent manner. Currently, Gubernator supports using etcd or the kubernetes endpoints API to discover Gubernator peers.
When a client or service makes a request to Gubernator, the rate limit config is provided with each request by the client. The rate limit configuration is then stored with the current rate limit status in the local cache of the rate limit owner. Rate limits and their configuration that are stored in the local cache will only exist for the specified duration of the rate limit configuration.
After the duration time has expired and if the rate limit was not requested again within the duration, it is dropped from the cache. Subsequent requests for the same name and unique_key pair will recreate the config and rate limit in the cache and the cycle will repeat. On the other hand, subsequent requests with different configs will overwrite the previous config and will apply the new config immediately.
An example rate limit request sent via GRPC might look like the following:
rate_limits:
# Scopes the request to a specific rate limit
- name: requests_per_sec
# A unique_key that identifies this rate limit request
unique_key: account_id=123|source_ip=172.0.0.1
# The number of hits we are requesting
hits: 1
# The total number of requests allowed for this rate limit
limit: 100
# The duration of the rate limit in milliseconds
duration: 1000
# The algorithm used to calculate the rate limit
# 0 = Token Bucket
# 1 = Leaky Bucket
algorithm: 0
# The behavior of the rate limit in gubernator.
# 0 = BATCHING (Enables batching of requests to peers)
# 1 = NO_BATCHING (Disables batching)
# 2 = GLOBAL (Enable global caching for this rate limit)
behavior: 0
An example response would be:
rate_limits:
# The status of the rate limit. OK = 0, OVER_LIMIT = 1
- status: 0,
# The current configured limit
limit: 10,
# The number of requests remaining
remaining: 7,
# A unix timestamp in milliseconds of when the rate limit will reset,
# or if OVER_LIMIT is set it is the time at which the rate limit
# will no longer return OVER_LIMIT.
reset_time: 1551309219226,
# Additional metadata about the request the client might find useful
metadata:
# This is the name of the node that owns this request
"owner": "api-n03.staging.us-east-1.mailgun.org:9041"
Since Gubernator rate limits are hashed and handled by a single peer in the cluster, rate limits that apply to every request in a data center would result in the rate limit request being handled by a single peer for the entirety of the data center.
For example, consider a rate limit with name=requests_per_datacenter
and a unique_id=us-east-1
. Now imagine that a request is made to Gubernator with this rate limit for every HTTP request that enters the us-east-1
data center. This could be hundreds of thousands, or even potentially millions of requests per second that are all hashed and handled by a single peer in the cluster. Because of this potential scaling issue, Gubernator introduces a configurable behavior
called GLOBAL
.
When a rate limit is configured with behavior=GLOBAL
, the rate limit request that is received from a client will not be forwarded to the owning peer. Instead, it will be answered from an internal cache handled by the peer who received the request. Hits
toward the rate limit will be batched by the receiving peer and sent asynchronously to the owning peer where the hits will be totaled and the OVER_LIMIT
calculated. It is then the responsibility of the owning peer to update each peer in the cluster with the current status of the rate limit, such that peer internal caches routinely get updated with the most current rate limit status from the owner.
Since Hits
are batched and forwarded to the owning peer asynchronously, the immediate response to the client will not include the most accurate remaining
counts. That count will only get updated after the async call to the owner peer is complete and the owning peer has had time to update all the peers in the cluster. As a result, the use of GLOBAL
allows for greater scale but at the cost of consistency. Using GLOBAL
can increase the amount of traffic per rate limit request if the cluster is large enough. GLOBAL
should only be used for extremely high volume rate limits that don’t scale well with the traditional non GLOBAL
behavior.
In our production environment, for every request to our API we send 2 rate-limit requests to Gubernator for rate limit evaluation; one to rate the HTTP request, and the other to rate the number of recipients a user can send an email too within the specific duration. Under this setup, a single Gubernator node fields over 2,000 requests a second with most batched responses returned in under 1 millisecond.
Peer requests forwarded to owning nodes typically respond in under 30 microseconds.
Since many of our public-facing APIs are written in python, we run many python interpreter instances on a single node. Those python instances will forward requests locally to the Gubernator instance which then batches and forwards requests to owning nodes.
Gubernator allows users to choose non-batching behavior which would further reduce latency for client rate limit requests. However, because of throughput requirements, our production environment uses Behaviour=BATCHING
with the default 500-microsecond window. In production, we have observed batch sizes of 1,000 during peak API usage. Other users who don’t have the same high traffic demands could disable batching and would see lower latencies but at the cost of throughput.
If you are using Golang, you can use Gubernator as a library. This is useful if you wish to implement a rate limit service with your own company-specific model on top. We do this internally here at Mailgun with a service we creatively called ratelimits
, which keeps track of the limits imposed on a per-account basis. In this way, you can utilize the power and speed of Gubernator but still layer business logic and integrate domain-specific problems into your rate limiting service.
When you use the library, your service becomes a full member of the cluster participating in the same consistent hashing and caching as a stand-alone Gubernator server would. All you need to do is provide the GRPC server instance and tell Gubernator where the peers in your cluster are located.
Using Gubernator as a general-purpose rate limiting service allows us to rely on microservice architecture without compromising on service independence and duplication of work required by common rate-limiting solutions. We hope by open-sourcing this project others can collaborate on and benefit from the work we began here.
Interested in working at Mailgun? We’re hiring! And there are several dev positions available. Check out our current openings here.