It doesn’t matter whether the intent is malicious (brute‑force password guessing and DDoS attacks) or benign (customers flocking to a sale) – a high volume of HTTP requests can overwhelm your services and cause your apps to crash. An easy solution to the problem is rate limiting, which restricts the number of requests each user can make in a given time period. In a Kubernetes environment, however, a significant part of the total volume of traffic reaching a service might be outside of the purview of the Ingress controller, in the form of communication with other services. In this situation it often makes sense to set up rate‑limiting policies using a service mesh.
Configuring rate limiting with NGINX Service Mesh is a simple task which you can complete in less than 10 minutes. Check out this demo to see it in action and read on to learn how to define and apply a rate‑limiting policy.
Demo: Configuring Rate Limiting with NGINX Service Mesh
The demo uses three containers injected with the NGINX Service Mesh sidecar: a backend service, a frontend service, and a
bash terminal. The NGINX Service Mesh control plane has also been deployed.
The frontend service sends a request to the backend service every second, and we can see the responses in the frontend service’s log:
backend v1 backend v1 backend v1 backend v1
Applying a Rate-Limiting Policy (1:00)
Suppose we don’t want the backend service to be receiving so many requests. We can define a rate‑limiting policy as a custom resource with the following fields:
destination– The service that’s receiving requests; here it’s our backend service.
sources– A list of clients from which requests come, each subjected to the rate limit. Here we’re defining just one source, our frontend service.
rate– The rate limit. Here it’s 10 requests per minute, or 1 every 6 seconds.
apiVersion: specs.smi.nginx.com/v1alpha1 kind: RateLimit metadata: name: backend-rate-limit namespace: default spec: destination: kind: Service name: backend-svc namespace: default sources: - kind: Deployment name: frontend namespace: default name: 10rm rate: 10r/m burst: 0 delay: nodelay
We run this command to activate the policy:
$ kubectl create -f rate-limit.yaml
In the log for the frontend, we see that five of every six requests is denied with this message:
<html> <head><title>503 Service Temporarily Unavailable</title</head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx/1.19.5</center> </body> </html>
Applying the Rate Limit to All Clients (2:32)
The rate limit applies only to the client named in the
sources field (our frontend service). The backend service accepts requests from all other clients at whatever rate they send them. We can illustrate this by repeatedly sending requests in the
bash terminal; each request receives the
v1 response that indicates success.
There are two ways to apply the rate limit to all clients. The first is to add their names to the
sources field. The second, and much simpler, way is to remove the
sources field entirely. We do that by running this command to edit the policy:
$ kubectl edit ratelimits.specs.smi.nginx.com backend-rate-limit
After saving the edited policy, we again make requests in the
bash terminal and see that requests from that source that exceed the rate limit get rejected with the formatted
503 error shown above.
Allowing Bursts of Requests (3:43)
There are a couple of other fields we can add to the policy to customize the rate limit. We know that some apps are “bursty”, sending multiple requests in rapid succession. To accommodate this, we can add the
burst field. Here we set it to
3, meaning that the backend service accepts that many additional requests in each six‑second period. Requests beyond that are rejected.
delay field controls how the allowed burst of requests is fed to the backend service. Without it (that is, by default), burst requests are queued and are sent according to the rate limit, interleaved with new requests. To send burst requests immediately, we set the
delay field to the value
You can also set the
delay field to an integer. For example, if we set it to
3 and increase the
burst field to
5, then when five or more burst requests arrive with a six‑second period, three are sent immediately, two are queued, and the rest are rejected.
We can observe the effect in the log of setting
nodelay. We see three extra requests are accepted before a request is rejected:
backend v1 backend v1 backend v1 backend v1 <html> <head><title>503 Service Temporarily Unavailable</title</head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx/1.19.5</center> </body> </html> . . .
Removing the Rate Limit (6:30)
The final action in our demo is run this command to deactivate the rate‑limiting policy and accept all requests:
$ kubectl delete -f rate-limit.yml
Try NGINX Service Mesh for Rate Limiting
For details on the burst and delay parameters, see the reference documentation. For a discussion of other traffic‑management patterns, read How to Improve Resilience in Kubernetes with Advanced Traffic Management on our blog.
Also check out these video demos of NGINX Service Mesh features:
- How to Use NGINX Service Mesh for Traffic Splitting
- How to Use NGINX Service Mesh for Secure Access Control