The Traefik rateLimit middleware throttles incoming traffic using a token-bucket algorithm — average requests per second, with a burst allowance for momentary spikes. It is the right primitive for protecting expensive endpoints (login, search, write APIs) from credential-stuffing, scrapers, and accidental client loops. Getting the parameters wrong is what trips real users — a too-tight limit on legitimate traffic and a too-loose limit that doesn’t stop the abuse. This article covers the two parameters, the sourceCriterion for “limit by what”, and the patterns we deploy.
How to verify
# Hammer the route and watch for 429
for i in $(seq 1 50); do
curl -sI https://api.example.com/login -o /dev/null -w "%{http_code}\n"
done | sort | uniq -c
# Expected: most 200, a tail of 429 once the bucket drains
# Confirm the middleware is wired
curl -s http://127.0.0.1:8082/api/http/middlewares | jq '.[] | select(.type=="rateLimit")'
# Per-route 429 count in Prometheus
curl -s http://127.0.0.1:9100/metrics | grep -E 'traefik_service_requests_total.*code="429"'
What’s happening
The token bucket starts full. Every incoming request consumes one token. Tokens refill at a rate of average per second. When the bucket is empty, requests get HTTP 429. The burst parameter is the maximum tokens the bucket holds — a higher burst means a bigger initial spike is allowed.
A common confusion: average is in requests per period, where period is by default 1 second. So average: 100, burst: 50 means 100 RPS sustained, with up to 50 extra requests in a single spike. You can set period: 1m to talk in requests-per-minute; the math is the same but the unit changes.
The middleware needs to know what to limit by. By default it limits per source IP using the immediate peer. Behind a proxy you need sourceCriterion.ipStrategy.depth to walk X-Forwarded-For (same semantics as ipAllowList). You can also limit by a request header value (X-Tenant, Authorization) using sourceCriterion.requestHeaderName — useful for per-tenant or per-API-key limits.
The store is in-memory and per-Traefik-instance. If you run two Traefik replicas, each has its own bucket and the effective limit is 2x average. There is no shared-store option in core; for true global limits you need a plugin backed by Redis or similar.
The procedure
-
Basic per-IP rate limit. 100 RPS sustained, 50-request burst, default per-IP keying.
http: middlewares: rl-api: rateLimit: average: 100 burst: 50 routers: api: rule: "Host(`api.example.com`)" entryPoints: [websecure] service: api-backend middlewares: [rl-api] -
Tight limit on a sensitive endpoint. Login gets a stricter limit and a longer window — 5 attempts per minute, no burst.
http: middlewares: rl-login: rateLimit: average: 5 period: 1m burst: 1 routers: login: rule: "Host(`api.example.com`) && Path(`/login`)" entryPoints: [websecure] service: api-backend middlewares: [rl-login] -
Per-tenant rate limit. Limit by
X-Tenantheader — each tenant gets its own bucket.http: middlewares: rl-tenant: rateLimit: average: 1000 burst: 200 sourceCriterion: requestHeaderName: X-Tenant routers: api: rule: "Host(`api.example.com`)" entryPoints: [websecure] service: api-backend middlewares: [rl-tenant] -
Behind a CDN / load balancer. Set
ipStrategy.depthto walk past trusted hops.http: middlewares: rl-api: rateLimit: average: 100 burst: 50 sourceCriterion: ipStrategy: depth: 2 # behind CDN + LB -
Combine with a retry middleware — but be careful. Retries do not count toward the bucket (they go to the same backend, not the same Traefik entry), but a 429 returned to the client triggers client-side retries that DO refill the load.
http: routers: api-reads: rule: "Host(`api.example.com`) && Method(`GET`)" entryPoints: [websecure] service: api-backend middlewares: [rl-api, retry-safe] -
Kubernetes IngressRoute equivalent.
apiVersion: traefik.io/v1alpha1 kind: Middleware metadata: name: rl-api namespace: my-app spec: rateLimit: average: 100 burst: 50 sourceCriterion: ipStrategy: depth: 2
Operational notes
- Rate-limit values should map to realistic traffic — an audit of the existing P99 RPS per route is the right baseline before flipping the middleware on.
- Per-IP limits punish NAT’d users — an office of 50 sharing one egress IP hits the limit immediately. Soften with a higher limit, or layer with a per-session limit upstream.
- The default in-memory store means N replicas of Traefik = N times the effective limit. Set the
averageto 1/N of your target when running multiple replicas, or use a single-replica Traefik for the rate-limited surface. - 429 responses should include a
Retry-Afterheader so well-behaved clients back off; Traefik does not add this automatically. AcustomResponseHeadersmiddleware in front can add it as a hint. - A rate-limit alert on a sudden jump in 429s is more useful than the limit itself — it tells you when the limit is working overtime.
Rate limiting is one of the most underused middlewares because the parameters require thought — a value picked from a blog post rarely matches your traffic. The audit-then-tune approach is part of how Stack Harbor deploys managed operations. For the related retry middleware see traefik-retry-middleware.