The Traefik rateLimit middleware throttles incoming traffic using a token-bucket algorithm — average requests per second, with a burst allowance for momentary spikes. It is the right primitive for protecting expensive endpoints (login, search, write APIs) from credential-stuffing, scrapers, and accidental client loops. Getting the parameters wrong is what trips real users — a too-tight limit on legitimate traffic and a too-loose limit that doesn’t stop the abuse. This article covers the two parameters, the sourceCriterion for “limit by what”, and the patterns we deploy.

How to verify

# Hammer the route and watch for 429
for i in $(seq 1 50); do
  curl -sI https://api.example.com/login -o /dev/null -w "%{http_code}\n"
done | sort | uniq -c
# Expected: most 200, a tail of 429 once the bucket drains
# Confirm the middleware is wired
curl -s http://127.0.0.1:8082/api/http/middlewares | jq '.[] | select(.type=="rateLimit")'
# Per-route 429 count in Prometheus
curl -s http://127.0.0.1:9100/metrics | grep -E 'traefik_service_requests_total.*code="429"'

What’s happening

The token bucket starts full. Every incoming request consumes one token. Tokens refill at a rate of average per second. When the bucket is empty, requests get HTTP 429. The burst parameter is the maximum tokens the bucket holds — a higher burst means a bigger initial spike is allowed.

A common confusion: average is in requests per period, where period is by default 1 second. So average: 100, burst: 50 means 100 RPS sustained, with up to 50 extra requests in a single spike. You can set period: 1m to talk in requests-per-minute; the math is the same but the unit changes.

The middleware needs to know what to limit by. By default it limits per source IP using the immediate peer. Behind a proxy you need sourceCriterion.ipStrategy.depth to walk X-Forwarded-For (same semantics as ipAllowList). You can also limit by a request header value (X-Tenant, Authorization) using sourceCriterion.requestHeaderName — useful for per-tenant or per-API-key limits.

The store is in-memory and per-Traefik-instance. If you run two Traefik replicas, each has its own bucket and the effective limit is 2x average. There is no shared-store option in core; for true global limits you need a plugin backed by Redis or similar.

The procedure

Basic per-IP rate limit. 100 RPS sustained, 50-request burst, default per-IP keying.

http:
  middlewares:
    rl-api:
      rateLimit:
        average: 100
        burst: 50
  routers:
    api:
      rule: "Host(`api.example.com`)"
      entryPoints: [websecure]
      service: api-backend
      middlewares: [rl-api]

Tight limit on a sensitive endpoint. Login gets a stricter limit and a longer window — 5 attempts per minute, no burst.

http:
  middlewares:
    rl-login:
      rateLimit:
        average: 5
        period: 1m
        burst: 1
  routers:
    login:
      rule: "Host(`api.example.com`) && Path(`/login`)"
      entryPoints: [websecure]
      service: api-backend
      middlewares: [rl-login]

Per-tenant rate limit. Limit by X-Tenant header — each tenant gets its own bucket.

http:
  middlewares:
    rl-tenant:
      rateLimit:
        average: 1000
        burst: 200
        sourceCriterion:
          requestHeaderName: X-Tenant
  routers:
    api:
      rule: "Host(`api.example.com`)"
      entryPoints: [websecure]
      service: api-backend
      middlewares: [rl-tenant]

Behind a CDN / load balancer. Set ipStrategy.depth to walk past trusted hops.

http:
  middlewares:
    rl-api:
      rateLimit:
        average: 100
        burst: 50
        sourceCriterion:
          ipStrategy:
            depth: 2     # behind CDN + LB

Combine with a retry middleware — but be careful. Retries do not count toward the bucket (they go to the same backend, not the same Traefik entry), but a 429 returned to the client triggers client-side retries that DO refill the load.
```
http:
  routers:
    api-reads:
      rule: "Host(`api.example.com`) && Method(`GET`)"
      entryPoints: [websecure]
      service: api-backend
      middlewares: [rl-api, retry-safe]
```

Kubernetes IngressRoute equivalent.

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rl-api
  namespace: my-app
spec:
  rateLimit:
    average: 100
    burst: 50
    sourceCriterion:
      ipStrategy:
        depth: 2

Operational notes

Rate-limit values should map to realistic traffic — an audit of the existing P99 RPS per route is the right baseline before flipping the middleware on.
Per-IP limits punish NAT’d users — an office of 50 sharing one egress IP hits the limit immediately. Soften with a higher limit, or layer with a per-session limit upstream.
The default in-memory store means N replicas of Traefik = N times the effective limit. Set the average to 1/N of your target when running multiple replicas, or use a single-replica Traefik for the rate-limited surface.
429 responses should include a Retry-After header so well-behaved clients back off; Traefik does not add this automatically. A customResponseHeaders middleware in front can add it as a hint.
A rate-limit alert on a sudden jump in 429s is more useful than the limit itself — it tells you when the limit is working overtime.

Rate limiting is one of the most underused middlewares because the parameters require thought — a value picked from a blog post rarely matches your traffic. The audit-then-tune approach is part of how Stack Harbor deploys managed operations. For the related retry middleware see traefik-retry-middleware.