Skip to content

HAProxy health checks

TCP, HTTP, and external health checks in HAProxy — inter, fall, rise, expect rules, and the production patterns that catch sick backends without flapping.

A backend without health checks is one dead pod away from serving 502s to half your users. HAProxy supports TCP probes, HTTP probes, agent checks, and external command checks; this article covers the patterns we put on every production backend, the timing parameters that prevent flapping, and the http-check rule chain that lets you assert on the response body.

How to verify

For an existing backend, the runtime API and stats page tell you what state each server is in and why.

echo "show servers state" | sudo socat /run/haproxy/admin.sock -
echo "show stat" | sudo socat /run/haproxy/admin.sock - | column -ts,
curl -s http://127.0.0.1:8404/stats?stats;csv | awk -F, '/^be_/ { print $1,$2,$18,$37 }'
sudo journalctl -u haproxy -n 50 --no-pager | grep -iE 'health|server'

The stats CSV columns include status, check_status, last_chk (response text), and last_chg (seconds since last status change). When a backend goes DOWN, the last_chk text tells you what failed — connection refused, HTTP 503, timeout, body mismatch.

What’s happening

HAProxy probes a backend server on a timer, marks it UP after N consecutive successes (rise) and DOWN after M consecutive failures (fall). The interval is inter. The probe protocol is the section’s option directive: option tcp-check for raw TCP, option httpchk for HTTP, no option means a bare TCP connect.

The probe types in order of fidelity:

  • No check — the server line without check. HAProxy never probes; you discover failures only when client traffic hits the dead server.
  • TCP connect checkcheck with no option. HAProxy opens a TCP connection and immediately closes. Tells you the port is listening; tells you nothing about whether the app is healthy.
  • HTTP check (option httpchk) — HAProxy sends an HTTP request and asserts on the response. This is the production default for HTTP backends.
  • TCP send/expect check (option tcp-check) — script of tcp-check send / tcp-check expect lines. Used for non-HTTP protocols (Redis PING, SMTP HELO, MySQL).
  • Agent check (agent-check) — a separate TCP probe that returns a weight or status string. The app announces its own health.
  • External check (option external-check) — runs a script. Powerful but the slowest; avoid in high-cardinality backends.

The probe IP and port can differ from the traffic IP and port: port 9000 on a server line probes 9000 even though traffic goes to 8080. This is how you put a thin /healthz server on a sidecar port without polluting the main app.

The procedure

  1. HTTP backend with a real /healthz. The expected production pattern:

    backend be_app
        option httpchk
        http-check send meth GET uri /healthz ver HTTP/1.1 hdr Host app.internal
        http-check expect status 200
        server app1 10.0.1.11:8080 check inter 2s fall 3 rise 2
        server app2 10.0.1.12:8080 check inter 2s fall 3 rise 2

    The modern syntax (http-check send / http-check expect) replaces the older one-liner option httpchk GET /healthz. It is more readable and supports multiple checks chained.

  2. Time the parameters deliberately. Production defaults we use:

    • inter 2s — probe every 2 seconds. Faster catches outages faster but doubles the probe traffic.
    • fall 3 — 3 consecutive failures before marking DOWN. With inter 2s, the longest outage detection is 6s.
    • rise 2 — 2 consecutive successes before marking UP again. Prevents a flapping server from cycling traffic.
    • slowstart 30s — when a server transitions from DOWN to UP, ramp its weight from 0 to full over 30 seconds. Critical for apps with cold caches.
  3. Chain multiple HTTP checks. Probe two endpoints, fail if either fails:

    backend be_app
        option httpchk
        http-check send meth GET uri /healthz hdr Host app.internal
        http-check expect status 200
        http-check send meth GET uri /readyz hdr Host app.internal
        http-check expect status 200
        server app1 10.0.1.11:8080 check inter 5s fall 3 rise 2

    Each http-check send runs sequentially within a single probe cycle. Fail any one of them and the whole probe fails.

  4. Assert on response body. Not just the status code:

    backend be_redis_health_proxy
        option httpchk
        http-check send meth GET uri /healthz
        http-check expect rstring "redis:ok"
        server cache1 10.0.2.11:8080 check inter 2s fall 3 rise 2

    rstring, string, status, rstatus, header are the supported expect predicates.

  5. TCP send/expect for non-HTTP backends. Redis PING:

    backend be_redis
        mode tcp
        option tcp-check
        tcp-check send PING\r\n
        tcp-check expect string +PONG
        server redis1 10.0.2.11:6379 check inter 2s fall 3 rise 2

    MySQL HELO:

    backend be_mysql
        mode tcp
        option mysql-check user haproxy_check
        server db1 10.0.3.11:3306 check inter 5s fall 3 rise 2

    option mysql-check is purpose-built: a real MySQL handshake, not just a TCP connect.

  6. Separate probe port from traffic port. When the app and the health endpoint live on different ports:

    backend be_app
        option httpchk
        http-check send meth GET uri /healthz
        http-check expect status 200
        server app1 10.0.1.11:8080 check port 9090 inter 2s fall 3 rise 2

    Probes hit 9090, traffic flows to 8080.

  7. Maintenance and disabled state. A server can be drained or marked maintenance via runtime API without modifying the config:

    echo "disable server be_app/app1" | sudo socat /run/haproxy/admin.sock -
    echo "set server be_app/app1 state drain" | sudo socat /run/haproxy/admin.sock -
    echo "enable server be_app/app1" | sudo socat /run/haproxy/admin.sock -

    drain stops new sessions, lets existing ones finish. disable is harder; runtime API sets MAINT state.

Common pitfalls

  • A health check that just opens a TCP connection on the app port reports UP even when the app is wedged. Always probe an HTTP /healthz that actually exercises the app’s dependencies.
  • fall 3 rise 2 with inter 2s means a sick backend takes 6s to mark DOWN and 4s to come back UP. Aggressive (inter 1s fall 2) detects outages in 2s but generates 4× more probe traffic.
  • The default timeout check inherits from timeout connect if unset. If your app’s healthz takes 8s to respond and timeout connect is 5s, the probe fails and you wonder why the backend is DOWN.
  • A /healthz that returns 200 unconditionally is a lie. The endpoint should fail when the app cannot serve traffic — DB unreachable, cache cold, dependency dead. See HAProxy troubleshooting.
  • option httpchk and option ssl-hello-chk are mutually exclusive — pick one. For TLS backends you usually want HTTP check over the TLS connection: add ssl verify required ca-file ... on the server line, not ssl-hello-chk alone.

Stack Harbor wires health checks as part of the bring-up checklist on every backend — never the check-with-no-option default. We also wire slowstart on cold-cache backends and a separate probe port for apps that need a fat /healthz without exposing it on the traffic port. This is part of how we run Clustered Environments.