A backend without health checks is one dead pod away from serving 502s to half your users. HAProxy supports TCP probes, HTTP probes, agent checks, and external command checks; this article covers the patterns we put on every production backend, the timing parameters that prevent flapping, and the http-check rule chain that lets you assert on the response body.
How to verify
For an existing backend, the runtime API and stats page tell you what state each server is in and why.
echo "show servers state" | sudo socat /run/haproxy/admin.sock -
echo "show stat" | sudo socat /run/haproxy/admin.sock - | column -ts,
curl -s http://127.0.0.1:8404/stats?stats;csv | awk -F, '/^be_/ { print $1,$2,$18,$37 }'
sudo journalctl -u haproxy -n 50 --no-pager | grep -iE 'health|server'
The stats CSV columns include status, check_status, last_chk (response text), and last_chg (seconds since last status change). When a backend goes DOWN, the last_chk text tells you what failed — connection refused, HTTP 503, timeout, body mismatch.
What’s happening
HAProxy probes a backend server on a timer, marks it UP after N consecutive successes (rise) and DOWN after M consecutive failures (fall). The interval is inter. The probe protocol is the section’s option directive: option tcp-check for raw TCP, option httpchk for HTTP, no option means a bare TCP connect.
The probe types in order of fidelity:
- No check — the
serverline withoutcheck. HAProxy never probes; you discover failures only when client traffic hits the dead server. - TCP connect check —
checkwith nooption. HAProxy opens a TCP connection and immediately closes. Tells you the port is listening; tells you nothing about whether the app is healthy. - HTTP check (
option httpchk) — HAProxy sends an HTTP request and asserts on the response. This is the production default for HTTP backends. - TCP send/expect check (
option tcp-check) — script oftcp-check send/tcp-check expectlines. Used for non-HTTP protocols (Redis PING, SMTP HELO, MySQL). - Agent check (
agent-check) — a separate TCP probe that returns a weight or status string. The app announces its own health. - External check (
option external-check) — runs a script. Powerful but the slowest; avoid in high-cardinality backends.
The probe IP and port can differ from the traffic IP and port: port 9000 on a server line probes 9000 even though traffic goes to 8080. This is how you put a thin /healthz server on a sidecar port without polluting the main app.
The procedure
-
HTTP backend with a real
/healthz. The expected production pattern:backend be_app option httpchk http-check send meth GET uri /healthz ver HTTP/1.1 hdr Host app.internal http-check expect status 200 server app1 10.0.1.11:8080 check inter 2s fall 3 rise 2 server app2 10.0.1.12:8080 check inter 2s fall 3 rise 2The modern syntax (
http-check send/http-check expect) replaces the older one-lineroption httpchk GET /healthz. It is more readable and supports multiple checks chained. -
Time the parameters deliberately. Production defaults we use:
inter 2s— probe every 2 seconds. Faster catches outages faster but doubles the probe traffic.fall 3— 3 consecutive failures before marking DOWN. Withinter 2s, the longest outage detection is 6s.rise 2— 2 consecutive successes before marking UP again. Prevents a flapping server from cycling traffic.slowstart 30s— when a server transitions from DOWN to UP, ramp its weight from 0 to full over 30 seconds. Critical for apps with cold caches.
-
Chain multiple HTTP checks. Probe two endpoints, fail if either fails:
backend be_app option httpchk http-check send meth GET uri /healthz hdr Host app.internal http-check expect status 200 http-check send meth GET uri /readyz hdr Host app.internal http-check expect status 200 server app1 10.0.1.11:8080 check inter 5s fall 3 rise 2Each
http-check sendruns sequentially within a single probe cycle. Fail any one of them and the whole probe fails. -
Assert on response body. Not just the status code:
backend be_redis_health_proxy option httpchk http-check send meth GET uri /healthz http-check expect rstring "redis:ok" server cache1 10.0.2.11:8080 check inter 2s fall 3 rise 2rstring,string,status,rstatus,headerare the supported expect predicates. -
TCP send/expect for non-HTTP backends. Redis PING:
backend be_redis mode tcp option tcp-check tcp-check send PING\r\n tcp-check expect string +PONG server redis1 10.0.2.11:6379 check inter 2s fall 3 rise 2MySQL HELO:
backend be_mysql mode tcp option mysql-check user haproxy_check server db1 10.0.3.11:3306 check inter 5s fall 3 rise 2option mysql-checkis purpose-built: a real MySQL handshake, not just a TCP connect. -
Separate probe port from traffic port. When the app and the health endpoint live on different ports:
backend be_app option httpchk http-check send meth GET uri /healthz http-check expect status 200 server app1 10.0.1.11:8080 check port 9090 inter 2s fall 3 rise 2Probes hit 9090, traffic flows to 8080.
-
Maintenance and disabled state. A server can be drained or marked maintenance via runtime API without modifying the config:
echo "disable server be_app/app1" | sudo socat /run/haproxy/admin.sock - echo "set server be_app/app1 state drain" | sudo socat /run/haproxy/admin.sock - echo "enable server be_app/app1" | sudo socat /run/haproxy/admin.sock -drainstops new sessions, lets existing ones finish.disableis harder; runtime API setsMAINTstate.
Common pitfalls
- A health check that just opens a TCP connection on the app port reports
UPeven when the app is wedged. Always probe an HTTP/healthzthat actually exercises the app’s dependencies. fall 3 rise 2withinter 2smeans a sick backend takes 6s to mark DOWN and 4s to come back UP. Aggressive (inter 1s fall 2) detects outages in 2s but generates 4× more probe traffic.- The default
timeout checkinherits fromtimeout connectif unset. If your app’s healthz takes 8s to respond andtimeout connectis 5s, the probe fails and you wonder why the backend is DOWN. - A
/healthzthat returns 200 unconditionally is a lie. The endpoint should fail when the app cannot serve traffic — DB unreachable, cache cold, dependency dead. See HAProxy troubleshooting. option httpchkandoption ssl-hello-chkare mutually exclusive — pick one. For TLS backends you usually want HTTP check over the TLS connection: addssl verify required ca-file ...on the server line, notssl-hello-chkalone.
Stack Harbor wires health checks as part of the bring-up checklist on every backend — never the check-with-no-option default. We also wire slowstart on cold-cache backends and a separate probe port for apps that need a fat /healthz without exposing it on the traffic port. This is part of how we run Clustered Environments.