HAProxy fails in patterns. Once you have triaged a few dozen incidents, the same shapes repeat — backend marked DOWN, TLS handshake refused, queue building, 503 NOSRV in logs. This article is the diagnostic playbook we use: how to read an HAProxy log line, which runtime commands answer which question, and how to triage the five most common production failures.
How to verify
The first three commands on any HAProxy incident:
sudo journalctl -u haproxy -n 200 --no-pager | tail -50
echo "show info" | sudo socat /run/haproxy/admin.sock - | head -20
echo "show stat" | sudo socat /run/haproxy/admin.sock - | column -ts,
show info is uptime, version, current concurrency, max-conn ratios — does the process look healthy at the macro level. show stat lists every frontend, backend, and server with their state and counters. The journal tail shows what HAProxy has been complaining about.
What’s happening
A production HAProxy log line in httplog format:
192.0.2.1:54321 [02/Jun/2026:14:23:01.123] fe_http be_app/app1 0/0/1/15/16 200 1234 - - ---- 5/4/3/2/0 0/0 "GET /api/items HTTP/1.1"
The fields that matter for triage:
be_app/app1— backend and server that handled the request.0/0/1/15/16— Tq/Tw/Tc/Tr/Tt: request idle time / queue / connect / response / total.200— HTTP status code.----— termination flags. The first character is the most useful:s= server-side close,c= client-side close,S= server timeout,C= connect failure,H= header timeout,R= resource starvation,D= denied.5/4/3/2/0— Actconn/feconn/beconn/srv_conn/retries: concurrent connections at process / frontend / backend / server, plus retry count.0/0— server-queue / backend-queue lengths.
The termination flags are where you start. ---- is normal. SD---- means the server side terminated and DENY was triggered — usually a backend health check failed. cD---- means the client gave up while HAProxy was about to deny.
The procedure
-
Backend marked DOWN (503 NOSRV). Symptom: logs show
503 - NOSRVin the backend column, clients see “Service Unavailable”:echo "show servers state" | sudo socat /run/haproxy/admin.sock - echo "show stat" | sudo socat /run/haproxy/admin.sock - | awk -F, '/^be_/ {print $1,$2,$18,$37}' | column -t sudo journalctl -u haproxy -n 200 --no-pager | grep -i 'health\|DOWN\|UP'The
show statcolumns includecheck_status(HCHK, L7OK, L4CON, L7STS, L7TOUT, etc.) andlast_chk(the raw error).L4CON= connect refused at TCP level.L7TOUT= HTTP healthcheck timed out.L7STS/503= backend returned 503.Triage: SSH to the backend host. Is the process running? Does
curl localhost:8080/healthzwork? If yes, check firewall (security group / iptables / network policy). If no, the app is broken — that is not an HAProxy problem. -
SSL handshake failures. Symptom: clients report “your connection is not secure” or
curlreturnsSSL_ERROR_INTERNAL_ERROR. Logs show[ALERT]lines or:443with no follow-upGET:sudo journalctl -u haproxy -n 500 --no-pager | grep -iE 'ssl|crt|alert|cert' openssl s_client -connect site.example.com:443 -servername site.example.com < /dev/null 2>&1 | head -40 echo "show ssl cert" | sudo socat /run/haproxy/admin.sock -Common causes: expired certificate (check
notAfterin s_client output), missing intermediate (chain incomplete), CA file mismatch for mTLS frontends, key file unreadable by the haproxy user. See HAProxy SSL termination for the chain build. -
Slow responses (high Tr). Symptom: clients see 30-second waits; HAProxy log lines show high
Tr(response time from backend):tail -f /var/log/haproxy.log | awk '{print $11}' | sort | uniq -c | sort -rn | head sudo journalctl -u haproxy -n 200 --no-pager | grep -i 'timeout' echo "show table fe_http data.bytes_out_rate" | sudo socat /run/haproxy/admin.sock - | headIf
Tris consistently >5s on one server but not others, that server is slow. If it is consistent across all servers, the backend dependency (database, cache) is slow. IfTtis high butTris normal, the client connection is slow.The
timeout serverdirective caps how long HAProxy waits — if you seesD----flags, the timeout fired before the backend responded. Increase if the backend genuinely needs more time, or fix the backend. -
Queue building (high
beconn). Symptom:show statshowsqcur> 0 and growing; clients see latency increasing:echo "show stat" | sudo socat /run/haproxy/admin.sock - | awk -F, '/^be_/ && $3 > 0 {print $1,$2,$3,$4,$5,$15,$17,$18}' | column -t echo "show servers conn" | sudo socat /run/haproxy/admin.sock -qcuris current queue depth. A queue means more in-flight requests than the backend’s per-servermaxconnallows. Either raisemaxconnon the server line, add more servers, or fix the slow backend. -
Runaway connection count. Symptom:
CurrConnsclimbing without bound; eventually hits the globalmaxconnand HAProxy starts refusing connections:echo "show info" | sudo socat /run/haproxy/admin.sock - | grep -E 'CurrConns|MaxConn' echo "show sess" | sudo socat /run/haproxy/admin.sock - | head -20 ss -ant | awk '{print $1}' | sort | uniq -cshow sesslists every session HAProxy holds. If thousands have the same client IP, you have a client opening too many connections; rate-limit them via stick-tables. If they all have very high age, they are leaking — usually a websocket or long-poll without atimeout tunnel. -
option http-server-closevsoption http-keep-alive. Wrong choice can cause connection pool exhaustion at the backend:defaults option http-server-close timeout http-keep-alive 10shttp-server-closecloses the backend connection after each request, lets HAProxy keep-alive the client side.http-keep-alivekeeps backend connections open. For apps that struggle with many short-lived connections,http-keep-alivereduces churn. -
Logs not shipping. Symptom: incident under way, the logs you expected to consult are not in your aggregator:
sudo journalctl -u rsyslog -n 50 --no-pager sudo tail -f /var/log/haproxy.log ls -lh /var/log/haproxy.log df -h /varCommon causes: rsyslog dropped the config snippet,
/varfilled up, logrotate failed to rotate, the agent (promtail/filebeat/cloudwatch) is dead. Check the destination first — the log is probably being written; it just is not arriving.
Common pitfalls
- The HAProxy log timestamp is the time when the request completed, not when it started. A slow request shows up in the log at the moment of completion, not the moment of arrival.
show errorsexposes the last 5 protocol errors per frontend/backend — invaluable for debugging malformed HTTP from a buggy backend.journalctl -u haproxyshows the systemd-level messages; the per-request logs are typically in/var/log/haproxy.logvia rsyslog. Do not confuse the two.- A reload while debugging clears
show infocounters per-worker. If you reloaded “to fix it” and the counters look fresh, you may have undone the symptom and the cause. - Don’t assume a backend is bad because HAProxy says so. The HAProxy → backend path goes through L3/L4 (firewall, network policy, MTU).
tcpdumpon both sides settles which hop is dropping.
Stack Harbor maintains a per-client HAProxy runbook with the diagnostic commands above, mapped to common alert types. Incident triage starts with the runbook; first-principles debugging is only when the runbook doesn’t cover the symptom. We update the runbook after every novel incident. This is part of the on-call discipline behind our Managed Operations practice.