Skip to content

HAProxy troubleshooting

A diagnostic playbook for HAProxy in production — 503s, SSL handshake failures, slow backends, runaway connections, and the logs/commands that tell you why.

HAProxy fails in patterns. Once you have triaged a few dozen incidents, the same shapes repeat — backend marked DOWN, TLS handshake refused, queue building, 503 NOSRV in logs. This article is the diagnostic playbook we use: how to read an HAProxy log line, which runtime commands answer which question, and how to triage the five most common production failures.

How to verify

The first three commands on any HAProxy incident:

sudo journalctl -u haproxy -n 200 --no-pager | tail -50
echo "show info" | sudo socat /run/haproxy/admin.sock - | head -20
echo "show stat" | sudo socat /run/haproxy/admin.sock - | column -ts,

show info is uptime, version, current concurrency, max-conn ratios — does the process look healthy at the macro level. show stat lists every frontend, backend, and server with their state and counters. The journal tail shows what HAProxy has been complaining about.

What’s happening

A production HAProxy log line in httplog format:

192.0.2.1:54321 [02/Jun/2026:14:23:01.123] fe_http be_app/app1 0/0/1/15/16 200 1234 - - ---- 5/4/3/2/0 0/0 "GET /api/items HTTP/1.1"

The fields that matter for triage:

  • be_app/app1 — backend and server that handled the request.
  • 0/0/1/15/16 — Tq/Tw/Tc/Tr/Tt: request idle time / queue / connect / response / total.
  • 200 — HTTP status code.
  • ---- — termination flags. The first character is the most useful: s = server-side close, c = client-side close, S = server timeout, C = connect failure, H = header timeout, R = resource starvation, D = denied.
  • 5/4/3/2/0 — Actconn/feconn/beconn/srv_conn/retries: concurrent connections at process / frontend / backend / server, plus retry count.
  • 0/0 — server-queue / backend-queue lengths.

The termination flags are where you start. ---- is normal. SD---- means the server side terminated and DENY was triggered — usually a backend health check failed. cD---- means the client gave up while HAProxy was about to deny.

The procedure

  1. Backend marked DOWN (503 NOSRV). Symptom: logs show 503 - NOSRV in the backend column, clients see “Service Unavailable”:

    echo "show servers state" | sudo socat /run/haproxy/admin.sock -
    echo "show stat" | sudo socat /run/haproxy/admin.sock - | awk -F, '/^be_/ {print $1,$2,$18,$37}' | column -t
    sudo journalctl -u haproxy -n 200 --no-pager | grep -i 'health\|DOWN\|UP'

    The show stat columns include check_status (HCHK, L7OK, L4CON, L7STS, L7TOUT, etc.) and last_chk (the raw error). L4CON = connect refused at TCP level. L7TOUT = HTTP healthcheck timed out. L7STS/503 = backend returned 503.

    Triage: SSH to the backend host. Is the process running? Does curl localhost:8080/healthz work? If yes, check firewall (security group / iptables / network policy). If no, the app is broken — that is not an HAProxy problem.

  2. SSL handshake failures. Symptom: clients report “your connection is not secure” or curl returns SSL_ERROR_INTERNAL_ERROR. Logs show [ALERT] lines or :443 with no follow-up GET:

    sudo journalctl -u haproxy -n 500 --no-pager | grep -iE 'ssl|crt|alert|cert'
    openssl s_client -connect site.example.com:443 -servername site.example.com < /dev/null 2>&1 | head -40
    echo "show ssl cert" | sudo socat /run/haproxy/admin.sock -

    Common causes: expired certificate (check notAfter in s_client output), missing intermediate (chain incomplete), CA file mismatch for mTLS frontends, key file unreadable by the haproxy user. See HAProxy SSL termination for the chain build.

  3. Slow responses (high Tr). Symptom: clients see 30-second waits; HAProxy log lines show high Tr (response time from backend):

    tail -f /var/log/haproxy.log | awk '{print $11}' | sort | uniq -c | sort -rn | head
    sudo journalctl -u haproxy -n 200 --no-pager | grep -i 'timeout'
    echo "show table fe_http data.bytes_out_rate" | sudo socat /run/haproxy/admin.sock - | head

    If Tr is consistently >5s on one server but not others, that server is slow. If it is consistent across all servers, the backend dependency (database, cache) is slow. If Tt is high but Tr is normal, the client connection is slow.

    The timeout server directive caps how long HAProxy waits — if you see sD---- flags, the timeout fired before the backend responded. Increase if the backend genuinely needs more time, or fix the backend.

  4. Queue building (high beconn). Symptom: show stat shows qcur > 0 and growing; clients see latency increasing:

    echo "show stat" | sudo socat /run/haproxy/admin.sock - | awk -F, '/^be_/ && $3 > 0 {print $1,$2,$3,$4,$5,$15,$17,$18}' | column -t
    echo "show servers conn" | sudo socat /run/haproxy/admin.sock -

    qcur is current queue depth. A queue means more in-flight requests than the backend’s per-server maxconn allows. Either raise maxconn on the server line, add more servers, or fix the slow backend.

  5. Runaway connection count. Symptom: CurrConns climbing without bound; eventually hits the global maxconn and HAProxy starts refusing connections:

    echo "show info" | sudo socat /run/haproxy/admin.sock - | grep -E 'CurrConns|MaxConn'
    echo "show sess" | sudo socat /run/haproxy/admin.sock - | head -20
    ss -ant | awk '{print $1}' | sort | uniq -c

    show sess lists every session HAProxy holds. If thousands have the same client IP, you have a client opening too many connections; rate-limit them via stick-tables. If they all have very high age, they are leaking — usually a websocket or long-poll without a timeout tunnel.

  6. option http-server-close vs option http-keep-alive. Wrong choice can cause connection pool exhaustion at the backend:

    defaults
        option http-server-close
        timeout http-keep-alive 10s

    http-server-close closes the backend connection after each request, lets HAProxy keep-alive the client side. http-keep-alive keeps backend connections open. For apps that struggle with many short-lived connections, http-keep-alive reduces churn.

  7. Logs not shipping. Symptom: incident under way, the logs you expected to consult are not in your aggregator:

    sudo journalctl -u rsyslog -n 50 --no-pager
    sudo tail -f /var/log/haproxy.log
    ls -lh /var/log/haproxy.log
    df -h /var

    Common causes: rsyslog dropped the config snippet, /var filled up, logrotate failed to rotate, the agent (promtail/filebeat/cloudwatch) is dead. Check the destination first — the log is probably being written; it just is not arriving.

Common pitfalls

  • The HAProxy log timestamp is the time when the request completed, not when it started. A slow request shows up in the log at the moment of completion, not the moment of arrival.
  • show errors exposes the last 5 protocol errors per frontend/backend — invaluable for debugging malformed HTTP from a buggy backend.
  • journalctl -u haproxy shows the systemd-level messages; the per-request logs are typically in /var/log/haproxy.log via rsyslog. Do not confuse the two.
  • A reload while debugging clears show info counters per-worker. If you reloaded “to fix it” and the counters look fresh, you may have undone the symptom and the cause.
  • Don’t assume a backend is bad because HAProxy says so. The HAProxy → backend path goes through L3/L4 (firewall, network policy, MTU). tcpdump on both sides settles which hop is dropping.

Stack Harbor maintains a per-client HAProxy runbook with the diagnostic commands above, mapped to common alert types. Incident triage starts with the runbook; first-principles debugging is only when the runbook doesn’t cover the symptom. We update the runbook after every novel incident. This is part of the on-call discipline behind our Managed Operations practice.