Skip to content

Longhorn: installing distributed block storage on a Kubernetes cluster

Install Longhorn on Kubernetes for replicated block storage volumes, with the storage class options, node tagging, and snapshot/backup configuration we deploy in production.

Longhorn is a CNCF-graduated, Rancher-originated distributed block storage system for Kubernetes — it gives you replicated PersistentVolumes with snapshots and backups without standing up Ceph or Portworx. Each volume’s replicas are stored as files on the node filesystems, so the operational model is “Kubernetes pods with replicated state on local disks.” This article walks the Helm install we use on Kubernetes clusters, the storage class options that matter (replicaCount, staleReplicaTimeout), node tagging for placement, and the snapshot-to-S3 backup wiring that makes Longhorn useful as more than just a local volume manager.

How to verify

# Longhorn namespace and pods
kubectl -n longhorn-system get pods
kubectl -n longhorn-system get nodes.longhorn.io

# Storage class
kubectl get storageclass longhorn -o yaml

# Volume and replica status
kubectl get volumes.longhorn.io -A
kubectl get replicas.longhorn.io -A

# Engine and manager logs
kubectl -n longhorn-system logs -l app=longhorn-manager --tail=50

What’s happening

Longhorn has three core components per cluster. The manager is a controller running on every node that handles volume lifecycle — provisioning, attachment, replica placement. The engine is a per-volume process that runs on the node where the volume is attached; it accepts iSCSI traffic from the kubelet and replicates writes to the volume’s replicas. Replicas are processes that hold actual data on the node filesystems. A volume with replicaCount: 3 has three replica processes, ideally on three different nodes, and the engine writes to all of them synchronously.

The CSI driver presents Longhorn volumes as iSCSI targets to the pod. The kubelet attaches the iSCSI device, mounts it, and bind-mounts it into the pod. This means Longhorn doesn’t need a special filesystem in the pod — it’s just a block device, and the pod can format it as ext4, XFS, or whatever. The downside is iSCSI overhead vs a pure ReadWriteMany filesystem; Longhorn is RWO-only (one node at a time can mount a given volume), and that’s a constraint to plan around for stateful sets that want ReadWriteMany semantics.

Backups are S3-based. You configure a BackupTarget (an S3 bucket or NFS mount), then trigger backups manually or via RecurringJob CRDs. The backup stores an incremental snapshot — only changed blocks since the last backup — which is efficient but means restore requires the chain of backups to be intact. We always validate backups by doing periodic test restores; a backup you’ve never restored from is not a backup.

The node story matters. Longhorn places replicas using two criteria: node tags (you label nodes with disk-type=ssd) and zone awareness (it tries to put replicas in different topology zones). For a multi-rack or multi-AZ deployment, set node tags and topology spread so replicas don’t all end up on the same rack.

The procedure

  1. Verify prerequisites — every node must have open-iscsi installed and the kernel modules iscsi_tcp and nfs available:

    # On every K8s node
    sudo apt install -y open-iscsi nfs-common
    sudo systemctl enable --now iscsid
  2. Install Longhorn via Helm:

    helm repo add longhorn https://charts.longhorn.io
    helm repo update
    kubectl create namespace longhorn-system
    helm install longhorn longhorn/longhorn \
      --namespace longhorn-system \
      --set defaultSettings.defaultReplicaCount=3 \
      --set defaultSettings.replicaSoftAntiAffinity=false \
      --set defaultSettings.defaultDataPath=/var/lib/longhorn \
      --set persistence.defaultClassReplicaCount=3 \
      --version 1.7.0
  3. Verify pods are healthy across all worker nodes:

    kubectl -n longhorn-system get pods -o wide
    kubectl -n longhorn-system get nodes.longhorn.io
  4. Tag nodes for replica placement (e.g., for SSD-only volumes):

    kubectl annotate node worker-1 node.longhorn.io/default-disks-config='[{"path":"/var/lib/longhorn","allowScheduling":true,"tags":["ssd"]}]'
  5. Create a storage class for SSD-tagged disks with 3 replicas:

    # storageclass-longhorn-ssd.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: longhorn-ssd
    provisioner: driver.longhorn.io
    allowVolumeExpansion: true
    reclaimPolicy: Delete
    volumeBindingMode: Immediate
    parameters:
      numberOfReplicas: "3"
      staleReplicaTimeout: "2880"
      fromBackup: ""
      fsType: "ext4"
      diskSelector: "ssd"
      dataLocality: "best-effort"
    kubectl apply -f storageclass-longhorn-ssd.yaml
  6. Provision a PVC:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-pvc
    spec:
      accessModes: [ReadWriteOnce]
      storageClassName: longhorn-ssd
      resources:
        requests:
          storage: 10Gi
  7. Configure an S3 backup target (using the Longhorn UI port-forward or by CR):

    kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80 &
    # Then in browser: Settings → BackupTarget = s3://backups@us-east-1/longhorn
    # BackupTargetCredentialSecret = longhorn-backup-secret (containing AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY)

    Or apply a RecurringJob CRD for nightly backups:

    apiVersion: longhorn.io/v1beta2
    kind: RecurringJob
    metadata:
      name: backup-daily
      namespace: longhorn-system
    spec:
      cron: "0 2 * * *"
      task: backup
      groups: [default]
      retain: 14
      concurrency: 2

Common pitfalls

  • Forgetting open-iscsi on a worker node — Longhorn pods come up but volume attach fails with cryptic iSCSI errors when a workload schedules there.
  • replicaSoftAntiAffinity=true (the default in older versions) lets replicas land on the same node, defeating fault tolerance. Set to false.
  • ReadWriteMany requires the NFS-based RWX feature (shareManager), and it’s slower than RWO. Don’t enable it casually.
  • Volume expansion needs the underlying filesystem to support online resize. ext4 and XFS both do, but the pod consuming the PVC needs to be running for the expand to complete.
  • Longhorn data lives on a single path (defaultDataPath) by default; if you have multiple disks per node, configure them explicitly via the node CRDs or you only use one.

In the engagements we run, Longhorn is the storage tier for Kubernetes-native stateful workloads where the customer doesn’t have a Ceph cluster or external SAN. We deploy with replicaCount=3, configure S3 backups against the customer’s object store or AWS S3, and validate restores monthly. For larger or more performance-sensitive workloads we’d reach for Rook-Ceph; for the more typical mid-tier, Longhorn is hard to beat on operational simplicity.