The TIG stack (Telegraf + InfluxDB + Grafana) is the push-based answer to the Prometheus question. Telegraf agents collect metrics and push to InfluxDB; InfluxDB stores them in buckets with a per-bucket retention; Grafana queries them with Flux or InfluxQL. It is the right shape when you have devices that cannot host a /metrics endpoint (network gear, MQTT sensors, app frameworks with native StatsD), or when you want a push model with strict per-tenant retention. This article walks through a working TIG install with the operational settings that prevent the disk and the cardinality from running away.

How to verify

After install, the three components should be talking and a Telegraf agent’s metrics should be queryable in Grafana.

sudo systemctl status influxdb telegraf grafana-server --no-pager
ss -lntp | grep -E ':(8086|3000)\b'
# InfluxDB health
curl -fsS http://127.0.0.1:8086/health | jq
influx ping
# A bucket exists and has cardinality
influx bucket list
influx query 'from(bucket:"telegraf") |> range(start: -5m) |> count()' --token $INFLUX_TOKEN

# Telegraf is sending
sudo journalctl -u telegraf -n 50 --no-pager | grep -E 'Wrote|error'

Wrote 12 metrics in 23.4ms in Telegraf’s log means the agent is pushing successfully. failed to write metrics to bucket means the token, bucket name, or org is wrong — the message contains which.

What’s happening

Telegraf is a Go agent with a plug-in architecture: dozens of inputs (system metrics, MySQL, Nginx, SNMP, Kafka, MQTT, StatsD, the list keeps growing) feed a buffer; that buffer is flushed to one or more outputs (InfluxDB, Prometheus remote-write, Kafka, file, anything). InfluxDB 2 is a time-series database with buckets (containers with a retention policy), measurements (analogous to tables), tags (indexed labels, low-cardinality), and fields (the actual values, not indexed).

The cardinality trap is the same as Loki and Prometheus. A tag with high cardinality (request ID, user ID) creates a new series per value and the index grows quickly; the right place for those is in a field. The retention story is the strength of InfluxDB compared to Prometheus — set a bucket to 30 days and the database evicts older points without intervention; multiple buckets with different retentions let you keep one-day raw and one-year downsampled in the same store.

The procedure

Install InfluxDB 2. From the official APT repo.

curl -fsSL https://repos.influxdata.com/influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata.gpg >/dev/null
echo 'deb https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt update
sudo apt install -y influxdb2 influxdb2-cli
sudo systemctl enable --now influxdb

Bootstrap a primary org, user, and bucket.

influx setup \
  --username sh-admin \
  --password "$(openssl rand -base64 32)" \
  --org stackharbor \
  --bucket telegraf \
  --retention 720h \
  --force
# The admin token is printed; capture it. Store it in /root/.influxdbv2/configs.
influx config ls

--retention 720h is 30 days. Adjust to your real retention requirement.

Create a dedicated write token for Telegraf (do not use the admin token).

ORG_ID=$(influx org list --name stackharbor --json | jq -r '.[0].id')
BUCKET_ID=$(influx bucket list --org stackharbor --name telegraf --json | jq -r '.[0].id')
influx auth create --org stackharbor \
  --write-bucket $BUCKET_ID \
  --description 'telegraf write'
# Capture the printed token; this goes into /etc/telegraf/telegraf.conf

Install and configure Telegraf.

sudo apt install -y telegraf

# /etc/telegraf/telegraf.conf
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  flush_interval = "10s"
  flush_jitter = "2s"
  omit_hostname = false

[global_tags]
  env = "prod"
  cluster = "ca-central-1"

[[outputs.influxdb_v2]]
  urls = ["http://127.0.0.1:8086"]
  token = "$INFLUX_TOKEN"
  organization = "stackharbor"
  bucket = "telegraf"

[[inputs.cpu]]
  percpu = true
  totalcpu = true
[[inputs.mem]]
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "overlay"]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.system]]
[[inputs.systemd_units]]
  pattern = "(nginx|postgresql|telegraf|sshd)*"

$INFLUX_TOKEN is read from /etc/default/telegraf:

# /etc/default/telegraf
INFLUX_TOKEN=<write-token-from-step-3>

sudo systemctl enable --now telegraf

Wire Grafana as a datasource against InfluxDB 2. Use the Flux query language for new dashboards; InfluxQL still works but is deprecated for 2.x.

# /etc/grafana/provisioning/datasources/influxdb.yaml
apiVersion: 1
datasources:
  - name: InfluxDB
    type: influxdb
    url: http://127.0.0.1:8086
    access: proxy
    jsonData:
      version: Flux
      organization: stackharbor
      defaultBucket: telegraf
      tlsSkipVerify: false
    secureJsonData:
      token: <a-read-token-created-the-same-way>

Set per-bucket retention and downsampling. A common pattern: keep telegraf (10 s resolution) for 30 days, downsample to a telegraf_1m bucket for one year.

influx bucket create --org stackharbor --name telegraf_1m --retention 8760h
# A task that aggregates every 1m and writes to the long-retention bucket:
influx task create -f /etc/influxdb/tasks/downsample-1m.flux --org stackharbor

// /etc/influxdb/tasks/downsample-1m.flux
option task = { name: "downsample-telegraf-1m", every: 1m }
from(bucket: "telegraf")
  |> range(start: -2m, stop: -1m)
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
  |> to(bucket: "telegraf_1m", org: "stackharbor")

Operational notes

A high-cardinality tag (anything per-request) causes InfluxDB’s TSI index to grow without bound; check _internal series storage_shard_disk_size and tsi1_series_create to catch this early.
Telegraf’s metric_buffer_limit is per-output — exceed it and the agent starts dropping metrics with a log line; size it for your worst plausible InfluxDB outage.
InfluxDB 2 uses bcrypt for passwords and the setup --password flag accepts plaintext — rotate the admin password through influx user password after setup so the bash history does not hold it.
The retention enforcement is async — a bucket with --retention 1h does not delete points exactly at 1h; expect a few minutes lag.
Grafana panels written in InfluxQL keep working but cannot use the Flux-only features (joins, pivots); document which dashboards are stuck on the old query language.

For the pull-based alternative — Prometheus on the same host — see prometheus-install-ubuntu. For a Netdata-based local agent that can push to InfluxDB, see netdata-install.

Stack Harbor runs TIG for clients whose source devices cannot host a /metrics endpoint, as part of the managed operations tier — retention buckets sized to the audit window, write tokens scoped to a bucket, and cardinality on every series watched alongside the standard health metrics.