The TIG stack (Telegraf + InfluxDB + Grafana) is the push-based answer to the Prometheus question. Telegraf agents collect metrics and push to InfluxDB; InfluxDB stores them in buckets with a per-bucket retention; Grafana queries them with Flux or InfluxQL. It is the right shape when you have devices that cannot host a /metrics endpoint (network gear, MQTT sensors, app frameworks with native StatsD), or when you want a push model with strict per-tenant retention. This article walks through a working TIG install with the operational settings that prevent the disk and the cardinality from running away.
How to verify
After install, the three components should be talking and a Telegraf agent’s metrics should be queryable in Grafana.
sudo systemctl status influxdb telegraf grafana-server --no-pager
ss -lntp | grep -E ':(8086|3000)\b'
# InfluxDB health
curl -fsS http://127.0.0.1:8086/health | jq
influx ping
# A bucket exists and has cardinality
influx bucket list
influx query 'from(bucket:"telegraf") |> range(start: -5m) |> count()' --token $INFLUX_TOKEN
# Telegraf is sending
sudo journalctl -u telegraf -n 50 --no-pager | grep -E 'Wrote|error'
Wrote 12 metrics in 23.4ms in Telegraf’s log means the agent is pushing successfully. failed to write metrics to bucket means the token, bucket name, or org is wrong — the message contains which.
What’s happening
Telegraf is a Go agent with a plug-in architecture: dozens of inputs (system metrics, MySQL, Nginx, SNMP, Kafka, MQTT, StatsD, the list keeps growing) feed a buffer; that buffer is flushed to one or more outputs (InfluxDB, Prometheus remote-write, Kafka, file, anything). InfluxDB 2 is a time-series database with buckets (containers with a retention policy), measurements (analogous to tables), tags (indexed labels, low-cardinality), and fields (the actual values, not indexed).
The cardinality trap is the same as Loki and Prometheus. A tag with high cardinality (request ID, user ID) creates a new series per value and the index grows quickly; the right place for those is in a field. The retention story is the strength of InfluxDB compared to Prometheus — set a bucket to 30 days and the database evicts older points without intervention; multiple buckets with different retentions let you keep one-day raw and one-year downsampled in the same store.
The procedure
-
Install InfluxDB 2. From the official APT repo.
curl -fsSL https://repos.influxdata.com/influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata.gpg >/dev/null echo 'deb https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list sudo apt update sudo apt install -y influxdb2 influxdb2-cli sudo systemctl enable --now influxdb -
Bootstrap a primary org, user, and bucket.
influx setup \ --username sh-admin \ --password "$(openssl rand -base64 32)" \ --org stackharbor \ --bucket telegraf \ --retention 720h \ --force # The admin token is printed; capture it. Store it in /root/.influxdbv2/configs. influx config ls--retention 720his 30 days. Adjust to your real retention requirement. -
Create a dedicated write token for Telegraf (do not use the admin token).
ORG_ID=$(influx org list --name stackharbor --json | jq -r '.[0].id') BUCKET_ID=$(influx bucket list --org stackharbor --name telegraf --json | jq -r '.[0].id') influx auth create --org stackharbor \ --write-bucket $BUCKET_ID \ --description 'telegraf write' # Capture the printed token; this goes into /etc/telegraf/telegraf.conf -
Install and configure Telegraf.
sudo apt install -y telegraf# /etc/telegraf/telegraf.conf [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 flush_interval = "10s" flush_jitter = "2s" omit_hostname = false [global_tags] env = "prod" cluster = "ca-central-1" [[outputs.influxdb_v2]] urls = ["http://127.0.0.1:8086"] token = "$INFLUX_TOKEN" organization = "stackharbor" bucket = "telegraf" [[inputs.cpu]] percpu = true totalcpu = true [[inputs.mem]] [[inputs.disk]] ignore_fs = ["tmpfs", "devtmpfs", "overlay"] [[inputs.diskio]] [[inputs.net]] [[inputs.system]] [[inputs.systemd_units]] pattern = "(nginx|postgresql|telegraf|sshd)*"$INFLUX_TOKENis read from/etc/default/telegraf:# /etc/default/telegraf INFLUX_TOKEN=<write-token-from-step-3>sudo systemctl enable --now telegraf -
Wire Grafana as a datasource against InfluxDB 2. Use the Flux query language for new dashboards; InfluxQL still works but is deprecated for 2.x.
# /etc/grafana/provisioning/datasources/influxdb.yaml apiVersion: 1 datasources: - name: InfluxDB type: influxdb url: http://127.0.0.1:8086 access: proxy jsonData: version: Flux organization: stackharbor defaultBucket: telegraf tlsSkipVerify: false secureJsonData: token: <a-read-token-created-the-same-way> -
Set per-bucket retention and downsampling. A common pattern: keep
telegraf(10 s resolution) for 30 days, downsample to atelegraf_1mbucket for one year.influx bucket create --org stackharbor --name telegraf_1m --retention 8760h # A task that aggregates every 1m and writes to the long-retention bucket: influx task create -f /etc/influxdb/tasks/downsample-1m.flux --org stackharbor// /etc/influxdb/tasks/downsample-1m.flux option task = { name: "downsample-telegraf-1m", every: 1m } from(bucket: "telegraf") |> range(start: -2m, stop: -1m) |> aggregateWindow(every: 1m, fn: mean, createEmpty: false) |> to(bucket: "telegraf_1m", org: "stackharbor")
Operational notes
- A high-cardinality tag (anything per-request) causes InfluxDB’s TSI index to grow without bound; check
_internalseriesstorage_shard_disk_sizeandtsi1_series_createto catch this early. - Telegraf’s
metric_buffer_limitis per-output — exceed it and the agent starts dropping metrics with a log line; size it for your worst plausible InfluxDB outage. - InfluxDB 2 uses bcrypt for passwords and the setup
--passwordflag accepts plaintext — rotate the admin password throughinflux user passwordafter setup so the bash history does not hold it. - The retention enforcement is async — a bucket with
--retention 1hdoes not delete points exactly at 1h; expect a few minutes lag. - Grafana panels written in InfluxQL keep working but cannot use the Flux-only features (joins, pivots); document which dashboards are stuck on the old query language.
For the pull-based alternative — Prometheus on the same host — see prometheus-install-ubuntu. For a Netdata-based local agent that can push to InfluxDB, see netdata-install.
Stack Harbor runs TIG for clients whose source devices cannot host a /metrics endpoint, as part of the managed operations tier — retention buckets sized to the audit window, write tokens scoped to a bucket, and cardinality on every series watched alongside the standard health metrics.