Skip to content

Metrics (Prometheus)

Pleiades exposes Prometheus-compatible metrics at a configurable HTTP endpoint when enabled. All metrics include constant labels cluster and node when these are set in the config.

Enable endpoint

metrics:
  enablePrometheus: true
  listenAddr: "0.0.0.0"
  port: 9090
- Metrics are available at http://<listenAddr>:<port>/metrics.

Metric catalog (namespace: gslbd) - dns - gslbd_dns_requests_total - gitops - gslbd_gitops_fetch_total{result} - gslbd_gitops_verify_total{result} - gslbd_gitops_apply_total{result} - gslbd_gitops_last_apply_info{sha,signer} value 1 for the last applied commit - state (NATS/JetStream) - gslbd_state_nats_connected (0/1) - gslbd_state_nats_published_total{type} - gslbd_state_nats_received_total{type} - gslbd_state_kv_put_total{bucket,result} - gslbd_state_kv_get_total{bucket,result} - gslbd_state_merge_lag_ms (histogram) - gslbd_state_active_members (gauge) - health - gslbd_health_endpoints_total{family} - gslbd_health_endpoints_healthy{family} - dnssec (emitted only when dnssec.enabled: true) - gslbd_dnssec_sign_latency_seconds (histogram) — per-RRset signing latency; target p99 < 1 ms - gslbd_dnssec_key_days_remaining{key} (gauge) — days until KSK/ZSK expiry; label value is ksk or zsk - gslbd_dnssec_response_bytes (histogram) — signed response size in bytes; responses > 1232 bytes trigger TC=1 truncation

Scrape example

scrape_configs:
  - job_name: 'gslbd'
    scrape_interval: 15s
    static_configs:
      - targets: ['gslbd-hostname:9090']

Sample alerts

groups:
- name: gslbd
  rules:
  - alert: GslbdNATSDisconnected
    expr: gslbd_state_nats_connected == 0
    for: 2m
  - alert: GslbdMergeLagHigh
    expr: histogram_quantile(0.95, sum(rate(gslbd_state_merge_lag_ms_bucket[5m])) by (le)) > 2000
    for: 5m
  - alert: GslbdActiveMembersZero
    expr: gslbd_state_active_members == 0
    for: 5m
  - alert: GslbdDNSSECKeyExpiryWarning
    expr: gslbd_dnssec_key_days_remaining < 30
    for: 1h
    labels:
      severity: warning
  - alert: GslbdDNSSECKeyExpiryCritical
    expr: gslbd_dnssec_key_days_remaining < 7
    for: 1h
    labels:
      severity: critical

Implementation references - internal/metrics/* for registerers, collectors, and HTTP server. - Metrics are registered with const labels via metrics.InitLabels(clusterID, nodeID) in main.