Metrics and Observability¶
This guide explains how to expose Prometheus metrics, what metrics exist, and how to build dashboards and alerts for Pleiades GSLB.
Enable Prometheus endpoint
Add to /etc/gslb/config.yaml:
http://<listenAddr>:<port>/metrics and exports Prometheus text format.
Constant labels
- All metrics include cluster and node constant labels when cluster.id and/or node.id are set.
Key metrics (namespace: gslbd)
- DNS
- gslbd_dns_requests_total
- GitOps
- gslbd_gitops_fetch_total{result}
- gslbd_gitops_verify_total{result}
- gslbd_gitops_apply_total{result}
- gslbd_gitops_last_apply_info{sha,signer} value 1 for last applied commit
- State sync (NATS/JetStream)
- gslbd_state_nats_connected (0/1)
- gslbd_state_nats_published_total{type}
- gslbd_state_nats_received_total{type}
- gslbd_state_kv_put_total{bucket,result}
- gslbd_state_kv_get_total{bucket,result}
- gslbd_state_merge_lag_ms (histogram)
- gslbd_state_active_members (gauge)
- Health
- gslbd_health_endpoints_total{family}
- gslbd_health_endpoints_healthy{family}
Scrape configuration example
scrape_configs:
- job_name: 'gslbd'
scrape_interval: 15s
static_configs:
- targets: ['gslbd-hostname:9090']
Grafana dashboard
- Suggested panels:
- DNS RPS: rate of gslbd_dns_requests_total.
- NATS connectivity: gslbd_state_nats_connected.
- Merge lag p95: histogram_quantile(0.95, sum(rate(gslbd_state_merge_lag_ms_bucket[5m])) by (le,cluster,node)).
- Health totals and healthy by family: gauges/graphs from gslbd_health_*.
- Active members over time: gslbd_state_active_members.
Example alerts
groups:
- name: gslbd
rules:
- alert: GslbdNATSDisconnected
expr: gslbd_state_nats_connected == 0
for: 2m
labels:
severity: warning
annotations:
summary: NATS connection down
- alert: GslbdMergeLagHigh
expr: histogram_quantile(0.95, sum(rate(gslbd_state_merge_lag_ms_bucket[5m])) by (le)) > 2000
for: 5m
labels:
severity: warning
annotations:
summary: High merge lag (p95)
- alert: GslbdActiveMembersZero
expr: gslbd_state_active_members == 0
for: 5m
labels:
severity: critical
annotations:
summary: No active members detected
Troubleshooting
- No metrics visible: ensure the server log shows the metrics server started; verify port openness and any firewall.
- Missing labels: ensure cluster.id and node.id are set; metrics include them as constant labels.