State Synchronization Setup (NATS + JetStream)¶
This guide walks you through enabling global state synchronization for health and membership across multiple gslbd instances using NATS with JetStream.
Overview
- Each node publishes local health and heartbeats to NATS subjects and KV buckets.
- Each node subscribes to the cluster subjects and KV snapshots to maintain a global view (GlobalHealthView).
- The load balancer uses a composite provider to merge local and global health according to policy.
When to enable - You run multiple GSLB nodes across regions and want them to share health state. - You want quorum-based decisions or fast propagation of health changes.
Prerequisites
- NATS 2.x with JetStream enabled.
- TLS/mTLS credentials for clients (recommended).
- A cluster ID configured in gslbd.
Recommended topology (super-cluster)
- 3+ NATS servers per region (odd number for availability).
- Connect regions via NATS gateways to form a super-cluster.
- Enable JetStream in each cluster; optionally set a JetStream domain (e.g., gslb).
Minimal NATS server (single region) example
# nats-server.conf (excerpt)
jetstream: {
domain: gslb
}
server_name: n1
port: 4222
cluster: {
name: gslb
port: 6222
}
# TLS, accounts, and gateways omitted for brevity; see NATS docs
Configure gslbd
Add to /etc/gslb/config.yaml:
cluster:
id: "prod-global"
state:
healthPolicy: "prefer-local" # or global-any-healthy | global-quorum
quorumMinPercent: 51 # used for global-quorum
heartbeatInterval: "10s"
heartbeatTTL: "30s"
nats:
servers: ["nats://n1.example.com:4222","nats://n2.example.com:4222","nats://n3.example.com:4222"]
tls:
caFile: "/etc/gslb/pki/ca.crt"
certFile: "/etc/gslb/pki/client.crt"
keyFile: "/etc/gslb/pki/client.key"
jetStream:
domain: "gslb"
Optional: Enable Configuration Sync (JetStream)
Configuration Sync is Optional
Configuration Sync via JetStream is independent of health/membership state sync. You can run health sync without config sync and vice versa.
If you want nodes to fetch and watch the full desired configuration as YAML via JetStream, add:
state:
enableConfigSync: true
config:
mode: "jetstream"
stream: "PLEIADES.cfg"
subjectPrefix: "pleiades.cfg"
kvBucket: "PLEIADES_CFG"
applyTimeout: "5s"
Policies explained - prefer-local (default): local result is authoritative; global used mainly for hints. - local-only: ignore global state (useful during incidents or testing). - global-any-healthy: consider healthy if any site reports healthy. - global-quorum: healthy only if a percent of active members report healthy within staleness TTL.
How quorum works
- Each node publishes periodic heartbeats.
- Active membership = nodes whose latest heartbeat is within heartbeatTTL.
- For an endpoint, only reports from active members within heartbeatTTL are counted.
- quorumMinPercent computes healthy threshold: healthy*100/total >= quorumMinPercent.
Verifying connectivity and state
- Metrics to watch:
- gslbd_state_nats_connected (1 when connected)
- gslbd_state_active_members
- gslbd_state_nats_{published,received}_total{type}
- gslbd_state_kv_{put,get}_total{bucket,result}
- When Configuration Sync is enabled, logs will include messages like: config snapshot loaded and config event: cluster=... version=....
- Logs will show connect errors or KV bucket creation errors.
Failure modes
- If NATS is unreachable: gslbd continues with local health only (policy prefer-local).
- If clocks skew: events include timestamps; quorum logic tolerates within TTL but keep NTP enabled.
- If membership is zero: no quorum can be established; provider falls back to local (for global-quorum).
Security
- Use mTLS for client/server auth or NATS accounts/JWT with TLS.
- Restrict subjects and KV permissions to the cluster namespace (e.g., gslb.<cluster>.*).
- Rotate client certs periodically; see Security Guide.