Architecture Overview¶
Pleiades is a DNS-based Global Server Load Balancer (GSLB). It answers A/AAAA queries and selects backend IPs using a health-aware, client-IP-aware load balancer. Configuration is stored in SQLite and managed via a REST API. Runtime state (health, membership, config) is synchronized across geo-distributed nodes via NATS + JetStream. GitOps enables declarative configuration from a Git repository. Metrics are exposed for Prometheus. Licensing enforces request rate limits.
Core components
DNS Server (internal/dns)
- Listens for DNS queries for a configured domain.
- Extracts the DNS client's IP from the UDP remote address.
- Service-aware resolution: on each query, the DNS server first looks up the queried name in the services table via GetServiceByDomain. If a matching service has a pool assigned, members of that pool are used (enabled members only, round-robin per service). Falls back to the static load balancer for domains not found in the DB.
- Passes the client IP to Balancer.GetNextIPv4For/GetNextIPv6For for client-aware fallback routing.
- Enforces licensing limits before answering.
- Emits Prometheus metrics.
- NewServer accepts a *storage.DB (may be nil; DB resolution is skipped when nil).
TUI (cmd/gslbctl, internal/tui)
- Standalone binary; connects directly to the SQLite database (no API server required).
- Built with Bubble Tea, Lip Gloss, and Bubbles.
- Tab navigation between Pools and Services.
- Pool view: list, create, rename, delete; drill into pool for Members, Health Check, Geo Rules sub-screens.
- Service view: list (with assigned pool column), create, edit (pre-filled), delete; pool assignment uses a dedicated picker screen showing unassigned pools first (A–Z) then already-assigned pools (A–Z with annotation).
- Launch with: ./gslbctl -db /var/lib/gslbd/gslbd.db
Load Balancer (internal/loadbalancer)
- Algorithm interface: Next() net.IP for stateless selection.
- ClientAwareAlgorithm interface: SortedCandidatesFor(clientIP net.IP) []net.IP for client-IP-aware selection.
- RoundRobin: equal distribution; implements Algorithm.
- WeightedRoundRobin: smooth weighted distribution; implements Algorithm.
- GeoIPAlgorithm: sorts candidates by haversine great-circle distance from the client IP. Uses MaxMind GeoLite2 (.mmdb) for geolocation. Supports manual EndpointLocations overrides for private/DC IPs not in the DB. Implements ClientAwareAlgorithm and io.Closer.
- MapFileAlgorithm: CIDR prefix matching; matched endpoint sorts first. Supports hot-reload via UpdateRules. Implements ClientAwareAlgorithm.
- Balancer: wraps an algorithm, integrates with a health provider, and exposes GetNextIPv4For/GetNextIPv6For. When the algorithm implements ClientAwareAlgorithm, candidates are iterated in preference order. Member mutations from the REST API are reflected immediately in the running balancer.
Storage (internal/storage)
- Pure-Go SQLite via modernc.org/sqlite (no CGO required).
- WAL mode, foreign keys enabled, cascade deletes.
- Schema: pools, members (cascade from pool), services (SET NULL pool_id on pool delete), health_checks (UNIQUE per pool, cascade), geo_rules (cascade from pool).
- Open(path) runs migrate() to create/upgrade schema on startup.
- All IDs are 32-char hex strings from crypto/rand.
- ErrNotFound sentinel for missing rows.
- GetServiceByDomain(ctx, domain): case-insensitive, trailing-dot-tolerant lookup used by the DNS server.
REST API (internal/api)
- Go 1.22 http.NewServeMux() with {id} path patterns.
- 18 routes covering full CRUD for pools, members, services, health checks, and geo rules.
- Input validation: IP addresses (net.ParseIP), port range (1–65535), CIDR blocks (net.ParseCIDR).
- Member create/update/delete mutations are reflected immediately in the running Balancer.
- Empty collections return [] (not null) for JSON compatibility.
- See docs/API.md for full route reference.
Health Checker (internal/health)
- Active health checks (TCP or HTTP/HTTPS) at intervals with timeouts.
- Thread-safe status store; exposes IsHealthy(net.IP).
GitOps Controller (internal/gitops)
- Polls a Git repo at intervals.
- Enforces GPG-signed commit policy.
- Validates and applies config (endpoints, health settings) with safe rollback (last-good retained in-process).
State Sync (internal/state)
- NATS connection + JetStream KV buckets for health and membership.
- Publishes local health and heartbeats; subscribes to global events and KV snapshot.
- Maintains GlobalHealthView and provides composite health policy (prefer-local, local-only, global-any-healthy, global-quorum).
- Config sync (configsync_nats.go): fully implemented.
- PublishCluster: writes YAML + metadata to KV and publishes to JetStream stream with Pleiades-Version, Pleiades-Commit, Content-Type headers.
- WatchCluster: watches KV for changes; nil entry signals initial values done; delete ops emit type:"delete" events; buffered channel of 16.
- SnapshotCluster: KV Get; returns nil YAML (not error) if key not found.
- EnsureConfigStream / EnsureConfigKVBucket: eagerly create stream and KV bucket on startup.
Metrics (internal/metrics)
- Prometheus endpoint and collectors. All metrics carry cluster and node labels when configured.
Licensing (internal/licensing)
- Validates HMAC-SHA256 signed license tokens and enforces RPS limits in DNS path.
- Secret loaded from PLEIADES_LICENSE_SECRET env var or config file.
Data flows
DNS query path
1. Request enters DNS server; client IP extracted from UDP remote address.
2. Licensing check for RPS.
3. DB lookup: GetServiceByDomain(qname) — if a service with an assigned pool is found, enabled members of that pool are used (round-robin per service, per IP family).
4. If no DB match, Balancer selects next healthy IP from the static endpoint set: if algorithm is ClientAwareAlgorithm, SortedCandidatesFor(clientIP) is called and candidates iterated by preference order, filtered by IP family and health.
5. Response constructed as A/AAAA answer.
Health checking 1. Checker loops across endpoints at configured interval. 2. Performs TCP or HTTP(S) probe. 3. Stores last-known status in memory.
REST API → Balancer live update
1. API handler calls db.CreateMember / db.UpdateMember / db.DeleteMember.
2. Handler immediately calls balancer.AddEndpoint, SetWeight, or RemoveEndpoint.
3. Next DNS query picks up the change without restart.
GitOps reconciliation 1. Fetch repo; verify GPG signature. 2. Parse YAML; validate configuration. 3. Apply: replace endpoint set; recreate checker if settings changed.
State synchronization
1. Publisher emits local health and heartbeats; writes KV with TTL.
2. Subscriber snapshots KV (health + membership) then subscribes to subjects.
3. GlobalHealthView merges per-node reports and membership; provider policy drives decisions.
UML
- See diagrams in docs/diagrams/:
- GSLB_Components.puml (component diagram)
- GSLB_Classes.puml (class diagram)
- seq_DNS_Query.puml (sequence: DNS resolution)
- seq_Health_Checks.puml (sequence: health loop)
- seq_GitOps_Reconcile.puml (sequence: GitOps apply)
- seq_State_Sync.puml (sequence: state sync and quorum)
Operations at a glance
- Configure via YAML (local file or fetched by GitOps).
- Manage pools/members/services at runtime via REST API; changes take effect immediately.
- Optional NATS settings enable global state sync and config distribution.
- Prometheus metrics served at /metrics when enabled.
- Secure-by-default HTTP health checks (TLS verification on unless explicitly disabled).