Skip to content

Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, Dani Rojas, Zava, and Jamie Tartt#734

Open
bmertens-datum wants to merge 8 commits into
mainfrom
add/networking-traffic-intelligence
Open

Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, Dani Rojas, Zava, and Jamie Tartt#734
bmertens-datum wants to merge 8 commits into
mainfrom
add/networking-traffic-intelligence

Conversation

@bmertens-datum
Copy link
Copy Markdown
Collaborator

@bmertens-datum bmertens-datum commented May 19, 2026

Summary

Adds documents under enhancements/networking/traffic-intelligence/ introducing the Total Load Balancing vision, the signal distribution transport, and the first set of projects under it. All project documents follow the {service-name}-{codename}.md naming convention. Project codenames are internal — not go-to-market product names.

Documents

total-load-balancing.md

The concept document. Introduces Total Load Balancing — a network architecture where routing intelligence signals (geography, latency, ASN, congestion, sovereignty, health, risk, compute availability) move fluidly across the platform rather than being trapped in isolated appliances. Modeled on the Total Football philosophy. Covers the signal roadmap, routing decision vision, the two-layer load balancing stack (Cilium L4 + Envoy L7), project index, and the Distribution Transport section naming Higgins Bus as the signal distribution layer.

signal-distribution-higgins-bus.md

The transport design document. Defines Higgins Bus — the MOQT-based pub/sub layer that carries all Total Load Balancing signals to edge PoPs. Covers why MOQT fits (QUIC-native fan-out, relay topology, object TTLs), the full track namespace across all current projects (with future placeholders for RTT, sovereignty, model locality, and compute), the GeoDB hybrid distribution model with failure handling, named IP list real-time distribution with TTL/expiry semantics, relay infrastructure via moqstream, and protocol caveats.

ip-geo-roy-kent.md

The Roy Kent Project — geo data. The first Total Load Balancing project. Scopes to geography only — making IP-to-geo data broadly available and reusable across DNS (GSLB), Envoy (ACLs and ALB), metrics enrichment, Galactic VPC, and UFO Compute. Covers GeoDB requirements and vendor evaluation (see #732), customer-managed named IP lists, all seven consumers, and the distribution architecture using Higgins Bus as the transport. Updated to reflect Envoy Gateway 1.8.0 native GeoIP support for the geo-blocking consumer.

health-checks-nate.md

The Nate Project — active health checks. Named after Nathan Shelley, the kit man who obsessively catalogued every weakness nobody else noticed and that everyone eventually depended on.

Nate is Datum Cloud's active health checking system: distributed probes running at Datum PoPs measure availability, latency, and throughput across a wide range of protocols, then publish results as health signals on Higgins Bus so every routing and policy component has a current view of what is up, what is degraded, and what is unreachable.

Key design points:

  • Active only — Nate sends probes; passive health state is the responsibility of the systems that generate it
  • Distributed and geo-aware — probers run at Datum PoPs with region and ASN diversity; uses Roy Kent GeoDB for coordinate data
  • Protocol-agnostic — HTTP/HTTPS, TCP, TLS, ICMP, DNS, gRPC, UDP, SMTP, and extensible custom types
  • Dual-use — serves Datum-internal infrastructure monitoring and customer-defined health checks
  • Signal-first — Nate publishes HealthStatus objects to Higgins Bus; failover decisions live in the consuming systems

l4-load-balancing-dani-rojas.md

The Dani Rojas Project — Layer 4 load balancing. Named after Dani Rojas — the striker who does not overthink. He just gets the ball where it needs to go. "Traffic is life!"

Documents Cilium as Datum's L4 load balancer. Today Cilium is platform-managed and not customer-configurable. The goal is to expose it as a customer-facing product at every compute-enabled PoP.

Key design points:

  • Every compute-enabled PoP — L4 LB is a standard capability at any PoP running compute, not a separately provisioned add-on
  • Customer-configurable for compute targets — UFO Compute (Unikraft) and any other customer compute; the L4 to Envoy path remains platform-managed
  • Basic config in the UI — backend pools, round-robin/least-connections, active and passive health checks, source-IP affinity
  • Advanced config via datumctl and MCP — connection limits, weighted algorithms, PROXY protocol, timeout tuning, health check protocol selection

envoy-routing-zava.md

The Zava Project — Envoy L7 routing. Named after Zava — the misunderstood striker. Powerful, capable of things nobody else can do, and largely an enigma to everyone trying to work with him. "Avocados are misunderstood."

Maps all 19 routing capabilities Envoy needs to deliver for the Datum platform. 13 are native to Envoy (configuration only). 6 require integration with Datum systems:

  • Geo Aware Routing — geo authorization is native in Envoy Gateway 1.8.0 via SecurityPolicy and EnvoyProxy.spec.geoIP; geo upstream selection still requires GeoDB integration
  • Dynamic Latency Based Routing — a significant future project; Roy Kent lays the foundational infrastructure before real RTT signals can drive routing decisions
  • Traffic Shaping and Rate Limiting — local rate limiting native; platform-wide limits require an external rate limit service
  • Protocol Agnostic Routing — explicit listener and filter chain config required per protocol
  • Policy Driven Routing — ext_proc filter is the integration point for sovereignty, compliance, and cost constraints
  • Observability Hooks — native stats/logs/traces need wiring to Datum's OpenTelemetry pipeline

gslb-jamie-tartt.md

The Jamie Tartt Project — Global Server Load Balancing. Named after Jamie Tartt, who started thinking only about himself and became the player who optimized for the whole team. "Stop thinking locally and start optimizing for the whole team."

As GSLB has matured it has become a foundation of global applications on the internet. This document covers Datum's GSLB design using PowerDNS.

Key design points:

  • DNS as the outermost steering layer — operates before any connection is established; shapes traffic distribution across the full PoP fleet
  • PowerDNS native GeoIP backend — MMDB format, continent/country/region granularity, weighted responses, EDNS Client Subnet support
  • Nate-driven health failover — control plane subscribes to platform/health/pop/{pop-id} tracks on Higgins Bus and updates PowerDNS when PoPs go unhealthy
  • Roy Kent GeoDB integration — same snapshot that feeds Envoy also feeds PowerDNS; MMDB format alignment required from vendor eval (GeoDB Vendor Evaluation #732)
  • Sovereignty as a hard constraint — architecture accommodates jurisdiction rules from the start
  • EDNS Client Subnet handling — covers all four resolver scenarios including the accuracy gap for users behind public resolvers

Open questions tied to #733: runtime API update mechanism, GeoDB reload latency, TTL strategy, weighted multi-PoP answer behavior.

File naming convention

All project documents follow {service-name}-{codename}.md. The service name describes the capability; the codename is the internal project name. Codenames are Ted Lasso characters — internal references only, not go-to-market product names.

Related issues

Status

Provisional — working documents, subject to change as design and implementation planning progress.

Initial business requirements document for Traffic Intelligence Phase 1
(Geographic Intelligence). Covers GeoDB, customer-managed named IP lists,
GSLB, geo blocking, ALB, metrics enrichment, UFO Compute, and Galactic VPC
consumers. Includes Phase 1 roadmap and links to issues #732 and #733.
@bmertens-datum
Copy link
Copy Markdown
Collaborator Author

@scotwells @privateip I have another round of edits I am going to do. I think this will work better as 2 documents.

- README.md: rewritten as concept doc — Total Load Balancing vision,
  signal roadmap, routing decision hierarchy, phase index
- geo-phase1.md: all Phase 1 Geographic Intelligence detail — Roy Kent
  Project intro, roadmap, GeoDB, named IP lists, consumers, distribution
  architecture
- README.md -> total-load-balancing.md (concept doc, Total Load Balancing)
- geo-phase1.md -> roy-kent-project.md (Phase 1 renamed to Roy Kent Project)
- Remove all Phase 1 terminology; use project names throughout
- Add Health signal above RTT in signal table
- Expand GPU Availability to Compute Availability (GPU, CPU, DPU)
@bmertens-datum bmertens-datum changed the title Add Traffic Intelligence Phase 1 enhancement Add Total Load Balancing and Roy Kent Project docs May 20, 2026
bmertens-datum and others added 2 commits May 22, 2026 10:12
- Add higgins-bus.md: MOQT as the pub/sub transport layer for Total Load
  Balancing, covering track namespace design, GeoDB hybrid model (object
  storage for bulk + MoQ notifications/deltas), named IP list real-time
  distribution, relay topology via moqstream, and health signal tracks
- Update roy-kent-project.md: Distribution Architecture section now names
  Higgins Bus as the transport, calls out named IP list and GeoDB hybrid
  models explicitly, and links to the new doc
- Update total-load-balancing.md: add Distribution Transport section with
  track namespace table across all signal groups; note Higgins Bus in the
  signal distribution description
- Both project codenames carry an internal-only disclaimer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces the Nate Project design document covering Datum Cloud's
active health checking system — distributed probes, multi-protocol
support, geographic vantage point selection, MOQT-based signal
distribution, and dual Datum/customer use.

Also wires Nate into the existing Traffic Intelligence docs:
- total-load-balancing.md: Health signal row and Projects table now
  reference Nate; distribution track namespace table updated
- higgins-bus.md: Future Projects health entries replaced with a full
  Nate section (publisher, consumers, object content, publish triggers,
  bootstrap, and metrics path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bmertens-datum bmertens-datum changed the title Add Total Load Balancing and Roy Kent Project docs Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, and Nate May 24, 2026
bmertens-datum and others added 2 commits May 25, 2026 06:57
Renames nate/README.md -> nate.md to be consistent with
higgins-bus.md and roy-kent-project.md. Updates all cross-references
in total-load-balancing.md and higgins-bus.md accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces l4-load-balancing.md documenting Cilium as Datum's L4 LB —
current platform-managed state, goal to expose as customer-configurable
for compute targets (UFO Compute + other customer compute) at every
compute-enabled PoP. Covers configuration model (basic UI / advanced
datumctl + MCP), active and passive health checks, L4 vs L7 split, and
open questions.

Updates total-load-balancing.md to reference Dani Rojas in the load
balancing stack table and related areas.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bmertens-datum bmertens-datum changed the title Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, and Nate Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, and Dani Rojas May 26, 2026
All project documents now follow the pattern {service-name}-{codename}.md:
  higgins-bus.md           → signal-distribution-higgins-bus.md
  roy-kent-project.md      → ip-geo-roy-kent.md
  nate.md                  → health-checks-nate.md
  l4-load-balancing.md     → l4-load-balancing-dani-rojas.md
  envoy-routing.md         → envoy-routing-zava.md
  gslb-dns.md              → gslb-jamie-tartt.md

All cross-references updated across all files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bmertens-datum bmertens-datum requested review from ecv and kevwilliams May 26, 2026 19:27
@bmertens-datum bmertens-datum changed the title Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, and Dani Rojas Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, Dani Rojas, Zava, and Jamie Tartt May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants