Add Traffic Intelligence docs: Total Load Balancing, Roy Kent, Higgins Bus, Nate, Dani Rojas, Zava, and Jamie Tartt#734
Open
bmertens-datum wants to merge 8 commits into
Open
Conversation
Collaborator
Author
|
@scotwells @privateip I have another round of edits I am going to do. I think this will work better as 2 documents. |
- README.md: rewritten as concept doc — Total Load Balancing vision, signal roadmap, routing decision hierarchy, phase index - geo-phase1.md: all Phase 1 Geographic Intelligence detail — Roy Kent Project intro, roadmap, GeoDB, named IP lists, consumers, distribution architecture
- README.md -> total-load-balancing.md (concept doc, Total Load Balancing) - geo-phase1.md -> roy-kent-project.md (Phase 1 renamed to Roy Kent Project) - Remove all Phase 1 terminology; use project names throughout - Add Health signal above RTT in signal table - Expand GPU Availability to Compute Availability (GPU, CPU, DPU)
- Add higgins-bus.md: MOQT as the pub/sub transport layer for Total Load Balancing, covering track namespace design, GeoDB hybrid model (object storage for bulk + MoQ notifications/deltas), named IP list real-time distribution, relay topology via moqstream, and health signal tracks - Update roy-kent-project.md: Distribution Architecture section now names Higgins Bus as the transport, calls out named IP list and GeoDB hybrid models explicitly, and links to the new doc - Update total-load-balancing.md: add Distribution Transport section with track namespace table across all signal groups; note Higgins Bus in the signal distribution description - Both project codenames carry an internal-only disclaimer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces the Nate Project design document covering Datum Cloud's active health checking system — distributed probes, multi-protocol support, geographic vantage point selection, MOQT-based signal distribution, and dual Datum/customer use. Also wires Nate into the existing Traffic Intelligence docs: - total-load-balancing.md: Health signal row and Projects table now reference Nate; distribution track namespace table updated - higgins-bus.md: Future Projects health entries replaced with a full Nate section (publisher, consumers, object content, publish triggers, bootstrap, and metrics path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renames nate/README.md -> nate.md to be consistent with higgins-bus.md and roy-kent-project.md. Updates all cross-references in total-load-balancing.md and higgins-bus.md accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces l4-load-balancing.md documenting Cilium as Datum's L4 LB — current platform-managed state, goal to expose as customer-configurable for compute targets (UFO Compute + other customer compute) at every compute-enabled PoP. Covers configuration model (basic UI / advanced datumctl + MCP), active and passive health checks, L4 vs L7 split, and open questions. Updates total-load-balancing.md to reference Dani Rojas in the load balancing stack table and related areas. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All project documents now follow the pattern {service-name}-{codename}.md:
higgins-bus.md → signal-distribution-higgins-bus.md
roy-kent-project.md → ip-geo-roy-kent.md
nate.md → health-checks-nate.md
l4-load-balancing.md → l4-load-balancing-dani-rojas.md
envoy-routing.md → envoy-routing-zava.md
gslb-dns.md → gslb-jamie-tartt.md
All cross-references updated across all files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds documents under
enhancements/networking/traffic-intelligence/introducing the Total Load Balancing vision, the signal distribution transport, and the first set of projects under it. All project documents follow the{service-name}-{codename}.mdnaming convention. Project codenames are internal — not go-to-market product names.Documents
total-load-balancing.md
The concept document. Introduces Total Load Balancing — a network architecture where routing intelligence signals (geography, latency, ASN, congestion, sovereignty, health, risk, compute availability) move fluidly across the platform rather than being trapped in isolated appliances. Modeled on the Total Football philosophy. Covers the signal roadmap, routing decision vision, the two-layer load balancing stack (Cilium L4 + Envoy L7), project index, and the Distribution Transport section naming Higgins Bus as the signal distribution layer.
signal-distribution-higgins-bus.md
The transport design document. Defines Higgins Bus — the MOQT-based pub/sub layer that carries all Total Load Balancing signals to edge PoPs. Covers why MOQT fits (QUIC-native fan-out, relay topology, object TTLs), the full track namespace across all current projects (with future placeholders for RTT, sovereignty, model locality, and compute), the GeoDB hybrid distribution model with failure handling, named IP list real-time distribution with TTL/expiry semantics, relay infrastructure via moqstream, and protocol caveats.
ip-geo-roy-kent.md
The Roy Kent Project — geo data. The first Total Load Balancing project. Scopes to geography only — making IP-to-geo data broadly available and reusable across DNS (GSLB), Envoy (ACLs and ALB), metrics enrichment, Galactic VPC, and UFO Compute. Covers GeoDB requirements and vendor evaluation (see #732), customer-managed named IP lists, all seven consumers, and the distribution architecture using Higgins Bus as the transport. Updated to reflect Envoy Gateway 1.8.0 native GeoIP support for the geo-blocking consumer.
health-checks-nate.md
The Nate Project — active health checks. Named after Nathan Shelley, the kit man who obsessively catalogued every weakness nobody else noticed and that everyone eventually depended on.
Nate is Datum Cloud's active health checking system: distributed probes running at Datum PoPs measure availability, latency, and throughput across a wide range of protocols, then publish results as health signals on Higgins Bus so every routing and policy component has a current view of what is up, what is degraded, and what is unreachable.
Key design points:
l4-load-balancing-dani-rojas.md
The Dani Rojas Project — Layer 4 load balancing. Named after Dani Rojas — the striker who does not overthink. He just gets the ball where it needs to go. "Traffic is life!"
Documents Cilium as Datum's L4 load balancer. Today Cilium is platform-managed and not customer-configurable. The goal is to expose it as a customer-facing product at every compute-enabled PoP.
Key design points:
envoy-routing-zava.md
The Zava Project — Envoy L7 routing. Named after Zava — the misunderstood striker. Powerful, capable of things nobody else can do, and largely an enigma to everyone trying to work with him. "Avocados are misunderstood."
Maps all 19 routing capabilities Envoy needs to deliver for the Datum platform. 13 are native to Envoy (configuration only). 6 require integration with Datum systems:
gslb-jamie-tartt.md
The Jamie Tartt Project — Global Server Load Balancing. Named after Jamie Tartt, who started thinking only about himself and became the player who optimized for the whole team. "Stop thinking locally and start optimizing for the whole team."
As GSLB has matured it has become a foundation of global applications on the internet. This document covers Datum's GSLB design using PowerDNS.
Key design points:
Open questions tied to #733: runtime API update mechanism, GeoDB reload latency, TTL strategy, weighted multi-PoP answer behavior.
File naming convention
All project documents follow {service-name}-{codename}.md. The service name describes the capability; the codename is the internal project name. Codenames are Ted Lasso characters — internal references only, not go-to-market product names.
Related issues
Status
Provisional — working documents, subject to change as design and implementation planning progress.