Skip to content

Proposal: Multi-cluster routing via URL path #132

@BorisTyshkevich

Description

@BorisTyshkevich

Proposal: Multi-cluster routing via URL path

Context

Today one altinity-mcp process serves exactly one ClickHouse cluster. The
upstream CH endpoint is fixed at startup via clickhouse.host /
clickhouse.port and every tool invocation forwards to that endpoint. To
serve N clusters we deploy N copies of altinity-mcp with N distinct helm
releases.

This proposal lets one altinity-mcp process serve N ClickHouse clusters by
routing on URL path — the path identifies which cluster the request
targets. All other ClickHouse configuration (port, TLS, mode, view/table
regexes, limits, …) is shared across clusters: only the hostname differs,
and hostnames are derived from the cluster name via a single template
substitution.

The motivating shape is the Altinity Cloud operator naming convention, e.g.
cluster otel → service chi-otel-otel-0-0. A template like
chi-{cluster}-{cluster}-0-0.demo expands at request time.

Non-goals

Explicitly out of scope; revisit in v2:

  • Per-cluster config blocks. All clusters share one clickhouse: block.
    If two clusters need different ports, modes, regexes, etc., that's v2.
  • Cluster discovery tool (list_clusters etc.). Users will configure
    multiple MCP connectors in Claude manually — one URL per cluster.
  • JWE × multi-cluster coexistence. JWE uses path-based token routing
    (/{token}) that conflicts with cluster-name path routing. Config load
    rejects the combination multicluster.path_regex != "" && jwe.enabled.
  • Local JWT validation on the MCP request path. Today's injector
    deliberately doesn't validate (the CH-side sidecar is the validator).
    v1.1 doesn't change that; the catalog cache key is the SHA-256 of the
    raw bearer (see "Cache key" below), explicitly not an identity assertion.
  • Helm chart restructuring. Only the host: field's semantics expand to
    allow a {cluster} placeholder. Existing values files keep working
    unchanged.
  • Per-cluster OpenAPI. OpenAPIHandler / ServeOpenAPISchema read the
    wrapper server's global s.dynamicTools map, populated by a once-gated
    EnsureDynamicTools. In multi-cluster mode that global would be filled
    by whichever cluster's user arrived first, and every subsequent OpenAPI
    request — across all clusters and users — would see the same frozen
    catalog. CHConfigFromContext would route the underlying query
    correctly but the schema itself would leak across tenants. Refactoring
    OpenAPI onto the per-request catalog cache is meaningful work that
    isn't on the critical path for the multi-cluster motivation. v1
    disables OpenAPI in multi-cluster mode at config load; v2 lands the
    refactor.

Design overview

One process, one HTTP server, one global ClickHouseConfig, one new config
directive (multicluster.path_regex) that flips the process into
multi-cluster mode.

In multi-cluster mode, an HTTP middleware:

  1. Matches the URL against the cluster regex, extracts the cluster name.
  2. Validates the cluster name against the DNS-label regex and optional
    allowlist.
  3. Expands {cluster} in clickhouse.host to produce reqCfg, injects
    (cluster, reqCfg) into context, and chains to the existing
    authInjector and serverInjector so the bearer and wrapper server
    land on context too.

Then the SDK's getServer(*http.Request) callback fires per request and:

  1. Reads the bearer from context (set by authInjector). If empty,
    returns a static-tools-only server — multi-cluster mode requires OAuth
    so authInjector will normally have already 401'd, but the guard
    prevents an unauthenticated request from ever colliding on the
    empty-bearer cache slot.
  2. Derives cacheKey = sha256(bearer) and best-effort exp via
    unverified JWT parse.
  3. Calls cache.GetOrDiscover(...) keyed on (cacheKey, cluster); on
    miss, runs full discovery under singleflight.
  4. Constructs a fresh *mcp.Server with this user's static + dynamic
    tools, resources, and prompts registered, and returns it.

The SDK then owns everything else — transport guardrails, initialize,
ping, tools/list, tools/call, framing. No custom JSON-RPC dispatcher.

Because each URL path scopes a request to exactly one cluster, the MCP
server appears to clients as N independent MCP servers
sharing one process.
Each endpoint has its own tools/list containing the static tools
(execute_query, write_query) plus the views/tables visible to that user
on that cluster.

URL layout

Multi-cluster mode introduces two coupled config fields:

  • mount_prefix (default /mcp/) — the literal path prefix where the
    cluster middleware mounts on the outer mux. Must be a literal string, not
    a regex.
  • path_regex (default ^/mcp/(?P<cluster>[^/]+)(?P<rest>/.*)?$) — applied
    to requests reaching the mount. Must contain a cluster named group;
    may contain an optional rest named group that captures the suffix used
    to dispatch MCP transport vs OpenAPI within the cluster subtree.

The two are not derived from each other (deriving a mount prefix from an
arbitrary regex is fragile). Config-load validates that path_regex
matches paths that start with mount_prefix.

Pod-internal: the pod sees /mcp/{cluster} for the MCP transport.
OpenAPI is disabled in multi-cluster mode (see non-goals); there is no
per-cluster OpenAPI surface in v1.

Ingress: the external URL is https://mcp.host/mcp/{cluster} with
passthrough. Operators who want a different layout override both fields
consistently.

Path separation between system endpoints and cluster names is enforced
twice — once by mount_prefix (system endpoints live outside it), and
once by the cluster-name validator's leading-dot exclusion (see "Validation
at config load" below). Belt and suspenders: even under an aggressive
mount_prefix: /, .health, .well-known, .livez all fail the
DNS-label regex's first-character class.

Default mux layout (multi-cluster mode):

1. /health, /livez                           (operational, exact-match)
2. /jwe-token-generator                      (registered only when JWE enabled;
                                              incompatible with MC at config-load)
3. /.well-known/...                          (OAuth + MCP discovery, RFC-mandated path)
4. /oauth/...                                (OAuth endpoints; advertised in AS metadata,
                                              cannot be moved)
5. /mcp/{cluster}                            (MCP transport per cluster)

OpenAPI routes (/openapi/...) are not registered in multi-cluster mode —
config load rejects the combination. Routes 1–4 register with
exact-or-prefix patterns on the outer mux; the cluster matcher mounts on
the /mcp/ subtree. http.ServeMux longest-prefix semantics already place
exact-match routes ahead of subtree handlers, so the default layout has no
overlap. Operators choosing a non-default path_regex get the same
protection via the cluster-name regex.

Config schema (additive only)

# Existing ClickHouseConfig — unchanged shape; `host` gains optional {cluster}.
clickhouse:
  host: "chi-{cluster}-{cluster}-0-0.demo"   # {cluster} → URL-derived name
  port: 8443
  protocol: https
  view_regexp: "^v_.*"
  table_regexp: "^t_.*"
  # … all existing fields unchanged

# New top-level section. Presence of path_regex enables multi-cluster mode.
multicluster:
  # Literal mount point on the outer mux. The cluster middleware handles
  # every request under this prefix.
  mount_prefix: "/mcp/"
  # Regex applied to incoming paths reaching the mount. MUST contain a
  # "cluster" named group. v1 matches only the MCP transport path (no
  # trailing suffix); per-cluster OpenAPI is deferred to v2.
  path_regex: "^/mcp/(?P<cluster>[^/]+)/?$"
  # Cluster-name validation. Default is a strict RFC 1123 DNS label.
  cluster_name_regex: "^[a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?$"
  # Optional explicit allowlist; if non-empty, only listed names are accepted.
  cluster_allowlist: []                          # e.g. ["otel", "antalya"]
  catalog_cache_max: 10000                       # hard cap on cache entries
  catalog_ttl_fallback: 15m                      # used when JWT exp is unknown

Path dispatch within the cluster subtree:

match path_regex → cluster
  on match → SDK StreamableHTTPHandler with reqCfg injected on ctx
  else     → 404

Validation at config load:

  • PathRegex non-empty → compiles, contains named group cluster.
  • MountPrefix non-empty → starts with /, ends with /, contains no
    regex metachars. Defaults to /mcp/ if PathRegex is set but
    MountPrefix is empty.
  • PathRegex must accept at least one path of the form
    MountPrefix + "<dns-label>"; checked by matching a synthetic sample at
    startup. Catches the "operator changed one but not the other" footgun.
  • PathRegex non-empty AND Server.JWE.Enabled → startup error. (JWE owns
    /jwe-token-generator and the /{token}/... path layout, both
    incompatible with cluster-name path routing.)
  • PathRegex non-empty AND Server.OAuth.Enabled == false → startup
    error. Multi-cluster mode requires per-request credentials; without
    OAuth there's no per-user identity for cache keying and the shared
    cfg.ClickHouse cannot meaningfully apply across a templated host.
  • PathRegex non-empty AND Server.OpenAPI.Enabled → startup error.
    See the OpenAPI non-goal above; v1 refuses to start with both on.
  • ClusterNameRegex compiles (default applied if empty). The default —
    ^[a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?$ — already forbids a leading dot,
    which is the load-bearing property: cluster names cannot collide with
    /.well-known/*, /.health, or any other dot-prefixed system path
    regardless of the mount.
  • CatalogCacheMax defaults to 10000, min 100.
  • CatalogTTLFallback defaults to 15m, range [1m, 24h].
  • clickhouse.host containing {cluster} in single-cluster mode is logged
    as a config warning (works literally, almost certainly a misconfiguration).

Request lifecycle (multi-cluster mode)

HTTP request → outer mux
  │
  ├─ /health, /livez, /oauth/*, /.well-known/*   → operational handlers
  └─ /mcp/{cluster}[...] (under mount_prefix)    → cluster middleware
        │
        1. Apply multicluster.path_regex; extract cluster.
        │    no match → 404
        │
        2. Validate cluster name against cluster_name_regex + allowlist.
        │    fail → 404 "unknown cluster"
        │
        3. Build per-request ClickHouseConfig and inject (cluster, reqCfg)
        │   into ctx:
        │     reqCfg := cfg.ClickHouse
        │     reqCfg.Host = strings.ReplaceAll(... "{cluster}" ... cluster)
        │
        4. Existing auth injector runs (extracts bearer/JWE, no validation).
        │   Multi-cluster mode requires OAuth, so a missing bearer here is
        │   normally rejected as 401 by authInjector before reaching the SDK.
        │
        5. Existing serverInjector runs (puts *ClickHouseJWEServer on ctx).
        │   Preserves the CHJWEServerKey contract — static and dynamic
        │   tool handlers read it via GetClickHouseJWEServerFromContext.
        │
        6. Hand off to SDK StreamableHTTPHandler.
        │
        SDK path: SDK calls getServer(r) → buildServerForRequest(r):
              read bearer from ctx (set by authInjector in step 4)
              if bearer == "" → return static-only server  (defensive guard)
              compute cacheKey = sha256(bearer); exp = best-effort JWT exp
              load-or-discover catalog from cache (key = cacheKey \x00 cluster)
              construct fresh *mcp.Server with HasTools/HasResources/HasPrompts
              register static tools, resources, prompts
              register dynamic tools from the catalog
              return server
        SDK then owns initialize/tools/list/tools/call/ping/...

Cache key (token-bound)

The MCP auth injector intentionally doesn't validate inbound bearers — the
CH-side ch-jwt-verify sidecar is the validator. So we cannot derive a
trust-grade identity at the MCP request boundary.

v1.1 keys the catalog cache on the raw bearer itself, not on claims:

func cacheKey(bearer string) string {
    return "tok:" + sha256_hex(bearer)
}

// Best-effort exp extraction for TTL only. Not authoritative.
func bearerExp(bearer string) (time.Time, bool) {
    if claims, ok := parseJWTUnverified(bearer); ok {
        if e, ok := claims["exp"].(float64); ok && e > 0 {
            return time.Unix(int64(e), 0), true
        }
    }
    return time.Time{}, false
}

Why token-bound, not claim-bound:

  • Claim-based hashing (sha256(iss\x00sub\x00aud)) lets an attacker who
    forges a JWT with another user's iss/sub/aud hit that user's cached
    catalog on tools/list, disclosing tool names and descriptions — which
    in this codebase are derived from view/table names and may carry sensitive
    schema metadata. tools/call would fail later at ClickHouse, but the
    catalog has already leaked.
  • Token-based hashing eliminates the attack: only the legitimate
    token-holder can produce a bearer that hashes to a given cache key. An
    attacker presenting a different token gets a different key and runs full
    discovery against ClickHouse, which then rejects them.
  • Cost: rotation invalidates the cache (one extra discovery on first call
    with a new token). Acceptable.

exp is still extracted via unverified parse for TTL calculation only —
this is not a security boundary; the worst that can happen is a too-short
or too-long TTL (clamped by catalog_ttl_fallback).

The cache key is safe to log (opaque hash of an opaque secret).

Catalog cache

tools/call cannot afford to re-run discovery on every invocation — for
write-enabled setups that costs N+1 ClickHouse round-trips. Discovery on
tools/list is acceptable; tools/list is rare per session and clients
expect it slow.

type catalogEntry struct {
    Tools     map[string]DynamicToolMeta
    ExpiresAt time.Time
}

type catalogCache struct {
    mu      sync.RWMutex
    entries map[string]catalogEntry  // key: cacheKey(bearer) + "\x00" + cluster
    max     int
    sf      singleflight.Group       // collapses concurrent discoveries
}

Discipline:

  • Load-on-build. buildServerForRequest calls cache.GetOrDiscover(...)
    which returns the cached catalog on hit, or runs DiscoverTools(...) on
    miss under singleflight.Do(cacheKey + "\x00" + cluster, …). The first
    concurrent request triggers discovery; the rest wait and share the result.
  • TTL. ExpiresAt = min(jwtExp, now + catalog_ttl_fallback). If JWT
    exp is unknown (opaque token or unparseable JWT), ExpiresAt = now + catalog_ttl_fallback.
  • No sliding TTL, no LRU. Hard expiry only. A janitor goroutine sweeps
    expired entries once per minute.
  • Hard cap. When inserting would exceed max, sweep first; if still
    full, drop the insert and log at warn. Bounds worst-case memory at
    O(max × avg_tools_per_user).
  • Per-pod, in-memory. No Redis. Pod restart loses the cache; the next
    request rebuilds it.

tools/list is not specially handled — it reads from the per-request
*mcp.Server which was built from the cache. If the operator wants
tools/list to always rediscover, they can lower catalog_ttl_fallback;
that's the tuning knob.

What this is not: a session abstraction. There's no Session type, no
per-connection state. The cache is a TTL-keyed memo of discovery output.

Why per-request *mcp.Server

The SDK 1.6.1's NewStreamableHTTPHandler accepts
getServer func(*http.Request) *Server — it's invoked per incoming request.
Server.AddTool mutates the server's tool map, and the server's
tools/list/tools/call handlers read that map. So a fresh server per
request, populated with that user's catalog, gives us per-(user, cluster)
tools using only documented SDK APIs.

Cost per request:

  • One mcp.NewServer with capability flags matching today's server:
    HasTools: true, HasResources: true, HasPrompts: true. Resources and
    prompts are part of the existing server's advertised surface (see
    pkg/server/server.go:54–82) and must remain in multi-cluster mode —
    scoping them out would be a behavior change to existing clients.
  • N AddTool(...) calls plus the existing resource/prompt registrations
    (RegisterResources, RegisterPrompts). These are config-static, so
    the same call set runs every request.
  • Static tools: 1–2 AddTool calls (execute_query, optionally
    write_query).

The cache reuses the catalog itself across requests; only the server-and-tool
wiring is rebuilt. For typical catalogs (≤50 tools) this is microseconds.

Discovery timing (intentional)

getServer runs on every request, including initialize, ping, and
notifications — not just tools/list and tools/call. With Stateless: true, there is no other request hook. This means:

  • On a cold cache, the very first request from a new (token, cluster) pair
    triggers full discovery, regardless of which JSON-RPC method it carries.
  • Subsequent requests in the same TTL window hit the cache and skip CH.

This is intentional, not a bug. A typical client opens a connection with
initialize and then quickly calls tools/list — both would have needed
discovery anyway, and doing it on initialize means tools/list is
already warm. The alternative (peek the body to skip discovery on
non-tool methods) would reintroduce body-sniffing complexity for marginal
benefit. If DiscoverTools fails (e.g. CH unreachable, missing creds),
buildServerForRequest falls back to a static-tools-only server so
initialize still succeeds and the client can surface the error via the
first tool call.

Inherited from the SDK for free

  • DNS-rebinding Host check (streamable.go:258–265).
  • Origin / CSRF check via CrossOriginProtection (streamable.go:267–273).
  • Content-Type enforcement (streamable.go:276–278).
  • Accept negotiation (streamable.go:283–293).
  • HTTP method allowlist with Allow headers (streamable.go:334–357).
  • MCP-Protocol-Version validation (streamable.go:391–397).
  • Session-ID handling and user-mismatch detection (streamable.go:295–317).
  • Full JSON-RPC framing for tool calls, including batched requests.

None of this needs to be re-implemented. The previous v1 design's rpcmux
package is dropped.

In-RAM session structure (v1.1, final)

There is no per-session structure. Stateless: true on the transport stays.
The "session" concept is:

  1. Request context carries cluster (string) and reqCfg
    (ClickHouseConfig) — alongside the existing CHJWEServerKey and
    JWE/OAuth token keys. Lives one HTTP request.
  2. Catalog cache holds discovery output keyed by (cacheKey, cluster)
    with hard expiry. A TTL-keyed memo of tools/list-equivalent output.
    The cache key is computed inside getServer from the bearer that
    authInjector already put on context; it is not a context value
    of its own.

No Session type, no per-connection state, no cross-pod replication.

Code-level changes

pkg/config/config.go

  • New MulticlusterConfig:
    type MulticlusterConfig struct {
        MountPrefix         string        `yaml:"mount_prefix"`          // default /mcp/
        PathRegex           string        `yaml:"path_regex"`
        ClusterNameRegex    string        `yaml:"cluster_name_regex"`    // default RFC 1123
        ClusterAllowlist    []string      `yaml:"cluster_allowlist"`
        CatalogCacheMax     int           `yaml:"catalog_cache_max"`     // default 10000
        CatalogTTLFallback  time.Duration `yaml:"catalog_ttl_fallback"`  // default 15m
    }
  • Add Multicluster MulticlusterConfig to Config.
  • Validation enforces all the rules in "Validation at config load" above,
    including the synthetic-sample match between MountPrefix and PathRegex.

Package placement

The cluster router and catalog cache live inside pkg/server, not in a
new subpackage. Reason: dynamicToolMeta and the dynamic-tool registration
helpers are unexported (server_dynamic_tools.go:28 and the
s.registerDynamicTools method), and a subpackage cannot reference them.
Exporting them just to allow a subpackage to consume them would widen the
public API for no other caller. Keeping the new code in the same package
avoids that.

New files under pkg/server:

  • multicluster_router.goMulticlusterRouter middleware: applies
    path_regex, validates cluster name, builds reqCfg, injects context.
    Constructor: NewMulticlusterRouter(cfg config.MulticlusterConfig, ch config.ClickHouseConfig) (*MulticlusterRouter, error). Exposes
    (r *MulticlusterRouter) Match(req) (cluster, rest string, ok bool).
  • catalog_cache.goCatalogCache with constructor
    NewCatalogCache(cfg config.MulticlusterConfig) *CatalogCache and method:
    func (c *CatalogCache) GetOrDiscover(
        ctx context.Context,
        cacheKey, cluster string,
        reqCfg config.ClickHouseConfig,
        factory ClientFactory,
        rules []config.DynamicToolRule,
        readOnly bool,
        bearerExp time.Time, // zero → use fallback TTL
    ) (map[string]dynamicToolMeta, error)
    This is the single canonical signature; all other references in this doc
    conform to it.
  • multicluster_identity.goCacheKey(bearer string) string,
    BearerExp(bearer string) (time.Time, bool), and the context helpers
    ClusterFromContext, CHConfigFromContext. No CacheKeyFromContext
    — cacheKey is computed inside getServer from the bearer already on
    context.

Roughly 400 LOC including tests.

pkg/server/server_dynamic_tools.go

The current discovery path reaches credentials via
s.GetClickHouseClientFromCtx(ctx), which extracts JWE / OAuth tokens and
claims from context and merges them with s.Config.ClickHouse as the
host source. In multi-cluster mode the host source must be reqCfg, not
the global — otherwise discovery and tool calls silently route to the
wrong backend.

Discovery takes the resolved config by value and a credential-only
factory:

type ClientFactory func(
    ctx context.Context,
    chCfg config.ClickHouseConfig,   // authoritative — host already expanded
) (*clickhouse.Client, error)

func DiscoverTools(
    ctx context.Context,
    chCfg config.ClickHouseConfig,
    factory ClientFactory,
    rules []config.DynamicToolRule,
    readOnly bool,
) (map[string]dynamicToolMeta, error)

factory must respect chCfg.Host as authoritative; the factory pkg/server
provides reads JWE/OAuth tokens and claims from ctx, then constructs a
*clickhouse.Client against the passed chCfg without consulting the
global. The existing GetClickHouseClientWithOAuth already accepts an
explicit chCfg-equivalent path; we add a tightened variant that refuses
to fall back to s.Config.ClickHouse and route through it from both modes:

  • Single-cluster mode keeps calling the factory with
    s.Config.ClickHouse. Behavior unchanged.
  • Multi-cluster mode calls it with reqCfg. The factory never sees the
    global, so host expansion is the only authoritative source.

Other changes:

  • EnsureDynamicTools becomes a thin caller for single-cluster mode: calls
    DiscoverTools(s.Config.ClickHouse, s.clientFactory, ...) once, then
    s.registerDynamicTools(...) as today.
  • Handler factories (makeDynamicToolHandler,
    makeDynamicWriteToolHandler) read CHConfigFromContext(ctx).
    CHConfigFromContext falls back to the global config when the context
    key is absent (single-cluster mode); in multi-cluster mode the router
    middleware always sets the key, so the global is never consulted.

pkg/server/server.go

  • New context keys: ClusterNameKey, RequestCHConfigKey. These
    coexist with the existing CHJWEServerKey, JWETokenKey,
    JWEClaimsKey, OAuthTokenKey, OAuthClaimsKey — none of those are
    changed. The multi-cluster middleware chain is router → authInjector
    → serverInjector → sdkHandler
    : router sets cluster/reqCfg first,
    auth puts the bearer on ctx, serverInjector puts the wrapper server on
    ctx so handlers can find it via GetClickHouseJWEServerFromContext.
  • CHConfigFromContext(ctx) config.ClickHouseConfig — returns
    RequestCHConfigKey if set, otherwise the server's global
    s.Config.ClickHouse (single-cluster path).
  • No CacheKeyFromContext — cacheKey is derived inside the getServer
    closure from the bearer already on ctx.
  • GetClickHouseClientFromCtx switches to reading CHConfigFromContext
    for host/port/TLS, while still extracting JWE/OAuth tokens from ctx as
    today. The internal helper GetClickHouseClientWithOAuth gains an
    explicit chCfg parameter so it never reaches for the global.

Static-tool registration adapter. AltinityMCPServer.AddTool takes our
local ToolHandlerFunc; *mcp.Server.AddTool takes mcp.ToolHandler. The
two function signatures are identical but they are distinct named types so
Go requires a conversion. Add:

// sdkServerAdapter exposes a plain *mcp.Server as an AltinityMCPServer so
// it can be passed to registerStaticTool / registerDynamicTools without
// pulling in the wrapper-server's other state.
type sdkServerAdapter struct{ srv *mcp.Server }

func (a sdkServerAdapter) AddTool(tool *mcp.Tool, h ToolHandlerFunc) {
    a.srv.AddTool(tool, mcp.ToolHandler(h))
}
// AddResource / AddResourceTemplate / AddPrompt mirrored similarly.

// RegisterToolsOnSDKServer registers static + dynamic tools onto a fresh
// *mcp.Server. Used by the multi-cluster getServer closure.
func RegisterToolsOnSDKServer(srv *mcp.Server, cfg *config.Config,
                              dynamic map[string]dynamicToolMeta) {
    adapter := sdkServerAdapter{srv: srv}
    registerStaticToolsOn(adapter, cfg)
    registerDynamicToolsOn(adapter, dynamic)  // pure helper, not the method
}

registerDynamicToolsOn is a new pure helper that takes the catalog map
and an AltinityMCPServer; the existing
(*ClickHouseJWEServer).registerDynamicTools becomes a wrapper that calls
it (so single-cluster behavior is unchanged).

cmd/altinity-mcp/main.go

Three changes around the existing mcp.NewStreamableHTTPHandler setup.

  1. Mode select. If cfg.Multicluster.PathRegex == "" → unchanged path.
    The global *mcp.Server is pre-populated with static + dynamic tools
    as today, and getServer returns it.

  2. Multi-cluster path. Replace the getServer closure:

    router, err := server.NewMulticlusterRouter(cfg.Multicluster, cfg.ClickHouse)
    if err != nil { return err }
    cache := server.NewCatalogCache(cfg.Multicluster)
    
    // Factory closes over the JWE wrapper for credential extraction but
    // takes chCfg as authoritative (no fallback to s.Config.ClickHouse).
    factory := func(ctx context.Context, chCfg config.ClickHouseConfig) (*clickhouse.Client, error) {
        jwe   := a.mcpServer.ExtractTokenFromCtx(ctx)
        oauth := a.mcpServer.ExtractOAuthTokenFromCtx(ctx)
        claims := a.mcpServer.GetOAuthClaimsFromCtx(ctx)
        return a.mcpServer.GetClickHouseClientWithOAuthForConfig(ctx, chCfg, jwe, oauth, claims)
    }
    
    newSDKServer := func(tools map[string]dynamicToolMeta) *mcp.Server {
        srv := mcp.NewServer(&mcp.Implementation{Name: "altinity-mcp"},
            &mcp.ServerOptions{HasTools: true, HasResources: true, HasPrompts: true})
        server.RegisterToolsOnSDKServer(srv, &cfg, tools)
        server.RegisterResourcesOnSDKServer(srv)
        server.RegisterPromptsOnSDKServer(srv)
        return srv
    }
    
    getServer := func(r *http.Request) *mcp.Server {
        cluster, ok := server.ClusterFromContext(r.Context())
        if !ok { return nil } // SDK returns 400
    
        reqCfg := server.CHConfigFromContext(r.Context())
    
        // authInjector has already populated the bearer on ctx (or 401'd).
        // The empty-bearer guard is defensive: returning a static-only
        // server here prevents any unauthenticated request from colliding
        // on a single empty-bearer cache slot, even under a misconfigured
        // middleware chain.
        bearer := a.mcpServer.ExtractOAuthTokenFromCtx(r.Context())
        if bearer == "" {
            return newSDKServer(nil)  // static tools / resources / prompts only
        }
    
        key    := server.CacheKey(bearer)
        exp, _ := server.BearerExp(bearer)
    
        tools, err := cache.GetOrDiscover(r.Context(), key, cluster, reqCfg,
            factory, cfg.Server.DynamicTools, cfg.ClickHouse.ReadOnly, exp)
        if err != nil {
            log.Warn().Err(err).Msg("multicluster: discovery failed; static-only")
            tools = nil
        }
        return newSDKServer(tools)
    }
    
    sdkHandler := mcp.NewStreamableHTTPHandler(getServer,
        &mcp.StreamableHTTPOptions{Stateless: true})
  3. Mux registration. Outer mux registers /health, /livez, /oauth/*,
    /.well-known/*, then mounts the multi-cluster handler chain on
    cfg.Multicluster.MountPrefix. OpenAPI routes are not registered in
    multi-cluster mode (rejected at config load).

    mux.Handle(cfg.Multicluster.MountPrefix,
        corsHandler(stripTrailingSlash(
            router.Middleware(                    // steps 1–3: cluster, reqCfg
                authInjector(                     // step 4: bearer → ctx (or 401)
                    serverInjector(sdkHandler))))) // step 5: wrapper → ctx; SDK
    )
    

    Outer → inner is execution order (outermost middleware runs first).
    router.Middleware does only steps 1–3 — extract cluster, validate
    name, build reqCfg, inject (cluster, reqCfg) into ctx. cacheKey is
    not computed here. serverInjector must remain in the chain — static
    and dynamic tool handlers all call GetClickHouseJWEServerFromContext
    and bail with "can't get JWEServer from context" if it's absent.

helm/altinity-mcp/values.yaml

  • Document the {cluster} placeholder semantic on clickhouse.host.
  • Add optional multicluster: block with the fields above.
  • Existing deployments need no values changes.
  • New values_examples/mcp-multicluster.yaml showing the
    chi-{cluster}-{cluster}-0-0.demo shape.

Test plan

Unit tests

  • pkg/config: regex validation (missing cluster group rejected, malformed
    regex rejected); MC + JWE combination rejected; defaults applied.
  • pkg/server (multicluster files):
    • MulticlusterRouter.Match: positive/negative cases; cluster-name
      regex rejects evil.example, .., IPv4 literals, overlong names,
      leading-dot names (.health, .well-known); allowlist enforced.
    • CatalogCache.GetOrDiscover: hit, miss, TTL sweep, hard-cap behavior,
      concurrent miss collapsed by singleflight (test with a counted fake
      discovery function), fallback TTL when JWT exp absent.
    • CacheKey: stable on identical bearer; differs across rotated tokens;
      two different bearers with identical iss/sub/aud claims hash to
      different keys (regression test for the v1.0 attack); empty bearer
      is never passed to CacheKey from getServer (covered by the
      empty-bearer guard).
    • BearerExp: extracts exp from JWT; returns ok=false for opaque or
      malformed bearer.
    • getServer empty-bearer guard: a request with no Authorization
      header that somehow reaches getServer returns a static-only server
      (no cache.GetOrDiscover call, no CH round-trip).
  • pkg/server/server_dynamic_tools: DiscoverTools(ctx, chCfg, factory, rules, readOnly) calls the factory once per discovery; handler factories
    read CHConfigFromContext.
  • pkg/server (adapter): sdkServerAdapter.AddTool correctly converts
    ToolHandlerFuncmcp.ToolHandler; tool invocation through the
    adapter produces identical results to direct registration.
  • pkg/server: host expansion ({cluster} substitution; literal host with
    warning).
  • cmd/altinity-mcp: mux ordering — /health, /livez, /oauth/*,
    /.well-known/* are not interpreted as cluster names.

Integration / e2e (post-deploy)

End-to-end against the otel demo cluster:

  1. Deploy with:
    clickhouse:
      host: "chi-{cluster}-{cluster}-0-0.demo"
      port: 8443
      protocol: https
      mode: gating
    multicluster:
      mount_prefix: "/mcp/"
      path_regex: "^/mcp/(?P<cluster>[^/]+)/?$"
      cluster_allowlist: ["otel", "antalya"]
    openapi:
      enabled: false  # required in multi-cluster mode for v1
  2. Configure two MCP connectors in Claude:
    • https://mcp.../mcp/github
    • https://mcp.../mcp/antalya
  3. With each connector:
    • tools/list returns static tools + cluster-specific views.
    • execute_query succeeds against the correct backend.
    • /mcp/bogus → 404 "unknown cluster" (allowlist).
    • /mcp/evil.example → 404 (DNS-label regex).
    • /mcp/otel/openapi/list_tables → 404 (OpenAPI disabled in MC mode).
  4. Single-cluster regression: a deployment without multicluster.path_regex
    behaves identically to today (including resources, prompts, OpenAPI).
  5. Config-load rejection: a config with multicluster.path_regex set AND
    openapi.enabled: true refuses to start.
  6. JWE regression: JWE deployments unaffected (MC + JWE rejected at startup).
  7. Concurrency: open both connectors simultaneously with a cold cache;
    verify only one discovery per (cacheKey, cluster) hits CH (sidecar log
    count, or a custom metric exposed by the cache).

Use the test-mcp-connector skill for the e2e step.

Health and liveness endpoints

/livez and /health stay on the outer mux, outside mount_prefix. The
cluster middleware never sees them, and the leading-dot rule in the
cluster-name regex prevents a misconfigured mount from ever interpreting
health or livez as a cluster name.

Neither endpoint changes shape in multi-cluster mode:

  • /livez is a pure process check that never touches ClickHouse. It
    reports {status:"alive"} for any running pod regardless of how many
    clusters are configured.
  • /health already short-circuits its ClickHouse-ping branch when JWE or
    OAuth is enabled (the existing credentialsArePerRequest gate at
    main.go:689). Multi-cluster mode requires OAuth (enforced at config
    load), so the gate is always true; /health returns 200 with
    auth: "per_request_credentials" — identical to today's OAuth
    deployments.

This avoids the per-cluster-readiness question entirely: there is no
single answer to "is ClickHouse reachable" across N clusters with N
per-user credential sets, and probing each cluster on every health check
would amplify CH load proportional to (clusters × replicas × probe rate).
A future v2 may add an explicit multi-cluster readiness aggregator (e.g.
/readyz?cluster=otel) if a deployment needs it; v1 doesn't.

Backward compatibility

Pure superset of today's behavior:

  • multicluster.path_regex unset → behavior unchanged. Single-cluster
    deployments need zero config changes.
  • clickhouse.host without {cluster} → behavior unchanged.
  • All existing tools, handlers, and CH client construction logic keep
    working; they now read reqCfg from context in multi-cluster mode and
    from the server's global config otherwise. serverInjector and the
    CHJWEServerKey contract are preserved in both modes.
  • Resources and prompts continue to be advertised and registered in
    multi-cluster mode (per-request server uses
    HasTools/HasResources/HasPrompts matching today's defaults).
  • Helm chart: only host: semantics expand.

Future work (v2 and beyond)

  1. list_clusters tool scoped to clusters reachable for the calling user.
  2. Per-cluster overrides (cluster_overrides: { name: { … } }) or full
    clusters: map with default_cluster.
  3. Local JWT validation on the request path. Today's "validate at CH
    only" model has a soft trust boundary; landing this would let the cache
    key migrate from raw-bearer hash to verified-claim hash (stable across
    token rotation) without re-opening the v1.0 forgery attack. Needs
    introspection-or-JWKS plumbing on the inbound side.
  4. Per-cluster OpenAPI. Requires refactoring OpenAPIHandler /
    ServeOpenAPISchema off the wrapper's global s.dynamicTools map and
    onto the per-(cacheKey, cluster) catalog cache. Probably an additive
    OpenAPIHandlerForCatalog(cat) entry point invoked by a
    /mcp/{cluster}/openapi[/...] route added to path_regex (the rest
    group from earlier drafts). Until then, OpenAPI is rejected at config
    load alongside multi-cluster.
  5. JWE × multi-cluster coexistence via a path layout that carries both
    token and cluster (e.g. /{token}/{cluster}).
  6. Negative caching of 401/403 per (identity, cluster).
  7. Sliding TTL / background refresh for catalogs.
  8. tools/list_changed push once the spec settles on its post-PR-2322
    server→client model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions