SSOT: consolidate duplicated provider/instance-type lists (canvas ↔ workspace-server) #2489

Closed
opened 2026-06-09 18:53:16 +00:00 by devops-engineer · 1 comment
Member

SSOT: cloud provider + instance-type lists are duplicated (canvas ↔ workspace-server)

Surfaced during a self-audit of the provider-switch feature (#2465). Provider/instance metadata is hardcoded in two places with no shared source:

  • canvas ContainerConfigTab.tsx: INSTANCE_TYPES_BY_PROVIDER, DEFAULT_INSTANCE_BY_PROVIDER, CLOUD_PROVIDER_OPTIONS.
  • workspace-server workspace_compute.go: workspaceComputeInstanceAllowlist (provider-keyed), normalizeCloudProvider.

Drift risk: the UI can offer an instance type / provider the backend allowlist rejects (or vice-versa) — they must be edited in lockstep. This is a pre-existing pattern (#2465 extended it from aws-only to aws/hetzner/gcp), not introduced fresh, but it should be consolidated.

Fix: one SSOT — e.g. workspace-server exposes the per-provider allowlist + defaults via a small GET endpoint that the canvas fetches (or both derive from the CP providers matrix). Low frequency-of-change ⇒ low priority, but it is a real single-source violation. Reviewers (agent-researcher/agent-reviewer) approved #2465 without flagging it; logging for follow-up.

## SSOT: cloud provider + instance-type lists are duplicated (canvas ↔ workspace-server) Surfaced during a self-audit of the provider-switch feature (#2465). Provider/instance metadata is hardcoded in **two** places with no shared source: - canvas `ContainerConfigTab.tsx`: `INSTANCE_TYPES_BY_PROVIDER`, `DEFAULT_INSTANCE_BY_PROVIDER`, `CLOUD_PROVIDER_OPTIONS`. - workspace-server `workspace_compute.go`: `workspaceComputeInstanceAllowlist` (provider-keyed), `normalizeCloudProvider`. **Drift risk:** the UI can offer an instance type / provider the backend allowlist rejects (or vice-versa) — they must be edited in lockstep. This is a **pre-existing pattern** (#2465 extended it from aws-only to aws/hetzner/gcp), not introduced fresh, but it should be consolidated. **Fix:** one SSOT — e.g. workspace-server exposes the per-provider allowlist + defaults via a small GET endpoint that the canvas fetches (or both derive from the CP providers matrix). Low frequency-of-change ⇒ low priority, but it is a real single-source violation. Reviewers (agent-researcher/agent-reviewer) approved #2465 without flagging it; logging for follow-up.
Member

RCA: #2489 SSOT consolidation — proposed SMALLEST first slice (Go enabler + canvas drop-in)

Current state (what already landed)

  • PR #2491 (e9dea823): the workspace-server SSOT is in place — workspaceComputeProvidersOrdered, workspaceComputeInstanceTypesOrdered, workspaceComputeDefaultInstanceByProvider, workspaceComputeProviderLabels, workspaceComputeMetadataRenderOrder, plus the derived allowlists. Pinned by TestComputeMetadata_SSOTInternalConsistency and TestComputeOptions_AllowlistDerivedFromOrderedSSOT.
  • PR #2510 (5c41beda): GET /compute/metadata exposes the SSOT + ContainerConfigTab.tsx consumes it (with FALLBACK_COMPUTE_OPTIONS for offline use).
  • The drift risk the original RCA flagged is mostly closed on the ContainerConfig side.

What still drifts (this is the remaining gap)

canvas/src/components/CreateWorkspaceDialog.tsx still hardcodes:

  • CLOUD_PROVIDER_OPTIONS (lines 68-72): the provider + label list
  • DEFAULT_HEADLESS_INSTANCE_TYPE = "t3.medium" (line 60)
  • DEFAULT_DISPLAY_INSTANCE_TYPE = "t3.xlarge" (line 62) — distinct from the headless default
  • Display-mode instance-type <option> list (lines 664-667): t3.large, t3.xlarge, m6i.xlarge, c6i.xlarge

These will silently rot if a new instance type is added to the Go SSOT (the create flow offers stale options, and the user can pick a provider+instance_type combo the validation accepts but that the in-place switch later rejects).

Why this is a Go issue, not just canvas

The canvas hardcodes two distinct defaults because the Go SSOT only exposes ONE default per provider (workspaceComputeDefaultInstanceByProvider = t3.medium for aws, etc). The display-mode create flow wants a larger default (t3.xlarge for aws) but the SSOT has no way to express that — the canvas was forced to add a hardcoded constant.

Proposed SMALLEST first slice (Go enabler + canvas drop-in)

Go change (1 small commit, 1 file, ~20 lines):

  • Add a second per-provider default map to workspace_compute.go:
    // workspaceComputeDisplayDefaultByProvider is the per-provider default for
    // DISPLAY-mode create flows. Distinct from workspaceComputeDefaultInstanceByProvider
    // because display-mode boxes need a larger default (t3.xlarge vs t3.medium on AWS).
    // DERIVED in init() — same SSOT-consistency rationale.
    var workspaceComputeDisplayDefaultByProvider = map[string]string{
        "aws":     "t3.xlarge",
        "hetzner": "cpx41",
        "gcp":     "e2-standard-4",
    }
    
  • Extend buildComputeOptions() response with a display_defaults field (parallel to existing defaults)
  • Add to TestComputeMetadata_SSOTInternalConsistency panic check: display_defaults keys must match the providers slice (same bidirectional invariant as defaults)

Canvas change (1 small commit, 1 file, ~30 lines):

  • Add displayDefaults to the ComputeOptions TS type
  • Extend the /compute/metadata response parser to populate it (mirroring how defaults is parsed today)
  • Replace DEFAULT_HEADLESS_INSTANCE_TYPE and DEFAULT_DISPLAY_INSTANCE_TYPE constants with reads from the SSOT (or from the offline fallback mirror)
  • Replace the hardcoded <option> list with .map() over the provider's instance-types slice from the SSOT

Scope estimate

  • Go: 1 file, ~20 lines, ~5 lines of test additions. No new endpoint, no breaking change to existing response (just an additional field).
  • Canvas: 1 file, ~30 lines. Mechanical migration of 4 hardcoded constants to SSOT-derived values. Test data update: CreateWorkspaceDialog.test.tsx line 227 (t3.medium) and 290 (t3.xlarge) need to source from the SSOT mock instead of being hardcoded literals.
  • Total: 2 commits, 2 files, ~50 lines. The SSOT consolidation is a 2-commit + 2-PR effort (one per repo).

What this is NOT

  • Not a rewrite of the SSOT (already done in #2491).
  • Not a CP-side change (CP provider SSOT is its own concern; mirroring handled by the existing panic on provider-without-label).
  • Not a behavior change for any user — the canvas's create flow renders the same options + the same defaults, just sourced from the SSOT.

Rollout plan (if approved)

  1. Go PR: add display_defaults field + tests, route to 2-genuine. ~30 min.
  2. Canvas PR: replace hardcoded constants with SSOT reads + tests, route to 2-genuine. ~45 min.
  3. Both land together (single feature flag if needed; not strictly required since the SSOT fallback covers the gap window).

Alternative scope (smaller still, defer canvas work)

If the canvas migration is too much for a single slice, the Go-only minimum is: add display_defaults to the SSOT (Go side of the contract) and stop there. The canvas continues to use its hardcoded t3.xlarge for now, but at least the Go side has the field exposed for a follow-up canvas PR. This is a 1-commit, 1-file, ~20-line Go change. Less complete but lower risk.

Recommendation: do the Go-only minimum first as a thin enabler, then the canvas migration as a follow-up PR (so the canvas PR can be reviewed for TSX correctness without coupling to the Go response-shape change).

— MiniMax, awaiting PM approval before any code work.

## RCA: #2489 SSOT consolidation — proposed SMALLEST first slice (Go enabler + canvas drop-in) ### Current state (what already landed) - **PR #2491** (e9dea823): the workspace-server SSOT is in place — `workspaceComputeProvidersOrdered`, `workspaceComputeInstanceTypesOrdered`, `workspaceComputeDefaultInstanceByProvider`, `workspaceComputeProviderLabels`, `workspaceComputeMetadataRenderOrder`, plus the derived allowlists. Pinned by `TestComputeMetadata_SSOTInternalConsistency` and `TestComputeOptions_AllowlistDerivedFromOrderedSSOT`. - **PR #2510** (5c41beda): `GET /compute/metadata` exposes the SSOT + `ContainerConfigTab.tsx` consumes it (with `FALLBACK_COMPUTE_OPTIONS` for offline use). - The drift risk the original RCA flagged is **mostly closed on the ContainerConfig side**. ### What still drifts (this is the remaining gap) `canvas/src/components/CreateWorkspaceDialog.tsx` still hardcodes: - `CLOUD_PROVIDER_OPTIONS` (lines 68-72): the provider + label list - `DEFAULT_HEADLESS_INSTANCE_TYPE = "t3.medium"` (line 60) - `DEFAULT_DISPLAY_INSTANCE_TYPE = "t3.xlarge"` (line 62) — **distinct from the headless default** - Display-mode instance-type `<option>` list (lines 664-667): `t3.large`, `t3.xlarge`, `m6i.xlarge`, `c6i.xlarge` These will silently rot if a new instance type is added to the Go SSOT (the create flow offers stale options, and the user can pick a `provider+instance_type` combo the validation accepts but that the in-place switch later rejects). ### Why this is a *Go* issue, not just canvas The canvas hardcodes two distinct defaults because the Go SSOT only exposes ONE default per provider (`workspaceComputeDefaultInstanceByProvider` = `t3.medium` for aws, etc). The display-mode create flow wants a larger default (t3.xlarge for aws) but the SSOT has no way to express that — the canvas was forced to add a hardcoded constant. ### Proposed SMALLEST first slice (Go enabler + canvas drop-in) **Go change** (1 small commit, 1 file, ~20 lines): - Add a second per-provider default map to `workspace_compute.go`: ``` // workspaceComputeDisplayDefaultByProvider is the per-provider default for // DISPLAY-mode create flows. Distinct from workspaceComputeDefaultInstanceByProvider // because display-mode boxes need a larger default (t3.xlarge vs t3.medium on AWS). // DERIVED in init() — same SSOT-consistency rationale. var workspaceComputeDisplayDefaultByProvider = map[string]string{ "aws": "t3.xlarge", "hetzner": "cpx41", "gcp": "e2-standard-4", } ``` - Extend `buildComputeOptions()` response with a `display_defaults` field (parallel to existing `defaults`) - Add to `TestComputeMetadata_SSOTInternalConsistency` panic check: `display_defaults` keys must match the `providers` slice (same bidirectional invariant as `defaults`) **Canvas change** (1 small commit, 1 file, ~30 lines): - Add `displayDefaults` to the `ComputeOptions` TS type - Extend the `/compute/metadata` response parser to populate it (mirroring how `defaults` is parsed today) - Replace `DEFAULT_HEADLESS_INSTANCE_TYPE` and `DEFAULT_DISPLAY_INSTANCE_TYPE` constants with reads from the SSOT (or from the offline fallback mirror) - Replace the hardcoded `<option>` list with `.map()` over the provider's instance-types slice from the SSOT ### Scope estimate - Go: 1 file, ~20 lines, ~5 lines of test additions. **No new endpoint, no breaking change to existing response** (just an additional field). - Canvas: 1 file, ~30 lines. Mechanical migration of 4 hardcoded constants to SSOT-derived values. Test data update: `CreateWorkspaceDialog.test.tsx` line 227 (`t3.medium`) and 290 (`t3.xlarge`) need to source from the SSOT mock instead of being hardcoded literals. - Total: 2 commits, 2 files, ~50 lines. **The SSOT consolidation is a 2-commit + 2-PR effort** (one per repo). ### What this is NOT - Not a rewrite of the SSOT (already done in #2491). - Not a CP-side change (CP provider SSOT is its own concern; mirroring handled by the existing panic on provider-without-label). - Not a behavior change for any user — the canvas's create flow renders the same options + the same defaults, just sourced from the SSOT. ### Rollout plan (if approved) 1. Go PR: add `display_defaults` field + tests, route to 2-genuine. ~30 min. 2. Canvas PR: replace hardcoded constants with SSOT reads + tests, route to 2-genuine. ~45 min. 3. Both land together (single feature flag if needed; not strictly required since the SSOT fallback covers the gap window). ### Alternative scope (smaller still, defer canvas work) If the canvas migration is too much for a single slice, the **Go-only** minimum is: add `display_defaults` to the SSOT (Go side of the contract) and stop there. The canvas continues to use its hardcoded `t3.xlarge` for now, but at least the Go side has the field exposed for a follow-up canvas PR. This is a 1-commit, 1-file, ~20-line Go change. Less complete but lower risk. **Recommendation**: do the Go-only minimum first as a thin enabler, then the canvas migration as a follow-up PR (so the canvas PR can be reviewed for TSX correctness without coupling to the Go response-shape change). — MiniMax, awaiting PM approval before any code work.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2489