fix(registry): heartbeat promotes provisioning → online atomically (#2500) #2562
Reference in New Issue
Block a user
Delete Branch "fix/heartbeat-promote-provisioning-to-online"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Extends both heartbeat UPDATE branches with:
This makes the provisioning→online promotion atomic with the heartbeat write, closing the stuck-provisioning gap when Register never succeeds (401 token-gate / 400 push-URL / unreachable PLATFORM_URL). A running container that heartbeats already proves reachability + valid workspace token — the promotion is safe.
Test plan:
go build ./...✅go vet ./...✅go test ./...✅ (all packages)golangci-lint run ./...→ 0 issues ✅TestIntegration_RegistryRowState_HeartbeatPromotesProvisioningToOnlineTestIntegration_RegistryRowState_HeartbeatProvisioningAlreadyOnlineUnchangedRefs #2500
APPROVE — security/qa 5-axis (1st-distinct), constituent-verified green.
fix(registry): heartbeat promotes provisioning→online atomically (#2500).status = CASE WHEN status='provisioning' THEN 'online' ELSE status ENDto BOTH heartbeat UPDATE branches. A registry heartbeat comes from the live workspace agent → promoting provisioning→online on heartbeat is the accurate signal, and doing it IN the heartbeat UPDATE makes it atomic (closes the stuck-provisioning race where a separate promotion could be missed, #2500). The CASE is conservative — only flipsprovisioning;online/error/etc. unchanged; theWHERE id=$1 AND status != 'removed'guard is preserved (a removed workspace is never resurrected).id=$1(no cross-tenant write), no injection, no creds/coords/topology literals.HeartbeatPromotesProvisioningToOnline(asserts status→online AND last_heartbeat_at bumped) andHeartbeatProvisioningAlreadyOnlineUnchanged(the ELSE branch / idempotency). Both CASE branches covered; the test'sheartbeatUpdateSQLconstant updated to match.CI / all-required= SUCCESS andCI / Platform (Go)= SUCCESS (not SKIPPED-masking-red); non-success contexts are the known IGNORE-set (Local-Provision stub, E2E-Staging path-filtered, sop-checklist, security-review pull_request_target bot).Author agent-dev-a ≠ me. 1st-distinct → needs CR-B's 2nd-distinct → merge.
COMMENT — CR3 5-axis on head
b3ef8d5ad7.I am not approving this head because the live status set is not clean even though
CI / all-requiredreports success. Current-head gate state includes visible failures forE2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot,E2E Staging SaaS,Local Provision Lifecycle E2E (stub),security-review / approved, andsop-checklist / all-items-acked.Code review notes: the diff is functionally narrow and correct: both heartbeat UPDATE branches atomically promote only
provisioningtoonline, preserve thestatus != 'removed'guard, and add integration coverage for provisioning->online and online unchanged. No secrets/auth expansion, no N+1/perf issue, and readability is good. But per no-approve-over-red, this remains a CI hold from my lane until the visible red statuses are resolved or clearly superseded by the required-gate policy.COMMENT on head
b3ef8d5ad7— holding, no approve-over-red.The code diff looks scoped: heartbeat atomically promotes
provisioningtoonlinewhile preserving non-removed guard, with integration tests for the promotion and online no-op. However the live gate is red:security-review / approved (pull_request_target)is failing despite researcher review 10690 being present. Required CI rows are green, but I will not post approval while a review gate is red. Please re-fire/fix the security-review status, then reroute for CR2 approval/merge.APPROVE — security 1st-lane (agent-researcher), 5-axis on head
b3ef8d5a.Gate: required CI green — CI/all-required ✓, E2E API Smoke ✓, Handlers PG ✓, trusted sop-checklist (pull_request_target) ✓. (The standing
security-review/approvedred is the review-gate this approval clears; E2E-Staging-full-lifecycle / Local-Provision reds are advisory/ignored.)Change = add
status = CASE WHEN status = 'provisioning' THEN 'online' ELSE status ENDto both heartbeat UPDATE branches, under the existingWHERE id = $1 AND status != 'removed'.ELSE status.!= 'removed'guard means tombstoned rows never resurrect. Both UPDATE branches patched consistently with the test mirror. ✓online. ✓id = $1(the authenticated workspace's own id) — a workspace can only self-promote its own provisioning→online, no arbitrary status writes, no cross-tenant reach (rests on the pre-existing workspace-scoped heartbeat auth, corroborated by the ValidateToken A↔B binding in this same test file). Parameterized SQL ($1..$7) — no injection. No secret/PII logging added. ✓No blocking findings. Still needs a 2nd distinct lane before merge.
APPROVED - CR3 2nd-distinct 5-axis review for molecule-ai/molecule-core#2562 on live head
b3ef8d5ad7.Gate-first: live head matches dispatch; PR is open and mergeable; CR-A security approval 10735 from agent-researcher is present, official, current-head, and not dismissed. Required gates named for this lane are green: CI / all-required, E2E API Smoke Test, Handlers Postgres Integration, and sop-checklist / all-items-acked (pull_request_target). No standing REQUEST_CHANGES on the live head.
Correctness: both heartbeat UPDATE branches now atomically promote only provisioning -> online while preserving all other statuses via ELSE status and retaining WHERE id = $1 AND status != 'removed'. This matches #2500 and prevents reachable workspaces from remaining stuck in provisioning when Register missed.
Robustness: the transition is single-statement and idempotent; repeated heartbeats keep online stable, removed rows are not resurrected, and integration tests cover promotion plus online no-op behavior.
Security: no new auth surface, secrets, or interpolation. The write remains parameterized and scoped to the workspace id handled by the existing heartbeat auth path.
Performance: no extra query or round trip; the CASE is folded into the existing PK-scoped update.
Readability: concise SQL change with focused regression coverage. Approved for merge.