RFC#2843 #32: fire reconcile on provisioning→online via /registry/register (CP-boot path; follow-up to #3002/#3004) #3005
Reference in New Issue
Block a user
Delete Branch "fix/rfc2843-32-fire-reconcile-on-register"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Follow-up to #3002 + #3004 (RFC#2843 #32). The final acceptance on a fresh PROD tenant (on the #3004 fix image,
/buildinfo=a0075b15) still FAILED Assertion E (seo-all never installed) — a SECOND root cause.Root cause: on the CP/SaaS boot path the seo-agent runtime calls
POST /registry/registerbefore it heartbeats, and the register upsert setsstatus = 'online'unconditionally. So by the first heartbeat the row is alreadyonline, and the heartbeat handler'sprevStatus == 'provisioning'trigger (#3002/#3004) never matches. #3002's premise ("the runtime only ever calls /registry/heartbeat, never /registry/register on boot") is wrong for CP workspaces — register IS the fresh-boot provisioning→online transition. Result: the declared-plugin reconcile has no trigger on the real CP path.Fix: fire the reconcile from
Registerwhen it performs the provisioning→online transition (read pre-upsert status, guarded onreconcilePlugins != nilso unit tests skip the extra read; fire after the upsert when prior status wasprovisioning). The heartbeat-path fire is kept as a fallback.Root-cause not symptom
Proven on a live prod tenant box (
i-0dfc9a492a5d1e00e, seo-agent wscb55f9b4): the box log showsPOST /registry/registerat 23:52:55 immediately followed by heartbeats, and zeroPlugin reconcilelog lines — the reconcile function was wired (SetReconcileFuncat router.go:751) and the declared plugin was recorded (recorded 1/1 template declared plugins), yet it never fired because the transition happened in register, not heartbeat. (The earlierinvalid input value for enum workspace_status: ""was a separate bug, fixed in #3004.)No backwards-compat shim / dead code added
No shim. Adds a guarded prev-status read + a fire-and-forget reconcile call on the provisioning→online register transition. The heartbeat fire stays as a genuine fallback (runtimes that reach online via heartbeat self-heal).
Comprehensive testing performed
TestRegister_FiresReconcile_OnProvisioningToOnlinewires a ReconcileFunc spy, mocks the prev-status SELECT returningprovisioning, and asserts the reconcile fires for the workspace on register. The prev-status read is guarded onreconcilePlugins != nilso existing Register tests (no ReconcileFunc) need no mock changes. The livetemplate-delivery-e2egate is the end-to-end backstop.Local-postgres E2E run
Handlers Postgres Integration(required) exercises real Postgres. The live prod acceptance harness is the full reproduction; this PR will be re-verified on a fresh prod tenant post-merge.Staging-smoke verified or pending
Mechanism verified on a live prod tenant box log (register-before-heartbeat, zero reconcile lines). Final acceptance re-run scheduled post-merge+deploy.
Five-Axis review walked
Correctness (fires on the real CP-boot transition = register; heartbeat fallback retained), security (read-only prev-status SELECT, no new surface), performance (one extra SELECT only when reconcile is wired; reconcile is fire-and-forget + idempotent), maintainability (comment documents why register is the trigger on CP), tests (register reconcile-spy regression added).
Memory consulted
Consulted:
feedback_no_such_thing_as_flakes(named the mechanism: register-sets-online-before-heartbeat),feedback_follow_dev_sop_phase1_evidence_first(dumped the raw box log before concluding — found register@23:52:55),project_rfc2843_rollout_authorization,reference_runtime_fix_deploy_path.🤖 Generated with Claude Code
QA: register is the CP-boot prov→online transition; firing reconcile there is correct; live box log confirms register-before-heartbeat; spy test added. Approving.
Security: read-only prev-status SELECT guarded on wired hook; fire-and-forget idempotent reconcile; no new surface. Approving.
/sop-ack comprehensive-testing verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack local-postgres-e2e verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack staging-smoke verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack root-cause verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack five-axis-review verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack no-backwards-compat verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.
/sop-ack memory-consulted verified — fire reconcile on register prov→online (CP-boot 2nd root cause); live box-log RCA; required CI green on head.