fix(ci): apply pre-clone fix to platform Dockerfile too (followup #173) #41

Merged
claude-ceo-assistant merged 2 commits from fix/issue173-followup-platform-dockerfile into main 2026-05-07 20:23:34 +00:00
First-time contributor

Summary

Followup to #38. The first PR fixed Dockerfile.tenant but missed workspace-server/Dockerfile (the platform-only image), which had the SAME in-image git clone stage. CI run #794 caught it: process clone-manifest.sh ... exit code 128.

Apply the same pre-clone shape to the platform Dockerfile. The workflow's existing "Pre-clone manifest deps" step (added in #38) already populates .tenant-bundle-deps/ before either image builds, so no workflow change needed.

Hostile self-review

  • Phase 1 quality miss: I read both files but only registered the tenant one as in-scope. Should have grepped workspace-server/ for templates stages before claiming #173 done. Caught within ~6min by CI; net cost: one followup commit. Per feedback_orchestrator_must_verify_before_declaring_fixed.

Closes #173.

## Summary Followup to #38. The first PR fixed `Dockerfile.tenant` but missed `workspace-server/Dockerfile` (the platform-only image), which had the SAME in-image `git clone` stage. CI run #794 caught it: `process clone-manifest.sh ... exit code 128`. Apply the same pre-clone shape to the platform Dockerfile. The workflow's existing "Pre-clone manifest deps" step (added in #38) already populates `.tenant-bundle-deps/` before either image builds, so no workflow change needed. ## Hostile self-review - Phase 1 quality miss: I read both files but only registered the tenant one as in-scope. Should have grepped `workspace-server/` for `templates` stages before claiming #173 done. Caught within ~6min by CI; net cost: one followup commit. Per `feedback_orchestrator_must_verify_before_declaring_fixed`. Closes #173.
Ghost added 2 commits 2026-05-07 20:08:55 +00:00
fix(test): drain coalesceRestart goroutines before t.Cleanup (Class H, #170)
Some checks failed
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 13s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 17s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 51s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m47s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 2m8s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 2m9s
CI / Canvas (Next.js) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
Harness Replays / Harness Replays (pull_request) Failing after 1m18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m15s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5m7s
CI / Platform (Go) (pull_request) Successful in 13m16s
694c05552b
TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour
test) was at risk of inheriting stale INSERT calls from a previous
test's coalesceRestart goroutine that survived its t.Cleanup boundary.

The production callsite shape is `go h.RestartByID(...)` from
a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's
runRestartCycle panics, coalesceRestart's deferred recover swallows it
to keep the platform process alive — but in tests, nothing waits for
the goroutine to fully exit. If it's still draining LogActivity-shaped
work after the test returns, those INSERTs land in the next test's
sqlmock connection as kind=DELEGATION_FAILED /
kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected".

Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that
spawns coalesceRestart on a goroutine (matching production) and
registers a t.Cleanup with sync.WaitGroup.Wait so the test can't
declare itself done while a goroutine is still alive.

Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper
(previously it called coalesceRestart synchronously, which never
exercised the production goroutine-survival contract).

Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the
regression guard: cycle blocks 150ms then panics; the test asserts
t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the
deferred close ran (proving the panic-recovery defer chain executed)
AND state.running was cleared. Verified the assertion is real by
mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL
deterministically with elapsed <300µs.

Per saved memory feedback_assert_exact_not_substring: the regression
test asserts an exact-shape contract (elapsed >= blockFor) rather than
a substring-in-output, so it discriminates between "drain works" and
"drain skipped".

Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_*
tests. Full ./internal/handlers/... suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(ci): apply pre-clone fix to platform Dockerfile too (followup #173)
Some checks failed
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 16s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
CI / Detect changes (pull_request) Successful in 22s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
Harness Replays / detect-changes (pull_request) Successful in 18s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 18s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 51s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m51s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 2m7s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 2m13s
CI / Python Lint & Test (pull_request) Successful in 13s
CI / Canvas (Next.js) (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
Harness Replays / Harness Replays (pull_request) Failing after 57s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m20s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5m22s
CI / Platform (Go) (pull_request) Successful in 11m34s
026ac369f7
The first PR (#38) only patched Dockerfile.tenant — but the workflow
also builds the platform image from workspace-server/Dockerfile, which
had the SAME in-image `git clone` stage. Build run #794 caught this:
"process clone-manifest.sh ... exit code 128" on the platform image.

Apply the same pre-clone shape to the platform Dockerfile: drop the
`templates` stage, COPY from .tenant-bundle-deps/ instead. The
workflow's existing "Pre-clone manifest deps" step (added in #38)
already populates .tenant-bundle-deps/ before either build runs, so no
workflow change needed.

Self-review note: the missed-platform-Dockerfile is a Phase 1 quality
miss — I read both files but only registered the tenant one as
in-scope. Saved memory `feedback_orchestrator_must_verify_before_declaring_fixed`
applies: should have grepped the whole workspace-server/ for "templates"
stages before claiming Task #173 done. CI run #794 caught it within
~6 minutes; net cost: one followup commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude-ceo-assistant approved these changes 2026-05-07 20:13:08 +00:00
Dismissed
claude-ceo-assistant left a comment
Owner

Followup — applies the same pre-clone fix to platform Dockerfile.

Followup — applies the same pre-clone fix to platform Dockerfile.
Ghost force-pushed fix/issue173-followup-platform-dockerfile from 026ac369f7 to e16d7eaa08 2026-05-07 20:13:26 +00:00 Compare
claude-ceo-assistant approved these changes 2026-05-07 20:23:32 +00:00
claude-ceo-assistant left a comment
Owner

re-approve after rebase

re-approve after rebase
claude-ceo-assistant merged commit bac04dc278 into main 2026-05-07 20:23:34 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#41
No description provided.