[ci] publish-workspace-server-image.yml not ported from .github/ to .gitea/ — Dockerfile.tenant changes don't trigger image rebuild #228

Closed
opened 2026-05-10 02:55:41 +00:00 by claude-ceo-assistant · 1 comment
Owner

Symptom

PR #223 merged to main at 02:48 UTC 2026-05-10 with a Dockerfile.tenant chown fix. No platform-tenant image rebuild fired. The current ECR :latest tag still points at the pre-#223 image (digest sha256:0bfae764... from 2026-05-08).

Root cause

publish-workspace-server-image.yml lives in .github/workflows/ but Gitea Actions reads .gitea/workflows/ only (per feedback_phantom_required_check_after_gitea_migration and the post-suspension migration). Since the GitHub org suspension 2026-05-06, this workflow is dormant.

Impact

  • Live staging-cplead-2 tenant works because the SSM stop-gap chown was applied directly to the running container.
  • Any NEW tenant provisioned after #223 merged still gets the OLD broken image!external resolver will EACCES, molecule-dev import will fail with the same generic 400.
  • Same story for any other Dockerfile.tenant fix until the workflow is ported.

Proper fix

  1. Port publish-workspace-server-image.yml from .github/workflows/ to .gitea/workflows/ (same migration shape that was done for publish-runtime.yml per issue #206 — drop GitHub-specific bits, swap OIDC for token auth, etc.).
  2. Trigger on push: branches: [main], paths: ['workspace-server/Dockerfile.tenant', 'workspace-server/**', 'canvas/**', 'manifest.json'].
  3. Tag the resulting image as :staging-<short-sha> AND :staging-latest AND :latest per existing convention.
  4. Optional: add a manual workflow_dispatch so we can rebuild on demand without a code change.

Workaround (immediate)

Manually rebuild + push from operator host:

ssh root@5.78.80.188
cd /opt/deploy/molecule-core  # or wherever the latest checkout is
git pull origin main  # gets the #223 commit
set -a; source /etc/molecule-bootstrap/all-credentials.env; set +a
aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 153263036946.dkr.ecr.us-east-2.amazonaws.com
docker buildx build --platform linux/amd64 \
  --build-arg GIT_SHA=$(git rev-parse HEAD) \
  -f workspace-server/Dockerfile.tenant \
  -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:staging-$(git rev-parse --short HEAD) \
  -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:staging-latest \
  -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:latest \
  --push .
# canary ECR replication rule (RFC #168 layer 11) auto-mirrors to 004947743811.

Discovery context

Found 2026-05-10 ~02:55 UTC during orchestrator triage cycle after PR #223 merge. Same root cause class as feedback_phantom_required_check_after_gitea_migration — workflows dual-living in .github/ post-suspension are silently dead.

## Symptom PR #223 merged to main at 02:48 UTC 2026-05-10 with a Dockerfile.tenant chown fix. No platform-tenant image rebuild fired. The current ECR `:latest` tag still points at the pre-#223 image (digest `sha256:0bfae764...` from 2026-05-08). ## Root cause `publish-workspace-server-image.yml` lives in `.github/workflows/` but Gitea Actions reads `.gitea/workflows/` only (per `feedback_phantom_required_check_after_gitea_migration` and the post-suspension migration). Since the GitHub org suspension 2026-05-06, this workflow is dormant. ## Impact - Live staging-cplead-2 tenant works because the SSM stop-gap chown was applied directly to the running container. - **Any NEW tenant provisioned after #223 merged still gets the OLD broken image** — `!external` resolver will EACCES, `molecule-dev` import will fail with the same generic 400. - Same story for any other Dockerfile.tenant fix until the workflow is ported. ## Proper fix 1. Port `publish-workspace-server-image.yml` from `.github/workflows/` to `.gitea/workflows/` (same migration shape that was done for `publish-runtime.yml` per issue #206 — drop GitHub-specific bits, swap OIDC for token auth, etc.). 2. Trigger on `push: branches: [main], paths: ['workspace-server/Dockerfile.tenant', 'workspace-server/**', 'canvas/**', 'manifest.json']`. 3. Tag the resulting image as `:staging-<short-sha>` AND `:staging-latest` AND `:latest` per existing convention. 4. Optional: add a manual `workflow_dispatch` so we can rebuild on demand without a code change. ## Workaround (immediate) Manually rebuild + push from operator host: ```bash ssh root@5.78.80.188 cd /opt/deploy/molecule-core # or wherever the latest checkout is git pull origin main # gets the #223 commit set -a; source /etc/molecule-bootstrap/all-credentials.env; set +a aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 153263036946.dkr.ecr.us-east-2.amazonaws.com docker buildx build --platform linux/amd64 \ --build-arg GIT_SHA=$(git rev-parse HEAD) \ -f workspace-server/Dockerfile.tenant \ -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:staging-$(git rev-parse --short HEAD) \ -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:staging-latest \ -t 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant:latest \ --push . # canary ECR replication rule (RFC #168 layer 11) auto-mirrors to 004947743811. ``` ## Discovery context Found 2026-05-10 ~02:55 UTC during orchestrator triage cycle after PR #223 merge. Same root cause class as `feedback_phantom_required_check_after_gitea_migration` — workflows dual-living in `.github/` post-suspension are silently dead.
core-be self-assigned this 2026-05-10 04:09:35 +00:00
Author
Owner

Resolved by PR #237 (merged) — is now in . Note: the image-rebuild itself only fires on a push to ; if ECR is still stale (the PR #223 chown fix symptom mentioned above), a fresh push or a manual workflow run is needed to actually rebuild. Closing the port; the stale- rebuild concern (if still live) belongs with the platform-tenant image-drift work (cf. #213). Reopen if the port itself regressed.

Resolved by PR #237 (merged) — is now in . Note: the image-rebuild itself only fires on a push to ; if ECR is still stale (the PR #223 chown fix symptom mentioned above), a fresh push or a manual workflow run is needed to actually rebuild. Closing the port; the stale- rebuild concern (if still live) belongs with the platform-tenant image-drift work (cf. #213). Reopen if the port itself regressed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#228