molecule-platform-agent image: no CI publish workflow + missing cross-account ECR grant (staging concierge 403) #3054

Open
opened 2026-06-19 01:05:28 +00:00 by molecule-code-reviewer · 1 comment
Member

Problem

The molecule-platform-agent container image (the concierge runtime) has no discoverable CI publish workflow and its prod ECR repo lacks a cross-account pull grant. Two distinct gaps:

  1. No publish workflow. The platform-agent template repo (molecule-ai-workspace-template-platform-agent) has no Dockerfile and no .gitea/workflows publish job (only test-identity-fallback.yml). That's also why #3027's drift gate had to become skip-if-absent. The image at 153263036946:molecule-ai/molecule-platform-agent@sha256:1c3c... appears to have been built/pushed out-of-band.

  2. No cross-account pull grant. Staging tenants (account 004947743811) pull from the prod registry 153263036946 (migration 030 dropped the staging routing hop). Runtime image repos grant cross-account pull to the staging instance role; molecule-platform-agent does not → fresh staging concierges 403 on docker pull, cloud-init scripts_user aborts, only a 218B stub config lands.

Evidence (SSM, live staging concierge box, 2026-06-18)

[user-data] pulling 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/molecule-platform-agent@sha256:1c3c... → 403 Forbidden
cc_scripts_user.py[WARNING]: Failed to run module scripts_user

Staging-only: prod tenants are in 153263036946 (same account as the image) so prod is unaffected.

Fix

  • Short term (CTO infra grant): add a cross-account ECR repository policy on 153263036946:molecule-ai/molecule-platform-agent granting ecr:BatchGetImage + ecr:GetDownloadUrlForLayer + ecr:BatchCheckLayerAvailability to arn:aws:iam::004947743811:root, mirroring an existing runtime repo's policy. (Tracked via CEO-asst approval fea5ec88.)
  • Durable: add a proper publish workflow for the platform-agent image (mirror molecule-ai-workspace-template-claude-code/.gitea/workflows/publish-image.yml) that builds, pushes, and set-repository-policy for the cross-account grant after push — so this self-heals and never silently drifts.

Unblocks

  • Honest staging end-to-end validation of the concierge create_workspace fix (PR #3049).
  • Task #48 (promote E2E Staging Platform Boot to merge-blocking) — should land only after this, else it would block all core merges on a staging infra gap.

Refs: #2385, #3027, PR #3049.

## Problem The `molecule-platform-agent` container image (the concierge runtime) has **no discoverable CI publish workflow** and its prod ECR repo lacks a cross-account pull grant. Two distinct gaps: 1. **No publish workflow.** The platform-agent template repo (`molecule-ai-workspace-template-platform-agent`) has no `Dockerfile` and no `.gitea/workflows` publish job (only `test-identity-fallback.yml`). That's also why #3027's drift gate had to become skip-if-absent. The image at `153263036946:molecule-ai/molecule-platform-agent@sha256:1c3c...` appears to have been built/pushed out-of-band. 2. **No cross-account pull grant.** Staging tenants (account `004947743811`) pull from the prod registry `153263036946` (migration 030 dropped the staging routing hop). Runtime image repos grant cross-account pull to the staging instance role; `molecule-platform-agent` does **not** → fresh staging concierges 403 on `docker pull`, cloud-init `scripts_user` aborts, only a 218B stub config lands. ## Evidence (SSM, live staging concierge box, 2026-06-18) ``` [user-data] pulling 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/molecule-platform-agent@sha256:1c3c... → 403 Forbidden cc_scripts_user.py[WARNING]: Failed to run module scripts_user ``` Staging-only: prod tenants are in `153263036946` (same account as the image) so prod is unaffected. ## Fix - **Short term (CTO infra grant):** add a cross-account ECR repository policy on `153263036946:molecule-ai/molecule-platform-agent` granting `ecr:BatchGetImage` + `ecr:GetDownloadUrlForLayer` + `ecr:BatchCheckLayerAvailability` to `arn:aws:iam::004947743811:root`, mirroring an existing runtime repo's policy. (Tracked via CEO-asst approval `fea5ec88`.) - **Durable:** add a proper publish workflow for the platform-agent image (mirror `molecule-ai-workspace-template-claude-code/.gitea/workflows/publish-image.yml`) that builds, pushes, **and** `set-repository-policy` for the cross-account grant after push — so this self-heals and never silently drifts. ## Unblocks - Honest staging end-to-end validation of the concierge `create_workspace` fix (PR #3049). - Task #48 (promote `E2E Staging Platform Boot` to merge-blocking) — should land only after this, else it would block all core merges on a staging infra gap. Refs: #2385, #3027, PR #3049.
Author
Member

Corrected root cause + fix applied (2026-06-18)

The prod repo policy was not the gap — 153263036946:molecule-ai/molecule-platform-agent already grants cross-account pull to arn:aws:iam::004947743811:root and MoleculeTenantEICRole-staging.

The real gap is the staging-side identity policy: role MoleculeTenantEICRole-staging has inline policy WorkspaceTemplatePrimaryECRPull that allowlists cross-account pull for a specific list of prod ECR repos (workspace-template-claude-code, -codex, -hermes, -openclaw, -autogen, -langgraph) — and molecule-platform-agent was missing. Cross-account ECR requires both the resource (repo) policy and the principal's identity policy to allow the action; only the identity side was missing.

Fixed: added arn:aws:ecr:us-east-2:153263036946:repository/molecule-ai/molecule-platform-agent to that inline policy (iam put-role-policy, staging account 004947743811). Verified on a fresh staging concierge: docker pull completes (0× 403), container Up, runtime logs Registered with platform: 200, cloud-init scripts_user SUCCESS.

Remaining (this issue stays open for the durable fix)

  • The allowlist is drifted / hand-maintained: still missing seo-agent and google-adk (both in the 7-template manifest) → those staging runtimes would hit the identical 403. It also carries autogen/langgraph not in the manifest.
  • Durable fix: manage WorkspaceTemplatePrimaryECRPull as IaC kept in sync with manifest.json workspace_templates + the special molecule-platform-agent image, so a new template never silently 403s on staging. (The earlier "no publish workflow for the platform-agent image" point still stands separately.)
## Corrected root cause + fix applied (2026-06-18) The prod repo policy was **not** the gap — `153263036946:molecule-ai/molecule-platform-agent` already grants cross-account pull to `arn:aws:iam::004947743811:root` and `MoleculeTenantEICRole-staging`. The real gap is the **staging-side identity policy**: role `MoleculeTenantEICRole-staging` has inline policy `WorkspaceTemplatePrimaryECRPull` that allowlists cross-account pull for a *specific list* of prod ECR repos (`workspace-template-claude-code`, `-codex`, `-hermes`, `-openclaw`, `-autogen`, `-langgraph`) — and `molecule-platform-agent` was missing. Cross-account ECR requires **both** the resource (repo) policy **and** the principal's identity policy to allow the action; only the identity side was missing. **Fixed**: added `arn:aws:ecr:us-east-2:153263036946:repository/molecule-ai/molecule-platform-agent` to that inline policy (`iam put-role-policy`, staging account 004947743811). Verified on a fresh staging concierge: `docker pull` completes (0× 403), container Up, runtime logs `Registered with platform: 200`, cloud-init `scripts_user` SUCCESS. ## Remaining (this issue stays open for the durable fix) - The allowlist is **drifted / hand-maintained**: still missing `seo-agent` and `google-adk` (both in the 7-template manifest) → those staging runtimes would hit the identical 403. It also carries `autogen`/`langgraph` not in the manifest. - **Durable fix**: manage `WorkspaceTemplatePrimaryECRPull` as IaC kept in sync with `manifest.json` `workspace_templates` + the special `molecule-platform-agent` image, so a new template never silently 403s on staging. (The earlier "no publish workflow for the platform-agent image" point still stands separately.)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3054