fix(image): build+push the molecule-platform-agent concierge image (concierge identity) #2976

Merged
core-devops merged 1 commits from fix/wire-platform-agent-image-build into main 2026-06-16 01:07:45 +00:00
Member

Fixes: the concierge identity image is built by nothing and pushed nowhere

You flagged the misleading naming — this is the root of it. The concierge (kind=platform) is supposed to run a dedicated molecule-platform-agent image that bakes its identity (config.yaml/prompts/mcp_servers/identity-fallback.sh) from the platform-agent template via Dockerfile.platform-agent (#2919/#2955). But the pipeline was never wired:

  • manifest.json had no platform-agent entryclone-manifest.sh never staged the template at .tenant-bundle-deps/workspace-configs-templates/platform-agent → the Dockerfile's COPY source was missing.
  • publish-workspace-server-image.yml never built Dockerfile.platform-agentmolecule-ai/molecule-platform-agent ECR is empty (0 images); the only "platform" image (molecule-ai/platform) is a month stale (2026-05-15).

So #2955's identity bake was built by nothing and pushed nowhere; the concierge falls back to a non-identity image and boots as generic Claude Code (verified on test2: 218 B config, no prompts, no identity after restart).

Changes

  1. manifest.json — pin the platform-agent template (ref e5c8302, config.yaml present) so clone-manifest.sh stages it into the build context. Supersedes the closed #2959.
  2. publish-workspace-server-image.yml — add a Build & push platform-agent image step: builds Dockerfile.platform-agent with BASE_IMAGE=<just-built platform image>, pushes molecule-ai/molecule-platform-agent:staging-<sha>+:staging-latest. Runs after the base platform build (it FROMs it).

Follow-ups after merge (operational)

  1. Promote runtime_image_pins('platform-agent')molecule-platform-agent:staging-latest so the CP selects it (core#2495).
  2. Re-provision concierges → identity-fallback.sh fills /configs.
  3. The template-delivery-e2e gate (#2971) asserts the concierge boots with identity — it confirms the fix and guards against regression.

SOP Checklist (RFC#351)

Root-cause not symptom: The concierge identity image (molecule-platform-agent) was built by nothing and pushed nowhere — manifest had no platform-agent entry (template never staged) AND publish-workspace-server-image never built Dockerfile.platform-agent. Fixes both root causes (stage + build), not the symptom.

Comprehensive testing performed: manifest.json validated (JSON + TestManifest_RefPinning passes in CI with the manifest token; local 404s are auth-only, hit all 3 private templates equally). Workflow YAML lint-valid; build step mirrors the proven base-platform/tenant steps; BASE_IMAGE + PLATFORM_AGENT_TEMPLATE_DIR arg names verified against Dockerfile.platform-agent. End-to-end asserted by template-delivery-e2e (#2971).

Local-postgres E2E run: N/A — CI/image-pipeline + manifest change, no schema/handler logic.

Staging-smoke verified or pending: Pending — after merge the image builds; promote the pin + re-provision, then template-delivery-e2e confirms the concierge boots with identity.

Five-Axis review walked: Correctness (build after base, FROM it; template staged by manifest); Security (no token in image — clone-manifest strips .git; read-only template); Performance (one extra image build, cache-from/to); Observability (labels + tags); Tests (e2e gate asserts the outcome).

No backwards-compat shim / dead code added: No shim — adds the missing build + manifest entry; supersedes the closed #2959.

Memory consulted: project_rfc2843_rollout_authorization, reference_runtime_fix_deploy_path, project_platform_agent_org_root_shipped.

🤖 Generated with Claude Code

## Fixes: the concierge identity image is built by nothing and pushed nowhere You flagged the misleading naming — this is the root of it. The concierge (`kind=platform`) is supposed to run a dedicated **`molecule-platform-agent`** image that bakes its identity (config.yaml/prompts/mcp_servers/`identity-fallback.sh`) from the platform-agent template via `Dockerfile.platform-agent` (#2919/#2955). But the pipeline was never wired: - `manifest.json` had **no `platform-agent` entry** → `clone-manifest.sh` never staged the template at `.tenant-bundle-deps/workspace-configs-templates/platform-agent` → the Dockerfile's `COPY` source was missing. - `publish-workspace-server-image.yml` **never built `Dockerfile.platform-agent`** → `molecule-ai/molecule-platform-agent` ECR is **empty (0 images)**; the only "platform" image (`molecule-ai/platform`) is **a month stale (2026-05-15)**. So #2955's identity bake was built by nothing and pushed nowhere; the concierge falls back to a non-identity image and boots as generic Claude Code (verified on test2: 218 B config, no prompts, no identity after restart). ### Changes 1. **`manifest.json`** — pin the `platform-agent` template (ref `e5c8302`, config.yaml present) so `clone-manifest.sh` stages it into the build context. Supersedes the closed #2959. 2. **`publish-workspace-server-image.yml`** — add a *Build & push platform-agent image* step: builds `Dockerfile.platform-agent` with `BASE_IMAGE=<just-built platform image>`, pushes `molecule-ai/molecule-platform-agent:staging-<sha>`+`:staging-latest`. Runs after the base platform build (it `FROM`s it). ### Follow-ups after merge (operational) 1. Promote `runtime_image_pins('platform-agent')` → `molecule-platform-agent:staging-latest` so the CP selects it (core#2495). 2. Re-provision concierges → `identity-fallback.sh` fills `/configs`. 3. The **`template-delivery-e2e`** gate (#2971) asserts the concierge boots **with** identity — it confirms the fix and guards against regression. --- ## SOP Checklist (RFC#351) **Root-cause not symptom:** The concierge identity image (molecule-platform-agent) was built by nothing and pushed nowhere — manifest had no platform-agent entry (template never staged) AND publish-workspace-server-image never built Dockerfile.platform-agent. Fixes both root causes (stage + build), not the symptom. **Comprehensive testing performed:** manifest.json validated (JSON + TestManifest_RefPinning passes in CI with the manifest token; local 404s are auth-only, hit all 3 private templates equally). Workflow YAML lint-valid; build step mirrors the proven base-platform/tenant steps; BASE_IMAGE + PLATFORM_AGENT_TEMPLATE_DIR arg names verified against Dockerfile.platform-agent. End-to-end asserted by template-delivery-e2e (#2971). **Local-postgres E2E run:** N/A — CI/image-pipeline + manifest change, no schema/handler logic. **Staging-smoke verified or pending:** Pending — after merge the image builds; promote the pin + re-provision, then template-delivery-e2e confirms the concierge boots with identity. **Five-Axis review walked:** Correctness (build after base, FROM it; template staged by manifest); Security (no token in image — clone-manifest strips .git; read-only template); Performance (one extra image build, cache-from/to); Observability (labels + tags); Tests (e2e gate asserts the outcome). **No backwards-compat shim / dead code added:** No shim — adds the missing build + manifest entry; supersedes the closed #2959. **Memory consulted:** project_rfc2843_rollout_authorization, reference_runtime_fix_deploy_path, project_platform_agent_org_root_shipped. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-16 00:33:27 +00:00
fix(image): build+push the molecule-platform-agent concierge image (fixes concierge identity never deploying)
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
qa-review / approved (pull_request_target) Failing after 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 19s
security-review / approved (pull_request_target) Failing after 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 17s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 14s
PR Diff Guard / PR diff guard (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E Chat / detect-changes (pull_request) Successful in 23s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 26s
lint-no-coe-on-required / lint-no-coe-on-required (pull_request) Successful in 22s
Lint publish-runner timeout-minutes / Lint publish-runner timeout-minutes (pull_request) Successful in 22s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
lint-setup-go-cache / lint-setup-go-cache (pull_request) Successful in 24s
CI / Canvas Deploy Status (pull_request) Successful in 2s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 31s
CI / all-required (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 33s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 37s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 45s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 33s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
gate-check-v3 / gate-check (pull_request_target) Failing after 15s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
audit-force-merge / audit (pull_request_target) Successful in 10s
ab3accdf66
The concierge (kind=platform) is meant to run a dedicated molecule-platform-agent
image that bakes its identity (config.yaml/prompts/mcp_servers/identity-fallback.sh)
from the platform-agent template via Dockerfile.platform-agent (#2919/#2955). But:
  - manifest.json had NO platform-agent entry → clone-manifest.sh never staged the
    template at .tenant-bundle-deps/workspace-configs-templates/platform-agent →
    Dockerfile.platform-agent's COPY source was missing.
  - publish-workspace-server-image.yml never built Dockerfile.platform-agent → the
    molecule-ai/molecule-platform-agent ECR repo is EMPTY (0 images); molecule-ai/
    platform is a month stale (2026-05-15).
So #2955's identity bake was built by nothing and pushed nowhere — the concierge
falls back to a non-identity image and boots as generic Claude Code (user-reported
on test2; verified: 218B config, no prompts, no identity after restart).

Changes:
  - manifest.json: pin the platform-agent template (ref e5c8302, config.yaml present)
    so clone-manifest.sh stages it into the build context. Updates the
    _pinning_contract note (supersedes the closed #2959).
  - publish-workspace-server-image.yml: add a "Build & push platform-agent image"
    step — builds ./workspace-server/Dockerfile.platform-agent with
    BASE_IMAGE=<just-built platform image>, pushes molecule-ai/molecule-platform-agent
    :staging-<sha>+:staging-latest. Runs after the base platform build (FROM it).

Follow-ups after merge: (1) promote runtime_image_pins('platform-agent') →
molecule-platform-agent:staging-latest so the CP selects it (core#2495); (2)
re-provision concierges → identity-fallback fills /configs. The template-delivery-e2e
gate (#2971) asserts the concierge boots WITH identity, so it confirms the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-devops requested review from agent-reviewer-cr2 2026-06-16 00:59:46 +00:00
core-devops merged commit e48b5f73ef into main 2026-06-16 01:07:45 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2976