Some checks failed
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 17s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 30s
Harness Replays / Harness Replays (pull_request) Failing after 32s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m21s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m36s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m36s
CI / Platform (Go) (pull_request) Successful in 2m18s
Two coupled cleanups for the post-2026-05-06 stack:
============================================
The plugin injected GITHUB_TOKEN/GH_TOKEN via the App's
installation-access flow (~hourly rotation). Per-agent Gitea
identities replaced this approach after the 2026-05-06 suspension —
workspaces now provision with a per-persona Gitea PAT from .env
instead of an App-rotated token. The plugin code itself lived on
github.com/Molecule-AI/molecule-ai-plugin-github-app-auth which is
also unreachable post-suspension; checking it out at CI build time
was already failing.
Removed:
- workspace-server/cmd/server/main.go: githubappauth import + the
`if os.Getenv("GITHUB_APP_ID") != ""` block that called
BuildRegistry. gh-identity remains as the active mutator.
- workspace-server/Dockerfile + Dockerfile.tenant: COPY of the
sibling repo + the `replace github.com/Molecule-AI/molecule-ai-
plugin-github-app-auth => /plugin` directive injection.
- workspace-server/go.mod + go.sum: github-app-auth dep entry
(cleaned up by `go mod tidy`).
- 3 workflows: actions/checkout steps for the sibling plugin repo:
- .github/workflows/codeql.yml (Go matrix path)
- .github/workflows/harness-replays.yml
- .github/workflows/publish-workspace-server-image.yml
Verified `go build ./cmd/server` + `go vet ./...` pass post-removal.
=======================================================
Same workflow used to push to ghcr.io/molecule-ai/platform +
platform-tenant. ghcr.io/molecule-ai is gone post-suspension. The
operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
molecule-ai/) already hosts platform-tenant + workspace-template-*
+ runner-base images and is the post-suspension SSOT for container
images. This PR aligns publish-workspace-server-image with that
stack.
- env.IMAGE_NAME + env.TENANT_IMAGE_NAME repointed to ECR URL.
- docker/login-action swapped for aws-actions/configure-aws-
credentials@v4 + aws-actions/amazon-ecr-login@v2 chain (the
standard ECR auth pattern; uses AWS_ACCESS_KEY_ID/SECRET secrets
bound to the molecule-cp IAM user).
The :staging-<sha> + :staging-latest tag policy is unchanged —
staging-CP's TENANT_IMAGE pin still points at :staging-latest, just
with the new registry prefix.
Refs molecule-core#157, #161; parallel to org-wide CI-green sweep.
184 lines
8.6 KiB
YAML
184 lines
8.6 KiB
YAML
name: publish-workspace-server-image
|
|
|
|
# Builds and pushes Docker images to GHCR on staging or main pushes.
|
|
# EC2 tenant instances pull the tenant image from GHCR.
|
|
#
|
|
# Branch / tag policy (see Compute tags step for the per-branch logic):
|
|
#
|
|
# staging push → builds image, tags :staging-<sha> + :staging-latest.
|
|
# staging-CP pins TENANT_IMAGE=:staging-latest, so it
|
|
# picks up staging-branch code automatically. This is
|
|
# what makes staging-CP actually test staging-branch
|
|
# code instead of "yesterday's main" — pre-fix, this
|
|
# workflow only ran on main, so staging tenants
|
|
# silently served stale code (#2308 fix RFC #2312
|
|
# landed on staging but never reached tenants because
|
|
# staging→main was wedged on path-filter parity bugs).
|
|
#
|
|
# main push → builds image, tags :staging-<sha> + :staging-latest
|
|
# (same as before). canary-verify.yml retags
|
|
# :staging-<sha> → :latest after canary tenants
|
|
# green-light the digest. The :staging-latest retag
|
|
# on main push is intentional: when main lands AFTER a
|
|
# staging push, staging-CP gets the post-promote code
|
|
# (which equals what it had + any merge resolution),
|
|
# so the canary-on-staging-CP step still runs against
|
|
# the prod-bound digest.
|
|
#
|
|
# In the steady state both branches refresh :staging-latest; the
|
|
# semantic is "most recent staging-or-main build of tenant code."
|
|
# Drift between the two is bounded by the staging→main auto-promote
|
|
# cadence and is corrected on the next staging push.
|
|
|
|
on:
|
|
push:
|
|
branches: [staging, main]
|
|
paths:
|
|
- 'workspace-server/**'
|
|
- 'canvas/**'
|
|
- 'manifest.json'
|
|
- '.github/workflows/publish-workspace-server-image.yml'
|
|
workflow_dispatch:
|
|
|
|
# Serialize per-branch so two rapid staging pushes don't race the same
|
|
# :staging-latest tag retag. Allow staging and main to run in parallel
|
|
# (different github.ref → different concurrency group) since they
|
|
# produce different :staging-<sha> tags and last-write-wins on
|
|
# :staging-latest is acceptable across branches (the post-promote
|
|
# main code equals current staging code in a healthy flow).
|
|
#
|
|
# cancel-in-progress: false → in-flight builds finish; the next push's
|
|
# build queues. This avoids a partially-pushed image and keeps the
|
|
# canary fleet pin (:staging-<sha>) consistent with what was actually
|
|
# tested at canary-verify time.
|
|
concurrency:
|
|
group: publish-workspace-server-image-${{ github.ref }}
|
|
cancel-in-progress: false
|
|
|
|
permissions:
|
|
contents: read
|
|
packages: write
|
|
|
|
env:
|
|
IMAGE_NAME: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform
|
|
TENANT_IMAGE_NAME: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant
|
|
|
|
jobs:
|
|
build-and-push:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
|
|
# github-app-auth sibling-checkout removed 2026-05-07 (#157):
|
|
# plugin was dropped + workspace-server/Dockerfile no longer
|
|
# COPYs it.
|
|
|
|
- name: Configure AWS credentials for ECR
|
|
# GHCR was the pre-suspension target; the molecule-ai org on
|
|
# GitHub got swept 2026-05-06 and ghcr.io/molecule-ai/* is no
|
|
# longer reachable. Post-suspension target is the operator's
|
|
# ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
|
|
# molecule-ai/*), which already hosts platform-tenant +
|
|
# workspace-template-* + runner-base images. AWS creds come
|
|
# from the AWS_ACCESS_KEY_ID/SECRET secrets bound to the
|
|
# molecule-cp IAM user. Closes #161.
|
|
uses: aws-actions/configure-aws-credentials@v4
|
|
with:
|
|
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
|
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
|
aws-region: us-east-2
|
|
|
|
- name: Log in to ECR
|
|
id: ecr-login
|
|
uses: aws-actions/amazon-ecr-login@v2
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
|
|
|
|
- name: Compute tags
|
|
id: tags
|
|
run: |
|
|
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
|
|
|
|
# Canary-gated release flow:
|
|
# - This step always publishes :staging-<sha> + :staging-latest.
|
|
# - On staging push, staging-CP picks up :staging-latest immediately
|
|
# (its TENANT_IMAGE pin is :staging-latest) — so staging-branch
|
|
# code reaches staging tenants without waiting for main.
|
|
# - On main push, canary-verify.yml runs smoke tests against
|
|
# canary tenants (which pin :staging-<sha>), and on green retags
|
|
# :staging-<sha> → :latest. Prod tenants pull :latest.
|
|
# - On red, :latest stays on the prior good digest — prod is safe.
|
|
#
|
|
# Why :staging-latest is retagged on main push too: when main lands
|
|
# after a staging promote, staging-CP gets the post-promote code so
|
|
# the canary-on-staging-CP step still runs against the prod-bound
|
|
# digest. In a healthy flow the post-promote main code == the
|
|
# current staging code, so this is effectively a no-op except for
|
|
# the canary fleet pin handoff.
|
|
#
|
|
# Pre-fix history: this workflow used to only trigger on main. That
|
|
# meant staging-CP served "yesterday's main" indefinitely whenever
|
|
# staging→main was wedged. The 2026-04-30 dogfooding session
|
|
# surfaced this when RFC #2312 (chat upload HTTP-forward) landed on
|
|
# staging but staging tenants kept failing chat upload because they
|
|
# were running pre-RFC code. Adding the staging trigger above closes
|
|
# that gap. Earlier 2026-04-24 incident: a static :staging-<sha> pin
|
|
# drifted 10 days behind staging — same class of bug, different
|
|
# mechanism.
|
|
- name: Build & push platform image to GHCR (staging-<sha> + staging-latest)
|
|
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
|
|
with:
|
|
context: .
|
|
file: ./workspace-server/Dockerfile
|
|
platforms: linux/amd64
|
|
push: true
|
|
tags: |
|
|
${{ env.IMAGE_NAME }}:staging-${{ steps.tags.outputs.sha }}
|
|
${{ env.IMAGE_NAME }}:staging-latest
|
|
cache-from: type=gha
|
|
cache-to: type=gha,mode=max
|
|
# GIT_SHA bakes into the Go binary via -ldflags so /buildinfo
|
|
# returns it at runtime — see Dockerfile + buildinfo/buildinfo.go.
|
|
# This is the same value as the OCI revision label below; passing
|
|
# it twice is intentional, the OCI label is for registry tooling
|
|
# while /buildinfo is for the redeploy verification step.
|
|
build-args: |
|
|
GIT_SHA=${{ github.sha }}
|
|
labels: |
|
|
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
|
org.opencontainers.image.revision=${{ github.sha }}
|
|
org.opencontainers.image.description=Molecule AI platform (Go API server) — pending canary verify
|
|
|
|
- name: Build & push tenant image to GHCR (staging-<sha> + staging-latest)
|
|
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
|
|
with:
|
|
context: .
|
|
file: ./workspace-server/Dockerfile.tenant
|
|
platforms: linux/amd64
|
|
push: true
|
|
tags: |
|
|
${{ env.TENANT_IMAGE_NAME }}:staging-${{ steps.tags.outputs.sha }}
|
|
${{ env.TENANT_IMAGE_NAME }}:staging-latest
|
|
cache-from: type=gha
|
|
cache-to: type=gha,mode=max
|
|
# Canvas uses same-origin fetches. The tenant Go platform
|
|
# reverse-proxies /cp/* to the SaaS CP via its CP_UPSTREAM_URL
|
|
# env; the tenant's /canvas/viewport, /approvals/pending,
|
|
# /org/templates etc. live on the tenant platform itself.
|
|
# Both legs share one origin (the tenant subdomain) so
|
|
# PLATFORM_URL="" forces canvas to fetch paths as relative,
|
|
# which land same-origin.
|
|
#
|
|
# Self-hosted / private-label deployments override this at
|
|
# build time with a specific backend (e.g. local dev:
|
|
# NEXT_PUBLIC_PLATFORM_URL=http://localhost:8080).
|
|
build-args: |
|
|
NEXT_PUBLIC_PLATFORM_URL=
|
|
GIT_SHA=${{ github.sha }}
|
|
labels: |
|
|
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
|
org.opencontainers.image.revision=${{ github.sha }}
|
|
org.opencontainers.image.description=Molecule AI tenant platform + canvas — pending canary verify
|