chore: reconcile main → staging post-suspension divergence (Task #165 followup) #48
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "chore/reconcile-main-staging-divergence"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Reconcile main and staging after they diverged following the
2026-05-06 GitHub-org suspension. Class D (#26) and Class G (#40,
#42, #44) landed on staging while ECR/buildx fixes (#34-47) landed
straight on main, so every push to main triggered an Auto-sync run
that failed at
git merge --no-ff origin/mainwith 7 contentconflicts.
This is a one-time human-authored reconciliation merge: staging's
post-suspension Gitea/ECR migrations win on URL/policy edits, main's
additive work (mock-bigorg manifest entry, inline ECR auth,
MOLECULE_GITEA_TOKEN basic-auth path in clone-manifest.sh) is
preserved on top.
After this lands, staging is a strict superset of main, and the
next auto-sync on a push to main is a clean fast-forward / no-op.
The auto-sync workflow on main also picks up staging's
AUTO_SYNC_TOKEN swap (Class D #26) for free, fixing the latent
layer-2 push-auth issue.
Why Class D (Task #165) didn't fix it
Class D registered AUTO_SYNC_TOKEN + updated workflow YAML on
staging only. The auto-sync workflow runs from main when push
to main fires, and main's copy still references
secrets.GITHUB_TOKEN.But the failure is even higher up the stack:
git merge --no-ffaborts on real content conflicts before any token is needed. Class D
was the right token-plumbing fix for a different layer.
Failing run (pre-fix)
https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/1024
fatal: Not possible to fast-forward, aborting.Automatic merge failed; fix conflicts and then commit the result.7 conflicts: canary-verify.yml, ci.yml, publish-runtime.yml,
publish-workspace-server-image.yml, retarget-main-to-staging.yml,
manifest.json, scripts/clone-manifest.sh.
Test plan
bash -n scripts/clone-manifest.shpython3 -c 'yaml.safe_load(...)'on every touched workflowpython3 -c 'json.load(open("manifest.json"))'— 21 plugins, 9 templates, 7 org_templatesHostile self-review
Could the publish-workspace-server-image resolution silently break
ECR push? I dropped the action-based AWS configure / ECR login /
buildx setup steps from staging in favor of main's "moved inline"
comment. The actual
aws ecr get-login-password | docker login+docker build+docker pushlines BELOW the conflict block werealready auto-merged from both sides into the same content, so the
build steps still work. But staging-CP currently uses the action-based
path and was working — so something about main's inline approach
was either equivalent or strictly better. Worth a smoke-test on the
first publish-workspace-server-image run after this lands.
Did I miss a conflict by accepting either side wholesale?
Specifically:
publish-runtime.ymlcascade — staging has the richGitea push retry loop, main has the legacy GitHub repo-dispatch
curl. I kept staging entirely. If main had a divergent fix INSIDE
the staging block (e.g. a SKIPPED-list regression), it's gone. I
did NOT see such a fix in
git log origin/staging..origin/main -- .github/workflows/publish-runtime.yml, but the audit was line-by-lineon the conflict region only.
Manifest org_templates list — does mock-bigorg actually exist on
Gitea? I added it to manifest.json with a lowercased slug, on the
assumption that the org template repo was migrated alongside the
others. If the repo isn't there yet, the next Pre-clone manifest deps
step in publish-workspace-server-image will fail with HTTP 404. The
risk is the next image build, not the merge itself; if missing,
open a follow-up to either provision the Gitea mirror or remove the
line.
🤖 Generated with Claude Code
Two coupled cleanups for the post-2026-05-06 stack: ============================================ The plugin injected GITHUB_TOKEN/GH_TOKEN via the App's installation-access flow (~hourly rotation). Per-agent Gitea identities replaced this approach after the 2026-05-06 suspension — workspaces now provision with a per-persona Gitea PAT from .env instead of an App-rotated token. The plugin code itself lived on github.com/Molecule-AI/molecule-ai-plugin-github-app-auth which is also unreachable post-suspension; checking it out at CI build time was already failing. Removed: - workspace-server/cmd/server/main.go: githubappauth import + the `if os.Getenv("GITHUB_APP_ID") != ""` block that called BuildRegistry. gh-identity remains as the active mutator. - workspace-server/Dockerfile + Dockerfile.tenant: COPY of the sibling repo + the `replace github.com/Molecule-AI/molecule-ai- plugin-github-app-auth => /plugin` directive injection. - workspace-server/go.mod + go.sum: github-app-auth dep entry (cleaned up by `go mod tidy`). - 3 workflows: actions/checkout steps for the sibling plugin repo: - .github/workflows/codeql.yml (Go matrix path) - .github/workflows/harness-replays.yml - .github/workflows/publish-workspace-server-image.yml Verified `go build ./cmd/server` + `go vet ./...` pass post-removal. ======================================================= Same workflow used to push to ghcr.io/molecule-ai/platform + platform-tenant. ghcr.io/molecule-ai is gone post-suspension. The operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/ molecule-ai/) already hosts platform-tenant + workspace-template-* + runner-base images and is the post-suspension SSOT for container images. This PR aligns publish-workspace-server-image with that stack. - env.IMAGE_NAME + env.TENANT_IMAGE_NAME repointed to ECR URL. - docker/login-action swapped for aws-actions/configure-aws- credentials@v4 + aws-actions/amazon-ecr-login@v2 chain (the standard ECR auth pattern; uses AWS_ACCESS_KEY_ID/SECRET secrets bound to the molecule-cp IAM user). The :staging-<sha> + :staging-latest tag policy is unchanged — staging-CP's TENANT_IMAGE pin still points at :staging-latest, just with the new registry prefix. Refs molecule-core#157, #161; parallel to org-wide CI-green sweep.Adds a 'mock' runtime: virtual workspaces with no container, no EC2, no LLM. Every A2A reply is synthesised from a small canned-variant pool ('On it!', 'Got it, on it now.', etc.) deterministically seeded by (workspace_id, request_id). Built for funding-demo "200-workspace mock org" — renders an enterprise-scale org chart on the canvas (CEO/VPs/Managers/ICs) without burning real LLM credits or provisioning 200 EC2 instances. Surfaces: - workspace-server/internal/handlers/mock_runtime.go: A2A proxy short-circuit, canned-reply pool, deterministic variant pick. - workspace-server/internal/handlers/a2a_proxy.go: gate the short-circuit before resolveAgentURL (mock has no URL). - workspace-server/internal/handlers/org_import.go: skip Docker provisioning for mock workspaces, set status='online' directly, drop the per-sibling 2s pacing for mock children (collapses a 200-workspace import from ~7min → ~1s). - workspace-server/internal/handlers/runtime_registry.go: register 'mock' in the runtime allowlist (manifest + fallback set). - workspace-server/internal/registry/healthsweep.go + orphan_sweeper.go: skip mock workspaces in container-health and stale-token sweeps (no container by design). - workspace-server/internal/handlers/workspace_restart.go: mirror the 'external' Restart no-op for mock. - manifest.json: register the new Molecule-AI/molecule-ai-org-template-mock-bigorg repo. Tests: 5 new in mock_runtime_test.go covering happy-path, non-mock regression guard, determinism, IsMockRuntime trim/case, JSON-RPC id echo. All existing handler + registry tests still pass. Local-verified: imported the 200-workspace template against a fresh postgres+redis, confirmed all 200 land in 'online' and stay there through the 30s health-sweep window, exercised A2A on CEO + VPs + Managers + ICs and saw the variant pool rotate. Org template lives at Molecule-AI/molecule-ai-org-template-mock-bigorg (created today) and is imported via the existing /org/import flow on the canvas Template Palette. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Reviewed: conflict resolutions are minimal + line-aligned with the diverged intents. URL/policy edits favor the post-suspension Gitea+ECR path; main-side additive changes (mock-bigorg, MOLECULE_GITEA_TOKEN auth, inline ECR auth) preserved on top. AUTO_SYNC_TOKEN scope unchanged. YAML+JSON+shell parsers all green locally. Approving the one-time reconciliation; the underlying auto-sync workflow path is unchanged so this is bounded-blast.