molecule-core

Author	SHA1	Message	Date
claude-ceo-assistant	e4e1bf4080	ci(canary): annotate EXPECTED_PERSONA dual-update constraint Hostile-self-review weakest-spot #2: if the devops-engineer persona is ever renamed, the canary will go red even if everything else is fine. Add an inline comment pointing the next editor at both files that must update together (auto-sync-main-to-staging.yml's git config + this canary's EXPECTED_PERSONA + the staging branch protection's push_whitelist_usernames). No behaviour change — comment-only.	2026-05-07 15:35:22 -07:00
claude-ceo-assistant	62629eda4a	ci(canary): rewrite Probe 3 to actually validate auth (NOP push --dry-run) While verifying Phase 4, found a real flaw in Probe 3 (`git ls-remote refs/heads/staging`). On a public repo (which molecule-core is), Gitea falls back to anonymous read on bad auth, so `ls-remote` succeeds even with a junk token. The probe was therefore green-lighting rotated tokens — false-green, the worst possible canary failure mode. Rewritten to use `git push --dry-run` of the current staging SHA back to `refs/heads/staging`: - Push always authenticates (auth-gated on smart-protocol handshake, before the dry-run can compute the empty-diff). - NOP by construction: pushing the current tip back to itself is "Everything up-to-date" with exit 0. - Bad token → "Authentication failed", exit 128. - Doesn't reach pre-receive (where branch-protection authz runs), so scope is "auth only" — matches the design intent (failure mode B); authz already covered daily by branch-protection-drift.yml. Implementation note: `git push` requires a local repo. Spinning up a fresh `git init` in a tempdir (~1KB, ~50ms) instead of pulling the full repo via actions/checkout — actions/checkout would clone ~hundreds of MB for what amounts to "a place to run git from." Local mutation tests pass: - Real token: "Everything up-to-date" exit 0 - Junk token: "Authentication failed" exit 128 with actionable ::error:: messages pointing at the runbook Header comment + runbook step-mapping updated to reflect new probe shape. Refs: #72	2026-05-07 15:34:34 -07:00
claude-ceo-assistant	0cef033a6a	ci(canary): route curl -w to tempfile to satisfy status-capture lint The two API probes used the unsafe shape rejected by lint-curl-status-capture.yml (per feedback_curl_status_capture_pollution): status=$(curl ... -w '%{http_code}' ... \|\| echo "000") When curl exits non-zero (transport error, --fail-with-body 4xx/5xx), the `-w` already wrote a code; the `\|\| echo "000"` then APPENDS another "000", yielding "000000" or "409000" — passes shape checks while looking right. Switch to the canonical safe shape (set +e + tempfile + cat): set +e curl ... -w '%{http_code}' >code_file 2>/dev/null set -e status=$(cat code_file 2>/dev/null \|\| true) [ -z "$status" ] && status="000" Inline comment in both probe steps explains the lint constraint so the next editor doesn't re-introduce the bad pattern. Refs: #72, lint failure on PR #77 (1/22 red → 22/22 expected green)	2026-05-07 15:26:22 -07:00
claude-ceo-assistant	bfc393c065	ci: add AUTO_SYNC_TOKEN rotation drift canary (#72 ) Adds a 6h-cron synthetic check that fires the auth surface used by auto-sync-main-to-staging.yml (PR #66) and emits a red workflow status when AUTO_SYNC_TOKEN has drifted out of validity. Closes hostile-self-review weakest-spot #3 from PR #66 (token-rotation detection latency). Read-only verification — no writes, no synthetic merge commits, no canary branch noise. Three probes: 1. GET /api/v1/user → token authenticates as devops-engineer 2. GET /api/v1/repos/molecule-ai/molecule-core → read:repository scope 3. git ls-remote refs/heads/staging → exact HTTPS auth path used by actions/checkout in the real auto-sync workflow Hard-fail on missing AUTO_SYNC_TOKEN secret on both schedule and workflow_dispatch — per feedback_schedule_vs_dispatch_secrets_hardening, a silent soft-skip would make the canary itself drift-invisible (the sweep-cf-orphans #2088 lesson). Operator runbook in workflow header. Token reuse: same AUTO_SYNC_TOKEN as the workflow under monitor; no new credential introduced. Read-only paths only. Refs: #72, hostile-self-review #66	2026-05-07 15:23:03 -07:00
devops-engineer	f8a238dfdd	chore: second auto-sync verification (post-#66/#67) (#68 )	2026-05-07 22:11:30 +00:00
devops-engineer	3f68ac1fcb	chore: second consecutive trigger for auto-sync verification (post-#66/#67)	2026-05-07 15:10:40 -07:00
devops-engineer	5efa92fbc6	chore: verify auto-sync main→staging post-#66 (#67 )	2026-05-07 22:10:04 +00:00
devops-engineer	f0664264cb	chore: empty commit to verify auto-sync main→staging post-#66	2026-05-07 15:09:18 -07:00
devops-engineer	7b194eb1aa	fix(ci): rewrite auto-sync main→staging for Gitea direct push (#66 , closes #65 )	2026-05-07 22:07:00 +00:00
devops-engineer	6235ef7461	fix(ci): rewrite auto-sync main→staging for Gitea direct push Root cause of `Auto-sync main → staging / sync-staging (push)` failing every push to main since the GitHub→Gitea migration: The workflow assumed a GitHub `merge_queue` ruleset on staging (blocking direct push) and used `gh pr create` + `gh pr merge --auto` to land sync via the queue. On Gitea this fails at the `gh pr create` step with `HTTP 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)` — Gitea exposes no GraphQL endpoint, and the GitHub-CLI cannot ship PRs against Gitea. Verified failure mode in run 1117/job 0 (token logs at /tmp/log2.txt, run target /molecule-ai/molecule-core/actions/ runs/1117/jobs/0). The merge step succeeded and pushed auto-sync/main-1e1f4d63; the PR step failed with the 405. So every main push left an orphan auto-sync/* branch and a red CI status, with no PR to land it. Fix: the staging branch protection on Gitea (`enable_push: true`, `push_whitelist_usernames: [devops-engineer]`) already permits direct push from the devops-engineer persona. Drop the entire merge-queue PR architecture and replace with: 1. Checkout staging with secrets.AUTO_SYNC_TOKEN (devops-engineer persona token, NOT founder PAT — `feedback_per_agent_gitea_identity_default`). 2. `git fetch origin main` + ff-merge or no-ff merge. 3. `git push origin staging` directly. The AUTO_SYNC_TOKEN repo secret already exists (created 2026-05-07 14:00 alongside the staging push_whitelist update). Workflow name + job name unchanged → required-check name `Auto-sync main → staging / sync-staging (push)` keeps the same context, no branch-protection edits needed. Rejected alternatives (documented in workflow header): - Reuse PR architecture via Gitea REST: ~80 LOC of API plumbing for no benefit; direct push works. - GH_HOST=git.moleculesai.app: still calls /api/graphql, same 405; doesn't fix the root issue. - Custom JS action: external dep for a 5-line `git push`. Header comment in the workflow now documents: - What this workflow does (SSOT for staging advancing). - Why direct push (GitHub merge_queue → Gitea push_whitelist). - Identity and token (anti-bot-ring per saved memory). - Failure modes A–D with operator runbook for each. - Loop safety (push to staging doesn't fire push:main → no recursion). Verification plan: this fix-PR's merge to main is itself the trigger; watch the workflow run on the merge commit and on one follow-up trigger commit, expect both green. Refs: failing run https://git.moleculesai.app/molecule-ai/ molecule-core/actions/runs/1117/jobs/0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:04:12 -07:00
claude-ceo-assistant	1e1f4d635b	fix(ci): convert CodeQL workflow to no-op stub on Gitea (#156 ) (#51 ) Closes #156. Touches #142. Approved-by: security-auditor	2026-05-07 21:37:04 +00:00
claude-ceo-assistant	3a00dd236f	fix(ci): convert CodeQL workflow to no-op stub on Gitea (#156 ) Why --- PR #35 marked `continue-on-error: true` at the JOB level (correct YAML), but Gitea Actions 1.22.6 does NOT propagate job-level continue-on-error to the commit-status API — every matrix leg still posts `failure`. That keeps OVERALL=failure on every push to main + staging and blocks the auto-promote signal even when every other gate is green. Worse: the underlying CodeQL run never actually worked on Gitea. The github/codeql-action/init@v4 step calls api.github.com bundle endpoints (CLI download + query packs + telemetry) that Gitea does NOT proxy. Confirmed via live-tested run 1d/3101 on operator host: 2026-05-07T20:55:17 ::group::Run Initialize CodeQL with: languages: ${{ matrix.language }} queries: security-extended 2026-05-07T20:55:36 ::error::404 page not found 2026-05-07T20:55:50 Failure - Main Initialize CodeQL 2026-05-07T20:55:51 skipping Perform CodeQL Analysis (main skipped) 2026-05-07T20:55:51 :⚠️:No files were found at sarif-results/go/ The SARIF artifact upload was already a no-op (warning above) — the analyze step never wrote anything because init failed. So nothing of value is being lost by stubbing this out. What ---- - Convert the workflow to a single-step stub that emits success per matrix language (go, javascript-typescript, python). - Keep workflow `name: CodeQL` exactly (auto-promote-staging.yml line 67 keys on it as a workflow_run gate). - Keep job name template `Analyze (${{ matrix.language }})` and the 3-leg matrix exactly (commit-status context names + branch protection + #144 required-check-name parity). - Keep all four triggers (push / pull_request / merge_group / schedule) so merge_group required-checks parity holds. - Drop the codeql-action steps, the Autobuild step, the SARIF parse step, and the upload-artifact step — all four of those are now dead code (init can never succeed against Gitea's API surface). Policy ------ Per Hongming decision 2026-05-07 (#156): CodeQL is ADVISORY, not blocking, until a Gitea-compatible SAST pipeline lands. The header of the new workflow file documents this decision + lists the three re-enable options (self-hosted Semgrep, Sonatype, GitHub mirror) plus the compensating controls in place (secret-scan, block-internal- paths, lint-curl-status-capture, branch-protection-drift). Closes #156. Touches #142 (no capital-M Molecule-AI refs in this file — already lowercase per `e01077be`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:26:57 -07:00
claude-ceo-assistant	0276b295cc	Merge pull request 'chore(ci): retrigger publish-workspace-server-image after ECR repo create (#173 )' (#47 ) from chore/issue173-retrigger-after-ecr-repo-create into main	2026-05-07 20:54:53 +00:00
devops-engineer	194cdf012b	chore(ci): retrigger publish-workspace-server-image after ECR repo create (#173 ) Run #1010 (post-#46) succeeded all the way to push but failed with "repository molecule-ai/platform does not exist" — the platform image ECR repo had never been created (only platform-tenant existed). Created the repo via: aws ecr create-repository --region us-east-2 \ --repository-name molecule-ai/platform \ --image-scanning-configuration scanOnPush=true This is a one-line workflow comment to satisfy the path-filter and re-run the publish workflow against the now-existing repo. Closes #173 properly this time — pre-clone + inline ECR auth + ECR repo all in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:54:11 -07:00
claude-ceo-assistant	6b30ab6391	fix(ci): inline aws ecr get-login-password + docker login (#46 ) Closes #173 — final piece.	2026-05-07 20:49:55 +00:00
devops-engineer	f0e8d9bb23	fix(ci): inline aws ecr get-login-password + docker login (followup #173 ) CI run #987 (post-#45) showed `docker push` from shell still hits "no basic auth credentials" — `aws-actions/amazon-ecr-login@v2` writes auth to a step-scoped DOCKER_CONFIG that doesn't carry across to the next shell step on Gitea Actions. Fix: drop both `aws-actions/configure-aws-credentials@v4` and `aws-actions/amazon-ecr-login@v2`. Run `aws ecr get-login-password \| docker login` inline in the same shell step as `docker build` + `docker push`. AWS creds come from secrets via env vars, ECR token is fresh per-step (12h validity is plenty), config.json lives in the same shell process — auth state is guaranteed. This is the operator-host manual approach mapped 1:1 into CI. runner-base image already has aws-cli + docker (verified locally). Closes #173 (fifth piece — and final, this matches the manual flow exactly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:49:12 -07:00
claude-ceo-assistant	ee56443146	fix(ci): replace buildx with plain docker build+push (#45 ) Closes #173 — fourth and hopefully final piece.	2026-05-07 20:44:42 +00:00
devops-engineer	43e2d24c5b	fix(ci): replace buildx with plain docker build+push (followup #173 ) CI run #946 (post-#43) confirmed `driver: docker` doesn't fix the ECR push 401 either: buildx CLI inside the runner container talks to the operator-host docker daemon (mounted socket), but the daemon doesn't see the runner's ECR auth state, and the runner's buildx CLI doesn't attach the auth header in a way the daemon accepts. Drop buildx + build-push-action entirely. Plain `docker build` + `docker push` from the runner container works because both use the SAME docker socket + the SAME runner-container config.json (populated by `aws ecr get-login-password \| docker login` from amazon-ecr-login). Trade-off: lose multi-arch support. We only ship linux/amd64 tenant images today, so this is fine. If multi-arch becomes a requirement later, we can revisit (likely with `docker buildx create --driver=remote` pointing at an external buildkit, but that's substantial infra work; not worth it for a single-arch shop). Closes #173 (fourth piece — and hopefully last; this matches the operator-host manual approach exactly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:43:50 -07:00
claude-ceo-assistant	0b840df563	fix(ci): use docker driver for buildx + drop type=gha cache (#43 ) Closes #173 — third and final piece. Pairs with #38 and #41.	2026-05-07 20:36:01 +00:00
devops-engineer	bee4f9ea79	fix(ci): use docker driver for buildx + drop type=gha cache (followup #173 ) PR #38 + #41 fixed the Dockerfile-side clone issue. CI run #893 then revealed two Gitea-Actions-specific issues with the unchanged buildx config: 1. `failed to push: 401 Unauthorized` to ECR. Root cause: default buildx driver `docker-container` spawns a buildkit container that doesn't share the host's `~/.docker/config.json`, so the ECR auth set up by amazon-ecr-login doesn't reach the push. Fix: pin `driver: docker` so buildx delegates to the host daemon, which already has the ECR creds. 2. `dial tcp ...:41939: i/o timeout` on `_apis/artifactcache/cache`. Root cause: `cache-from/cache-to: type=gha` is GitHub-specific; Gitea Actions has no compatible artifact-cache backend, so every cache lookup fails after a 30s timeout. Fix: remove the cache-* options. Cold-build cost is <10min for 37-repo clone + Go/Node compile, acceptable. Could revisit with type=registry inline cache later if rebuilds get painful. With this + #38/#41, the workflow should run end-to-end on Gitea Actions: pre-clone -> docker build (host daemon) -> ECR push. Closes #173 (third and final piece). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:35:07 -07:00
claude-ceo-assistant	c1e32ff4a7	Merge pull request 'fix(test): drain coalesceRestart goroutines before t.Cleanup (Class H, #170 )' (#39 ) from fix/170-goroutine-bleed-test-isolation into main	2026-05-07 20:27:08 +00:00
claude-ceo-assistant	bac04dc278	fix(ci): apply pre-clone fix to platform Dockerfile too (#41 ) Closes #173 — followup to #38.	2026-05-07 20:23:33 +00:00
devops-engineer	e16d7eaa08	fix(ci): apply pre-clone fix to platform Dockerfile too (followup #173 ) The first PR (#38) only patched Dockerfile.tenant — but the workflow also builds the platform image from workspace-server/Dockerfile, which had the SAME in-image `git clone` stage. Build run #794 caught this: "process clone-manifest.sh ... exit code 128" on the platform image. Apply the same pre-clone shape to the platform Dockerfile: drop the `templates` stage, COPY from .tenant-bundle-deps/ instead. The workflow's existing "Pre-clone manifest deps" step (added in #38) already populates .tenant-bundle-deps/ before either build runs, so no workflow change needed. Self-review note: the missed-platform-Dockerfile is a Phase 1 quality miss — I read both files but only registered the tenant one as in-scope. Saved memory `feedback_orchestrator_must_verify_before_declaring_fixed` applies: should have grepped the whole workspace-server/ for "templates" stages before claiming Task #173 done. CI run #794 caught it within ~6 minutes; net cost: one followup commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:13:13 -07:00
Hongming Wang	17f1f30b3f	fix(test): drain coalesceRestart goroutines before t.Cleanup (Class H, #170 ) TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour test) was at risk of inheriting stale INSERT calls from a previous test's coalesceRestart goroutine that survived its t.Cleanup boundary. The production callsite shape is `go h.RestartByID(...)` from a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's runRestartCycle panics, coalesceRestart's deferred recover swallows it to keep the platform process alive — but in tests, nothing waits for the goroutine to fully exit. If it's still draining LogActivity-shaped work after the test returns, those INSERTs land in the next test's sqlmock connection as kind=DELEGATION_FAILED / kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected". Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that spawns coalesceRestart on a goroutine (matching production) and registers a t.Cleanup with sync.WaitGroup.Wait so the test can't declare itself done while a goroutine is still alive. Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper (previously it called coalesceRestart synchronously, which never exercised the production goroutine-survival contract). Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the regression guard: cycle blocks 150ms then panics; the test asserts t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the deferred close ran (proving the panic-recovery defer chain executed) AND state.running was cleared. Verified the assertion is real by mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL deterministically with elapsed <300µs. Per saved memory feedback_assert_exact_not_substring: the regression test asserts an exact-shape contract (elapsed >= blockFor) rather than a substring-in-output, so it discriminates between "drain works" and "drain skipped". Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_* tests. Full ./internal/handlers/... suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:13:13 -07:00
Hongming Wang	694c05552b	fix(test): drain coalesceRestart goroutines before t.Cleanup (Class H, #170 ) TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour test) was at risk of inheriting stale INSERT calls from a previous test's coalesceRestart goroutine that survived its t.Cleanup boundary. The production callsite shape is `go h.RestartByID(...)` from a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's runRestartCycle panics, coalesceRestart's deferred recover swallows it to keep the platform process alive — but in tests, nothing waits for the goroutine to fully exit. If it's still draining LogActivity-shaped work after the test returns, those INSERTs land in the next test's sqlmock connection as kind=DELEGATION_FAILED / kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected". Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that spawns coalesceRestart on a goroutine (matching production) and registers a t.Cleanup with sync.WaitGroup.Wait so the test can't declare itself done while a goroutine is still alive. Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper (previously it called coalesceRestart synchronously, which never exercised the production goroutine-survival contract). Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the regression guard: cycle blocks 150ms then panics; the test asserts t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the deferred close ran (proving the panic-recovery defer chain executed) AND state.running was cleared. Verified the assertion is real by mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL deterministically with elapsed <300µs. Per saved memory feedback_assert_exact_not_substring: the regression test asserts an exact-shape contract (elapsed >= blockFor) rather than a substring-in-output, so it discriminates between "drain works" and "drain skipped". Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_* tests. Full ./internal/handlers/... suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:04:57 -07:00
claude-ceo-assistant	948b5a0d89	fix(ci): pre-clone manifest deps in workflow, drop in-image clone (#38 ) Closes #173. Verified locally with persona PAT (37/37 repos cloned).	2026-05-07 20:01:06 +00:00
devops-engineer	a6d67b4c68	fix(ci): pre-clone manifest deps in workflow, drop in-image clone (closes #173 ) publish-workspace-server-image.yml could not run on Gitea Actions because Dockerfile.tenant's stage 3 ran `git clone` against private Gitea repos from inside the Docker build context, where no auth path exists. Every workspace-server rebuild required a manual operator-host push. Move cloning to the trusted CI context (where AUTO_SYNC_TOKEN — the devops-engineer persona PAT — is naturally available). Dockerfile.tenant now COPYs from .tenant-bundle-deps/, populated by the workflow's new "Pre-clone manifest deps" step. The Gitea token never enters the image. - scripts/clone-manifest.sh: optional MOLECULE_GITEA_TOKEN env embeds basic-auth in the clone URL; redacted in log output. Anonymous fallback preserved for future public-repo path. - .github/workflows/publish-workspace-server-image.yml: new pre-clone step before docker build; injects AUTO_SYNC_TOKEN. Fail-fast if the secret is empty. - workspace-server/Dockerfile.tenant: drop stage 3 (templates), COPY from .tenant-bundle-deps/ instead. Header documents the prereq. - .gitignore: ignore /.tenant-bundle-deps/ so a local build can't accidentally commit cloned repos. Verified locally: clone-manifest.sh with the devops-engineer persona token cloned all 37 repos (9 ws + 7 org + 21 plugins, 4.9MB after .git strip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:59:46 -07:00
claude-ceo-assistant	d2da0c8d34	Merge pull request 'fix(workspace-server): a2a-proxy preflight container check (closes #36 )' (#37 ) from fix/issue36-a2a-proxy-preflight into main	2026-05-07 18:25:07 +00:00
claude-ceo-assistant (Claude Opus 4.7 on Hongming's MacBook)	be5fbb5ad3	fix(workspace-server): a2a-proxy preflight container check (closes #36 ) Same SSOT-divergence shape as #10 / fixed in #12, but on the a2a-proxy code path. The plugin handler was routed through `provisioner.RunningContainerName`; a2a-proxy was forwarding optimistically and only catching missing containers REACTIVELY via `maybeMarkContainerDead` after the network call timed out. Result on tenants whose agent containers had been recycled (e.g. post-EC2 replace from molecule-controlplane#20): canvas waits 2-30s for the network forward to fail before getting a 503, and the workspace-server logs only "ProxyA2A forward error" without the "container is dead" signal. This PR adds a proactive `Provisioner.IsRunning` check in `proxyA2ARequest` between `resolveAgentURL` and `dispatchA2A`, gated on the conditions where we know we're talking to a sibling Docker container we own (`h.provisioner != nil` AND `platformInDocker` AND the URL was rewritten to Docker-DNS form). Three outcomes via the SSOT helper: (true, nil) → forward as today (false, nil) → fast-503 with `error="workspace container not running — restart triggered"`, `restarting=true`, `preflight=true`, plus the same offline-flip + WORKSPACE_OFFLINE broadcast + async restart that `maybeMarkContainerDead` produces (true, err) → fall through to optimistic forward (matches IsRunning's "fail-soft as alive" contract — flaky daemon must not trigger a restart cascade) The `preflight=true` flag in the response distinguishes the proactive short-circuit from the reactive `maybeMarkContainerDead` path so canvas or downstream callers can render distinct messages later. * `internal/handlers/a2a_proxy.go` — preflight call site between resolveAgentURL and dispatchA2A; gated on `h.provisioner != nil && platformInDocker && url == http://<ContainerName(id)>:port`. * `internal/handlers/a2a_proxy_helpers.go` — `preflightContainerHealth` helper. Routes through `h.provisioner.IsRunning` (which itself wraps `RunningContainerName`). Identical offline-flip side-effects as `maybeMarkContainerDead` for the dead-container case. * `internal/handlers/a2a_proxy_preflight_test.go` — 4 tests: running → nil; not-running → structured 503 + sqlmock expectations on the offline-flip + structure_events insert; transient error → nil (fail-soft); AST gate pinning the SSOT routing (mirror of #12's gate). Mutation-tested: removing the `if running { return nil }` guard makes the production code fail to compile (unused var). A subtler mutation (replacing the !running branch with `return nil`) would make TestPreflight_ContainerNotRunning_StructuredFastFail fail at runtime with sqlmock's "expected DB call did not occur." Refs: molecule-core#36. Companion to #12 (issue #10). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 11:15:08 -07:00
claude-ceo-assistant	b9ca4ad84a	Merge pull request 'fix(ci): mark CodeQL continue-on-error (advisory only) — closes #156 ' (#35 ) from fix/codeql-continue-on-error-156 into main	2026-05-07 17:26:59 +00:00
claude-ceo-assistant	b73d3bfff2	fix(ci): mark CodeQL continue-on-error (advisory only) — closes #156	2026-05-07 17:26:52 +00:00
hongming	51ea86e3ec	feat: mock runtime + mock-bigorg 200-workspace org (#34 ) Demo Mock #3 — see PR for details. Admin-merged, CI skipped per Hongming directive.	2026-05-07 15:41:06 +00:00
Hongming Wang	d64641904f	feat(workspace-server): mock runtime + mock-bigorg org template Adds a 'mock' runtime: virtual workspaces with no container, no EC2, no LLM. Every A2A reply is synthesised from a small canned-variant pool ('On it!', 'Got it, on it now.', etc.) deterministically seeded by (workspace_id, request_id). Built for funding-demo "200-workspace mock org" — renders an enterprise-scale org chart on the canvas (CEO/VPs/Managers/ICs) without burning real LLM credits or provisioning 200 EC2 instances. Surfaces: - workspace-server/internal/handlers/mock_runtime.go: A2A proxy short-circuit, canned-reply pool, deterministic variant pick. - workspace-server/internal/handlers/a2a_proxy.go: gate the short-circuit before resolveAgentURL (mock has no URL). - workspace-server/internal/handlers/org_import.go: skip Docker provisioning for mock workspaces, set status='online' directly, drop the per-sibling 2s pacing for mock children (collapses a 200-workspace import from ~7min → ~1s). - workspace-server/internal/handlers/runtime_registry.go: register 'mock' in the runtime allowlist (manifest + fallback set). - workspace-server/internal/registry/healthsweep.go + orphan_sweeper.go: skip mock workspaces in container-health and stale-token sweeps (no container by design). - workspace-server/internal/handlers/workspace_restart.go: mirror the 'external' Restart no-op for mock. - manifest.json: register the new Molecule-AI/molecule-ai-org-template-mock-bigorg repo. Tests: 5 new in mock_runtime_test.go covering happy-path, non-mock regression guard, determinism, IsMockRuntime trim/case, JSON-RPC id echo. All existing handler + registry tests still pass. Local-verified: imported the 200-workspace template against a fresh postgres+redis, confirmed all 200 land in 'online' and stay there through the 30s health-sweep window, exercised A2A on CEO + VPs + Managers + ICs and saw the variant pool rotate. Org template lives at Molecule-AI/molecule-ai-org-template-mock-bigorg (created today) and is imported via the existing /org/import flow on the canvas Template Palette. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 08:40:37 -07:00
claude-ceo-assistant	70104d1cef	Merge pull request #33 from molecule-ai/feat/demo-mock-1-purchase-success-modal feat(canvas): demo Mock #1 — purchase-success modal Per Hongming directive: skip CI for 2h, admin-merge for funding demo.	2026-05-07 15:32:55 +00:00
Hongming Wang	a37a4a6e40	feat(canvas): demo Mock #1 — purchase-success modal on URL flag Funding-demo Mock #1: when the canvas loads with `?purchase_success=1`, show a centred success modal in the warm-paper theme. Auto-dismisses after 5s; Close button + Esc + backdrop click also dismiss; URL params are stripped on first paint so a refresh after dismiss does not re-trigger. Mounted in `app/layout.tsx` (not `app/page.tsx`) so the modal persists across the canvas page-state transitions (loading → hydrated → error) without unmounting and losing its open-state. No real billing logic — the marketplace "Purchase" button on the landing page redirects here with the flag; this modal is the only thing the user sees of the "transaction". Local-verified end-to-end via playwright (5/5 tests pass): redirect URL shape, modal visibility, URL cleanup, close button, refresh-after- dismiss behaviour, 5s auto-dismiss. Pairs with the Purchase button added to landingpage Marketplace section.	2026-05-07 08:32:35 -07:00
claude-ceo-assistant	85b09659e6	Merge pull request 'fix(ci): add scripts/** to publish-workspace-server-image path filter' (#32 ) from fix/publish-path-filter-add-scripts into main	2026-05-07 15:19:12 +00:00
devops-engineer	6de3c1ccd2	fix(ci): add scripts/** to publish-workspace-server-image path filter scripts/clone-manifest.sh runs inside the platform Dockerfile build, so a change to that script needs to retrigger publish. Without it, the prior fix (clone via Gitea + lowercase org) didn't trigger this workflow because scripts/ wasn't in the path filter. Also serves as the file change to satisfy the path filter for THIS push, retriggering publish-workspace-server-image now.	2026-05-07 08:18:53 -07:00
claude-ceo-assistant	d4256b9d83	Merge pull request 'fix(scripts): clone-manifest.sh — use Gitea + lowercase org slug (Class G)' (#31 ) from fix/clone-manifest-gitea into main	2026-05-07 15:18:09 +00:00
devops-engineer	8313b2a7a7	fix(scripts): clone-manifest.sh — use Gitea + lowercase org slug Post-2026-05-06 GitHub-org suspension: scripts/clone-manifest.sh was still pointing at https://github.com/${repo}.git, so the Docker build for workspace-server'\''s platform image fails at: fatal: could not read Username for 'https://github.com': No such device or address with no credentials available in the build container. Fix: clone from https://git.moleculesai.app/${repo}.git instead. manifest.json'\''s repo paths still read 'Molecule-AI/...' (the historic GitHub slug, mixed-case); Gitea lowercases the org component to 'molecule-ai/...'. Lowercase the org segment on the fly with awk so we don'\''t need to rewrite every manifest entry. Local verify: bash -n passes, lowercase transform produces correct Gitea paths, anonymous git clone of one of the manifest plugins over HTTPS to git.moleculesai.app succeeds. Class G in the prod-ship CI sweep — same shape as the github.com ref Harness Replays hits, this is the second instance found.	2026-05-07 08:17:58 -07:00
claude-ceo-assistant	566c095571	Merge pull request 'chore(ci): trigger publish-workspace-server-image (path-filter satisfaction)' (#30 ) from chore/touch-publish-workflow-to-trigger into main	2026-05-07 15:12:22 +00:00
devops-engineer	694a036a7f	chore(ci): trailing newline to retrigger publish-workspace-server-image (path-filter requires workflow file change)	2026-05-07 08:12:10 -07:00
claude-ceo-assistant	8c1dbc6ba5	Merge pull request 'chore(ci): retrigger publish-workspace-server-image post AWS secrets registration' (#29 ) from chore/retrigger-publish-post-aws-secrets into main	2026-05-07 15:08:03 +00:00
devops-engineer	72d0d4b44e	chore(ci): retrigger publish-workspace-server-image post AWS secrets registration	2026-05-07 08:07:46 -07:00
claude-ceo-assistant	52e61d4704	fix(ci): cherry-pick PR#23 — drop github-app-auth plugin checkout (#28 )	2026-05-07 14:52:47 +00:00
devops-engineer	10e510f50c	chore: drop github-app-auth + swap GHCR→ECR (closes #157 , #161 ) Two coupled cleanups for the post-2026-05-06 stack: ============================================ The plugin injected GITHUB_TOKEN/GH_TOKEN via the App's installation-access flow (~hourly rotation). Per-agent Gitea identities replaced this approach after the 2026-05-06 suspension — workspaces now provision with a per-persona Gitea PAT from .env instead of an App-rotated token. The plugin code itself lived on github.com/Molecule-AI/molecule-ai-plugin-github-app-auth which is also unreachable post-suspension; checking it out at CI build time was already failing. Removed: - workspace-server/cmd/server/main.go: githubappauth import + the `if os.Getenv("GITHUB_APP_ID") != ""` block that called BuildRegistry. gh-identity remains as the active mutator. - workspace-server/Dockerfile + Dockerfile.tenant: COPY of the sibling repo + the `replace github.com/Molecule-AI/molecule-ai- plugin-github-app-auth => /plugin` directive injection. - workspace-server/go.mod + go.sum: github-app-auth dep entry (cleaned up by `go mod tidy`). - 3 workflows: actions/checkout steps for the sibling plugin repo: - .github/workflows/codeql.yml (Go matrix path) - .github/workflows/harness-replays.yml - .github/workflows/publish-workspace-server-image.yml Verified `go build ./cmd/server` + `go vet ./...` pass post-removal. ======================================================= Same workflow used to push to ghcr.io/molecule-ai/platform + platform-tenant. ghcr.io/molecule-ai is gone post-suspension. The operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/ molecule-ai/) already hosts platform-tenant + workspace-template-* + runner-base images and is the post-suspension SSOT for container images. This PR aligns publish-workspace-server-image with that stack. - env.IMAGE_NAME + env.TENANT_IMAGE_NAME repointed to ECR URL. - docker/login-action swapped for aws-actions/configure-aws- credentials@v4 + aws-actions/amazon-ecr-login@v2 chain (the standard ECR auth pattern; uses AWS_ACCESS_KEY_ID/SECRET secrets bound to the molecule-cp IAM user). The :staging-<sha> + :staging-latest tag policy is unchanged — staging-CP's TENANT_IMAGE pin still points at :staging-latest, just with the new registry prefix. Refs molecule-core#157, #161; parallel to org-wide CI-green sweep.	2026-05-07 07:48:51 -07:00
claude-ceo-assistant	6fac24e3de	Merge pull request 'fix(workspace-server): SSOT-route container check + 422 on external runtimes (closes #10 )' (#12 ) from fix/issue10-runtime-aware-plugin-install into main	2026-05-07 11:27:52 +00:00
claude-ceo-assistant	f51722411b	Merge branch 'main' into fix/issue10-runtime-aware-plugin-install	2026-05-07 11:26:14 +00:00
claude-ceo-assistant	f0015bff81	Merge pull request 'fix(workspace-server): default-bind to 127.0.0.1 in dev-mode fail-open (closes #7 )' (#8 ) from fix/s8-bind-loopback-dev into main	2026-05-07 11:25:48 +00:00
claude-ceo-assistant	b72d1d3f26	Merge branch 'main' into fix/issue10-runtime-aware-plugin-install	2026-05-07 11:25:24 +00:00
claude-ceo-assistant	a674a6547e	Merge branch 'main' into fix/s8-bind-loopback-dev	2026-05-07 11:25:20 +00:00

1 2 3 4 5 ...

4550 Commits