molecule-core

Author	SHA1	Message	Date
Hongming Wang	feec130685	fix(ci): use Dockerfile.tenant for Fly registry image (Go + Canvas) The publish workflow was pushing platform/Dockerfile (Go-only) to the Fly registry, but tenant machines run the combined image (Go + Canvas reverse proxy). This caused "canvas unavailable" after machine update. Changes: - Fly registry build: platform/Dockerfile → platform/Dockerfile.tenant - GHCR: keeps Go-only image (for self-hosted/dev use) - Path triggers: add canvas/** and manifest.json (tenant image includes both) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 09:31:51 -07:00
Hongming Wang	ca1d5741d5	fix(ci): bypass docker login + macOS Keychain for image publish Six prior PRs (#273, #319, #322, #341, #484, #486) all kept calling `docker login` and tried to coerce credsStore via increasingly elaborate config tricks. None worked. The latest publish-canvas-image and publish-platform-image runs on main are still failing with: error storing credentials - err: exit status 1, out: `User interaction is not allowed. (-25308)` Verified locally on the runner host (2026-04-16): `docker login` on macOS unconditionally writes credentials to osxkeychain after a successful login, regardless of the config presented to it. # I wrote this: { "auths": {}, "credsStore": "", "credHelpers": {} } # After `docker login --config <dir> ghcr.io ...` succeeded: { "auths": { "ghcr.io": {} }, # empty — auth is in Keychain "credsStore": "osxkeychain" # Docker rewrote it back } So `--config` flag, DOCKER_CONFIG env var, credsStore="" etc. all share the same fate: Docker re-enables osxkeychain after every successful login. The Mac mini runner is a launchd user agent with a locked Keychain, so storage fails with -25308. This PR replaces the `docker login` invocation entirely. We write `base64(user:pat)` directly into the disposable DOCKER_CONFIG's `auths` map. `docker/build-push-action@v5` and the daemon honor the auths map for push without ever calling `docker login`, so the Keychain is never involved. Same shape in both workflows: - publish-canvas-image.yml — single registry (ghcr.io) - publish-platform-image.yml — two registries (ghcr.io + registry.fly.io) Fly username remains literal "x". Security: - Token env vars never echoed. Heredoc writes the auth blob via `umask 077` (file mode 600). The temp config dir lives under RUNNER_TEMP and is reaped at job end. - Diagnostics preserved (docker version + binary ls + registry keys only, no values) so future runner permission regressions remain visible without leaking secrets. Equivalent to closed PR #464 — re-opening because main is still broken (verified by inspecting the most recent failure). The closing comment on #464 stated the issue was already addressed by #341, but it isn't.	2026-04-16 09:25:20 -07:00
Hongming Wang	83a1a28b3f	fix(ci): use docker login CLI instead of login-action to bypass macOS Keychain docker/login-action@v3 ignores DOCKER_CONFIG and still tries the macOS system keychain on the self-hosted runner, producing: error storing credentials: User interaction is not allowed. (-25308) Switch to `docker login ... --password-stdin` which respects DOCKER_CONFIG and writes credentials to the per-run config.json we created in the isolate step. Applied to both GHCR and Fly registry logins in both publish workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:45:20 -07:00
Hongming Wang	dbe96ca11d	fix(ci): replace heredoc JSON with printf in publish workflows The heredoc block writing Docker config.json had unindented `{` at column 1, which GitHub Actions' YAML parser interpreted as a flow mapping start — causing every publish-platform-image and publish-canvas-image run to fail with 0 jobs (startup_failure). Replace `cat <<'JSON' ... JSON` with a single `printf` call that produces identical config.json content without confusing the parser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:20:43 -07:00
Hongming Wang	0064e61881	feat(ci): add Fly deploy step to publish-platform-image workflow After pushing the tenant image to registry.fly.io, the workflow now lists all running/stopped molecule-tenant machines and updates each to the newly pushed image tag. Gracefully skips if no machines exist (control plane provisions on demand). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:29:42 -07:00
Hongming Wang	0071b66a59	fix(ci): heredoc indentation in publish workflows + add dev-start.sh Two fixes: 1. publish-canvas-image.yml + publish-platform-image.yml: the JSON heredoc for config.json had leading whitespace from YAML indentation, producing invalid JSON. Docker fell back to osxkeychain → -25308. Fixed by removing indentation inside the heredoc body. 2. Added scripts/dev-start.sh — one-command local dev environment. Starts infra (docker-compose), platform (Go), and canvas (Next.js) with proper health checks and cleanup on Ctrl-C.	2026-04-16 05:56:25 -07:00
Hongming Wang	558d5c456a	fix(ci): remove molecli build step — CLI moved to standalone repo	2026-04-16 05:28:10 -07:00
Hongming Wang	d106cad8ac	Merge pull request #468 from Molecule-AI/fix/issue-458-e2e-cancel-protection ci: extract e2e-api into dedicated workflow with run-level cancel protection (#458)	2026-04-16 05:16:45 -07:00
DevOps Engineer	9b72be75f6	ci: extract e2e-api into dedicated workflow with run-level cancel protection (#458 ) Job-level `concurrency.cancel-in-progress: false` only prevents sibling jobs from killing each other — it does not protect the parent workflow run from being cancelled when a new push arrives. Every PR push was cancelling the in-progress E2E run, forcing manual `gh run rerun` across 7+ active PRs. Fix: move e2e-api into `.github/workflows/e2e-api.yml` with a workflow-level concurrency group (`e2e-api-${{ github.ref }}`, cancel-in-progress: false). New pushes now queue behind the running E2E job instead of cancelling it. Fast jobs (platform-build, canvas-build, shellcheck, python-lint) stay in ci.yml and retain normal run-level cancellation for quick iteration feedback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:15:13 +00:00
Hongming Wang	8e304e69e8	chore: remove extracted directories, add manifest-driven Docker builds Remove plugins/, workspace-configs-templates/, org-templates/ dirs (now in standalone repos). Add manifest.json listing all 33 repos and scripts/clone-manifest.sh to clone them. Both Dockerfiles now use the manifest script instead of 33 hardcoded git-clone lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 04:13:29 -07:00
Hongming Wang	2abb267d97	Merge pull request #415 from Molecule-AI/fix/issue-399-canvas-image-publish feat(ci): auto-publish canvas Docker image to GHCR on canvas/** merges	2026-04-16 03:08:27 -07:00
Canvas Agent	4a95aa3e98	feat(ci): auto-publish canvas Docker image to GHCR on canvas/ merges Closes #399. ## Root cause `publish-platform-image.yml` existed for the Go platform image but there was no equivalent for the canvas. After every canvas PR merged, CI ran `npm run build` and passed — but the live container at :3000 was never updated. The `canvas-deploy-reminder` job only posted a comment asking operators to manually rebuild, which was consistently missed. ## What this adds - `.github/workflows/publish-canvas-image.yml`: triggers on `canvas/` changes to main (and `workflow_dispatch`). Mirrors the platform workflow: macOS Keychain isolation, QEMU for linux/amd64, Buildx, GHCR push with `:latest` + `:sha-<7>` tags. - `NEXT_PUBLIC_PLATFORM_URL` / `NEXT_PUBLIC_WS_URL` resolve from `workflow_dispatch` inputs → `CANVAS_PLATFORM_URL` / `CANVAS_WS_URL` repo secrets → `localhost:8080` defaults (safe for self-hosted dev). - Inputs are passed via env vars (not direct `${{ }}` interpolation) to prevent shell injection from string inputs. - `docker-compose.yml`: adds `image: ghcr.io/molecule-ai/canvas:latest` to the canvas service so `docker compose pull canvas && docker compose up -d canvas` applies the new image. `build:` is retained for local development. Adds a comment clarifying that `NEXT_PUBLIC_*` runtime env vars are ignored by the standalone bundle (build-time only). - `ci.yml`: updates `canvas-deploy-reminder` commit comment to reference `docker compose pull` as the fast path, with `docker compose build` as the local-source fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:23:26 +00:00
Hongming Wang	51e3393ec0	fix(ops): bake workspace-configs-templates into platform Docker image Tenant machines were booting with no templates because the Dockerfile only shipped the Go binary + migrations. The canvas showed "0 templates" with an empty picker. Changes: - platform/Dockerfile: build context changed from ./platform to repo root so COPY can reach workspace-configs-templates/ alongside the Go source. COPY paths updated for platform/{go.mod,go.sum,*.go} and platform/migrations/. - .github/workflows/publish-platform-image.yml: context: . (was ./platform), paths trigger now includes workspace-configs-templates/ so template changes rebuild the image. Phase A of the template-registry plan. Phase B adds a DB registry + on-demand fetch for community templates (user pastes GitHub URL at workspace creation time). The baked defaults always ship in the image for zero-config tenant boot. Verified: `docker build -f platform/Dockerfile -t test .` succeeds, `docker run --rm test ls /workspace-configs-templates/` shows all 8 templates (autogen, claude-code-default, crewai, deepagents, gemini-cli, hermes, langgraph, openclaw). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 01:54:47 -07:00
Hongming Wang	8ad8ae1077	fix(ci): explicitly disable osxkeychain credsStore for self-hosted runner #273 tried to fix the macOS Keychain -25308 error by pointing DOCKER_CONFIG at a per-run temp dir with `{"auths": {}}`. That was necessary but not sufficient: Docker on macOS inherits `osxkeychain` as the default credsStore even when config.json doesn't declare one (comes from Docker Desktop's bundled binding), so the login-action still tried to call /usr/local/bin/docker-credential-osxkeychain which fails with -25308 from the non-interactive launchd session. Evidence: after #273, publish-platform-image still failed on every main merge with: error saving credentials: error storing credentials - err: exit status 1, out: `User interaction is not allowed. (-25308)` Fix: write a config.json that explicitly sets `credsStore: ""` and clears `credHelpers`, forcing Docker to store creds in the inline `auths` map of this disposable config.json instead of reaching for the keychain. Also print config.json at diagnostic time so a future regression surfaces in the log instead of at login. No runtime / test impact — this only changes what the runner writes to the workflow's temp DOCKER_CONFIG directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:20:06 -07:00
Hongming Wang	f2457ac287	chore(ci): serialize e2e-api across runs to prevent docker collision Now that the Molecule-AI org has two self-hosted Apple-silicon runners (`hongming-m1-mini` + `hongming-m1-mini-2`) servicing the same label set, two CI runs could execute the e2e-api job concurrently. Each run starts fixed-name docker containers (`molecule-ci-postgres`, `molecule-ci-redis`) bound to host ports 15432/16379 — a collision means the second run fails with "container name already in use" or "port already in use". Adds a workflow-level `concurrency: e2e-api` group to the job so GitHub Actions serializes e2e-api executions globally regardless of which runner picks them up. `cancel-in-progress: false` ensures later runs queue rather than cancelling the in-flight one (we want every PR's e2e check to actually execute, not get skipped by a newer push). Tradeoff: e2e-api is now effectively single-threaded across the whole org. Measured duration is ~1-2 min per run, so the added serialization latency is small relative to total CI wall time. All other jobs still parallelize across both runners.	2026-04-15 17:06:41 -07:00
Hongming Wang	0b403aeeab	fix(ci): publish-platform-image keychain + path diagnostics Every publish-platform-image run since the `aa41947` self-hosted runner migration has been failing with two runner-level issues that the workflow now works around (keychain) or surfaces clearly (path): 1. "error storing credentials - err: exit status 1, out: 'User interaction is not allowed. (-25308)'" docker/login-action tries to persist the GHCR + Fly tokens in the macOS Keychain, but the Mac mini runner runs as a non-interactive launchd service without an unlocked desktop session — keychain access raises -25308. Fix: set DOCKER_CONFIG to a per-run temp dir containing a plain config.json before the login step so credentials land in a file, not the keychain. This is the same trick the GitHub-hosted macos runners use in docker action examples. 2. "Unexpected error attempting to determine if executable file exists '/usr/local/bin/docker': Error: EACCES: permission denied, stat '/usr/local/bin/docker'" Not a workflow bug — the runner literally can't read the Docker binary path. Adds a diagnostic step before QEMU/buildx setup that prints: PATH, `command -v docker`, `docker --version`, and `ls -la` on both /usr/local/bin/docker and /opt/homebrew/bin/docker. Surfacing these in the log means the next failure (if any) shows the actual problem instead of hiding behind a cryptic buildx error. Does NOT fix the root cause of #2 — that needs the user to SSH into the Mac mini runner and reinstall / re-permission Docker Desktop (or switch to Colima/OrbStack). The diagnostic output will tell us exactly which path is broken. The 20+ queued CI runs from `ci.yml` are unrelated to this PR — they are stuck because the self-hosted runner has severely degraded queue throughput (runs wait 2+ hours before being picked up). That's a separate runner-health issue tracked as a user action in the triage report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:06:28 -07:00
Hongming Wang	b2761ba568	fix(ci): apply user's bypass-setup-python to main (missed in #186 squash-merge) #186's squash-merge commit (`aa419477`) took 15e15a21 (AGENT_TOOLSDIRECTORY override) but missed a6cfc5f (bypass setup-python entirely) which was pushed to the PR branch after the merge was initiated. The merge commit still has the old setup-python@v5 job config. Applies a6cfc5f's ci.yml verbatim via git checkout. Restores the Homebrew-python3.11 bypass path that the user prototyped. No other changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:58:22 -07:00
Hongming Wang	aa419477b7	chore(ci): migrate all jobs to self-hosted macOS arm64 runner * chore(ci): migrate all jobs to self-hosted macOS arm64 runner Switches every job in `ci.yml` and `publish-platform-image.yml` from `ubuntu-latest` to `[self-hosted, macos, arm64]` to avoid GitHub-hosted minute rate limits. All jobs run on a single Apple-silicon self-hosted runner registered at the Molecule-AI org level. Notable non-trivial adaptations (macOS runners can't use `services:` and some GHA marketplace actions are Linux-only): - e2e-api: `services: postgres/redis` replaced with inline `docker run` steps. Ports remapped to 15432/16379 to avoid collision with anything the host may already expose on the standard ports. Containers are named (`molecule-ci-postgres` / `molecule-ci-redis`) and torn down in an `if: always()` step. Postgres readiness is still gated on pg_isready via `docker exec`. - shellcheck: `ludeeus/action-shellcheck` is a Docker action, Linux-only. Replaced with a direct `shellcheck` invocation (pre-installed on the runner) that scans `tests/e2e/.sh` with `--severity=warning`. - publish-platform-image: added `docker/setup-qemu-action@v3` and an explicit `platforms: linux/amd64` on both `docker/build-push-action` invocations. The runner is arm64 but Fly tenant machines pull amd64, so QEMU-emulated cross-arch builds are required. GHA cache-from/cache-to behavior is unchanged. Runner prereqs (one-time host setup): - Docker Desktop installed and running (for e2e-api + image publish) - `shellcheck` on PATH - `docker` on PATH - Go / Node / gh / Python are installed via setup- actions per job * fix(ci): set AGENT_TOOLSDIRECTORY for python-lint on self-hosted runner setup-python@v5 defaults to /Users/runner/hostedtoolcache which doesn't exist on the hongming-claw self-hosted runner. AGENT_TOOLSDIRECTORY tells the action to use a writable path under the runner user's home directory. Fixes the only failing job in CI run 24469156329 on PR #186. --------- Co-authored-by: Hongming Wang <HongmingWang-Rabbit@users.noreply.github.com>	2026-04-15 10:48:27 -07:00
Hongming Wang	8decdd491e	fix(ci): revert Fly registry username to 'x' — 'molecule-ai' gets 401 Post-mortem on the failed publish-platform-image run on main (PR #82): Fly's Docker registry requires username EXACTLY equal to "x". My code-review "readability fix" changing it to "molecule-ai" caused every push to return 401 Unauthorized. Verified locally: echo $FLY_API_TOKEN \| docker login registry.fly.io -u x --password-stdin → Login Succeeded echo $FLY_API_TOKEN \| docker login registry.fly.io -u molecule-ai --password-stdin → 401 Unauthorized Lesson: don't second-guess docs that specify a literal value. Comment now says "MUST be literal 'x'" with a 2026-04-15 verification note to prevent future regressions. Code-review process improvement: when reviewing a change against a vendor API, prefer "preserve exact doc-specified values" over readability suggestions. Logged as a cron-learning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:21:53 -07:00
Hongming Wang	73dbca4e38	review: split push steps, runbook for secret rotation, username clarity Addresses PR #82 code review: 🟡×3 + 🔵×5. - Fly registry login username: 'x' → 'molecule-ai' + explanatory comment. - Build & push split into two steps (GHCR / Fly registry) so a single- registry outage can't fail the other. Second step uses 'if: always()' to ensure Fly mirror runs even if GHCR push flakes. - docs/runbooks/saas-secrets.md: full secret map + rotation procedures for every SaaS credential, with danger-case callouts. Documents the coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be rotated in both). - CLAUDE.md: new 'SaaS ops' section linking to the runbook.	2026-04-14 17:09:11 -07:00
Hongming Wang	6bcafd643e	feat(ci): mirror platform image to registry.fly.io/molecule-tenant Keeps ghcr.io/molecule-ai/platform private (per CEO direction — open- source when full SaaS ships) while still letting the private control plane's Fly provisioner boot tenant machines: Fly auto-authenticates same-org machines against registry.fly.io, no per-tenant pull credentials to wire. Workflow now logs into both GHCR (using built-in GITHUB_TOKEN) and Fly registry (using FLY_API_TOKEN secret) and pushes the same image to four tags total: - ghcr.io/molecule-ai/platform:latest - ghcr.io/molecule-ai/platform:sha-<short> - registry.fly.io/molecule-tenant:latest - registry.fly.io/molecule-tenant:sha-<short> Secret added via `gh secret set FLY_API_TOKEN` on the public repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:05:36 -07:00
Hongming Wang	92a06a8684	feat(ci): publish-platform-image workflow → ghcr.io/molecule-ai/platform Phase B.2 companion to the private molecule-controlplane provisioner PR. On every push to main that touches platform/**, builds platform/Dockerfile and pushes to GHCR with two tags: - :latest (floating, always main's tip) - :sha-<short-commit> (immutable, pin-friendly) Cache via GitHub Actions cache (cache-from: type=gha). Workflow_dispatch trigger so we can re-publish after a docs-only merge if needed. The private molecule-controlplane sets TENANT_IMAGE=ghcr.io/molecule-ai/platform:<tag> and the provisioner creates each tenant Fly Machine from this image. Staying on the same base image across tenants keeps upgrades atomic. CLAUDE.md updated to document the new workflow in the CI pipeline section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:37:49 -07:00
Dev Lead Agent	64c95edf8d	ci: post canvas deploy reminder comment after every main merge Adds a `canvas-deploy-reminder` job to ci.yml that fires on every push to main once `canvas-build` passes. It posts a commit comment via the built-in GITHUB_TOKEN (no new secrets needed) reminding whoever monitors CI to run: cd /g/personal_programs/molecule-monorepo git pull origin main docker compose build canvas && docker compose up -d canvas The comment includes the commit SHA and a direct link to the build log. Rationale: 5 consecutive merge cycles (PRs #21, #25, #30, #32, #34) went undeployed because there is no auto-deploy hook and the manual step was silently forgotten. A commit comment on the merge commit is the lowest-friction reminder that requires no external secrets or infra. Does NOT run on PRs — only on direct pushes to main (i.e. post-merge). Uses `needs: canvas-build` so the reminder only fires after build+tests pass; a failing build produces no comment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 08:28:42 +00:00
Hongming Wang	30b30b60dc	chore: apply round-7 review nits - _extract_token.py: narrow `except Exception` to `except (json.JSONDecodeError, ValueError)`. Prevents swallowing KeyboardInterrupt in edge cases and documents intent clearly. - ci.yml shellcheck job: switch to ludeeus/action-shellcheck@master (caches shellcheck binary across runs; saves the apt-get install). Both changes verified locally: YAML parses, extract script still extracts valid tokens and prints the stderr warning on malformed JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:08:45 -07:00
Hongming Wang	c84b9998b6	chore: apply code-review round-6 suggestions All 5 suggestions from the latest review pass. ## tests/e2e/_extract_token.py (new) Extracted the 14-line python-in-bash heredoc from _lib.sh into a real Python file. Easier to edit, fewer escaping traps, same behavior. Shell helper now just shells out to it. ## tests/e2e/_lib.sh - Replaced inline python with: python3 "$(dirname "${BASH_SOURCE[0]}")/_extract_token.py" - Removed redundant sys.exit(0) as part of the extraction ## Shellcheck-clean scripts (new CI job enforces) - Removed dead captures: BEFORE_COUNT (test_activity_e2e.sh), ORIG_SKILLS, REIMPORT_SKILLS (test_api.sh), QA_TOKEN (test_comprehensive_e2e.sh) - Renamed unused loop vars `i`, `j` -> `_` in 4 sites - Added `# shellcheck disable=SC2046` on the two intentional word-splits in test_claude_code_e2e.sh (docker stop/rm of multiple container IDs) - Removed a useless re-register of QA mid-script (was done in Section 2) ## CI (.github/workflows/ci.yml) - Replaced `sudo apt-get install postgresql-client` + psql with a direct `docker exec` into the existing postgres:16 service container. Saves ~10-20s per CI run. - Added new `shellcheck` job that lints tests/e2e/.sh on every PR. Local: shellcheck --severity=warning returns 0 across all 5 scripts. ## Verification - go test -race ./internal/handlers/... : pass - mcp-server: 96/96 jest - canvas: 357/357 vitest + clean build - tests/e2e/test_api.sh: 62/62 - tests/e2e/test_comprehensive_e2e.sh: 67/67 - shellcheck tests/e2e/.sh : clean - CI YAML: valid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:08:45 -07:00
Hongming Wang	3130fe0144	chore: address follow-up review — dead helpers, lib polish, CI hardening Last sweep of code-review items before merging PR #5. ## _lib.sh cleanup - Removed unused e2e_register and e2e_heartbeat helpers (dead code — no caller ever invoked them) - Standardized on $BASE variable set via : "${BASE:=...}" so every script uses one name (was mixed $BASE / $e2e_base) - e2e_extract_token now writes stderr warnings on JSON parse failure or missing auth_token, instead of silently returning empty. Previous behavior made downstream "missing workspace auth token" 401s much harder to diagnose ## Script cleanup - test_api.sh, test_comprehensive_e2e.sh, test_activity_e2e.sh all drop the redundant `e2e_base + BASE="$e2e_base"` aliasing; sourcing _lib.sh sets BASE via : "${BASE:=...}" default ## CI hardening (.github/workflows/ci.yml) - Postgres credentials now match .env.example (dev:dev — was molecule:molecule, caused confusion for local repros) - Added Go module cache via actions/setup-go cache:true + cache-dependency-path: platform/go.sum. ~30s cold-run improvement - New pre-E2E step asserts migrations actually ran by checking for the 'workspaces' table. Catches future migration-author mistakes before they surface as obscure E2E failures ## Follow-up issue Filed Molecule-AI/molecule-monorepo#6 for the deterministic token- mint admin endpoint. PR #5 uses an empirical "beat the container" race (5/5 wins in benchmarks); issue #6 tracks the real fix for any future CI load that invalidates the assumption. ## Verification - bash tests/e2e/test_api.sh -> 62/62 - bash tests/e2e/test_comprehensive_e2e.sh -> 67/67 - python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci.yml'))" -> ok ## Operational note Hourly PR-triage + issue-pickup cron scheduled this session (job id 0328bc8f, fires at :17 past each hour). Runtime reports it as session-only despite durable:true — re-invoke via /loop or CronCreate in a fresh session if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:08:45 -07:00
Hongming Wang	f9803ec55e	fix(e2e): comprehensive + activity_e2e + shared lib + CI smoke job Follow-up to the test_api.sh fix. Same Phase 30.1 + 30.6 staleness existed in the other E2E scripts; same pattern applied. ## New tests/e2e/_lib.sh Shared bash helpers so future scripts don't reimplement: - e2e_extract_token — parse auth_token from register response - e2e_register — register + echo token - e2e_heartbeat — heartbeat with bearer auth - e2e_cleanup_all_workspaces — pre-test state reset ## test_comprehensive_e2e.sh (14 fail -> 0 fail) Root cause was deeper than test_api.sh: the script creates workspaces at Section 2 but doesn't register them until Section 3. In between, the platform provisioner spawns the Docker container, whose main.py calls /registry/register first and claims the single-issue token. The script's later register gets no auth_token back. Fix: register each workspace immediately after POST /workspaces, beating the container to the token. Empirically 5/5 wins in a tight loop. PM/Dev/QA tokens captured at creation time; bearer auth threaded through all heartbeat/update-card/discover/peers calls. Removed the duplicate register calls in Section 3/4 that followed (tokens already captured). Result: 53/68 -> 67/67 (one duplicate check dropped). ## test_activity_e2e.sh Same pattern applied on faith. Script still SKIPs cleanly when no online agent is present; when an agent IS online, it now re-registers it to mint a fresh bearer token and threads Authorization: Bearer on the 3 heartbeat calls. ## test_api.sh refactor Now sources _lib.sh and uses the shared helpers. No behavior change, still 62/62. ## .github/workflows/ci.yml — new e2e-api job Spins up Postgres 16 + Redis 7 as GitHub Actions services, builds the platform binary, runs it in background with DATABASE_URL/REDIS_URL, polls /health for 30s, then runs tests/e2e/test_api.sh. On failure dumps platform.log for triage. 10-min job timeout. This is the watchdog that would have caught Phase 30.1 auth drift the day it landed. Picks test_api.sh not test_comprehensive_e2e.sh because the latter depends on Docker-in-Docker for container provisioning which is heavier than a PR gate should carry. ## Verification - bash tests/e2e/test_api.sh -> 62/62 - bash tests/e2e/test_comprehensive_e2e.sh -> 67/67 - bash tests/e2e/test_activity_e2e.sh -> cleanly SKIPs (no agent) - go build ./... -> clean - .github/workflows/ci.yml -> valid YAML, new job added Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:08:45 -07:00
Hongming Wang	24fec62d7f	initial commit — Molecule AI platform Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:55:37 -07:00

28 Commits