molecule-core

Author	SHA1	Message	Date
Hongming Wang	1100c50da8	Merge pull request #2172 from Molecule-AI/feat/e2e-cover-all-8-runtimes feat(e2e): extend priority-runtimes test to cover all 8 templates	2026-04-27 13:00:43 +00:00
Hongming Wang	c7478af99f	feat(e2e): extend priority-runtimes test to cover all 8 templates Tonight's wire-real E2E sweep exposed 12+ root causes across the post- #87 template extraction. Most would have been caught by an actual provision-and-online test running on each template — but the test only covered claude-code + hermes. Extending it to cover all 8 ensures any future regression in any template fails the test, not production. What's added: - run_openai_runtime(runtime, label): generic provisioner for the 5 OpenAI-backed templates (langgraph, crewai, autogen, deepagents, openclaw). Same shape as run_hermes minus the HERMES_* config block that hermes-agent needs. - run_gemini_cli: separate function — gemini-cli wants a Google AI key (E2E_GEMINI_API_KEY), not OpenAI. - Each new runtime registered in the dispatch loop. New `all` keyword for E2E_RUNTIMES runs every covered runtime. claude-code + hermes keep their dedicated functions; both have unique provisioning quirks (claude-code OAuth + claude-code-specific volume mounts; hermes 15-min cold-boot) that don't generalize cleanly. Skip-if-no-key pattern matches the existing one — partially-keyed CI gets clean skips, not false-fails. Usage: E2E_OPENAI_API_KEY=... E2E_RUNTIMES=langgraph ./test_priority_runtimes_e2e.sh E2E_OPENAI_API_KEY=... E2E_RUNTIMES=all ./test_priority_runtimes_e2e.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:57:59 -07:00
Hongming Wang	026f5e51d9	ops: add Railway SHA-pin drift audit script + regression test (#2001 ) #2000 fixed one symptom — TENANT_IMAGE pinned to `staging-a14cf86` (10 days stale) silently no-op'd four upstream fixes on 2026-04-24. This adds the audit pattern as a re-runnable script so the broader class is observable on demand without new CI infrastructure. Audit results today (2026-04-27): controlplane / production: 54 vars audited, 0 drift-prone pins controlplane / staging: 52 vars audited, 0 drift-prone pins So the immediate audit deliverable is clean — TENANT_IMAGE is the only known violation and #2000 already fixed it. The script makes the ongoing audit a 5-second command instead of a manual one. Detection regex catches: * branch-SHA suffixes (`staging\|main\|prod\|production-<6+ hex>`) — the exact 2026-04-24 incident shape * version pins after `:` or `=` (`:v1.2.3`, `=v0.1.16`) — same drift class, just rendered differently Anchoring on `:` or `=` keeps prose like "version 1.2.3 of the api" out of the false-positive set. UUIDs, ARNs, AMI IDs, secrets, and floating tags (`:staging-latest`, `:main`) pass through untouched. Regression test (tests/ops/test_audit_railway_sha_pins.sh) pins 20 representative cases — 9 should-flag (covering all four branch prefixes + semver variants + middle-of-value matches) and 11 should-pass (the false-positive guards). Same regex inlined in both files so a future tweak that weakens detection fails the test in lockstep with weakening the audit. Both files shellcheck clean. CI gate (acceptance criterion's "regression: add a CI check") is deliberately scoped out — querying Railway from CI requires plumbing RAILWAY_TOKEN as a repo secret, which is multi-step setup. The re-runnable script + test cover the same surface today; the CI workflow is a small follow-up once the token is provisioned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:01:23 -07:00
Hongming Wang	99fb61bb8c	fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159 ) When E2E_INTENTIONAL_FAILURE=1 poisons the tenant token, step 5/11's `tenant_call POST /workspaces` curl exits 22 (HTTP error under --fail-with-body). `set -e` propagates rc=22 directly, but the script's documented contract emits only {0,1,2,3,4}, and the sanity workflow's case statement only matches those. rc=22 falls through to "Unexpected rc — investigate harness" and opens a false-positive priority-high "safety net broken" issue (#2159, weekly run on 2026-04-27). The trap now captures $? at entry (must be the first statement before any command clobbers it) and at the end normalizes any non-contract code to 1 (generic failure). Leak detection continues to exit 4 directly, so its semantics are preserved. Adds tests/e2e/test_harness_rc_normalization.sh — a self-contained regression test that builds a stub harness with the same trap pattern, triggers controlled exit codes, and asserts the normalization. Covers the 5 contracted codes + curl-22 (the bug) + 3 representative network-failure codes + sigsegv-139. Verification: - 10/10 regression tests pass - shellcheck clean on both modified files - production teardown path unchanged for legitimate {1,2,3,4} failures and the leak-detection exit 4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:55:44 -07:00
Hongming Wang	a4b3ebf951	test(e2e): claude-code + hermes priority-runtimes happy path Self-contained happy-path E2E for the two runtimes the project commits to first-class support for (task #116, completes the loop on the "both must work end-to-end with tests" requirement). What it proves per runtime: 1. POST /workspaces succeeds with the runtime + secrets 2. Workspace reaches status=online within its cold-boot window (claude-code: 240s, hermes: 900s on cold apt + uv + sidecar) 3. POST /a2a (message/send "Reply with PONG") returns a non-error, non-empty reply 4. activity_logs row written with method=message/send and ok\|error status (a2a_proxy.LogActivity contract) Skip semantics: each phase independently checks for its required env key (CLAUDE_CODE_OAUTH_TOKEN / E2E_OPENAI_API_KEY) and skips cleanly if absent. The script always exit-0s if every phase either passed or skipped — so wiring it into a no-keys CI job validates the script itself stays clean without false-failing. Idempotent: pre-sweeps any prior "Priority E2E (claude-code)" / "Priority E2E (hermes)" workspaces so a run interrupted by SIGPIPE / kill -9 (which bypasses the EXIT trap) doesn't poison the next run. Same defensive pattern as test_notify_attachments_e2e.sh. CI wiring: - e2e-api.yml — runs on every PR with no LLM keys, both phases skip, catches script-level regressions (set -u bugs, syntax issues, etc.) - canary-staging.yml + e2e-staging-saas.yml already have the keys via secrets.MOLECULE_STAGING_OPENAI_KEY and exercise wire-real behavior — could be wired to opt-in if you want claude-code coverage there too. Local runs (from this branch, no keys): === Results: 0 passed, 0 failed, 2 skipped === Validates the capability primitives shipped in PRs #2137-2144: once template PRs #12 (claude-code) + #25 (hermes) merge with their declared provides_native_session=True + idle_timeout_override=900, a manual run with both keys validates the full native+pluggable chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:48:54 -07:00
Hongming Wang	49fb5fdaf6	test(notify): pre-sweep prior workspaces so interrupted runs don't pile up User flagged a leftover "Notify E2E" workspace on the canvas — caused by an earlier debug run getting SIGPIPE'd before the EXIT trap could fire. Add an idempotent pre-sweep at the top of the script so the next run cleans up any prior leftover with the same name. Belt-and-suspenders with the existing trap; both have to fail for a leak to persist. Verified: - Normal run: 14/14 pass, 0 leftovers - SIGTERM mid-setup: trap fires, 0 leftovers - Re-run after interruption: pre-sweep + new run both clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:55:13 -07:00
Hongming Wang	94e86698fb	fix(test): mint test token for notify E2E so it works in CI Local dev mode bypassed workspace auth, so my first push passed locally but failed CI with HTTP 401 on /notify. The wsAuth-grouped endpoints (notify, activity, chat/uploads) require Authorization: Bearer in any non-dev environment. Mint the token via the existing e2e_mint_test_token helper and thread it through every authenticated curl. Same pattern as test_api.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:45:42 -07:00
Hongming Wang	62cfc21033	test(comms): comprehensive E2E coverage for agent → user attachments User asked to "keep optimizing and comprehensive e2e testings to prove all works as expected" for the communication path. Adds three layers of coverage for PR #2130 (agent → user file attachments via send_message_to_user) since that path has the most user-visible blast radius: 1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test, no workspace container needed. 14 assertions covering: notify text-only round-trip, notify-with-attachments persists parts[].kind=file in the shape extractFilesFromTask reads, per-element validation rejects empty uri/name (regression for the missing gin `dive` bug), and a real /chat/uploads → /notify URI round-trip when a container is up. 2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the WebSocket-side filtering that drops malformed attachments, allows attachments-only bubbles, ignores non-array payloads, and no-ops on pure-empty events. 3. Persisted response_body shape test (message-parser.test.ts +1) — pins the {result, parts} contract the chat history loader hydrates on reload, so refreshing after an agent attachment restores both caption and download chips. Also wires the new shell E2E into e2e-api.yml so the contract regresses in CI rather than only in manual runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:41:56 -07:00
rabbitblood	b87befdabe	chore(simplify): trim SHA-rot comments + harden TENANT_HOST scheme/port stripping Simplify pass on top of the canary fix: - Drop the three CP commit SHAs from comments — issue #2090 covers the audit trail, SHAs would rot. - Pull the inline `900` into TLS_TIMEOUT_SEC=$((15 * 60)) so the bash mirrors the TS side (15 min) at a glance. - TENANT_HOST extraction now strips http(s) AND any port suffix, so getent doesn't silently fail on a ws://host:443 style URL. - sed-redact Authorization/Cookie out of the curl -v dump, defensive against future callers adding an auth header to this probe. Pure cleanup; no behaviour change to the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:44:54 -07:00
rabbitblood	af89d3fcbd	fix(e2e): bump tenant TLS timeout to 15m + diagnostic burst on failure (#2090 ) Canary #2090 has been red for 6 consecutive runs over 4+ hours, all timing out at the TLS-readiness step exactly at the 10-min cap. Time window correlates with three CP commits that landed today/yesterday and changed EC2 boot behaviour: - molecule-controlplane@a3eb8be — fix(ec2): force fresh clone of /opt/adapter - molecule-controlplane@ed70405 — feat(sweep): wire up healthcheck loop - molecule-controlplane@4ab339e — fix(provisioner): aggregate cleanup errors Two changes here, both surgical: 1. Bump the bash-side TLS deadline from 600s to 900s, and the canvas TS mirror from 10m to 15m. Stays below the 20-min provision envelope (so a genuinely-stuck tenant still fails loud at the earlier provision step instead of masquerading as TLS). 2. On TLS-timeout, dump a diagnostic burst before exiting: - getent hosts $TENANT_HOST (DNS resolution state) - curl -kv $TENANT_URL/health (TLS handshake + HTTP layer) The previous failure log was just "no 2xx in N min" with no signal for which layer was actually broken. After this, the next timeout tells us whether DNS, TLS handshake, or HTTP layer is the culprit so the CP root cause can be isolated without speculation. This is the unblock; a separate molecule-controlplane issue tracks the underlying regression suspicion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:39:28 -07:00
Hongming Wang	d0f198b24f	merge: resolve staging conflicts (a2a_proxy + workspace_crud) Three files conflicted with staging changes that landed while this PR sat open. Resolved each by combining both intents (not picking one side): - a2a_proxy.go: keep the branch's idle-timeout signature (workspaceID parameter + comment) AND apply staging's #1483 SSRF defense-in-depth check at the top of dispatchA2A. Type-assert h.broadcaster (now an EventEmitter interface per staging) back to Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through to no-op when the assertion fails (test-mock case). - a2a_proxy_test.go: keep both new test suites — branch's TestApplyIdleTimeout_ (3 cases for the idle-timeout helper) AND staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated the staging test's dispatchA2A call to pass the workspaceID arg introduced by the branch's signature change. - workspace_crud.go: combine both Delete-cleanup intents: * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas hang-up doesn't cancel mid-Docker-call (the container-leak fix) * Branch's stopAndRemove helper that skips RemoveVolume when Stop fails (orphan sweeper handles) * Staging's #1843 stopErrs aggregation so Stop failures bubble up as 500 to the client (the EC2 orphan-instance prevention) Both concerns satisfied: cleanup runs to completion past canvas hangup AND failed Stop calls surface to caller. Build clean, all platform tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:43:22 -07:00
Hongming Wang	1ae051ec95	test(e2e): add 'Invalid API key' regression assertion to staging A2A check (#1900 ) The staging E2E suite already grep's for 5 known regression patterns in the A2A response (hermes-agent 401, model_not_found, Encrypted content, Unknown provider, hermes-agent unreachable). The comment block at lines 386-395 lists "Invalid API key" as the signal for the CP #238 boot-event 401 race + stale OPENAI_API_KEY paths, but the explicit grep was never added — meaning a regression in that class would slip through the generic `error\|exception` catch-all. Closes the gap with one specific-pattern check that fails loud with the relevant bug references in the message. Verified `bash -n` clean; pre-existing shellcheck SC2015 at line 88 is unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:03:46 -07:00
rabbitblood	f9b1b34956	fix(e2e): bump staging tenant TLS-readiness timeout 3min → 10min Closes a 4+ cycle Canvas tabs E2E flake pattern that's been blocking staging→main PRs since 2026-04-24+ (#2096, #2094, #2055, #2079, ...). Root cause: TLS_TIMEOUT_MS=180s (3 min) is too tight for the layered realities of staging tenant TLS readiness: 1. Cloudflare DNS propagation through the edge (1-2 min typical) 2. Tenant CF Tunnel registering the new hostname (1-2 min) 3. CF edge ACME cert provisioning + cache (1-3 min) Each layer can add 1-3 min on its own under heavy staging load — the realistic worst case is well past the 3-min cap. Provision and workspace-online timeouts were already raised to 20 min (staging-setup.ts:42-46 history). The TLS gate was the remaining under-budgeted step. Bumping to 10 min keeps it inside the 20-min PROVISION envelope so a genuinely-stuck tenant still fails loud at the earlier provision step rather than masquerading as a TLS issue. Both call sites raised together: - canvas/e2e/staging-setup.ts: TLS_TIMEOUT_MS = 10 * 60 * 1000 - tests/e2e/test_staging_full_saas.sh: TLS_DEADLINE += 600 Each carries an inline rationale comment so the next reviewer sees the layer-by-layer decomposition without re-reading the issue thread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 08:21:18 -07:00
Hongming Wang	425df5e5a9	merge(staging): resolve conflicts + fix 7 test regressions on top of #2061 - Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files auto-merged (mostly canvas/tabs/chat and workspace-server handlers the earlier DIRTY marker was stale relative to current staging). - Fix 7 test failures surfaced by the merge: 1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was inferred as vi.fn(() => never[]); mockReturnValueOnce of a node object failed type check. Explicit return-type annotation. 2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx reads deletingIds.size (new multilevel-layout state). Both mock stores lacked deletingIds; added new Set<string>() to each. 3. canvas-batch-partial-failure.test.ts — makeWS() built a wire- format WorkspaceData (snake_case, with x/y/uptime_seconds). The store's node.data is now WorkspaceNodeData (camelCase, no wire- only fields). Rewrote makeWS to produce WorkspaceNodeData and updated 5 call-site casts. No assertions changed. 4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior that the PR intentionally inverts: a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG now contains only {"external"}, so the banner is no longer shown for hermes. Inverted assertion: now pins ABSENCE of the banner, with a comment noting the inversion. b. "config.yaml runtime wins over DB" — priority reversed: DB is now authoritative so the tier-on-node badge matches the form. Inverted scenario: DB=hermes + yaml=crewai → form shows hermes. Switched test's DB runtime off langgraph because the dropdown collapses langgraph into an empty- valued "default" option that would hide the win signal. - No production code changed — this commit is staging merge + test realignment only. 953/953 canvas tests pass. tsc --noEmit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:50:39 -07:00
Hongming Wang	94d9331c76	feat(canvas+platform): chat attachments, model selection, deploy/delete UX Session's accumulated UX work across frontend and platform. Reviewable in four logical sections — diff is large but internally cohesive (each section fixes a gap the next one depends on). ## Chat attachments — user ↔ agent file round trip - New POST /workspaces/:id/chat/uploads (multipart, 50 MB total / 25 MB per file, UUID-prefixed storage under /workspace/.molecule/chat-uploads/). - New GET /workspaces/:id/chat/download with RFC 6266 filename escaping and binary-safe io.CopyN streaming. - Canvas: drag-and-drop onto chat pane, pending-file pills, per-message attachment chips with fetch+blob download (anchor navigation can't carry auth headers). - A2A flow carries FileParts end-to-end; hermes template executor now consumes attachments via platform helpers. ## Platform attachment helpers (workspace/executor_helpers.py) Every runtime's executor routes through the same helpers so future runtimes inherit attachment awareness for free: - extract_attached_files — resolve workspace:/file:///bare URIs, reject traversal, skip non-existent. - build_user_content_with_files — manifest for non-image files, multi-modal list (text + image_url) for images. Respects MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision adapter hangs on base64 payloads (MiniMax M2.7). - collect_outbound_files — scans agent reply for /workspace/... paths, stages each into chat-uploads/ (download endpoint whitelist), emits as FileParts in the A2A response. - ensure_workspace_writable — called at molecule-runtime startup so non-root agents can write /workspace without each template having to chmod in its Dockerfile. Hermes template executor + langgraph (a2a_executor.py) + claude-code (claude_sdk_executor.py) all adopt the helpers. ## Model selection & related platform fixes - PUT /workspaces/:id/model — was 404'ing, so canvas "Save" silently lost the model choice. Stores into workspace_secrets (MODEL_PROVIDER), auto-restarts via RestartByID. - applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"] so Restart propagates the stored model to HERMES_DEFAULT_MODEL without needing the caller to rehydrate payload.Model. - ConfigTab Tier dropdown now reads from workspaces row, not the (stale) config.yaml — fixes "badge shows T3, form shows T2". ## ChatTab & WebSocket UX fixes - Send button no longer locks after a dropped TASK_COMPLETE — `sending` no longer initializes from data.currentTask. - A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s; the previous default aborted fetches while the server was still replying, producing "agent may be unreachable" on success. - socket.ts: disposed flag + reconnectTimer cancellation + handler detachment fix zombie-WebSocket in React StrictMode. - Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' — the adaptor's purpose IS the form, banner was contradictory. - workspace_provision.go auto-recovery: try <runtime>-default AND bare <runtime> for template path (hermes lives at the bare name). ## Org deploy/delete animation (theme-ready CSS) - styles/theme-tokens.css — design tokens (durations, easings, colors). Light theme overrides by setting only the deltas. - styles/org-deploy.css — animation classes + keyframes, every value references a token. prefers-reduced-motion respected. - Canvas projects node.draggable=false onto locked workspaces (deploying children AND actively-deleting ids) — RF's authoritative drag lock; useDragHandlers retains a belt-and- braces check. - Organ cancel button (red pulse pill on root during deploy) cascades via existing DELETE /workspaces/:id?confirm=true. - Auto fit-view after each arrival, debounced 500 ms so rapid sibling arrivals coalesce into one fit (previous per-event fit made the viewport lurch continuously). - Auto-fit respects user-pan — onMoveEnd stamps a user-pan timestamp only when event !== null (ignores programmatic fitView) so auto-fits don't self-cancel. - deletingIds store slice + useOrgDeployState merge gives the delete flow the same dim + non-draggable treatment as deploy. - Platform-level classNames.ts shared by canvas-events + useCanvasViewport (DRY'd 3 copies of split/filter/join). ## Server payload change - org_import.go WORKSPACE_PROVISIONING broadcast now includes parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas renders the child at the right parent-nested slot without doing any absolute-position walk. createWorkspaceTree signature gains relX, relY alongside absX, absY; both call sites updated. ## Tests - workspace/tests/test_executor_helpers.py — 11 new cases covering URI resolution (including traversal rejection), attached-file extraction (both Part shapes), manifest-only vs multi-modal content, large-image skip, outbound staging, dedup, and ensure_workspace_writable (chmod 777 + non-root tolerance). - workspace-server chat_files_test.go — upload validation, Content-Disposition escaping, filename sanitisation. - workspace-server secrets_test.go — SetModel upsert, empty clears, invalid UUID rejection. - tests/e2e/test_chat_attachments_e2e.sh — round-trip against a live hermes workspace. - tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static plumbing check + round-trip across hermes/langgraph/claude-code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:27:51 -07:00
Molecule AI CP-BE	ca7fa3b65e	fix(e2e): increase hermes workspace wait from 20 to 30 min Root cause of PR #1981 E2E failures (step 7 timeout): - hermes-agent install from NousResearch (Node 22 tarball + Python deps from source) + gateway health wait takes 15-25 min on staging	2026-04-24 17:11:37 +00:00
Hongming Wang	884fff1145	fix(e2e): pin HERMES_* env vars so openai/* routes deterministically Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs 2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose text is OpenRouter's error format — {'message': 'Missing Authentication header', 'code': 401} (integer code, not OpenAI's string 'invalid_api_key'). template-hermes's derive-provider.sh was picking PROVIDER=openrouter for openai/* models despite template-hermes#19 (the fix that flips openai/* → custom when OPENAI_API_KEY is set) having been merged 01:30Z. Verified via probe workspaces on the staging canary tenant: probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_): → OpenAI's 401 shape ('code': 'invalid_api_key') So derive-provider.sh's updates apparently aren't reaching every staging tenant on re-provision — possibly because tenant EC2s cache /opt/adapter from an earlier boot, or the CP's user-data snapshot bundles a pre-fix template-hermes. That's a separate follow-up (needs forced re-clone of /opt/adapter on every workspace boot). This PR is the test-side workaround. Pinning the HERMES_ bridge env vars bypasses derive-provider.sh entirely, so the test works regardless of which template-hermes commit any given tenant happens to have on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:41:22 -07:00
Hongming Wang	5ebe6ccb33	test: regression guards for 2026-04-23 hermes + CP bug wave Three complementary regression tests for the chain of P0s fixed today. Each targets a specific bug class that reached production, and will fire loud if any of them regress. ## 1. E2E A2A assertion enhancements (tests/e2e/test_staging_full_saas.sh) The existing A2A check looked for "error\|exception" in the response text, which was too broad and missed the actual error patterns we hit. Now matches each known error class individually with a diagnostic fail message pointing at the exact bug: - "[hermes-agent error 401]" → hermes #12 (API_SERVER_KEY) - "hermes-agent unreachable" → gateway process died - "model_not_found" → hermes #13 (model prefix) - "Encrypted content is not supported" → hermes #14 (api_mode) - "Unknown provider" → bridge PROVIDER misconfig Also asserts the response contains the PONG token the prompt asked for — catches silent-truncation/echo regressions. ## 2. Hermes install.sh bridge shell harness (tools/test-hermes-bridge.sh) 4 scenarios × 16 assertions, all offline (no docker, no network): - openai-bridge-happy: OPENAI_API_KEY + openai/gpt-4o → provider=custom, model="gpt-4o" (prefix stripped), api_mode=chat_completions - operator-custom-wins: explicit HERMES_CUSTOM_* → bridge skipped - openrouter-not-touched: OPENROUTER_API_KEY → provider=openrouter, slug kept - non-prefixed-model: bare "gpt-4o" → prefix-strip is a no-op Runs in <1s, can be wired into template-hermes CI. Pins the exact config.yaml shape — any drift in derive-provider.sh or the bridge if-block breaks a test. ## 3. Canvas ConfigTab hermes tests (ConfigTab.hermes.test.tsx) 5 vitest cases covering the #1894 bugs: - Runtime loads from workspace metadata when config.yaml missing - "No config.yaml found" red error hidden for hermes - Hermes info banner shown instead - Langgraph workspace still sees the red error (regression-guard the other way) - config.yaml runtime wins over workspace metadata when present ## Running bash tools/test-hermes-bridge.sh # 16 assertions cd canvas && npx vitest run src/components/tabs/__tests__/ConfigTab.hermes.test.tsx # 5 cases # E2E enhancements ride on the existing staging E2E workflow ## Not yet covered (tracked in #1900) CP admin delete-tenant EC2 cascade, cp-provisioner instance_id lookup (#1738), purge audit SQL mismatch (#241), and pq prepared- statement cache collision (#242). These are in-controlplane-repo concerns — separate PR with CP-side sqlmock + integration tests. Closes items in #1900. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:45:13 -07:00
Hongming Wang	b3da0b29c5	fix(e2e): hermes cold-boot tolerance — 20min deadline + treat failed as transient Today's E2E run 24864011116 timed out at 10 min waiting for workspace to reach online. Hermes cold-boot measured 13 min on the same day's apt mirror (my manual repro on 18.217.175.225). The original 10 min deadline was a ~2x too-tight budget. Also: the `failed` branch was a hard fail, but bootstrap-watcher (cp#245) marks workspace=failed at 5 min if install.sh hasn't finished yet. Heartbeat then transitions failed → online around 10-13 min. Pre this fix, the E2E bailed at the failed read and missed the recovery that was seconds away. ## Changes - Deadline: 10 min → 20 min (hermes worst-case 15 + slack) - `failed` status: now tolerated as transient; loop logs once then keeps polling. Only hard-fails at the final deadline. - Added transition logging (`WS_LAST_STATUS`) so CI output shows the provisioning → failed → online flow instead of silent polling. ## Why not fix cp#245 instead Both should be fixed. cp#245 (bootstrap-watcher deadline) is the root cause; this E2E fix is the defense-in-depth. When cp#245 lands, the `failed` transient log will stop firing but the rest of the logic still protects against other slow-apt-day spikes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:42:52 -07:00
Hongming Wang	de99a22ffc	fix(quickstart): hotfixes discovered during live testing session Five additional breakages surfaced while testing the restored stack end-to-end (spin up Hermes template → click node → open side panel → configure secrets → send chat). Each fix is narrowly scoped and has matching unit or e2e tests so they don't regress. ### 1. SSRF defence blocked loopback A2A on self-hosted Docker handlers/ssrf.go was rejecting `http://127.0.0.1:<port>` workspace URLs as loopback, so POST /workspaces/:id/a2a returned 502 on every Canvas chat send in local-dev. The provisioner on self-hosted Docker publishes each container's A2A port on 127.0.0.1:<ephemeral> — that's the only reachable address for the platform-on-host path. Added `devModeAllowsLoopback()` — allows loopback only when MOLECULE_ENV ∈ {development, dev}. SaaS (MOLECULE_ENV=production) continues to block loopback; every other blocked range (metadata 169.254/16, TEST-NET, CGNAT, link-local) stays blocked in dev mode. Tests: 5 new tests in ssrf_test.go covering dev-mode loopback, dev-mode short-alias ("dev"), production still blocks loopback, dev-mode still blocks every other range, and a 9-case table test of the predicate with case/whitespace/typo variants. ### 2. canvas/src/lib/api.ts: 401 → login redirect broke localhost Every 401 called `redirectToLogin()` which navigates to `/cp/auth/login`. That route exists only on SaaS (mounted by the cp_proxy when CP_UPSTREAM_URL is set). On localhost it 404s — users landed on a blank "404 page not found" instead of seeing the actual error they should fix. Gated the redirect on the SaaS-tenant slug check: on <slug>.moleculesai.app, redirect unchanged; on any non-SaaS host (localhost, LAN IP, reserved subdomains like app.moleculesai.app), throw a real error so the calling component can render a retry affordance. Tests: 4 new vitest cases in a dedicated api-401.test.ts (needs jsdom for window.location.hostname) — SaaS redirects, localhost throws, LAN hostname throws, reserved apex throws. ### 3. SecretsSection rendered a hardcoded key list config/secrets-section.tsx shipped a fixed COMMON_KEYS list (Anthropic / OpenAI / Google / SERP / Model Override) regardless of what the workspace's template actually needed. A Hermes workspace declaring MINIMAX_API_KEY in required_env got five irrelevant slots and nothing for the key it actually needed. Made the slot list template-driven via a new `requiredEnv?: string[]` prop passed down from ConfigTab. Added `KNOWN_LABELS` for well-known names and `humanizeKeyName` to turn arbitrary SCREAMING_SNAKE_CASE into a readable label (e.g. MINIMAX_API_KEY → "Minimax API Key"). Acronyms (API, URL, ID, SDK, MCP, LLM, AI) stay uppercase. Legacy fallback preserved when required_env is empty. Tests: 8 new vitest cases covering known-label lookup, humanise fallback, acronym preservation, deduplication, and both fallback paths. ### 4. Confusing placeholder in Required Env Vars field The TagList in ConfigTab labelled "Required Env Vars (from template)" is a DECLARATION field — stores variable names. The placeholder "e.g. CLAUDE_CODE_OAUTH_TOKEN" suggested that, but users naturally typed the value of their API key into the field instead. The actual values go in the Secrets section further down the tab. Relabelled to "Required Env Var Names (from template)", changed the placeholder to "variable NAME (e.g. ANTHROPIC_API_KEY) — not the value", and added a one-line helper below pointing to Secrets. ### 5. Agent chat replies rendered 2-3 times Three delivery paths can fire for a single agent reply — HTTP response to POST /a2a, A2A_RESPONSE WS event, and a send_message_to_user WS push. Paths 2↔3 were already guarded by `sendingFromAPIRef`; path 1 had no guard. Hermes emits both the reply body AND a send_message_to_user with the same text, which manifested as duplicate bubbles with identical timestamps. Added `appendMessageDeduped(prev, msg, windowMs = 3000)` in chat/types.ts — dedupes on (role, content) within a 3s window. Threaded into all three setMessages call sites. The window is short enough that legitimate repeat messages ("hi", "hi") from a real user/agent a few seconds apart still render. Tests: 8 new vitest cases covering empty history, different content, duplicate within window, different roles, window elapsed, stale match, malformed timestamps, and custom window. ### 6. New end-to-end regression test tests/e2e/test_dev_mode.sh — 7 HTTP assertions that run against a live platform with MOLECULE_ENV=development and catch regressions on all the dev-mode escape hatches in a single pass: AdminAuth (empty DB + after-token), WorkspaceAuth (/activity, /delegations), AdminAuth on /approvals/pending, and the populated /org/templates response. Shellcheck-clean. ### Test sweep - `go test -race ./internal/handlers/ ./internal/middleware/ ./internal/provisioner/` — all pass - `npx vitest run` in canvas — 922/922 pass (up from 902) - `shellcheck --severity=warning infra/scripts/setup.sh tests/e2e/test_dev_mode.sh` — clean - `bash tests/e2e/test_dev_mode.sh` — 7/7 pass against a live platform + populated template registry ### SaaS parity Every relaxation remains conditional on MOLECULE_ENV=development. Production tenants run MOLECULE_ENV=production (enforced by the secrets-encryption strict-init path) and always set ADMIN_TOKEN, so none of these code paths fire on hosted SaaS. Behaviour on real tenants is byte-for-byte unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:57:18 -07:00
Hongming Wang	786a8470e5	fix(e2e/staging-saas): send provider-prefixed model slug for hermes The E2E posts a bare "gpt-4o" as the workspace model. Hermes template's derive-provider.sh parses the slug PREFIX (before the slash) to set HERMES_INFERENCE_PROVIDER at install time. With no prefix, provider falls back to hermes's auto-detect, which picks the compiled-in Anthropic default. Hermes-agent then tries the Anthropic API with the OpenAI key the E2E passed in SECRETS_JSON and returns 401 "Invalid API key" at step 8/11 (A2A call). Same trap PR #1714 fixed for the canvas Create flow. The E2E was quietly broken on the same vector — it masked before today because workspaces never reached "online" (pre-#231 install.sh hook missing on staging; staging now deploys #231 via CP #236). Fix: pin MODEL_SLUG="openai/gpt-4o" since the E2E's secret is always the OpenAI key. Non-hermes runtimes ignore the prefix. Now that both layers are fixed (install.sh runs AND the slug steers hermes to OpenAI), the E2E should reach step 11/11. Evidence from run 24822173171 attempt 2 (post-CP-#236 deploy): 07:55:25 ✅ CP reachable 07:57:28 ✅ Tenant provisioning complete (2:03, canary) 08:04:56 ✅ Workspace 52107c1a online (7:28, install.sh ran!) 08:05:06 ✅ Workspace 34a286df online 08:05:06 ❌ A2A 401 — hermes tried Anthropic with OpenAI key Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 01:43:55 -07:00
Hongming Wang	5e130b7e6f	fix(e2e): delegation raw curl missing X-Molecule-Org-Id Section 10's delegation call is a raw curl (not tenant_call, because it carries an additional X-Source-Workspace-Id). It was missing X-Molecule-Org-Id, which TenantGuard requires — so the tenant 404'd every delegation probe despite section 8's A2A call (via tenant_call) working correctly. Repro: staging run 2026-04-21T17:40Z had section 8 green (PONG) and section 10 red (rc=22) on the same workspace. Only difference was the missing header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:41:17 -07:00
Hongming Wang	b8b3d5ce1f	fix(e2e): MODEL_PROVIDER is provider:model slug, not just provider workspace/config.py:258 reads MODEL_PROVIDER as the full model string (format 'provider:model', e.g. 'anthropic:claude-opus-4-7'). My prior 'openai' alone got parsed as the model name → 404 model_not_found. Use 'openai:gpt-4o' and also set OPENAI_BASE_URL to api.openai.com (default was openrouter.ai which takes different key format). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:33:27 -07:00
Hongming Wang	392282c518	fix(e2e): set MODEL_PROVIDER=openai for Hermes runtime Hermes's provider resolver checks ANTHROPIC_API_KEY first (resolution order puts anthropic before openai). Without MODEL_PROVIDER=openai explicitly set, Hermes defaults to claude-sonnet-4-6 against the OpenAI endpoint and 404s with model_not_found. Staging E2E run 2026-04-21T17:24Z hit this after every earlier fix landed (workspace online, A2A ready) — last remaining blocker for the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:24:58 -07:00
Hongming Wang	5be20ac1cf	fix(e2e): inject OPENAI_API_KEY into workspace secrets Workspace runtimes (hermes, langgraph, etc.) crash at boot with 'No provider API key found' when no ANTHROPIC_API_KEY / OPENAI_API_KEY / etc. is set. Harness previously sent no secrets → workspace sat in provisioning for 10 min → harness timed out. Console log from staging run 2026-04-21T17:08Z showed the exact crash: ValueError: No Hermes provider API key found. Set any one of: ANTHROPIC_API_KEY, HERMES_API_KEY, NOUS_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, ... Read E2E_OPENAI_API_KEY from env and inject into both parent and child workspace POST bodies via the secrets field (persists as workspace_secret, materialises into container env). Empty key falls through — dev can still run smoke tests, workspace just won't reach online. For CI, a new repo secret MOLECULE_STAGING_OPENAI_KEY needs to be added and passed as E2E_OPENAI_API_KEY in the workflow env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:18:14 -07:00
Hongming Wang	e9d111dbc6	fix(e2e): send X-Molecule-Org-Id on tenant calls TenantGuard middleware on the tenant platform returns 404 (not 403, by design — avoid leaking tenant existence to org scanners) when requests lack X-Molecule-Org-Id matching MOLECULE_ORG_ID. Harness hit this on POST /workspaces (section 5) despite having a valid Authorization bearer. - Capture org_id from admin-create response - Send X-Molecule-Org-Id on every tenant_call Confirmed via manual repro 2026-04-21T14:56Z: curl with Bearer but no org-id header → 404; with both headers → expected route reached. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 07:59:25 -07:00
Hongming Wang	37a02d6f5a	fix(e2e): derive tenant domain from CP URL (staging vs prod) Previous hardcode `$SLUG.moleculesai.app` only matched prod. Staging tenants live at `$SLUG.staging.moleculesai.app`, so the harness hit DNS for a nonexistent host and timed out at section 4 even after provisioning succeeded. Derive from CP URL: api.X → X, staging-api.X → staging.X. Override via MOLECULE_TENANT_DOMAIN for self-hosted setups. Confirmed gap on manual run 2026-04-21T14:40Z: section 2 passed in 2min but section 4 timed out at 3min on the wrong hostname. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 07:46:16 -07:00
Hongming Wang	a510573172	fix(e2e): poll instance_status not status in staging harness /cp/admin/orgs exposes `instance_status` (COALESCE'd from org_instances.status), NOT a top-level `status` field. The harness polled the wrong field and always read empty → timed out at 15min on a tenant that had actually provisioned successfully (confirmed 2026-04-21T14:22Z: EC2 launched, canary ok, but harness never saw status=running). No code change to the admin API — the field has never been named `status`. The harness just had a typo that happened to type-check (the Go struct hasn't changed, only the sh/py polling was wrong). Now the harness correctly reads `instance_status` and the main provision poll loop terminates on the expected transition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 07:40:03 -07:00
Hongming Wang	6bd674e412	fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token' Verified against live staging: the admin endpoint returns 400 'confirm field must equal the URL slug' when the body key is 'confirm_token'. Every workflow's safety-net teardown step + the main harness + the Playwright teardown all had the wrong key. Fixed all six call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:50:28 -07:00
Hongming Wang	d7193dfa34	feat(e2e): pivot to admin-bearer-only auth + add sanity self-check workflow Reduces required secret surface from 2 (session cookie + admin token) to 1 (admin token). Pairs with molecule-controlplane#202 which adds: - POST /cp/admin/orgs — server-to-server org creation - GET /cp/admin/orgs/:slug/admin-token — per-tenant bearer fetch With those endpoints live, CI doesn't need to scrape a browser WorkOS session cookie. CP admin bearer (Railway CP_ADMIN_API_TOKEN) drives provision + tenant-token retrieval + teardown through a single credential. Changes ------- test_staging_full_saas.sh: admin bearer for provision/teardown, fetched per-tenant token drives all tenant API calls. Added E2E_INTENTIONAL_FAILURE=1 toggle that poisons the tenant token after provisioning so the teardown path gets exercised when the happy-path isn't. canvas/e2e/staging-setup.ts: same pivot; exports STAGING_TENANT_TOKEN instead of STAGING_SESSION_COOKIE. canvas/e2e/staging-tabs.spec.ts: context.setExtraHTTPHeaders with Authorization: Bearer on every page request, no cookie handling. All three workflows (e2e-staging-saas, canary-staging, e2e-staging-canvas): drop MOLECULE_STAGING_SESSION_COOKIE env + verification step. One secret to set. NEW e2e-staging-sanity.yml: weekly Mon 06:00 UTC. Runs the harness with E2E_INTENTIONAL_FAILURE=1 and inverts the pass condition — rc=1 is green, rc=0 (unexpected success) or rc=4 (leak) open a priority-high issue labelled e2e-safety-net. This is the answer to 'how do we know the teardown path still works when nothing else has failed recently.' STAGING_SAAS_E2E.md refreshed: single-secret setup, sanity workflow documented, canvas workflow added to the coverage matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:34:11 -07:00
Hongming Wang	f4700858ac	feat(e2e): canary + canvas Playwright workflows; delegation mechanics Three additions on top of `187a9bf`: 1. Canary (.github/workflows/canary-staging.yml) 30-min cron that runs the full-SaaS harness in E2E_MODE=canary: one hermes workspace + one A2A PONG + teardown. ~8-min wall clock vs ~20-min for the full run. Alerting is self-contained: opens a single 'Canary failing' issue on first failure, comments on subsequent failures (no issue spam), auto-closes the issue on the next green run. Labels: canary-staging, bug. Safety-net teardown step sweeps e2e-YYYYMMDD-canary-* orgs tagged today so a runner cancel can't leak EC2. 2. Canvas Playwright (canvas/e2e/staging-*.ts + playwright.staging.config.ts + .github/workflows/e2e-staging-canvas.yml) staging-setup.ts provisions a fresh org + hermes workspace (same lifecycle as the bash harness, just in TypeScript). staging-tabs.spec.ts clicks through all 13 workspace-panel tabs (chat, activity, details, skills, terminal, config, schedule, channels, files, memory, traces, events, audit) and asserts each renders without crashing and without 'Failed to load' error toasts. Known SaaS gaps (Files empty, Terminal disconnects, Peers 401) are documented in #1369 and whitelisted so they don't fail the test — the gate is 'no hard crash', not 'no issues'. staging-teardown.ts deletes the org via DELETE /cp/admin/tenants/:slug. playwright.staging.config.ts separates staging from local tests so pnpm test in dev doesn't try to provision against staging. Retries=2 and timeouts are longer; workers=1 because the setup provisions one shared workspace. Workflow uploads HTML report + screenshots on failure for 14 days. 3. Delegation mechanics (tests/e2e/test_staging_full_saas.sh section 10) Parent → child proxy test: POST /workspaces/CHILD/a2a with X-Source-Workspace-Id=PARENT and verify the child responds + child activity log captures PARENT as source. Intentionally LLM-free: the mechanics regression is what matters; prompt-driven delegation correctness belongs in canvas-driven tests. Also reorders teardown step to 11/11 since delegation is 10/11. Mode gating: E2E_MODE=canary -> skips child workspace, HMA memory, peers, activity, delegation (steps 6, 9, 10 no-op). Full-lifecycle still runs every piece. Validated both paths via 'bash -n' syntax check after each edit. Secrets requirement unchanged (same two secrets as `187a9bf`): MOLECULE_STAGING_SESSION_COOKIE, MOLECULE_STAGING_ADMIN_TOKEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:15:10 -07:00
Hongming Wang	187a9bf87a	feat(e2e): staging full-SaaS workflow — per-run org provision + leak-free teardown Dedicated CI/CD lane that exercises the whole SaaS cross-EC2 shape end to end, against live staging: 1. Accept terms / create org (POST /cp/orgs) — catches ToS gate, slug validation, billing/quota, member insert regressions. 2. Wait for tenant EC2 + cloudflared tunnel + TLS propagation (up to 15 min cold). 3. Provision a parent + child workspace via the tenant URL. 4. Wait both online (exercises the SaaS register + token bootstrap flow fixed in #1364). 5. A2A round-trip on parent — validates the full LLM loop (MCP tools, provider auth, JSON-RPC response shape, proxy SSRF gate). 6. HMA memory write + read — validates awareness namespace + scope routing. 7. Peers + activity smoke — route-registration regression guard. 8. Teardown via DELETE /cp/admin/tenants/:slug + leak assertion — a leaked org at teardown fails CI with exit 4. Why a dedicated workflow (not folded into ci.yml): - ~20 min wall clock per run (EC2 boot is the long pole). Too slow for every PR push. - Needs its own concurrency group (staging has an org-create quota and two overlapping runs would race on slug prefix). - Distinct secret surface (session cookie + admin bearer) — keep it off PR jobs that don't need them. Triggers: push to main (provisioning-critical paths only), PRs on the same paths, manual workflow_dispatch (with runtime + keep_org inputs), and 07:00 UTC nightly cron for drift detection. Belt-and-braces teardown: the script installs an EXIT trap, and the workflow has an always()-step that greps e2e-YYYYMMDD-* orgs created today and force-deletes them via the idempotent admin endpoint. Covers the case where GH cancels the runner before the trap fires. Docs: tests/e2e/STAGING_SAAS_E2E.md — what's covered, how to provision the two required secrets, local-dev notes, cost (~$0.007/run), known gaps (canvas UI + delegation + claude-code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 03:54:09 -07:00
Hongming Wang	f32196d351	fix(e2e): stop asserting current_task on public workspace GET (#966 ) PR #966 intentionally stripped current_task, last_sample_error, and workspace_dir from the public GET /workspaces/:id response to avoid leaking task bodies to anyone with a workspace bearer. The E2E smoke test hadn't caught up — it was still asserting "current_task":"..." on the single-workspace GET, which made every post-#966 CI run fail with '60 passed, 2 failed'. Swap the per-workspace asserts to check active_tasks (still exposed, canonical busy signal) and keep the list-endpoint check that proves admin-auth'd callers still see current_task end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 02:19:15 -07:00
Hongming Wang	39074cc4ae	chore: final open-source cleanup — binary, stale paths, private refs - Remove compiled workspace-server/server binary from git - Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs - Fix CI workflow path filters (workspace-template → workspace) - Replace real EC2 IP and personal slug in test_saas_tenant.sh - Scrub molecule-controlplane references in docs - Fix stale workspace-template/ paths in provisioner, handlers, tests - Clean tracked Python cache files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:38:55 -07:00
Hongming Wang	ca7e9972ff	fix: remaining platform/ path references in scripts, tests, compose Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:32:03 -07:00
Hongming Wang	36d80b2024	fix: correct RAISE NOTICE parameter — %% → % for Postgres syntax The migration SQL is read as raw SQL (not through Go fmt.Sprintf), so %% is two parameters, not an escaped percent. Postgres RAISE uses single % for parameter substitution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 13:20:58 -07:00
Hongming Wang	3d988f7367	fix(e2e): clear ADMIN_TOKEN after last workspace delete so AdminAuth fail-opens	2026-04-16 06:34:17 -07:00
Hongming Wang	e691065b0a	fix(e2e): fall back to test-token when register doesn't return a new token On re-registration (workspace already has tokens), the register endpoint doesn't issue a new token — it returns the existing one in the response or omits it. The e2e_extract_token helper returns empty in that case. Fall back to the per-workspace token we already minted via test-token.	2026-04-16 06:29:44 -07:00
Hongming Wang	1c00be1d09	fix(e2e): use per-workspace tokens for register + heartbeat + discover AdminAuth (admin token) gates workspace CRUD operations. WorkspaceAuth (per-workspace token) gates register, heartbeat, discover. The test now mints a workspace-specific token via test-token endpoint for each workspace before calling register.	2026-04-16 06:22:16 -07:00
Hongming Wang	8a070f0077	fix(e2e): use acurl for registry/register + re-register calls (C18 auth)	2026-04-16 06:15:39 -07:00
Hongming Wang	854d2b688d	fix(e2e): read auth_token not token from test-token response	2026-04-16 06:11:32 -07:00
Hongming Wang	00ad6b246e	debug: add test-token response logging to e2e	2026-04-16 06:08:58 -07:00
Hongming Wang	9f35f1fecf	fix(e2e): use admin bearer token for AdminAuth-gated API calls After the first workspace is created and the test-token endpoint mints a bearer, HasAnyLiveTokenGlobal returns true. All subsequent calls to AdminAuth-gated routes (workspace CRUD, events, bundles, etc.) need the token. Added acurl() helper that attaches the token when available.	2026-04-16 06:05:13 -07:00
Hongming Wang	8f23908304	fix(tests): add auth headers to e2e GET /events + /bundles/export (post #167 ) PR #167 gated /events and /bundles/export/:id behind AdminAuth. The e2e script's 3 calls to these routes were unauthenticated and broke when the runner picked them up for the first time on PR #186 (self-hosted runner migration). Same admin-gate contract, same fix pattern as the #99/#110 e2e hotfixes. POST /bundles/import is left unauthenticated because by that point in the script both workspaces have been deleted and #110 revoked their tokens, so HasAnyLiveTokenGlobal=0 and AdminAuth fails-open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:33:38 -07:00
DevOps Engineer	543b895d3f	fix(security): revoke workspace tokens on delete (root-cause fix for C1 E2E) The Delete handler marked workspaces 'removed' but never touched workspace_auth_tokens. That left stale live tokens in the table, so HasAnyLiveTokenGlobal stayed true after the last workspace was deleted. AdminAuth then blocked the unauthenticated GET /workspaces in the E2E count-zero assertion with 401, and the previous commit worked around it by commenting out the assertion. This commit fixes the root cause: - workspace.go Delete: batch-revoke auth tokens for all deleted workspace IDs (including descendants) immediately after the canvas_layouts clean-up, using the same pq.Array pattern as the status update. - workspace_test.go TestWorkspaceDelete_CascadeWithChildren: add the expected UPDATE workspace_auth_tokens SET revoked_at sqlmock expectation. - tests/e2e/test_api.sh: restore the count=0 post-delete assertion (now passes because tokens are revoked → fail-open), capture NEW_TOKEN from the re-imported workspace registration for the final cleanup call (SUM_TOKEN is revoked after SUM_ID is deleted). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:28:10 +00:00
Hongming Wang	b95bf36690	Merge pull request #99 from Molecule-AI/fix/auth-middleware-critical fix(security): C1 — auth-gate GET /workspaces + middleware test coverage (C4/C8/C10/C11)	2026-04-15 00:26:10 -07:00
Hongming Wang	190104b8f5	test(e2e): skip count=0 post-delete assertion — conflicts with #99 C1 gate Soft-delete leaves workspace_auth_tokens rows alive, so HasAnyLiveTokenGlobal stays non-zero and admin-auth 401s an unauth GET /workspaces. The assertion was verifying deletion, not auth; the bundle round-trip below still covers the deletion path end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:22:02 -07:00
Hongming Wang	68faf6d0d1	test(e2e): pass bearer token to admin-gated GET /workspaces calls C1 fix (#99) moved GET /workspaces behind AdminAuth. Three late-script calls that run after tokens exist now include Authorization headers; the post-delete-all call stays anonymous since revoked tokens trigger the no-live-token fail-open path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:11:29 -07:00
Hongming Wang	496dee8e13	feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6 ) Adds a gated admin endpoint that mints a fresh workspace bearer token on demand, eliminating the register-race currently used by test_comprehensive_e2e.sh (PR #5 follow-up). - New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403). - Mints via wsauth.IssueToken; logs at INFO without the token itself. - Verifies workspace exists before minting (missing -> 404, never 500). - Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace, and happy-path + token-validates round trip. - tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption. - CLAUDE.md updated with route + env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:35:26 -07:00
Hongming Wang	602f3ef685	fix(provisioner): stop rogue config-missing restart loop (#17 ) Resolves #17. Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id or name starts with known test placeholder prefixes (aaaaaaaa-, etc.) and force-removes the paired Docker container. Documented in tests/README.md. Part B: add a pre-flight check in provisionWorkspace() — when neither a template path nor in-memory configFiles supplies config.yaml, probe the existing named volume via a throwaway alpine container. If the volume lacks config.yaml, mark the workspace status='failed' with a clear last_sample_error instead of handing it to Docker's unless-stopped restart policy (which otherwise loops forever on FileNotFoundError). New pure helper provisioner.ValidateConfigSource + unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:32:58 -07:00

1 2

60 Commits