molecule-core

Author	SHA1	Message	Date
Hongming Wang	c45aa8d7ee	Merge pull request #2587 from Molecule-AI/auto-sync/main-b4e45374 chore: sync main → staging (auto, ff to `b4e45374`)	2026-05-03 14:19:28 +00:00
Hongming Wang	b4e45374bf	Merge pull request #2586 from Molecule-AI/fix/auto-promote-app-token fix(auto-promote): use App token for auto-merge to fire downstream cascade (#2357)	2026-05-03 07:15:31 -07:00
Hongming Wang	f2d69f0088	Merge pull request #2585 from Molecule-AI/fix/canvas-loading-state-aria fix(canvas): add role=status + aria-live to remaining loading states	2026-05-03 14:14:33 +00:00
Hongming Wang	bc11ed8a2b	fix(auto-promote): use App token for auto-merge to fire downstream cascade (#2357 ) GITHUB_TOKEN-initiated merges suppress the downstream `push` event on main per GitHub's documented limitation: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow Result before this fix: every staging→main promote landed silently — publish-workspace-server-image, canary-verify, and redeploy-tenants-on-main all stayed dark. The polling tail was the SOLE cascade trigger; if it ever 30-min-timed-out the chain dead-locked invisibly. Symptom (from the issue body, 2026-04-30): \| Time \| Event \| Triggered? \| \|----------\|--------------------------------------------------\|-----------\| \| 05:48:04 \| Promote PR #2352 merged (`c140ad28`) \| No fired \| \| 06:07:29 \| Promote PR #2356 merged (`5973c9bd`) \| No fired \| Fix: mint the molecule-ai App token BEFORE the promote-PR step and hand it to the auto-merge call. App-token-initiated merges DO trigger downstream workflow_run cascades. The polling tail stays as defense-in-depth (with comments updated): once we've observed >=10 successful natural cascades it can be dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 07:13:26 -07:00
Hongming Wang	e2328abedc	fix(canvas): add role=status + aria-live to remaining loading states Three loading-state divs were missing the role/aria pattern that TemplatePalette.tsx and EmptyState.tsx already follow. Screen readers get no signal that the page is waiting: - canvas/src/app/page.tsx — full-screen "Loading canvas..." while the websocket hydrates. First paint of the entire app. - canvas/src/components/settings/TokensTab.tsx — "Loading tokens..." - canvas/src/components/settings/OrgTokensTab.tsx — "Loading keys..." Add role="status" + aria-live="polite" to the wrapping div so assistive tech announces the wait and the eventual transition. Visual rendering unchanged.	2026-05-03 07:11:48 -07:00
github-actions[bot]	bdad75ae3e	Merge pull request #2582 from Molecule-AI/staging staging → main: auto-promote `90ba2cd`	2026-05-03 07:06:58 -07:00
Hongming Wang	90ba2cd4df	Merge pull request #2580 from Molecule-AI/auto-sync/main-b002247f chore: sync main → staging (auto, ff to `b002247f`)	2026-05-03 13:54:03 +00:00
Hongming Wang	b002247f12	Merge pull request #2576 from Molecule-AI/staging staging → main: auto-promote `effbcd7`	2026-05-03 06:44:31 -07:00
Hongming Wang	03bcce3eb3	Merge pull request #2574 from Molecule-AI/auto-sync/main-55d85147 chore: sync main → staging (auto, ff to `55d85147`)	2026-05-03 13:18:34 +00:00
Hongming Wang	c74e71d604	Merge branch 'staging' into auto-sync/main-55d85147	2026-05-03 06:07:20 -07:00
Hongming Wang	d7f88674d8	Merge pull request #2577 from Molecule-AI/fix/canvas-tier-legend-t3-t4-contract fix(canvas): align tier text contracts with 4-tier reality (T1/T2/T3/T4)	2026-05-03 12:58:52 +00:00
Hongming Wang	7abb94dab8	fix(canvas): align tier text contracts with 4-tier reality (T1/T2/T3/T4) The tier system in CreateWorkspaceDialog and design-tokens has been T1 Sandboxed / T2 Standard / T3 Privileged / T4 Full Access, but two chrome surfaces still showed the older 3-tier mapping with T3 as "Full Access": - Legend (bottom-left chrome on every canvas page) listed only T1/T2/T3 and called T3 "Full Access". On a SaaS tenant the actual workspace badges render T4 (in amber/warm) — there was no T4 entry in the legend at all, so the user sees an undocumented orange badge. - ConfigTab tier dropdown (per-workspace settings → Sandboxing) had no T4 option at all and called T3 "Full Access". So an existing T4 workspace would show "T3 — Full Access" as the selected option, silently downgrading the displayed tier on the settings panel. - tenant.ts isSaaSTenant() doc comment claimed SaaS workspaces are "inherently T3 Full Access" — wrong on both the number and the lock rationale (SaaS hides T1/T2/T3, not just T1/T2). Fix: - Legend now imports TIER_CONFIG and renders all four tiers (Sandboxed/Standard/Privileged/Full Access) using the same color swatches as the badges on workspace cards. Eliminates the previous drift where Legend's hardcoded sky/violet/warm chips didn't match the gray/sky/violet/amber actually rendered on nodes. - ConfigTab adds the missing T4 — Full Access option and renames T3 to Privileged. - tenant.ts comment updated to match the picker's actual hide list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 05:56:18 -07:00
Hongming Wang	effbcd737b	Merge pull request #2575 from Molecule-AI/fix/cascade-include-all-active-templates fix(publish-runtime): re-add 5 templates wrongly removed from cascade — fixes #2566	2026-05-03 12:45:48 +00:00
Hongming Wang	6eb79adfd5	manifest: re-add 5 workspace templates pruned by #2536 The cascade-list-vs-manifest drift gate (PR #2556's behavior-based test) caught my previous-commit cascade additions as 'extra-in-cascade'. Manifest is the source of truth — restoring there. All 5 templates have successful publish-image runs in the past 24h (verified before the cascade fix), and continuous-synth-e2e defaults to langgraph as its primary canary. None deprecated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 05:43:07 -07:00
Hongming Wang	8f48a38550	fix(publish-runtime): re-add 5 templates wrongly removed from cascade (#2566 ) The PR #2536 cascade prune ('deprecated, no shipping images') was empirically wrong. Re-confirmed 2026-05-03: - continuous-synth-e2e.yml defaults to langgraph as its primary canary - All 5 'deprecated' templates have successful publish-image runs in the past 24h: langgraph, crewai, autogen, deepagents, gemini-cli Symptom this fixes — issue #2566 (priority-high, failing 36+h): Synthetic E2E (staging): langgraph adapter A2A failure 'Received Message object in task mode' — failing for >36h Today at 11:06 commit `e1628c4` fixed the underlying a2a-sdk strict-mode issue in workspace/a2a_executor.py. publish-runtime fired at 11:13 and cascaded — but only to claude-code, hermes, openclaw, codex. langgraph was excluded by the prune, so its image stayed on the broken runtime and the synth E2E (which defaults to langgraph) kept failing despite the fix being live in PyPI. After this lands + the next runtime publish fires, langgraph image re-bakes with the fix and synth-E2E goes green. Test plan: - [x] yaml-validate the workflow - [ ] After merge, watch publish-runtime cascade to all 9 templates - [ ] Confirm langgraph publish-image fires + succeeds - [ ] Confirm next continuous-synth-e2e run goes green Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 05:41:53 -07:00
github-actions[bot]	55d85147f7	Merge pull request #2573 from Molecule-AI/staging staging → main: auto-promote `dc6425f`	2026-05-03 05:34:23 -07:00
github-actions[bot]	f7e8f98cf7	Merge pull request #2570 from Molecule-AI/staging staging → main: auto-promote `173e22e`	2026-05-03 12:22:52 +00:00
Hongming Wang	dc6425fe39	Merge pull request #2571 from Molecule-AI/fix/synth-e2e-model-slug-by-runtime fix(synth-e2e): branch MODEL_SLUG by runtime so langgraph gets colon-form	2026-05-03 12:22:19 +00:00
Hongming Wang	cbc69f5e7e	fix(synth-e2e): branch MODEL_SLUG by runtime so langgraph gets colon-form The original script hardcoded `MODEL_SLUG="openai/gpt-4o"` (slash) and claimed "non-hermes runtimes ignore the prefix" — wrong for langgraph, which delegates model resolution to langchain's `init_chat_model`. That function requires `<provider>:<model>` (colon) and treats slash-form as OpenRouter routing, falling through without auth even when OPENAI_API_KEY is set. Surfaced 2026-05-03 after the a2a-sdk v1 contract bugs (PR #2558+#2563+#2567) cleared the masking layers — synth-E2E firing 2026-05-03T12:14 returned a properly-shaped task with state=failed + "Could not resolve authentication method" inside the agent body. continuous-synth-e2e.yml defaults E2E_RUNTIME=langgraph for the cron, so every firing hit this. Hermes still gets the slash-form it needs; claude-code uses the entry-id pattern. Adds E2E_MODEL_SLUG override for operator-dispatched runs that want to pin a specific slug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 05:17:55 -07:00
Hongming Wang	c71f641b12	Merge pull request #2569 from Molecule-AI/fix/redeploy-canary-default ci(redeploy): fix stale canary_slug default 'hongmingwang' → 'hongming'	2026-05-03 12:08:26 +00:00
Hongming Wang	173e22e091	Merge pull request #2568 from Molecule-AI/auto-sync/main-c0838d63 chore: sync main → staging (auto, ff to `c0838d63`)	2026-05-03 12:07:29 +00:00
Hongming Wang	60a516bc8d	ci(redeploy): fix stale canary_slug default 'hongmingwang' → 'hongming' The workflow_dispatch input default and the workflow_run env fallback both pointed at 'hongmingwang', which doesn't match any current prod tenant (slugs are: hongming, chloe-dong, reno-stars). CP silently skipped the missing canary and put every tenant in batch-1 in parallel, defeating the canary-first soak gate that exists to catch image-boot regressions before they hit the whole fleet. Concrete example from today's `c0838d6` redeploy at 11:53Z (run 25278434388): the dispatched body was `{"target_tag":"staging-c0838d6","canary_slug":"hongmingwang",...}` and the CP response showed all 3 tenants in `"phase":"batch-1"` — no soak, no canary. The deploy happened to be safe, but a broken image would have hit hongming + chloe-dong + reno-stars simultaneously. Fixed in three places: the runtime ordering comment, the workflow_dispatch default, and the env fallback used by the workflow_run trigger. Comment documents the rationale so the next slug rename doesn't silently regress this again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 05:06:01 -07:00
Hongming Wang	c0838d637e	Merge pull request #2562 from Molecule-AI/staging staging → main: auto-promote `bb63e60`	2026-05-03 04:49:36 -07:00
Hongming Wang	493ab2566e	Merge pull request #2567 from Molecule-AI/fix/synth-e2e-openai-key ci(synth-e2e): wire MOLECULE_STAGING_OPENAI_KEY into provisioned tenant	2026-05-03 11:45:17 +00:00
Hongming Wang	5e46ea70d6	ci(synth-e2e): wire MOLECULE_STAGING_OPENAI_KEY into provisioned tenant The synth-E2E (#2342) provisions a langgraph tenant whose default model `openai:gpt-4.1-mini` requires OPENAI_API_KEY for the first LLM call. Sibling workflows already wire this: - e2e-staging-saas.yml:89 - canary-staging.yml:63 continuous-synth-e2e.yml just forgot. Result: tenant boots, accepts a2a messages, then returns: Agent error: "Could not resolve authentication method. Expected either api_key or auth_token to be set." This was masked since 2026-04-29 (workflow creation) by a2a-sdk v0→v1 contract violations — PR #2558 (Task-enqueue) and #2563 (TaskUpdater.complete/failed terminal events) cleared those, exposing the underlying auth gap on the synth-E2E firing at 11:39 UTC today. The script tests/e2e/test_staging_full_saas.sh:325 already reads E2E_OPENAI_API_KEY and persists it as a workspace_secret on tenant create — only the workflow wiring was missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:43:07 -07:00
Hongming Wang	5cf3dc4369	Merge pull request #2565 from Molecule-AI/fix/redeploy-soft-warn-rt-e2e-prefix ci(deploy): broaden ephemeral-prefix matchers to cover rt-e2e-*	2026-05-03 11:30:52 +00:00
Hongming Wang	596e797dca	ci(deploy): broaden ephemeral-prefix matchers to cover rt-e2e-* The redeploy-tenants-on-staging soft-warn filter and the sweep-stale-e2e-orgs janitor both hardcoded `^e2e-` to identify ephemeral test tenants. Runtime-test harness fixtures (RFC #2251) mint slugs prefixed with `rt-e2e-`, which neither matcher recognized. Concrete impact observed today: - Two `rt-e2e-v{5,6}-` tenants left orphaned 8h on staging (sweep-stale-e2e-orgs ignored them). - On the next staging redeploy their phantom EC2s returned `InvalidInstanceId: Instances not in a valid state for account` from SSM SendCommand → CP returned HTTP 500 + ok=false. - The redeploy soft-warn missed them too, so the workflow went red, which broke the auto-promote-staging chain feeding the canvas warm-paper rollout to prod. Fix: switch both matchers to recognize the alternation `^(e2e-\|rt-e2e-)`. Long-lived prefixes (demo-prep, dryrun-, dryrun2-*) remain non-ephemeral and continue to hard-fail. Comment documents the source-of-truth list and the cross-file invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:28:29 -07:00
Hongming Wang	3ce638d6e6	Merge pull request #2564 from Molecule-AI/fix/canvas-react-flow-color-mode fix(canvas): wire ReactFlow colorMode to resolvedTheme	2026-05-03 11:14:13 +00:00
Hongming Wang	df7edfcd3f	fix(canvas): wire ReactFlow colorMode to resolvedTheme PR #2555 (Tailwind v4 + warm-paper) migrated all canvas chrome (toolbar, side panel, modal layer) to semantic tokens, but missed the React Flow viewport's `colorMode="dark"` literal — and two paired hardcoded dark literals on the Background dot color and MiniMap mask. Net result on prod: the user picked light mode, the toolbar flipped warm-paper, but the canvas backplate, edges, dots, controls, and minimap stayed black — visibly half-themed. Three coordinated fixes inside the canvas viewport: - ReactFlow `colorMode={resolvedTheme}` so the library's own dark/light styles flip with the user's choice. - Background dot color picks the line-soft tone in light mode (zinc-800 was invisible-on-cream). - MiniMap maskColor warm-tints the off-viewport dim so the unselected region doesn't render as a hard black bar over warm-paper. Verification: - `npx tsc --noEmit` clean - `npx vitest run` 188/188 pass - (will browser-verify post-redeploy on hongming.moleculesai.app) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:11:35 -07:00
Hongming Wang	3ecb25eb4f	Merge pull request #2563 from Molecule-AI/fix/a2a-v1-terminal-event fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode	2026-05-03 11:09:09 +00:00
Hongming Wang	e1628c4d56	fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode PR #2558 enqueued a Task at the start of new requests so the v1 SDK would accept TaskUpdater.start_work() — fix #1 of the v0→v1 migration gap (PR #2170). But after Task is enqueued, the executor enters "task mode" and the SDK rejects raw Message enqueues at the terminal step: {"code":-32603,"message":"Received Message object in task mode. Use TaskStatusUpdateEvent or TaskArtifactUpdateEvent instead."} Synth-E2E 2026-05-03T11:00:34Z surfaced this on the very first run after the prior fix cascaded. Validation site is the same a2a/server/agent_execution/active_task.py — the framework's job is to enforce the v1 invariant; we're catching up to it. The fix routes both terminal events through TaskUpdater helpers: - success: updater.complete(message=msg) wraps in TaskStatusUpdateEvent(state=COMPLETED, final=True) - error: updater.failed(message=...) wraps in TaskStatusUpdateEvent(state=FAILED, final=True) Both helpers exist in a2a-sdk ≥ 1.0; verified via TaskUpdater.complete signature. Tests: - conftest TaskUpdater stub now records complete/failed calls AND routes the message back through event_queue.enqueue_event so the ~20 legacy tests asserting on enqueue_event keep working - 2 new regression tests pin the contract: * test_terminal_success_routes_via_updater_complete * test_terminal_error_routes_via_updater_failed - Both NEW tests verified to FAIL on staging-baseline (without this fix) and PASS with it — they'd catch the regression before staging if the wheel-smoke gate covered task-mode terminal events too (separate yak-shave for #131 follow-up) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:06:45 -07:00
Hongming Wang	78721f7a42	Merge pull request #2561 from Molecule-AI/fix/cascade-list-drift-gate feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)	2026-05-03 10:55:08 +00:00
Hongming Wang	09010212a0	feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) Closes the recurrence path of PR #2556. The data fix realigned 8→4 templates in publish-runtime.yml's TEMPLATES variable, but the underlying drift hazard was unguarded — the next manifest change could silently leave cascade out of sync again. This gate fails any PR that changes manifest.json or publish-runtime.yml in a way that makes the cascade list diverge from manifest workspace_templates (suffix-stripped). Either direction is caught: missing-from-cascade templates that won't auto-rebuild on a new wheel publish (the codex-stuck-on-stale-runtime bug class — PR #2512 added codex to manifest, cascade wasn't updated, codex stayed pinned to its last-built runtime version for weeks). extra-in-cascade cascade dispatches to deprecated templates (the wasted-API-calls + dead-CI-noise class — PR #2536 pruned 5 templates from manifest; cascade kept dispatching to all 8 until PR #2556). Triggers narrowly: only on PRs that touch manifest.json, publish-runtime.yml, or the script itself. Fast (single grep+sed+comm pipeline, no Go build). Surfaced during the RFC #388 prior-art audit; folded in as the structural follow-up to the data fix #2556 promised. Self-tested both failure modes locally before commit: - Drop codex from cascade → script fails with "MISSING: codex" - Add langgraph to cascade → script fails with "EXTRA: langgraph" Refs: https://github.com/Molecule-AI/molecule-controlplane/issues/388 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:52:39 -07:00
Hongming Wang	bb63e60114	Merge pull request #2560 from Molecule-AI/fix/preflight-smoke-mode-bypass fix(preflight): skip required_env check in MOLECULE_SMOKE_MODE	2026-05-03 10:46:20 +00:00
Hongming Wang	06240ab67b	fix(preflight): skip required_env check in MOLECULE_SMOKE_MODE Boot smoke (#2275) exercises executor.execute() against stub deps and never hits the real provider, so missing auth env is not a real blocker. Without this bypass, every adapter that introduces a new auth env var must be mirrored into molecule-ci's fake-env list — a maintenance treadmill that just bit hermes-template: - 2026-05-03 09:47 UTC: hermes publish-image smoke fails on HERMES_API_KEY preflight (workflow injects CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY but not HERMES_API_KEY or OPENROUTER_API_KEY). Failed for two cycles before being noticed. The bypass demotes Required-env failures to warnings when MOLECULE_SMOKE_MODE is truthy, so the unset env stays visible in the boot log without blocking. Production paths are unchanged (env unset → fail). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:44:05 -07:00
Hongming Wang	88ef70431e	Merge pull request #2505 from Molecule-AI/staging staging → main: auto-promote `d64570a`	2026-05-03 03:36:56 -07:00
Hongming Wang	750b32c33f	Merge pull request #2558 from Molecule-AI/fix/a2a-v1-task-enqueue fix(a2a): enqueue Task before TaskStatusUpdateEvent for v1 SDK contract	2026-05-03 10:18:36 +00:00
Hongming Wang	5c3b79a8ba	fix(a2a): enqueue Task before TaskStatusUpdateEvent for v1 SDK contract a2a-sdk ≥ 1.0 raises InvalidAgentResponseError when an executor publishes a TaskStatusUpdateEvent (e.g. via TaskUpdater.start_work) before any Task event for fresh requests. The framework only auto-creates the Task on continuation messages (existing task_id resolves via task_manager.get_task); new requests leave _task_created unset and the SDK validation at a2a/server/agent_execution/active_task.py rejects the first status update. PR #2170 migrated the executor surface to v1 but missed this contract. The synthetic E2E gate caught it on every staging run since (~1 week silent fail) with: {"jsonrpc":"2.0","id":"e2e-msg-1","error":{"code":-32603, "message":"Agent should enqueue Task before TaskStatusUpdateEvent event","data":null}} The fix enqueues a Task(state=SUBMITTED) before the TaskUpdater is constructed, gated on `context.current_task is None` so continuation messages don't double-enqueue (which the SDK logs about but doesn't reject). Tests: - test_first_event_is_task_for_new_request — pins the new-request path: first enqueue must be a Task with the expected id/context_id - test_no_task_enqueue_on_continuation — pins the continuation path: when context.current_task is set, the executor must NOT re-enqueue Task - conftest: stub Task / TaskStatus / TaskState in the mocked a2a.types module so the import inside the executor resolves under unit tests google-adk adapter does not have this bug — its execute() only emits Message events, not TaskStatusUpdateEvent. Its cancel() does emit one, but cancel is rarely-invoked and out of scope for this fix. Live verification path: this PR's merge → publish-runtime cascade → next synth-E2E firing should go green at step "8/11 Sending A2A message to parent — expecting agent response".	2026-05-03 03:15:54 -07:00
Hongming Wang	e014d22ee9	Merge pull request #2557 from Molecule-AI/feat/sweep-aws-secrets-orphans feat(ops): sweep orphan AWS Secrets Manager secrets	2026-05-03 09:48:59 +00:00
Hongming Wang	18c2bdbe68	Merge pull request #2529 from Molecule-AI/dependabot/pip/workspace/starlette-gte-1.0.0 chore(deps)(deps): update starlette requirement from >=0.38.0 to >=1.0.0 in /workspace	2026-05-03 09:42:15 +00:00
Hongming Wang	6f8f7932d2	feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets CP's deprovision flow calls Secrets.DeleteSecret() (provisioner/ec2.go:806) but only when the deprovision runs to completion. Crashed provisions and incomplete teardowns leak the per-tenant `molecule/tenant/<org_id>/bootstrap` secret. At ~$0.40/secret/month, ~45 leaked secrets surfaced as ~$19/month on the AWS cost dashboard. The tenant_resources audit table (mig 024) tracks four kinds today — CloudflareTunnel, CloudflareDNS, EC2Instance, SecurityGroup — and the existing reconciler doesn't catch Secrets Manager orphans. The proper fix (KindSecretsManagerSecret + recorder hook + reconciler enumerator) is filed as a follow-up controlplane issue. This sweeper is the immediate stopgap. Parallel-shape to sweep-cf-tunnels.sh: - Hourly schedule offset (:30, between sweep-cf-orphans :15 and sweep-cf-tunnels :45) so the three janitors don't burst CP admin at the same minute. - 24h grace window — never deletes a secret younger than the provisioning roundtrip, so an in-flight provision can't be racemurdered. - MAX_DELETE_PCT=50 default (mirrors sweep-cf-orphans for durable resources; tenant secrets should track 1:1 with live tenants). - Same schedule-vs-dispatch hardening as the other janitors: schedule → hard-fail on missing secrets, dispatch → soft-skip. - 8-way xargs parallelism, dry-run by default, --execute to delete. Requires a dedicated AWS_JANITOR_* IAM principal — the prod molecule-cp principal lacks secretsmanager:ListSecrets (it only has scoped Get/Create/Update/Delete). The workflow's verify-secrets step will hard-fail on the first scheduled run until those secrets are configured, surfacing the missing setup loudly rather than silently no-op'ing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 02:38:08 -07:00
Hongming Wang	15124527da	Merge pull request #2276 from Molecule-AI/feat/layer1-runtime-digest-pinning feat(provisioner): digest-pin runtime images via runtime_image_pins table (Layer 1 of #2272)	2026-05-03 09:32:54 +00:00
Hongming Wang	e9a1ce3591	Merge pull request #2556 from Molecule-AI/fix/cascade-list-align-to-manifest fix(publish-runtime): align cascade list to 4 supported runtimes	2026-05-03 09:32:36 +00:00
Hongming Wang	1bff419833	feat(provisioner): digest-pin workspace images via runtime_image_pins (#2272 layer 1) Layer 1 of the runtime-rollout plan. Decouples publish from promotion by giving operators a `runtime_image_pins` table the provisioner consults at container-create time. No row = legacy `:latest` behavior; row present = provisioner pulls `<base>@sha256:<digest>`. One bad publish no longer breaks every workspace simultaneously. Mechanics: - Migration 047: `runtime_image_pins` (template_name PK + sha256 digest + audit columns) and `workspaces.runtime_image_digest` (nullable, with partial index) for "show me workspaces still on the old digest" queries. - `resolveRuntimeImage` (handlers/runtime_image_pin.go): looks up the pin, returns `<base>@sha256:<digest>` on hit, "" on miss/error so the provisioner falls through to the legacy tag map. Availability over pinning — any DB error logs and returns "" rather than blocking the provision. `WORKSPACE_IMAGE_LOCAL_OVERRIDE=1` short-circuits the lookup so devs rebuilding template images locally see their fresh build. - `WorkspaceConfig.Image` carries the resolved value into the provisioner. `selectImage` honors it ahead of the runtime→tag map and falls back to DefaultImage on unknown runtime. - The existing `imageTagIsMoving` predicate (#215) already returns false on `@sha256:` form, so digest pins skip the force-pull path naturally. Tests: - Handler-side (sqlmock): no-pin/db-error/with-pin/empty/unknown/local- override paths cover every branch of `resolveRuntimeImage`. - Provisioner-side: `selectImage` table covers explicit-image preference, runtime-map fallback, unknown-runtime → default, empty-config → default. Plus a struct-literal compile-time pin on `Image` so a future refactor can't silently drop the field. Layer 2 (per-ring routing via `workspaces.runtime_image_digest`) and the admin promote/rollback endpoint ride on top of this and ship separately.	2026-05-03 02:30:00 -07:00
Hongming Wang	24276b9458	fix(publish-runtime): align cascade list to 4 supported runtimes The cascade `TEMPLATES` list in publish-runtime.yml had drifted from manifest.json: Currently dispatches to: claude-code, langgraph, crewai, autogen, deepagents, hermes, gemini-cli, openclaw manifest.json supports: claude-code, hermes, openclaw, codex (after PR #2536 pruned to 4 actively-supported) Two consequences of the drift: 1. `codex` (added in PR #2512, supported in manifest) was never in the cascade — fresh runtime publishes did NOT trigger a codex template rebuild. Codex stayed pinned to whatever runtime version it last saw at its own image-build time. 2. langgraph/crewai/autogen/deepagents/gemini-cli — deprecated, no shipping images, no working A2A — were still receiving cascade dispatches. Wasted API calls and (worse) green CI on dead repos masks "this template is dead, stop maintaining it." Now matches manifest.json workspace_templates exactly. Surfaced during RFC #388 (fast workspace provision) prior-art audit. Long-term fix is to derive TEMPLATES from manifest.json so this can't drift again — captured as a Phase-1 invariant in RFC #388. This commit is the data fix only; structural fix lands with the bake pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 02:28:15 -07:00
Hongming Wang	aef9555b1d	Merge pull request #2555 from Molecule-AI/feat/canvas-warm-paper-tailwind-v4 feat(canvas): warm-paper theme + Tailwind v4	2026-05-03 09:27:23 +00:00
Hongming Wang	db48d1d261	fix(canvas): restore text-white on saturated buttons + close zinc gaps Independent code review of #2555 caught two contrast regressions left by the bulk perl pass: 1. text-white → text-ink mass-substitution silently broke destructive and primary buttons. text-ink resolves to #15181c (warm-paper near-black) in light mode — dark text on bg-red-600 / bg-amber-600 / bg-emerald-600 / bg-blue-600 / bg-accent / bg-accent-strong / bg-good / bg-bad fails WCAG contrast and looks broken. Per-line pass flips text-ink → text-white only when a saturated bg utility is present; tinted-state pills (bg-red-950/50 etc.) keep their intentionally-retained text-* literals. 2. Original mapping table was missing bg-zinc-600 (most-used hover-state literal for cancel buttons — caused them to JUMP from warm cream resting state to dark zinc on hover in light mode) and text-zinc-700/800/900 (separator dots and decorative dim text invisible on warm-paper light bg). Extended mapping fills these gaps with bg-surface-card / text-ink-soft. Also: drop stale tailwind.config.ts reference from components.json (file deleted by the v3→v4 migration); switch baseColor zinc → neutral and enable cssVariables since v4 uses CSS-driven tokens. Future shadcn-cli invocations would have failed or written malformed components without this. 27 sites in 27 files affected by #1, ~20 sites in 20 files by #2. 1214/1214 unit tests still pass; build still clean. Findings courtesy of multi-model review per code-review-and-quality skill — different blind spots catch different bugs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 02:04:20 -07:00
Hongming Wang	052575d773	fix(canvas): regenerate lockfile with cross-platform optional deps CI's `npm ci` failed because the previous lock was generated on macOS arm64, which omits the Linux-specific optional deps that @tailwindcss/postcss → lightningcss-linux-x64-gnu transitively need (@emnapi/runtime, @emnapi/core). Re-ran `npm install --include=optional` so the lock includes every platform variant of lightningcss + the @emnapi packages they pull in. Runner (Linux x64) now has what it needs; local macOS install still fine (npm picks the matching binary at install time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 01:52:42 -07:00
Hongming Wang	c0eca8d0e1	feat(canvas): warm-paper theme + Tailwind v4 migration Brings the canvas onto the warm-paper design system already shipped to landing, marketplace, and SaaS surfaces, and migrates the build from Tailwind v3 → v4 to match molecule-app. Plumbing: - swap tailwindcss v3 → v4, drop autoprefixer, add @tailwindcss/postcss - delete tailwind.config.ts (v4 reads tokens from @theme blocks in CSS) - globals.css: @import "tailwindcss" + @plugin "@tailwindcss/typography" - two @theme blocks: warm-paper light defaults + always-dark surface tokens (bg-bg / ink-mute / line-strong) for terminal/console panels - [data-theme="dark"] cascade overrides the warm-paper tokens for dark - React Flow edge stroke + scrollbar + selection colour pull from semantic tokens so they flip with the theme Theme infra (ported from molecule-app, identical contracts): - lib/theme-cookie.ts: mol_theme cookie + boot script (no "use client" so server components can read the constants) - lib/theme-provider.tsx: ThemeProvider + useTheme + cookie writer with Domain=.moleculesai.app so the preference follows the user across canvas/app/market/landing subdomains AND tenant subdomains - lib/theme.ts: ColorToken union + cssVar() helper - components/ThemeToggle.tsx: 3-way System/Light/Dark picker - layout.tsx: SSR cookie read + nonce'd inline boot script (CSP needs the explicit nonce — strict-dynamic doesn't forgive an un-nonce'd inline sibling) + ThemeProvider wrapper + bg-surface/text-ink body Component migration (62 files): - Mechanical bg-zinc-* / text-zinc-* / border-zinc-* / text-white → semantic surface/ink/line tokens via perl negative-lookahead pass (preserves opacity modifiers like /80, /60) - bg-blue-500/600 → bg-accent / bg-accent-strong - text-red-* / amber-* / emerald-* → text-bad / warm / good - Tinted-state banner backgrounds (bg-red-950, bg-amber-950, bg-blue-950 etc.) intentionally left literal — they remain readable on warm-paper in light mode without inventing new state-soft tokens - TerminalTab.tsx skipped — xterm renders to canvas, not DOM - 3 unit-test assertions updated to match new token strings (credits pillTone, AuthGate overlay class, A2AEdge accent) Verification: - pnpm test: 1214/1214 pass - pnpm tsc --noEmit: clean - next build: ✓ Compiled successfully (8 routes) - dev server inspection: html data-theme stamped, body uses bg-surface text-ink, boot script carries nonce, compiled CSS contains both @theme blocks + [data-theme="dark"] override Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 01:43:55 -07:00
Hongming Wang	e4893f5a9a	Merge pull request #2552 from Molecule-AI/feat/wire-event-log-into-adapter-base feat(workspace): wire EventLog into adapter base (#119 PR-3b)	2026-05-03 08:39:34 +00:00

1 2 3 4 5 ...

3870 Commits