Commit Graph

3996 Commits

Author SHA1 Message Date
Hongming Wang
5e46ea70d6 ci(synth-e2e): wire MOLECULE_STAGING_OPENAI_KEY into provisioned tenant
The synth-E2E (#2342) provisions a langgraph tenant whose default
model `openai:gpt-4.1-mini` requires OPENAI_API_KEY for the first LLM
call. Sibling workflows already wire this:
- e2e-staging-saas.yml:89
- canary-staging.yml:63

continuous-synth-e2e.yml just forgot. Result: tenant boots, accepts
a2a messages, then returns:

  Agent error: "Could not resolve authentication method. Expected
  either api_key or auth_token to be set."

This was masked since 2026-04-29 (workflow creation) by a2a-sdk v0→v1
contract violations — PR #2558 (Task-enqueue) and #2563
(TaskUpdater.complete/failed terminal events) cleared those, exposing
the underlying auth gap on the synth-E2E firing at 11:39 UTC today.

The script tests/e2e/test_staging_full_saas.sh:325 already reads
E2E_OPENAI_API_KEY and persists it as a workspace_secret on tenant
create — only the workflow wiring was missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:43:07 -07:00
Hongming Wang
5cf3dc4369
Merge pull request #2565 from Molecule-AI/fix/redeploy-soft-warn-rt-e2e-prefix
ci(deploy): broaden ephemeral-prefix matchers to cover rt-e2e-*
2026-05-03 11:30:52 +00:00
Hongming Wang
596e797dca ci(deploy): broaden ephemeral-prefix matchers to cover rt-e2e-*
The redeploy-tenants-on-staging soft-warn filter and the
sweep-stale-e2e-orgs janitor both hardcoded `^e2e-` to identify
ephemeral test tenants. Runtime-test harness fixtures (RFC #2251)
mint slugs prefixed with `rt-e2e-`, which neither matcher recognized.

Concrete impact observed today:
  - Two `rt-e2e-v{5,6}-*` tenants left orphaned 8h on staging
    (sweep-stale-e2e-orgs ignored them).
  - On the next staging redeploy their phantom EC2s returned
    `InvalidInstanceId: Instances not in a valid state for account`
    from SSM SendCommand → CP returned HTTP 500 + ok=false.
  - The redeploy soft-warn missed them too, so the workflow went
    red, which broke the auto-promote-staging chain feeding the
    canvas warm-paper rollout to prod.

Fix: switch both matchers to recognize the alternation
`^(e2e-|rt-e2e-)`. Long-lived prefixes (demo-prep, dryrun-*, dryrun2-*)
remain non-ephemeral and continue to hard-fail. Comment documents
the source-of-truth list and the cross-file invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:28:29 -07:00
Hongming Wang
3ce638d6e6
Merge pull request #2564 from Molecule-AI/fix/canvas-react-flow-color-mode
fix(canvas): wire ReactFlow colorMode to resolvedTheme
2026-05-03 11:14:13 +00:00
Hongming Wang
df7edfcd3f fix(canvas): wire ReactFlow colorMode to resolvedTheme
PR #2555 (Tailwind v4 + warm-paper) migrated all canvas chrome (toolbar,
side panel, modal layer) to semantic tokens, but missed the React Flow
viewport's `colorMode="dark"` literal — and two paired hardcoded dark
literals on the Background dot color and MiniMap mask. Net result on
prod: the user picked light mode, the toolbar flipped warm-paper, but
the canvas backplate, edges, dots, controls, and minimap stayed black —
visibly half-themed.

Three coordinated fixes inside the canvas viewport:
- ReactFlow `colorMode={resolvedTheme}` so the library's own dark/light
  styles flip with the user's choice.
- Background dot color picks the line-soft tone in light mode (zinc-800
  was invisible-on-cream).
- MiniMap maskColor warm-tints the off-viewport dim so the unselected
  region doesn't render as a hard black bar over warm-paper.

Verification:
- `npx tsc --noEmit` clean
- `npx vitest run` 188/188 pass
- (will browser-verify post-redeploy on hongming.moleculesai.app)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:11:35 -07:00
Hongming Wang
3ecb25eb4f
Merge pull request #2563 from Molecule-AI/fix/a2a-v1-terminal-event
fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode
2026-05-03 11:09:09 +00:00
Hongming Wang
e1628c4d56 fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode
PR #2558 enqueued a Task at the start of new requests so the v1 SDK
would accept TaskUpdater.start_work() — fix #1 of the v0→v1 migration
gap (PR #2170). But after Task is enqueued, the executor enters
"task mode" and the SDK rejects raw Message enqueues at the terminal
step:

  {"code":-32603,"message":"Received Message object in task mode.
  Use TaskStatusUpdateEvent or TaskArtifactUpdateEvent instead."}

Synth-E2E 2026-05-03T11:00:34Z surfaced this on the very first run
after the prior fix cascaded. Validation site is the same
a2a/server/agent_execution/active_task.py — the framework's job is
to enforce the v1 invariant; we're catching up to it.

The fix routes both terminal events through TaskUpdater helpers:
- success: updater.complete(message=msg) wraps in
  TaskStatusUpdateEvent(state=COMPLETED, final=True)
- error: updater.failed(message=...) wraps in
  TaskStatusUpdateEvent(state=FAILED, final=True)

Both helpers exist in a2a-sdk ≥ 1.0; verified via
TaskUpdater.complete signature.

Tests:
- conftest TaskUpdater stub now records complete/failed calls AND
  routes the message back through event_queue.enqueue_event so the
  ~20 legacy tests asserting on enqueue_event keep working
- 2 new regression tests pin the contract:
  * test_terminal_success_routes_via_updater_complete
  * test_terminal_error_routes_via_updater_failed
- Both NEW tests verified to FAIL on staging-baseline (without this
  fix) and PASS with it — they'd catch the regression before staging
  if the wheel-smoke gate covered task-mode terminal events too
  (separate yak-shave for #131 follow-up)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:06:45 -07:00
Hongming Wang
78721f7a42
Merge pull request #2561 from Molecule-AI/fix/cascade-list-drift-gate
feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)
2026-05-03 10:55:08 +00:00
Hongming Wang
09010212a0 feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)
Closes the recurrence path of PR #2556. The data fix realigned 8→4
templates in publish-runtime.yml's TEMPLATES variable, but the
underlying drift hazard was unguarded — the next manifest change
could silently leave cascade out of sync again.

This gate fails any PR that changes manifest.json or
publish-runtime.yml in a way that makes the cascade list diverge
from manifest workspace_templates (suffix-stripped). Either
direction is caught:

  missing-from-cascade  templates that won't auto-rebuild on a new
                       wheel publish (the codex-stuck-on-stale-runtime
                       bug class — PR #2512 added codex to manifest,
                       cascade wasn't updated, codex stayed pinned to
                       its last-built runtime version for weeks).

  extra-in-cascade     cascade dispatches to deprecated templates
                       (the wasted-API-calls + dead-CI-noise class —
                       PR #2536 pruned 5 templates from manifest;
                       cascade kept dispatching to all 8 until
                       PR #2556).

Triggers narrowly: only on PRs that touch manifest.json,
publish-runtime.yml, or the script itself. Fast (single grep+sed+comm
pipeline, no Go build).

Surfaced during the RFC #388 prior-art audit; folded in as the
structural follow-up to the data fix #2556 promised.

Self-tested both failure modes locally before commit:
  - Drop codex from cascade → script fails with "MISSING: codex"
  - Add langgraph to cascade → script fails with "EXTRA: langgraph"

Refs: https://github.com/Molecule-AI/molecule-controlplane/issues/388

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:52:39 -07:00
Hongming Wang
bb63e60114
Merge pull request #2560 from Molecule-AI/fix/preflight-smoke-mode-bypass
fix(preflight): skip required_env check in MOLECULE_SMOKE_MODE
2026-05-03 10:46:20 +00:00
Hongming Wang
06240ab67b fix(preflight): skip required_env check in MOLECULE_SMOKE_MODE
Boot smoke (#2275) exercises executor.execute() against stub deps
and never hits the real provider, so missing auth env is not a real
blocker. Without this bypass, every adapter that introduces a new
auth env var must be mirrored into molecule-ci's fake-env list — a
maintenance treadmill that just bit hermes-template:

- 2026-05-03 09:47 UTC: hermes publish-image smoke fails on
  HERMES_API_KEY preflight (workflow injects CLAUDE_CODE_OAUTH_TOKEN,
  ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY but not
  HERMES_API_KEY or OPENROUTER_API_KEY). Failed for two cycles
  before being noticed.

The bypass demotes Required-env failures to warnings when
MOLECULE_SMOKE_MODE is truthy, so the unset env stays visible in
the boot log without blocking. Production paths are unchanged
(env unset → fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:44:05 -07:00
Hongming Wang
88ef70431e
Merge pull request #2505 from Molecule-AI/staging
staging → main: auto-promote d64570a
2026-05-03 03:36:56 -07:00
Hongming Wang
750b32c33f
Merge pull request #2558 from Molecule-AI/fix/a2a-v1-task-enqueue
fix(a2a): enqueue Task before TaskStatusUpdateEvent for v1 SDK contract
2026-05-03 10:18:36 +00:00
Hongming Wang
5c3b79a8ba fix(a2a): enqueue Task before TaskStatusUpdateEvent for v1 SDK contract
a2a-sdk ≥ 1.0 raises InvalidAgentResponseError when an executor publishes a
TaskStatusUpdateEvent (e.g. via TaskUpdater.start_work) before any Task
event for fresh requests. The framework only auto-creates the Task on
continuation messages (existing task_id resolves via task_manager.get_task);
new requests leave _task_created unset and the SDK validation at
a2a/server/agent_execution/active_task.py rejects the first status update.

PR #2170 migrated the executor surface to v1 but missed this contract. The
synthetic E2E gate caught it on every staging run since (~1 week silent
fail) with:

    {"jsonrpc":"2.0","id":"e2e-msg-1","error":{"code":-32603,
     "message":"Agent should enqueue Task before TaskStatusUpdateEvent
     event","data":null}}

The fix enqueues a Task(state=SUBMITTED) before the TaskUpdater is
constructed, gated on `context.current_task is None` so continuation
messages don't double-enqueue (which the SDK logs about but doesn't reject).

Tests:

  - test_first_event_is_task_for_new_request — pins the new-request path:
    first enqueue must be a Task with the expected id/context_id
  - test_no_task_enqueue_on_continuation — pins the continuation path: when
    context.current_task is set, the executor must NOT re-enqueue Task
  - conftest: stub Task / TaskStatus / TaskState in the mocked a2a.types
    module so the import inside the executor resolves under unit tests

google-adk adapter does not have this bug — its execute() only emits
Message events, not TaskStatusUpdateEvent. Its cancel() does emit one,
but cancel is rarely-invoked and out of scope for this fix.

Live verification path: this PR's merge → publish-runtime cascade → next
synth-E2E firing should go green at step "8/11 Sending A2A message to
parent — expecting agent response".
2026-05-03 03:15:54 -07:00
Hongming Wang
e014d22ee9
Merge pull request #2557 from Molecule-AI/feat/sweep-aws-secrets-orphans
feat(ops): sweep orphan AWS Secrets Manager secrets
2026-05-03 09:48:59 +00:00
Hongming Wang
18c2bdbe68
Merge pull request #2529 from Molecule-AI/dependabot/pip/workspace/starlette-gte-1.0.0
chore(deps)(deps): update starlette requirement from >=0.38.0 to >=1.0.0 in /workspace
2026-05-03 09:42:15 +00:00
Hongming Wang
6f8f7932d2 feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets
CP's deprovision flow calls Secrets.DeleteSecret() (provisioner/ec2.go:806)
but only when the deprovision runs to completion. Crashed provisions and
incomplete teardowns leak the per-tenant `molecule/tenant/<org_id>/bootstrap`
secret. At ~$0.40/secret/month, ~45 leaked secrets surfaced as ~$19/month
on the AWS cost dashboard.

The tenant_resources audit table (mig 024) tracks four kinds today —
CloudflareTunnel, CloudflareDNS, EC2Instance, SecurityGroup — and the
existing reconciler doesn't catch Secrets Manager orphans. The proper fix
(KindSecretsManagerSecret + recorder hook + reconciler enumerator) is filed
as a follow-up controlplane issue. This sweeper is the immediate stopgap.

Parallel-shape to sweep-cf-tunnels.sh:
  - Hourly schedule offset (:30, between sweep-cf-orphans :15 and
    sweep-cf-tunnels :45) so the three janitors don't burst CP admin
    at the same minute.
  - 24h grace window — never deletes a secret younger than the
    provisioning roundtrip, so an in-flight provision can't be racemurdered.
  - MAX_DELETE_PCT=50 default (mirrors sweep-cf-orphans for durable
    resources; tenant secrets should track 1:1 with live tenants).
  - Same schedule-vs-dispatch hardening as the other janitors:
    schedule → hard-fail on missing secrets, dispatch → soft-skip.
  - 8-way xargs parallelism, dry-run by default, --execute to delete.

Requires a dedicated AWS_JANITOR_* IAM principal — the prod molecule-cp
principal lacks secretsmanager:ListSecrets (it only has scoped
Get/Create/Update/Delete). The workflow's verify-secrets step will hard-fail
on the first scheduled run until those secrets are configured, surfacing
the missing setup loudly rather than silently no-op'ing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 02:38:08 -07:00
Hongming Wang
15124527da
Merge pull request #2276 from Molecule-AI/feat/layer1-runtime-digest-pinning
feat(provisioner): digest-pin runtime images via runtime_image_pins table (Layer 1 of #2272)
2026-05-03 09:32:54 +00:00
Hongming Wang
e9a1ce3591
Merge pull request #2556 from Molecule-AI/fix/cascade-list-align-to-manifest
fix(publish-runtime): align cascade list to 4 supported runtimes
2026-05-03 09:32:36 +00:00
Hongming Wang
1bff419833 feat(provisioner): digest-pin workspace images via runtime_image_pins (#2272 layer 1)
Layer 1 of the runtime-rollout plan. Decouples publish from promotion by
giving operators a `runtime_image_pins` table the provisioner consults at
container-create time. No row = legacy `:latest` behavior; row present =
provisioner pulls `<base>@sha256:<digest>`. One bad publish no longer
breaks every workspace simultaneously.

Mechanics:

  - Migration 047: `runtime_image_pins` (template_name PK + sha256 digest +
    audit columns) and `workspaces.runtime_image_digest` (nullable, with
    partial index) for "show me workspaces still on the old digest" queries.
  - `resolveRuntimeImage` (handlers/runtime_image_pin.go): looks up the
    pin, returns `<base>@sha256:<digest>` on hit, "" on miss/error so the
    provisioner falls through to the legacy tag map. Availability over
    pinning — any DB error logs and returns "" rather than blocking the
    provision. `WORKSPACE_IMAGE_LOCAL_OVERRIDE=1` short-circuits the
    lookup so devs rebuilding template images locally see their fresh
    build.
  - `WorkspaceConfig.Image` carries the resolved value into the
    provisioner. `selectImage` honors it ahead of the runtime→tag map and
    falls back to DefaultImage on unknown runtime.
  - The existing `imageTagIsMoving` predicate (#215) already returns false
    on `@sha256:` form, so digest pins skip the force-pull path naturally.

Tests:

  - Handler-side (sqlmock): no-pin/db-error/with-pin/empty/unknown/local-
    override paths cover every branch of `resolveRuntimeImage`.
  - Provisioner-side: `selectImage` table covers explicit-image preference,
    runtime-map fallback, unknown-runtime → default, empty-config →
    default. Plus a struct-literal compile-time pin on `Image` so a future
    refactor can't silently drop the field.

Layer 2 (per-ring routing via `workspaces.runtime_image_digest`) and the
admin promote/rollback endpoint ride on top of this and ship separately.
2026-05-03 02:30:00 -07:00
Hongming Wang
24276b9458 fix(publish-runtime): align cascade list to 4 supported runtimes
The cascade `TEMPLATES` list in publish-runtime.yml had drifted from
manifest.json:

  Currently dispatches to: claude-code, langgraph, crewai, autogen,
                           deepagents, hermes, gemini-cli, openclaw
  manifest.json supports:  claude-code, hermes, openclaw, codex (after
                           PR #2536 pruned to 4 actively-supported)

Two consequences of the drift:

1. `codex` (added in PR #2512, supported in manifest) was never in the
   cascade — fresh runtime publishes did NOT trigger a codex template
   rebuild. Codex stayed pinned to whatever runtime version it last saw
   at its own image-build time.

2. langgraph/crewai/autogen/deepagents/gemini-cli — deprecated, no
   shipping images, no working A2A — were still receiving cascade
   dispatches. Wasted API calls and (worse) green CI on dead repos
   masks "this template is dead, stop maintaining it."

Now matches manifest.json workspace_templates exactly. Surfaced during
RFC #388 (fast workspace provision) prior-art audit.

Long-term fix is to derive TEMPLATES from manifest.json so this can't
drift again — captured as a Phase-1 invariant in RFC #388. This commit
is the data fix only; structural fix lands with the bake pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 02:28:15 -07:00
Hongming Wang
aef9555b1d
Merge pull request #2555 from Molecule-AI/feat/canvas-warm-paper-tailwind-v4
feat(canvas): warm-paper theme + Tailwind v4
2026-05-03 09:27:23 +00:00
Hongming Wang
db48d1d261 fix(canvas): restore text-white on saturated buttons + close zinc gaps
Independent code review of #2555 caught two contrast regressions left
by the bulk perl pass:

1. text-white → text-ink mass-substitution silently broke destructive
   and primary buttons. text-ink resolves to #15181c (warm-paper
   near-black) in light mode — dark text on bg-red-600 / bg-amber-600
   / bg-emerald-600 / bg-blue-600 / bg-accent / bg-accent-strong /
   bg-good / bg-bad fails WCAG contrast and looks broken. Per-line
   pass flips text-ink → text-white only when a saturated bg utility
   is present; tinted-state pills (bg-red-950/50 etc.) keep their
   intentionally-retained text-* literals.

2. Original mapping table was missing bg-zinc-600 (most-used
   hover-state literal for cancel buttons — caused them to JUMP from
   warm cream resting state to dark zinc on hover in light mode) and
   text-zinc-700/800/900 (separator dots and decorative dim text
   invisible on warm-paper light bg). Extended mapping fills these
   gaps with bg-surface-card / text-ink-soft.

Also: drop stale tailwind.config.ts reference from components.json
(file deleted by the v3→v4 migration); switch baseColor zinc →
neutral and enable cssVariables since v4 uses CSS-driven tokens.
Future shadcn-cli invocations would have failed or written malformed
components without this.

27 sites in 27 files affected by #1, ~20 sites in 20 files by #2.
1214/1214 unit tests still pass; build still clean.

Findings courtesy of multi-model review per code-review-and-quality
skill — different blind spots catch different bugs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 02:04:20 -07:00
Hongming Wang
052575d773 fix(canvas): regenerate lockfile with cross-platform optional deps
CI's `npm ci` failed because the previous lock was generated on macOS
arm64, which omits the Linux-specific optional deps that
@tailwindcss/postcss → lightningcss-linux-x64-gnu transitively need
(@emnapi/runtime, @emnapi/core).

Re-ran `npm install --include=optional` so the lock includes every
platform variant of lightningcss + the @emnapi packages they pull in.
Runner (Linux x64) now has what it needs; local macOS install still
fine (npm picks the matching binary at install time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:52:42 -07:00
Hongming Wang
c0eca8d0e1 feat(canvas): warm-paper theme + Tailwind v4 migration
Brings the canvas onto the warm-paper design system already shipped to
landing, marketplace, and SaaS surfaces, and migrates the build from
Tailwind v3 → v4 to match molecule-app.

Plumbing:
- swap tailwindcss v3 → v4, drop autoprefixer, add @tailwindcss/postcss
- delete tailwind.config.ts (v4 reads tokens from @theme blocks in CSS)
- globals.css: @import "tailwindcss" + @plugin "@tailwindcss/typography"
- two @theme blocks: warm-paper light defaults + always-dark surface
  tokens (bg-bg / ink-mute / line-strong) for terminal/console panels
- [data-theme="dark"] cascade overrides the warm-paper tokens for dark
- React Flow edge stroke + scrollbar + selection colour pull from
  semantic tokens so they flip with the theme

Theme infra (ported from molecule-app, identical contracts):
- lib/theme-cookie.ts: mol_theme cookie + boot script (no "use client"
  so server components can read the constants)
- lib/theme-provider.tsx: ThemeProvider + useTheme + cookie writer with
  Domain=.moleculesai.app so the preference follows the user across
  canvas/app/market/landing subdomains AND tenant subdomains
- lib/theme.ts: ColorToken union + cssVar() helper
- components/ThemeToggle.tsx: 3-way System/Light/Dark picker
- layout.tsx: SSR cookie read + nonce'd inline boot script (CSP needs
  the explicit nonce — strict-dynamic doesn't forgive an un-nonce'd
  inline sibling) + ThemeProvider wrapper + bg-surface/text-ink body

Component migration (62 files):
- Mechanical bg-zinc-* / text-zinc-* / border-zinc-* / text-white →
  semantic surface/ink/line tokens via perl negative-lookahead pass
  (preserves opacity modifiers like /80, /60)
- bg-blue-500/600 → bg-accent / bg-accent-strong
- text-red-* / amber-* / emerald-* → text-bad / warm / good
- Tinted-state banner backgrounds (bg-red-950, bg-amber-950, bg-blue-950
  etc.) intentionally left literal — they remain readable on warm-paper
  in light mode without inventing new state-soft tokens
- TerminalTab.tsx skipped — xterm renders to canvas, not DOM
- 3 unit-test assertions updated to match new token strings (credits
  pillTone, AuthGate overlay class, A2AEdge accent)

Verification:
- pnpm test: 1214/1214 pass
- pnpm tsc --noEmit: clean
- next build: ✓ Compiled successfully (8 routes)
- dev server inspection: html data-theme stamped, body uses
  bg-surface text-ink, boot script carries nonce, compiled CSS
  contains both @theme blocks + [data-theme="dark"] override

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:43:55 -07:00
Hongming Wang
e4893f5a9a
Merge pull request #2552 from Molecule-AI/feat/wire-event-log-into-adapter-base
feat(workspace): wire EventLog into adapter base (#119 PR-3b)
2026-05-03 08:39:34 +00:00
Hongming Wang
872f8e8971
Merge pull request #2554 from Molecule-AI/chore/remove-dead-ast-defensive-block
chore(workspace): remove dead defensive block in load_skills AST gate
2026-05-03 08:33:18 +00:00
Hongming Wang
d58185b8a8 chore(workspace): remove dead defensive block in load_skills AST gate
Self-review of PR #2553 caught an unreachable defensive block at
test_load_skills_call_sites.py:99-103: the inner check guarded
`call.func.__class__.__name__ == "Name"` from a FunctionDef, but
`_find_load_skills_calls` already filters its return type to
`ast.Call` — `FunctionDef` cannot reach that loop body. The block
was a no-op `pass` with a misleading comment.

Removing keeps the gate behaviorally identical; tests still pass.

Same five-axis review pass that turned this up also approved the
substantive logic of #2553, so no behavior change here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:30:05 -07:00
Hongming Wang
3c0f7de4b9
Merge pull request #2553 from Molecule-AI/feat/skill-compat-audit
docs(skills): document SKILL.md `runtime` field + AST coverage gate (#119 PR-4)
2026-05-03 08:24:55 +00:00
Hongming Wang
f8b40d8d73 docs(skills): document SKILL.md runtime field + AST coverage gate (#119 PR-4)
Closes the documentation + audit gap for declarative skill-compat. The
plumbing has been live since PR #117 (RuntimeCapabilities) and
skill_loader's `_normalize_runtime_field` has been emitting filter
decisions for weeks, but:
- No public doc explained the `runtime` frontmatter field, so skill
  authors didn't know how to opt in / opt out.
- No structural gate ensured every load_skills() call site threads
  current_runtime — a future caller forgetting the kwarg silently
  force-loads runtime-incompatible skills (no AttributeError, just a
  delayed crash on first tool invocation).

Two changes:

1. docs/agent-runtime/skills.md
   - Adds `runtime`, `tags`, `examples` to the Frontmatter Fields table.
   - Adds a Runtime Compatibility section with example, accepted shapes
     (universal default, list, string sugar), and the "logged + omitted,
     not crashed" failure mode. Notes that match values come from each
     adapter's name() (the same string in config.yaml's runtime: field).

2. workspace/tests/test_load_skills_call_sites.py
   - Static AST gate: walks every workspace/*.py (excluding tests),
     finds load_skills(...) Call nodes, fails if any lacks
     current_runtime= as a keyword.
   - Defense-in-depth `test_known_call_sites_present` — pins that the
     scan actually sees the two known callers (adapter_base,
     skill_loader.watcher) so a refactor that moves them is loud.
   - Sanity-checked the matcher against a synthetic violating module.

Same-shape pattern as PR #2358 (tenant_resources audit-coverage AST
gate, #150) — pin the contract structurally, not just behaviorally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:22:34 -07:00
Hongming Wang
71e7a6ffee feat(workspace): wire EventLog into adapter base (#119 PR-3b)
Adds adapter.event_log property+setter on BaseAdapter so adapters can
emit structured events (tool dispatch, skill load, executor errors)
without coupling to the chosen backend. Default is a shared no-op
DisabledEventLog; main.py overrides at boot from the
observability.event_log config block (PR-2 schema).

The shape is intentionally additive:
- Property is invisible to the BaseAdapter signature snapshot drift
  gate (the helper walks vars(cls) for callables only — properties
  are not callable). Verified with a regression test in the new
  test_adapter_base_event_log.py.
- Existing adapters continue to work unchanged. Template repos that
  never call self.event_log get the no-op for free.
- Setter accepts any EventLogBackend, so swapping memory↔disabled
  at runtime (or to a future Redis backend) requires no adapter
  code change.

Sequels:
- PR-3c: emit events from claude-code/hermes adapters at the
  natural points (tool dispatch, skill load).
- PR-4: skill-compat audit + SKILL.md frontmatter docs.
- Platform-side /workspaces/:id/activity endpoint reads the buffer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:18:19 -07:00
Hongming Wang
e2b58f0fbc
Merge pull request #2551 from Molecule-AI/feat/wire-observability-config
feat(workspace): wire observability heartbeat + log_level into consumers (#119 PR-3a)
2026-05-03 08:05:00 +00:00
Hongming Wang
efa68a26b1 feat(workspace): wire observability config into heartbeat + uvicorn (#119 PR-3a)
Replaces the hard-coded HEARTBEAT_INTERVAL=30 in heartbeat.py and
log_level="info" in main.py with values from
ObservabilityConfig (#119 PR-1, schema landed in PR #2538).

Concrete plumbing:

  - heartbeat.HeartbeatLoop accepts an `interval_seconds=` keyword
    arg. Defaults to the legacy module constant so 2-arg callers
    (existing tests, any downstream code that hasn't been updated)
    keep their existing 30s behavior.
  - main.py constructs HeartbeatLoop with
    config.observability.heartbeat_interval_seconds — the value the
    config parser already clamped to [5, 300].
  - main.py's uvicorn.Config takes log_level from
    config.observability.log_level (lowercased — uvicorn's convention
    differs from Python logging's) with LOG_LEVEL env still winning
    as an ops-side debugging override.

Adapter EventLog wiring deferred to PR-3b (#208 follow-up) — touches
adapter_base interface + needs careful design, kept separate to keep
this PR small + reviewable.

Tests:
  - test_heartbeat.py: 3 new tests pin default interval, explicit
    override, and the [5, 300] band that the constructor accepts
    without re-clamping (clamping is the parser's job).
  - All 88 tests in test_heartbeat.py + test_config.py pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:01:57 -07:00
Hongming Wang
87e355c296
Merge pull request #2548 from Molecule-AI/feat/event-log-module
feat(workspace): event_log module + EventLogConfig (#119 PR-2)
2026-05-03 07:54:05 +00:00
Hongming Wang
67f3e49e42
Merge pull request #2549 from Molecule-AI/fix/orphan-sweeper-skip-external-runtime
fix(orphan-sweeper): exclude runtime='external' from stale-token revoke
2026-05-03 07:52:43 +00:00
Hongming Wang
be271aef8b fix(orphan-sweeper): exclude runtime='external' from stale-token revoke
The Docker-mode orphan sweeper was incorrectly targeting external runtime
workspaces, revoking their auth tokens ~6 minutes after creation (one
sweep cycle past the 5-min grace).

External workspaces have NO local container by design — their agent runs
off-host. The "no live container" predicate the sweep uses to detect
wiped-volume orphans matches every external workspace unconditionally,
which was killing the only auth credential the off-host agent has.

Reproducer: create runtime=external workspace, paste the auth token into
molecule-mcp / curl, wait 5 minutes. Next request returns
`HTTP 401 — token may be revoked`. Platform log shows
`Orphan sweeper: revoking stale tokens for workspace <id> (no live
container; volume likely wiped)`.

Fix: add `AND w.runtime != 'external'` to the sweep's SELECT. The
existing test regexes (third-pass query expectations + the shared
expectStaleTokenSweepNoOp helper) are tightened to require the new
predicate, so a regression that drops it fails CI immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:49:37 -07:00
Hongming Wang
9753d58539 fix(build): register event_log in TOP_LEVEL_MODULES
The wheel-build drift gate caught it correctly: any new top-level
module under workspace/ must be listed in TOP_LEVEL_MODULES so its
`from event_log import …` statements get rewritten to
`from molecule_runtime.event_log import …` at package time.

Without this entry, the published wheel ships event_log.py un-rewritten
and crashes at runtime with ModuleNotFoundError on first heartbeat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:19:30 -07:00
Hongming Wang
0fc2531250 feat(workspace): event_log module + EventLogConfig (#119 PR-2)
Adds workspace/event_log.py with an in-memory EventLog backend and a
disabled no-op variant, plus EventLogConfig nested in
ObservabilityConfig (backend / ttl_seconds / max_entries).

The event log is the append-and-query buffer that the canvas Activity
tab and platform `/activity` endpoint will read in PR-3 of the #119
stack. Two backends ship in this PR:

  - InMemoryEventLog: bounded ring buffer with TTL eviction, monotonic
    ids that survive eviction so cursors don't break, thread-safe for
    concurrent appends from heartbeat + main loop + A2A executor.
  - DisabledEventLog: no-op for `backend: disabled` — opts the
    workspace out without crashing callers that propagate event ids.

Schema-only PR — no consumers wired yet. Wiring lands in PR-3.

Test coverage:
  - 34 new test_event_log.py tests (100% line coverage on event_log.py)
  - 9 new test_config.py tests for EventLogConfig parsing
  - Concurrency stress with 8 threads × 200 appends — verifies unique
    monotonic ids under contention
  - TTL + max_entries eviction with injected clock (no time.sleep)
  - Disabled backend contract pinned

Closes #207.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:17:12 -07:00
Hongming Wang
350495f032
Merge pull request #2547 from Molecule-AI/perf/cache-platform-inbound-secret
perf(wsauth): in-process cache for platform_inbound_secret reads
2026-05-03 07:11:38 +00:00
Hongming Wang
384edb4af0
Merge branch 'staging' into perf/cache-platform-inbound-secret 2026-05-03 00:08:43 -07:00
Hongming Wang
b040171fa1 perf(wsauth): in-process cache for platform_inbound_secret reads
Heartbeats fire every 60s per workspace and were the dominant caller
of ReadPlatformInboundSecret — one DB SELECT each, purely to redeliver
the same value. For an N-workspace fleet that's N SELECTs/minute of
pure overhead, growing linearly with the fleet (#189).

This adds a sync.Map cache keyed by workspaceID with a 5-minute TTL:

- **Read-through**: cache miss → DB SELECT → populate → return.
- **Write-through**: every IssuePlatformInboundSecret call refreshes
  the cache with the new value before returning, so the lazy-heal mint
  path (readOrLazyHealInboundSecret) doesn't see a stale read of the
  value it just wrote.
- **TTL eviction**: 5 minutes — generous enough that the heartbeat
  hot path hits cache for ~5 reads in a row before re-validating, short
  enough that an out-of-band rotation (operator running
  `UPDATE workspaces SET platform_inbound_secret=...` directly)
  propagates within minutes without requiring a redeploy.
- **Absence not cached**: ErrNoInboundSecret skips the cache write so
  the lazy-heal recovery contract for the column-NULL case
  (readOrLazyHealInboundSecret in workspace_provision_shared.go) keeps
  working.

Memory footprint is bounded by the active workspace fleet (~200 bytes
per entry); deleted workspaces leave dead entries until process restart,
acceptable given workspace-deletion is operator-rare.

Why in-process instead of Redis: workspace-server runs as a single
Railway service today (per memory project_controlplane_ownership);
adding Redis for this single column read would be over-engineering.
The cache is a self-contained, Redis-free upgrade that keeps the same
semantic surface (read returns the latest secret) while collapsing
the heartbeat read storm. If the deployment ever fans out across
replicas, an operator-side rotation propagates per-replica TTL-bounded
without needing a shared write log.

Tests: 5 new cases covering cache hit within TTL, refresh after TTL
(simulating an operator rotation via SQL), write-through on Issue,
absence-not-cached, and Reset clearing all entries. The setupMock
helper in wsauth and setupTestDB helper in handlers both call
ResetInboundSecretCacheForTesting() at start + cleanup so write-through
state from one test doesn't shadow SELECT expectations in the next.
SetInboundSecretCacheNowForTesting() exposes a deterministic clock
override so the TTL test doesn't sleep.

Task: #189.
2026-05-03 00:04:38 -07:00
Hongming Wang
c4f64a11a8
Merge pull request #2546 from Molecule-AI/fix/provisioner-repull-moving-tags
fix(provisioner): force re-pull of moving image tags on workspace start
2026-05-03 06:59:36 +00:00
Hongming Wang
552602e462 fix(provisioner): force re-pull of moving image tags on workspace start
Previously Start() only pulled when the image was missing locally
(imgErr != nil). Once a tenant's Docker daemon had `:latest` cached,
it stuck on that snapshot forever even after publish-runtime pushed
a newer image with the same tag — the same image-cache class that
sibling task #232 closed on the controlplane redeploy path.

Now Start() additionally re-pulls when the tag is "moving"
(`:latest`, no tag, `:staging`, `:main`, `:dev`, `:edge`, `:nightly`,
`:rolling`). Pinned tags (semver, sha-prefixed, date-stamped, build-id)
and digest-pinned references (`@sha256:...`) skip the pull because
their contents are by definition immutable.

The classifier (imageTagIsMoving) is deliberately conservative on the
"moving" side — only the well-known moving tags trip it. Misclassifying
a pinned tag as moving wastes bandwidth on every provision; misclassifying
moving as pinned silently bricks the fleet on stale snapshots, which
is exactly the bug class this fix closes.

Edge cases handled:
- Registry hostname with port (`localhost:5000/foo`) — the `:5000` is
  not mistaken for a tag.
- Digest pinning (`image@sha256:...`) — never re-pulled even if a
  moving-looking tag is also present.
- Legacy local-build tags (`workspace-template:hermes`) — treated as
  pinned (no registry to move from).

Test coverage: 22 cases across all classifier shapes. No changes to
the pull-failure path (still best-effort, ContainerCreate still
surfaces the actionable "image not found" error if the pull failed
and the cache is also empty).

Task: #215. Companion to #232.
2026-05-02 23:56:32 -07:00
Hongming Wang
29261cee3d
Merge pull request #2537 from Molecule-AI/test/derive-provider-drift-gate
Test: AST drift gate for derive-provider.sh ↔ Go port
2026-05-03 06:54:22 +00:00
Hongming Wang
dfeefb0acc fix(workspace-server): vendor upstream derive-provider.sh + close 12-prefix drift
The drift gate's monorepoRoot walk-up looked for workspace-configs-templates/
which is gitignored locally and doesn't exist in this repo at all (the
canonical script lives in molecule-ai-workspace-template-hermes). Test
failed on CI from day one with "could not find monorepo root".

Two layered fixes in one PR:

1. Vendor upstream derive-provider.sh as testdata/ + drop monorepoRoot.
   The vendored copy has a header pointing operators at the upstream
   source and a one-line cp command for refresh. Test now reads two
   files (vendored shell + workspace_provision.go) via package-relative
   paths — Go test sets cwd to the package dir, so this is hermetic
   without any walk-up gymnastics.

2. Update the case-statement regex to match upstream's renamed variable
   (${_HERMES_MODEL} since v0.12.0, the resolved value of
   HERMES_INFERENCE_MODEL with a HERMES_DEFAULT_MODEL legacy fallback).
   Regex now accepts either spelling so a future rename fails loudly
   on the parser-sanity check rather than silently returning empty.

Vendoring upstream surfaced real drift the gate was designed to catch:
upstream v0.12.0 added 12 provider prefixes that deriveProviderFromModelSlug
didn't handle (xai/grok, bedrock/aws, tencent/tencent-tokenhub, gmi,
qwen-oauth, lmstudio/lm-studio, minimax-oauth, alibaba-coding-plan,
google-gemini-cli, openai-codex, copilot-acp, copilot). Without these,
Save+Restart on a workspace using one of those prefixes would persist
LLM_PROVIDER="" and the next boot would fall back to derive-provider.sh's
runtime *=auto branch — losing the user's explicit choice on every restart.

Added all 12 case clauses + 16 new table-driven test cases (covering
both canonical and aliased forms). Drift gate now passes; future
upstream additions will fail loudly with a "DRIFT: ..." message
pointing the engineer at the missing case.

Task: #242
2026-05-02 23:51:23 -07:00
Hongming Wang
284012a768 test(workspace-server): AST drift gate for derive-provider.sh ↔ Go port
PR #2535 added a Go port of derive-provider.sh
(deriveProviderFromModelSlug) so workspace-server can persist
LLM_PROVIDER into workspace_secrets at provision time. This created
two sources of truth — if a future PR adds a provider prefix to one
without the other, the platform's persisted LLM_PROVIDER silently
disagrees with what the container's derive-provider.sh produces at
boot, with no test going red.

This adds a hermetic drift gate that:

  1. Parses workspace-configs-templates/hermes/scripts/derive-provider.sh
     with regex (handling both single-line `pat/*) PROVIDER="x" ;;`
     clauses and multi-line conditional clauses) to build a
     map[prefix]provider.
  2. Walks workspace_provision.go's AST with go/ast, finds
     deriveProviderFromModelSlug, and extracts every case-clause
     prefix → return-string-literal pair.
  3. Cross-checks both directions and accepts only the two documented
     divergences (nousresearch/* and openai/* both → "openrouter" at
     provision time because derive-provider.sh's runtime-env checks
     aren't loaded yet) via a hardcoded acceptedDivergences map.
  4. Fails with an actionable message that names both files and
     suggests the exact fix (add the case OR add to divergence list
     with a comment).

Pattern: behavior-based AST gate from PR #2367 / memory feedback —
pin the invariant by what the function maps, not by what it's named.
Stdlib-only (go/ast, go/parser, go/token, regexp); no network, no DB,
no docker — reads two monorepo files in-process.

A second sanity-check test pins anchor prefixes the regex must find,
so a future shell-syntax change can't silently produce an empty map
and trivially pass the main gate.

Closes task #242.
2026-05-02 23:51:23 -07:00
Hongming Wang
a28f905c5a
Merge pull request #2545 from Molecule-AI/fix/configtab-single-source-of-truth-MODEL
fix(canvas): ConfigTab is single source of truth for tier/provider/model
2026-05-03 06:40:38 +00:00
Hongming Wang
bdd1d09dfb fix(canvas): tighten originalModel + pin store-flush failure-gating (review feedback)
PR #2545 self-review findings.

(1) originalModel was set from wsMetadataModel alone. On a hermes/pre-#240
workspace where MODEL_PROVIDER was never written but YAML has
runtime_config.model: "something", originalModel="" while the form
rendered "something" — handleSave's diff fired /model PUT on every
unrelated save (tier change → workspace auto-restart). Snapshot from
the actual rendered model in BOTH loadConfig branches so the diff
stays scoped to user-initiated changes.

(2) The store-flush test asserted the call happened but didn't pin
success-gating. A future refactor wrapping the PATCH in try/catch and
unconditionally calling updateNodeData would have shipped green and
left the badge lying about server-rejected writes. New test pins the
PATCH-rejects-no-flush invariant.

(3) Hermes-edge regression test for (1).

All 1214 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:37:52 -07:00
Hongming Wang
7f0c58d563 fix(canvas): ConfigTab is single source of truth for tier/provider/model
Three drift bugs in ConfigTab + ProviderModelSelector. Same root pattern:
the form's display, the diff baseline, and the canvas store all read or
write from different copies of the same data, so what the user sees and
what the runtime actually uses can diverge silently.

(1) currentModelId read runtime_config.model first; loadConfig overrode
only top-level config.model. With template YAML `runtime_config.model:
sonnet` and live MODEL_PROVIDER=`MiniMax-M2`, the form rendered
"Claude Code subscription / Claude Sonnet (OAuth)" while the container
env (and chat) used MiniMax-M2. Fix: loadConfig propagates
wsMetadataModel into BOTH places.

(2) handleSave's nextModel-vs-oldModel diff compared the form value to
the YAML default. After (1) mirrors wsMetadataModel into the form's
runtime_config.model for display, that diff was always non-zero on
no-op saves and would fire /model PUT — which auto-restarts. New
originalModel state tracks the loaded MODEL_PROVIDER and is the diff
baseline.

(3) handleSave PATCHed the workspace row but never pushed the same
fields into useCanvasStore.updateNodeData. User picked T3, hit Save &
Restart, DB updated to tier=3, header pill kept showing T2 until full
hydrate. Fix: mirror dbPatch into the store.

Bonus: ProviderModelSelector.handleProviderChange used to auto-default
the model to next.models[0] (alphabetically first) when switching
providers. User picked the MiniMax provider intending MiniMax-M2.7;
the form silently set MiniMax-M2 (first in the bucket) and the
workspace deployed with the wrong model. Now empty-default for
multi-model providers, force explicit pick — Save/Deploy already gate
on model.trim() === "".

Three new tests in ConfigTab.provider.test.tsx pin (1)/(2)/(3); two
existing ProviderModelSelector tests updated to reflect the no-silent-
default behaviour, with a new single-model-auto-pick test for the
0-vs-many boundary. 1212/1212 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:31:02 -07:00
Hongming Wang
5259ce3ea1
Merge pull request #2544 from Molecule-AI/fix/templates-yaml-log-and-coexistence-test
fix(workspace-server): log silent yaml.Unmarshal + coexistence test (#256, #257)
2026-05-03 06:04:51 +00:00