Commit Graph

3389 Commits

Author SHA1 Message Date
Hongming Wang
593f2bd2be
Merge pull request #2315 from Molecule-AI/auto/issue-2312-pr-c-platform-forward
feat(chat_files): rewrite Upload as HTTP-forward (RFC #2312, PR-C)
2026-04-29 15:30:19 -07:00
Hongming Wang
1557197289 build(runtime): register new modules in TOP_LEVEL_MODULES (RFC #2312, PR-B)
Drift gate at scripts/build_runtime_package.py asserts every workspace/*.py
appears in the TOP_LEVEL_MODULES allowlist before publishing. Without
this commit, the publish-runtime cascade would have failed on PR-B's
merge with:

  in workspace/ but NOT in TOP_LEVEL_MODULES (will ship un-rewritten):
    ['internal_chat_uploads', 'platform_inbound_auth']

This is the same incident class as the 0.1.16 transcript_auth outage
(per memory: feedback_runtime_publish_pipeline_gates.md): a new module
shipped with un-rewritten flat imports → ModuleNotFoundError on every
workspace boot.

Verified locally:

  $ python3 scripts/build_runtime_package.py --version 0.0.0-test --out /tmp/runtime-build-test
  [build] copied 66 .py files
  [build] rewrote imports in 40 files
  [build] done.

  $ grep "from molecule_runtime\." /tmp/runtime-build-test/molecule_runtime/internal_chat_uploads.py
  from molecule_runtime.platform_inbound_auth import get_inbound_secret, inbound_authorized

Refs #2312.
2026-04-29 15:05:11 -07:00
Hongming Wang
c02cb0e1b6 review: defer forward-time URL re-validation to follow-up (#2316)
Self-review found the original draft of this PR added forward-time
validateAgentURL() as defense-in-depth — paranoia layer on top of the
existing register-time gate. The validator unconditionally blocks
loopback (127.0.0.1/8), which makes httptest-based proxy tests
impossible without an env-var hatch I'd rather not add to a security-
critical path on first pass.

Trust note kept inline pointing at the upstream gate + tracking issue
so the gap is explicit, not invisible.

Refs #2312.
2026-04-29 14:33:41 -07:00
Hongming Wang
e632a31347 feat(chat_files): rewrite Upload as HTTP-forward to workspace (RFC #2312, PR-C)
Closes the SaaS upload gap (#2308) with the unified architecture from
RFC #2312: same code path on local Docker and SaaS, no Docker socket
dependency, no `dockerCli == nil` cliff. Stacked on PR-A (#2313) +
PR-B (#2314).

Before:
  Upload → findContainer (nil in SaaS) → 503

After:
  Upload → resolve workspaces.url + platform_inbound_secret
        → stream multipart to <url>/internal/chat/uploads/ingest
        → forward response back unchanged

Same call site whether the workspace runs on local docker-compose
("http://ws-<id>:8000") or SaaS EC2 ("https://<id>.<tenant>...").
The bug behind #2308 cannot exist by construction.

Why streaming, not parse-then-re-encode:
  * No 50 MB intermediate buffer on the platform
  * Per-file size + path-safety enforcement is the workspace's job
    (see workspace/internal_chat_uploads.py, PR-B)
  * Workspace's error responses (413 with offending filename, 400 on
    missing files field, etc.) propagate through unchanged

Changes:
  * workspace-server/internal/handlers/chat_files.go — Upload rewritten
    as a streaming HTTP proxy. Drops sanitizeFilename, copyFlatToContainer,
    and the entire docker-exec path. ChatFilesHandler gains an httpClient
    (broken out for test injection). Download stays docker-exec for now;
    follow-up PR will migrate it to the same shape.
  * workspace-server/internal/handlers/chat_files_external_test.go —
    deleted. Pinned the wrong-headed runtime=external 422 gate from
    #2309 (already reverted in #2311). Superseded by the proxy tests.
  * workspace-server/internal/handlers/chat_files_test.go — replaced
    sanitize-filename tests (now in workspace/tests/test_internal_chat_uploads.py)
    with sqlmock + httptest proxy tests:
      - 400 invalid workspace id
      - 404 workspace row missing
      - 503 platform_inbound_secret NULL (with RFC #2312 detail)
      - 503 workspaces.url empty
      - happy-path forward (asserts auth header, content-type forwarded,
        body streamed, response propagated back)
      - 413 from workspace propagated unchanged (NOT remapped to 500)
      - 502 on workspace unreachable (connect refused)
    Existing Download + ContentDisposition tests preserved.
  * tests/e2e/test_chat_upload_e2e.sh — single-script-everywhere E2E.
    Takes BASE as env (default http://localhost:8080). Creates a
    workspace, waits for online, mints a test token, uploads a fixture,
    reads it back via /chat/download, asserts content matches +
    bearer-required. Same script runs against staging tenants (set
    BASE=https://<id>.<tenant>.staging.moleculesai.app).

Test plan:
  * go build ./... — green
  * go test ./internal/handlers/ ./internal/wsauth/ — green (full suite)
  * tests/e2e/test_chat_upload_e2e.sh against local docker-compose
    after PR-A + PR-B + this PR all merge — TODO before merge

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:26:37 -07:00
Hongming Wang
d1de330152 feat(workspace): /internal/chat/uploads/ingest endpoint (RFC #2312, PR-B)
Stacked on PR-A (#2313). The platform-side rewrite that actually calls
this endpoint lands in PR-C; this PR adds the workspace-side consumer
+ hardening so PR-C is a small Go-only diff.

What this adds:

  * platform_inbound_auth.py — auth gate mirroring transcript_auth.py.
    Reads /configs/.platform_inbound_secret (delivered by the PR-A
    provisioner). Fail-closed when the file is missing/empty/unreadable.
    Constant-time compare via hmac.compare_digest.

  * internal_chat_uploads.py — POST /internal/chat/uploads/ingest.
    Multipart parse → sanitize each filename → write to
    /workspace/.molecule/chat-uploads/<random>-<name> with
    O_CREAT|O_EXCL|O_NOFOLLOW. Same response shape (uri/name/mimeType/
    size + workspace: URI scheme) as the legacy Go handler — canvas /
    agent code that resolves "workspace:..." paths keeps working.

  * Wired into workspace/main.py via starlette_app.add_route alongside
    the existing /transcript route.

  * python-multipart>=0.0.18 added to requirements.txt (Starlette's
    Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981).

Test coverage (36 tests, all green; full workspace suite 1266 passed):

  * test_platform_inbound_auth.py — 14 tests:
      happy path, fail-closed on missing file, empty file, whitespace-
      only file, missing/case-wrong/empty Bearer prefix, in-process
      cache, default CONFIGS_DIR fallback, end-to-end file → authorized.

  * test_internal_chat_uploads.py — 22 tests:
      sanitize_filename matrix (incl. ../traversal, CJK chars, length
      truncation), 401 on missing/wrong/no-secret-file bearer, single +
      batch upload happy paths, unique random prefix on duplicate names,
      mimetype guess fallback, 400 on missing files field, 413 on per-
      file + total-body oversize, symlink-at-target refusal (with
      sentinel-content unchanged assertion).

Why this is safe to ship before PR-C:

  * No platform-side caller yet → no behavior change visible to users.
  * Auth fails closed; nothing on the network can hit a write path
    until the platform forwards with the matching bearer.
  * Workspace's existing routes (/health, /transcript, /handle/*) are
    unchanged.

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:16:32 -07:00
Hongming Wang
1c9cea980d feat(wsauth): platform→workspace inbound secret (RFC #2312, PR-A)
Foundation for the HTTP-forward architecture that replaces Docker-exec
in chat upload + 5 follow-on handlers. This PR is intentionally scoped
to schema + token mint + provisioner wiring; no caller reads the secret
yet so behavior is unchanged.

Why a second per-workspace bearer (not reuse the existing
workspace_auth_tokens row):

  workspace_auth_tokens         workspaces.platform_inbound_secret
  ─────────────────────         ─────────────────────────────────
  workspace → platform          platform → workspace
  hash stored, plaintext gone   plaintext stored (platform reads back)
  workspace presents bearer     platform presents bearer
  platform validates by hash    workspace validates by file compare

Distinct roles, distinct rotation lifecycle, distinct audit signal —
splitting later would require a fleet-wide rolling rotation, so paying
the schema cost up front.

Changes:
  * migration 044: ADD COLUMN workspaces.platform_inbound_secret TEXT
  * wsauth.IssuePlatformInboundSecret + ReadPlatformInboundSecret
  * issueAndInjectInboundSecret hook in workspace_provision: mints
    on every workspace create / re-provision; Docker mode writes
    plaintext to /configs/.platform_inbound_secret alongside .auth_token,
    SaaS mode persists to DB only (workspace will receive via
    /registry/register response in a follow-up PR)
  * 8 unit tests against sqlmock — covers happy path, rotation, NULL
    column, empty string, missing workspace row, empty workspaceID

PR-B (next) wires up workspace-side `/internal/chat/uploads/ingest`
that validates the bearer against /configs/.platform_inbound_secret.

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:09:33 -07:00
Hongming Wang
e779a4ae7b
Merge pull request #2309 from Molecule-AI/auto/issue-2308-external-upload-422
fix(chat_files): return 422 (not misleading 503) for external workspace upload (#2308)
2026-04-29 19:45:05 +00:00
Hongming Wang
4a6095ee1a fix(chat_files): return 422 with structured detail for external workspaces (closes #2308)
Symptom: pasting a screenshot into the canvas chat for a runtime="external"
workspace returned `503 {"error":"workspace container not running"}` —
accurate from the upload handler's POV (no container exists for external
workspaces) but misleading because it implies the container has crashed.

Fix: detect runtime="external" via DB lookup BEFORE the container-find
step and return 422 with:
  - error: "file upload not supported for external workspaces"
  - detail: explains why + points at admin/secrets workaround +
    references issue #2308 for the v0.2 native-support roadmap
  - runtime: "external" (machine-readable for clients)

Why 422 not 200/501:
- 422 = Unprocessable Entity — the request is well-formed but the
  workspace's runtime can't accept it. Standard REST semantics.
- 200 with empty result would lie; 501 implies the API itself is
  unimplemented (it's not — works for non-external workspaces); 503
  was the misleading status this PR fixes.

Verified via live E2E against localhost:
- Created `runtime=external,external=true` workspace
- Posted multipart to /workspaces/:id/chat/uploads
- Got 422 with the expected structured body

Unit test (`chat_files_external_test.go`) pins the contract via sqlmock
+ httptest. Notable: the handler is constructed with `templates: nil`
to prove the runtime check happens BEFORE any docker plumbing — if a
future change moves the check below findContainer, the test crashes
on nil-deref instead of silently regressing.

Out of scope (for v0.2 follow-up):
- Native external-workspace file ingest via artifacts table or the
  channel-plugin's inbox/ pattern. Requires separate design pass.

Closes #2308

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:37:49 -07:00
Hongming Wang
2dcacb8f6b
Merge pull request #2302 from Molecule-AI/auto/issue-1815-canvas-coverage-ci-step
ci(canvas): wire vitest --coverage into CI for baseline observability (#1815)
2026-04-29 18:26:41 +00:00
Hongming Wang
c2191684bf
Merge pull request #2303 from Molecule-AI/auto/issue-1770-pre-commit-go-build
fix(pre-commit): add go build gate for staged Go changes (#1770)
2026-04-29 18:01:56 +00:00
Hongming Wang
2d1b15ecbc fix(pre-commit): add go build ./... gate for staged Go changes (#1770)
Catches the bot-generated-structurally-invalid-Go class that took
staging Platform(Go) red for hours on 2026-04-22 (PR #1769 commit
66ea0b64 nested a function declaration inside another function's body).
The patch tool applied it; the Go parser rejected it; every Go PR
targeting staging during the window failed CI through no fault of its
own.

Hook now runs `cd workspace-server && go build ./...` when any .go
file in workspace-server/ is staged. If the build fails, commit is
rejected with the first 20 lines of build output. Skip-with-warning
when go isn't installed (CI runners + bots without go bypass cleanly).

Cost: ~5-10s per commit that touches Go on a warm cache. Acceptable
for the class of bug it catches — the alternative (catch at PR-time
via CI) is too late, the malformed commit is already shared.

This is one of the three guards proposed in #1770. The other two
(branch-protection on `Platform (Go)` as required check; SHARED_RULES
clarification on bot-PR overrides) are admin / process changes that
need your action.

Closes the pre-commit half of #1770. Branch-protection + SHARED_RULES
work tracks separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:12:22 -07:00
Hongming Wang
9785f5ebb1
Merge pull request #2301 from Molecule-AI/auto/issue-2060-content-routing-doc
docs(CONTRIBUTING): add 'What goes where' section for content routing (#2060)
2026-04-29 16:22:36 +00:00
Hongming Wang
949b1b97a5
Merge pull request #2300 from Molecule-AI/auto/issues-2269-2268-restartstates-leak-and-since-secs
fix(workspace_crud) + feat(activity): restartStates leak (#2269) + since_secs param (#2268)
2026-04-29 16:22:34 +00:00
Hongming Wang
b733cf46c3
Merge pull request #2298 from Molecule-AI/fix/issue-2290-required-env-no-force-bypass
fix(org-import): remove force=true bypass of required-env preflight (#2290)
2026-04-29 16:22:32 +00:00
Hongming Wang
9a0d440fb7
Merge pull request #2296 from Molecule-AI/auto/issue-2270-readme-activity-rename
docs(scripts): rename /heartbeat-history → /activity in README
2026-04-29 16:22:30 +00:00
Hongming Wang
22326d6591
Merge pull request #2295 from Molecule-AI/auto/issue-2289-git-path-shutil-which
fix: resolve git/gh from PATH instead of hardcoded /usr/local/bin (closes #2289)
2026-04-29 16:22:28 +00:00
Hongming Wang
d8210514c1 ci(canvas): wire vitest --coverage into CI for baseline observability (#1815)
Step 2 of #1815. Step 1 (instrumentation in canvas/vitest.config.ts)
already shipped — the inline comment there explicitly defers wiring
into CI to a follow-up because turning on a 70% threshold blind would
either fail CI immediately or paper over a real gap with an ad-hoc
exclude list.

This PR ships the observability half:
- Replaces `npx vitest run` with `npx vitest run --coverage` in the
  canvas-build job. Coverage gets reported on every PR; no threshold
  gate yet (vitest.config.ts intentionally doesn't set thresholds).
- Adds an artifact upload step for canvas/coverage/ (HTML + json-summary)
  so reviewers can browse the coverage report from any PR. 7-day
  retention; if-no-files-found=warn so a step skip doesn't fail.

Step 3 (thresholds + hard gate) is the natural follow-up — track in a
new sub-issue once we've seen ~5-10 PRs of baseline data and know
where current coverage sits. The issue body proposed lines:70 /
functions:70 / branches:65 / statements:70; that may need adjustment
once the baseline lands.

Closes the Step-2 portion of #1815. Step 3 stays open or gets a fresh
issue depending on your preference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:51:34 -07:00
Hongming Wang
bfa54e2ee7 docs(CONTRIBUTING): add 'What goes where' section for content routing
Adds a prominent section to CONTRIBUTING.md documenting that public
content (blog, marketing, OG images, SEO briefs, DevRel demos) belongs
in Molecule-AI/docs, not molecule-core. Mirrors the routing cheat-sheet
from #2060 with the table of content-type → target repo, and points
contributors at the existing `Block forbidden paths` CI gate as the
loud-fail signal.

Per the issue: 11 content PRs were silently blocked over 48h before
being closed and redirected. This in-repo notice gives contributors
(human and agent) a discoverable spot to learn the rule before opening
the wrong PR. The CI gate is already enforcing the policy; this just
makes the rule self-service.

Closes #2060

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:51:30 -07:00
Hongming Wang
9559118678 feat(activity): accept ?since_secs= for time-window filtering (#2268)
The harness runner (scripts/measure-coordinator-task-bounds-runner.sh)
calls `/workspaces/:id/activity?since_secs=$A2A_TIMEOUT` to scope a
trace to a specific test window. The query param was silently
ignored — `ActivityHandler.List` accepted only `type`, `source`, and
`limit`, so the runner got the most-recent-100 events regardless of
how long ago they happened. Works for fresh-tenant tests where
activity_logs is ~empty pre-run, breaks on busy tenants and on tests
that exceed 100 events.

Adds `since_secs` parsing with three behaviors:

- Valid positive int → `AND created_at >= NOW() - make_interval(secs => $N)`
  on the SQL. Parameterised; values bound via lib/pq, not interpolated.
  `make_interval(secs => $N)` is required — the `INTERVAL '$N seconds'`
  literal form rejects placeholder substitution inside the string.
- Above 30 days (2_592_000s) → silently clamped to the cap. Defends
  against a paranoid client triggering a multi-month full-table scan
  via `since_secs=999999999`.
- Negative, zero, or non-integer → 400 with a structured error, NOT
  silently dropped. Silent drop is exactly the bug this is fixing
  — a typoed param shouldn't be lost as most-recent-100.

Tests cover all four paths: accepted (with arg-binding assertion via
sqlmock.WithArgs), clamped at 30 days, invalid rejected (5 sub-cases),
and omitted (verifies no extra clause / arg leak via strict WithArgs
count).

RFC #2251 §V1.0 step 6 (platform-side-transition audit) also depends
on this for time-window filtering of activity_logs.

Closes #2268

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:53:52 -07:00
Hongming Wang
f75599eba9 fix(workspace_crud): drop restartStates entries on workspace delete (#2269)
Per-workspace `restartState` entries (introduced under the name
`restartMu` pre-#2266, renamed to `restartStates` in #2266) are
created via `LoadOrStore` in `workspace_restart.go` but never
deleted. On a long-running platform process serving many short-lived
workspaces (E2E tests, transient sandbox tenants), the sync.Map grows
monotonically — ~16 bytes per workspace ever created.

Fix: call `restartStates.Delete(wsID)` after stopAndRemove +
ClearWorkspaceKeys for each cascaded descendant and the parent. Mirrors
the existing per-ID cleanup loop. `sync.Map.Delete` is safe on absent
keys, so workspaces that were never restarted (no LoadOrStore call)
are no-op.

This is a pre-existing leak — #2266 did not introduce it; just renamed
the holder. Filing as a separate commit to keep the change minimal and
reviewable.

Closes #2269

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 05:53:34 -07:00
Hongming Wang
80c612d987 fix(org-import): remove force=true bypass of required-env preflight
The pre-#2290 \`force: true\` flag on POST /org/import skipped the
required-env preflight, letting orgs import without their declared
required keys (e.g. ANTHROPIC_API_KEY). The ux-ab-lab incident: that
import path was used, the org shipped without ANTHROPIC_API_KEY in
global_secrets, and every workspace 401'd on the first LLM call.

Per #2290 picks (C/remove/both):
- Q1=C: template-derived required_env (no schema change — already
  the existing aggregation via collectOrgEnv).
- Q2=remove: drop the bypass entirely. The seed/dev-org flow that
  legitimately needs to skip becomes a separate dry-run-import path
  with its own audit trail, not a permission bypass.
- Q3=block-at-import-only: provision-time drift logging is a
  follow-up; for this PR, blocking at import is the gate.

Surface change:
- Force field removed from POST /org/import request body.
- 412 \"suggestion\" text drops the \"or pass force=true\" guidance.
- Legacy callers sending {\"force\": true} are silently tolerated
  (Go's json.Unmarshal drops unknown fields), so no client-side
  breakage; the bypass effect is just gone.

Audited callers in this repo:
- canvas/src/components/TemplatePalette.tsx — never sends force.
- scripts/post-rebuild-setup.sh — never sends force.
- Only external tooling sent force=true. Those callers must now set
  the global secret via POST /settings/secrets before importing.

Adds TestOrgImport_ForceFieldRemoved as a structural pin: if a future
change re-adds Force to the body struct, the test fails and forces an
explicit reckoning with the #2290 rationale.

Closes #2290

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 03:23:23 -07:00
Hongming Wang
4742b6c3f4
Merge pull request #2263 from Molecule-AI/auto-sync/main-d35a2420
chore: sync main → staging (auto, ff to d35a2420)
2026-04-29 02:32:35 -07:00
Hongming Wang
499fed5080 docs(scripts): rename /heartbeat-history → /activity in README
PR #2265 renamed the harness trace endpoint and event name; sync the
cross-repo scripts/README.md to match.

Closes #2270

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:23:00 -07:00
Hongming Wang
59d65ba557 fix: resolve git/gh from PATH instead of hardcoded /usr/local/bin
Closes #2289.

Some workspace template images ship `/usr/local/bin/{git,gh}` wrappers
that bake `GH_TOKEN` into argv handling (preferred — auto-PR creation
authenticates without explicit token plumbing); other templates have
plain `/usr/bin/git` installed via apt with no wrapper. The hardcoded
`_GIT = "/usr/local/bin/git"` crashed every auto-push attempt on the
latter image class:

    FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/git'
      File "/app/molecule_runtime/executor_helpers.py", line 524, in _auto_push_and_pr_sync
        subprocess.run(['/usr/local/bin/git', 'rev-parse', '--is-inside-work-tree'], ...)

`shutil.which("git")` walks PATH in order — finds the `/usr/local/bin/`
wrapper first when it exists, falls back to `/usr/bin/git` otherwise.
GH_TOKEN injection still wins on wrapper-equipped images; auto-push
no longer crashes on bare-apt images.

Verified locally: `shutil.which("git")` resolves to `/usr/bin/git` on
the bug-reporter's image; `shutil.which("gh")` resolves to the
homebrew path on dev. Both paths exist + are executable on respective
hosts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:07:19 -07:00
42cf4f444c
Merge pull request #2271 from Molecule-AI/feat/new-response-message-helper
feat(runtime): add new_response_message helper for adapter A2A responses
2026-04-29 08:33:11 +00:00
Hongming Wang
a57382e918 feat(runtime): add new_response_message helper for adapter A2A responses
Surfaced via cross-template review of the a2a-sdk v0→v1 migration:
every adapter executor (claude-code, gemini-cli, crewai, openclaw,
autogen) builds A2A response Messages independently using
`new_text_message(text)` from the SDK, which omits `task_id` and
`context_id`. The runtime's own canonical pattern in
`workspace/a2a_executor.py:466-475` correctly threads both:

    Message(
        message_id=uuid.uuid4().hex,
        role=Role.ROLE_AGENT,
        parts=_parts,
        task_id=task_id,           # ← canonical
        context_id=context_id,     # ← canonical
    )

Adapters skipping these correlation fields means the platform's a2a
proxy can't reliably tie the response back to the originating task.
This is a divergence from canonical, not necessarily a strict bug
(task_id may be optional with a default) — but it's enough of a
correlation/observability gap that the canonical pattern bothers to
thread it.

Add `new_response_message(context, text, files=None)` to
executor_helpers.py — single home for response Message construction.
Templates can migrate from `new_text_message(text)` to this helper
in stacked PRs once the runtime publishes to PyPI.

The helper:
  - Reads `context.task_id`/`context.context_id` from the inbound
    RequestContext, falling back to fresh UUIDs (RequestContextBuilder
    always sets them in production; fallback is for unit tests).
  - Sets `role=Role.ROLE_AGENT` (the v1 enum value).
  - Builds text Parts via `Part(text=...)` and file Parts via
    `Part(url="workspace:<path>", filename=..., media_type=...)`.
  - Returns a v1 protobuf Message ready for
    `event_queue.enqueue_event(...)`.

Why "files=None" with the workspace: URI scheme as the file Part
shape: matches the canonical pattern in a2a_executor.py exactly so
the platform's chat-attachment download path (executor_helpers.py
`resolve_attachment_uri`) interprets responses uniformly across all
adapters.

Tests (5, all pass with --no-cov against the live runtime image):
  - test_new_response_message_text_only
  - test_new_response_message_with_files
  - test_new_response_message_files_only_no_text
  - test_new_response_message_falls_back_when_context_ids_unset
  - test_new_response_message_handles_missing_attrs

The conftest's a2a stubs needed an extension for Message + Role +
Part with kwargs preservation. Strictly additive — no existing tests
affected. (The 19 pre-existing failures in test_executor_helpers.py
are unrelated debt from the commit_memory/recall_memory rewrite,
visible on staging baseline before this change.)

Per-template migration is the follow-up: claude-code, gemini-cli,
crewai, openclaw, autogen all call `new_text_message(text)` today;
each gets a per-repo PR replacing it with
`new_response_message(context, text)`. This PR ships the helper
first so the templates have something to import.

Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix),
gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:13:34 -07:00
aa6c42f042
Merge pull request #2267 from Molecule-AI/fix/restart-panic-recovery
fix(restart): clear running flag on panic in cycle()
2026-04-29 07:10:44 +00:00
Hongming Wang
bdfa45572e fix(restart): clear running flag on panic in cycle()
Self-review caught a regression I introduced in #2266: if cycle() panics
(e.g. a future provisionWorkspace nil-deref or any runtime error from
the DB / Docker / encryption stacks it touches), the loop never reaches
`state.running = false`. The flag stays true forever, the early-return
guard at the top of coalesceRestart fires for every subsequent call,
and that workspace is permanently locked out of restarts until the
platform process restarts.

The pre-fix code had similar exposure (panic killed the goroutine
before defer wsMu.Unlock() ran in some Go versions), but my pending-
flag version made it worse: the guard is sticky, not ephemeral.

Fix: defer the state-clear so it always runs on exit, including panic.
Recover (and DON'T re-raise) so the panic doesn't propagate to the
goroutine boundary and crash the whole platform process — RestartByID
is always called via `go h.RestartByID(...)` from HTTP handlers, and
an unrecovered goroutine panic in Go terminates the program. Crashing
the platform for every tenant because one workspace's cycle panicked
is the wrong availability tradeoff. The panic message + full stack
trace via runtime/debug.Stack() are still logged for debuggability.

Regression test in TestCoalesceRestart_PanicInCycleClearsState:

  1. First call's cycle panics. coalesceRestart's defer must swallow
     the panic — assert no panic propagates out (would crash the
     platform process from a goroutine in production).
  2. Second call must run a fresh cycle (proves running was cleared).

All 7 tests pass with -race -count=10.

Surfaced via /code-review-and-quality self-review of #2266; the
re-raise-after-recover anti-pattern (originally argued as "don't
mask bugs") came up in the comprehensive review and was corrected
to log-with-stack-and-suppress for availability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:00:12 -07:00
480b0576f2
Merge pull request #2265 from Molecule-AI/fix/harness-runner-activity-endpoint
fix(harness-runner): switch from non-existent /heartbeat-history to /activity
2026-04-29 06:53:54 +00:00
3623e975ec
Merge pull request #2266 from Molecule-AI/fix/restart-race-pending-flag
fix(restart): coalesce concurrent restart requests via pending flag
2026-04-29 06:53:40 +00:00
Hongming Wang
f088090b27 fix(restart): coalesce concurrent restart requests via pending flag
The naive mutex-with-TryLock pattern in RestartByID was silently dropping
the second of two close-together restart requests. SetSecret and SetModel
both fire `go restartFunc(...)` from their HTTP handlers, and both DB
writes commit before either restart goroutine reaches loadWorkspaceSecrets.
If the second goroutine arrives while the first holds the per-workspace
mutex, TryLock returns false and the second is logged-and-dropped:

    Auto-restart: skipping <id> — restart already in progress

The first goroutine's loadWorkspaceSecrets ran before the second write
committed, so the new container boots without that env var. Surfaced
during the RFC #2251 V1.0 measurement as hermes returning "No LLM
provider configured" when MODEL_PROVIDER landed after the API-key write
and lost its restart to the mutex (HERMES_DEFAULT_MODEL absent →
start.sh fell back to nousresearch/hermes-4-70b → derived
provider=openrouter → no OPENROUTER_API_KEY → request-time error).
The same race hits any back-to-back secret/model save flow including
the canvas's "set MiniMax key + pick model" UX.

Fix: pending-flag / coalescing pattern. Any restart request that arrives
while one is in flight sets `pending=true` and returns. The in-flight
runner, on completion, checks the flag and runs another cycle. This
collapses N concurrent requests into at most 2 sequential cycles (the
current one + one more that picks up everyone who arrived during it),
while guaranteeing the final container always sees the latest secrets.

Concrete contract:

  - 1 request, no concurrency:        1 cycle
  - N concurrent requests during 1 in-flight cycle: 2 cycles total
  - N sequential requests (no overlap):              N cycles
  - Per-workspace state — different workspaces never serialize

Coalescing is extracted into `coalesceRestart(workspaceID, cycle func())`
so the gate logic is testable without the full WorkspaceHandler / DB /
provisioner stack. RestartByID now wraps that with the production cycle
function. runRestartCycle calls provisionWorkspace SYNCHRONOUSLY (drops
the historical `go`) so the loop's pending-flag check happens AFTER the
new container is up — without that, the next cycle's Stop call would
race the previous cycle's still-spawning provision goroutine.
sendRestartContext stays async; it's a one-way notification.

Tests in workspace_restart_coalesce_test.go cover all five contract
points + race-detector clean over 10 iterations:

  - Single call → 1 cycle
  - 5 concurrent during in-flight → exactly 2 cycles total
  - 3 sequential → 3 cycles
  - Pending-during-cycle picked up (targeted bug repro)
  - State cleared after drain (running flag reset)
  - Per-workspace isolation (no cross-workspace serialization)

Refs: molecule-core#2256 (V1.0 gate measurement); root cause for the
"No LLM provider configured" symptom seen during hermes/MiniMax repro.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:31:56 -07:00
Hongming Wang
de01544d6b fix(harness-runner): switch from non-existent /heartbeat-history to /activity
The runner was speculatively calling `/workspaces/:id/heartbeat-history` —
that endpoint doesn't exist on workspace-server. On local dev it 404'd;
on tenant builds the platform's :8080 canvas-proxy fallback intercepted
it and returned 28KB of Next.js HTML which then landed in the JSON event
log. Neither outcome was useful trace data.

`GET /workspaces/:id/activity` is the existing endpoint that reads
activity_logs. That table already records the events the RFC §V1.0
step 6 'platform-side transition' check needs (a2a_send / a2a_receive /
task_update / agent_log / error, plus duration_ms + status). Rename
the runner's fetch + emitted event accordingly.

Verified: GET /workspaces/<uuid>/activity?since_secs=60 returns 200
with `[]` against the local platform; no SaaS skip needed since the
endpoint exists in both environments.

Refs: molecule-core#2256 (V1.0 gate #1 measurement comment).
2026-04-28 23:12:51 -07:00
9d159f0a94
Merge pull request #2262 from Molecule-AI/docs/registry-pattern-and-harness-readmes
docs: registry pattern + harness scripts READMEs
2026-04-29 05:35:58 +00:00
a18d116606
Merge pull request #2261 from Molecule-AI/fix/harness-cleanup-failed-event
harness: SaaS routing + provider-agnostic config for RFC #2251 measurement
2026-04-29 05:35:43 +00:00
Hongming Wang
d35a2420eb
Merge pull request #2252 from Molecule-AI/staging
staging → main: auto-promote fcd87b9
2026-04-28 22:34:16 -07:00
Hongming Wang
dd5c54dbaa fix(harness-runner): WAIT_ONLINE_SECS round-up + SaaS heartbeat skip + UUID/slug validation
Three review-driven fixes to the runner before #2261 merges:

1. `WAIT_ONLINE_SECS / 3` truncated; an operator passing 200 actually
   waited 198s. Round up so 200 → 67 polls × 3s = 201s ≥ requested.

2. The heartbeat-history endpoint isn't on tenant workspace-servers —
   the platform's :8080 fallback proxies unmatched paths to the
   canvas Next.js, so the SaaS run captured 28KB of HTML in the
   `heartbeat_trace` event log. Skip the fetch in MODE=saas; emit an
   explicit `<skipped: ...>` placeholder. Local mode behaviour
   unchanged.

3. ORG_ID and ORG_SLUG had no client-side format check, so a typo'd
   value got swallowed by TenantGuard's intentionally-opaque 404
   (which doesn't tell the operator whether slug, UUID, or auth was
   wrong). Validate UUID and slug shape up front; matching errors
   are actionable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:29:29 -07:00
Hongming Wang
ef95628202 docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:

1. workspace/platform_tools/README.md — explains the ToolSpec
   single-source-of-truth pattern (#2240), the CLI-block alignment
   gap that hand-maintained generation can't close (#2258), the
   snapshot golden files + LF-pinning (#2260), and the add/rename/
   remove playbook. The next reader who lands in
   workspace/platform_tools/ now has the design rationale + the
   safe-edit procedure colocated with the code.

2. scripts/README.md — disambiguates the three measure-coordinator-
   task-bounds.sh files that now exist across two repos:

     - scripts/measure-coordinator-task-bounds.sh        (canonical OSS, this repo)
     - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
     - scripts/measure-coordinator-task-bounds.sh        (production-shape, in molecule-controlplane)

   Cross-references reference_harness_pair_pattern (auto-memory) for
   the cross-repo design rationale. Documents the common safety
   pattern (cleanup trap, DRY_RUN, non-target guard,
   cleanup_*_failed events) and the heartbeat-trace caveat.

Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:20:00 -07:00
Hongming Wang
00e4766046 docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:

1. workspace/platform_tools/README.md — explains the ToolSpec
   single-source-of-truth pattern (#2240), the CLI-block alignment
   gap that hand-maintained generation can't close (#2258), the
   snapshot golden files + LF-pinning (#2260), and the add/rename/
   remove playbook. The next reader who lands in
   workspace/platform_tools/ now has the design rationale + the
   safe-edit procedure colocated with the code.

2. scripts/README.md — disambiguates the three measure-coordinator-
   task-bounds.sh files that now exist across two repos:

     - scripts/measure-coordinator-task-bounds.sh        (canonical OSS, this repo)
     - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
     - scripts/measure-coordinator-task-bounds.sh        (production-shape, in molecule-controlplane)

   Cross-references reference_harness_pair_pattern (auto-memory) for
   the cross-repo design rationale. Documents the common safety
   pattern (cleanup trap, DRY_RUN, non-target guard,
   cleanup_*_failed events) and the heartbeat-trace caveat.

Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:19:40 -07:00
Hongming Wang
592f47694b feat(harness): SaaS routing + provider-agnostic config for RFC #2251 measurement
The original measure-coordinator-task-bounds.sh was hardcoded for
local-dev (workspace-server on :8080) with claude-code/langgraph
templates and OPENROUTER_API_KEY. Running it against staging requires
both auth-chain plumbing (per-tenant ADMIN_TOKEN + X-Molecule-Org-Id
TenantGuard header + tenant subdomain routing) and template/secret
flexibility (e.g. Hermes/MiniMax for Token Plan keys).

This adds:

* `measure-coordinator-task-bounds-runner.sh` — separate runner that
  wraps the same workspace-server API calls but takes everything as
  env-var inputs. Two MODE values:
  - `local`   → direct workspace-server (no auth/tenant scoping)
  - `saas`    → tenant subdomain + per-tenant ADMIN_TOKEN bearer +
                X-Molecule-Org-Id TenantGuard header. Auto-fetches
                tenant token via CP /cp/admin/orgs/<slug>/admin-token
                given ORG_SLUG + CP_ADMIN_API_TOKEN, OR accepts a
                pre-resolved TENANT_ADMIN_TOKEN.

* Configurable PM_TEMPLATE / CHILD_TEMPLATE / MODEL / SECRET_NAME /
  SECRET_VALUE — defaults match the original (claude-code-default +
  langgraph + OpenRouter). Hermes/MiniMax example documented in the
  header.

* Per-poll status_change events during wait_online, so a workspace
  that never reaches online surfaces its last status (provisioning,
  failed, etc.) instead of a bare timeout.

* WAIT_ONLINE_SECS knob (default 180s; SaaS cold-start needs ~420s
  for first hermes-image pull on a freshly-provisioned EC2 tenant).

* `${args[@]+...}` guard on the api() helper — avoids `set -u`
  exploding on an empty header array (the local-dev hot-path).

The original script also gained a SECRET_VALUE block earlier in the
session — that change (separately staged) makes the secret-name
configurable without forcing every operator through the new runner.

V1.0 gate #1 (RFC #2251, Issue 4 repro) measurement results posted
as a separate comment on molecule-core#2256.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:06:18 -07:00
b78ca4ae02
Merge pull request #2259 from Molecule-AI/fix/harness-cleanup-failed-event
fix(harness): cleanup_failed event + drop misleading exit_code capture
2026-04-29 04:30:34 +00:00
5b2132b828
Merge pull request #2260 from Molecule-AI/chore/snapshot-lf-attribute
chore(gitattributes): pin LF on snapshot golden files
2026-04-29 04:30:15 +00:00
Hongming Wang
9d7bb58374 chore(gitattributes): pin LF on snapshot golden files
Self-review follow-up on #2258 (registry snapshot tests, just merged).

The byte-exact snapshot comparisons in test_platform_tools.py would
fail mysteriously on a Windows contributor's machine with
core.autocrlf=true: checkout would convert LF → CRLF, the test would
fail locally with no useful diagnostic, and the regen instructions
in the test-file header would produce LF files that disagree with
the working copy.

Pin workspace/tests/snapshots/*.txt to text eol=lf so this can't
happen. All three current snapshots are already LF; the attribute
ensures it stays that way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:01:44 -07:00
Hongming Wang
6e5b5c4142 fix(harness): cleanup_failed event + drop misleading exit_code capture
Self-review follow-ups on #2257:

- Drop `local exit_code=$?` from cleanup(). `trap`-handler return values
  are ignored, so capturing $? only misled a future reader into thinking
  exit-code preservation was happening.

- Replace silenced `>/dev/null 2>&1` DELETE with `-w '%{http_code}'`
  capture. ADMIN_TOKEN expiring mid-run was the realistic failure mode
  here — previously we swallowed it under the silenced redirect, leaving
  workspaces leaked with no signal. Now a 401/403/5xx surfaces as a
  `cleanup_failed` JSON event with a remediation hint pointing at
  cleanup-rogue-workspaces.sh; 404 is treated as success (the
  post-condition — workspace absent — holds).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:00:38 -07:00
0f50c5889b
Merge pull request #2258 from Molecule-AI/chore/registry-snapshot-and-alignment
chore(registry): snapshot tests + CLI-block alignment for #2240
2026-04-29 03:56:35 +00:00
ada0e1ddd0
Merge pull request #2253 from Molecule-AI/docs/auto-promote-staging-prereq-comment
docs(ci): document auto-promote-staging GITHUB_TOKEN PR-create prereq
2026-04-29 03:48:51 +00:00
Hongming Wang
07a17c2e59 Merge remote-tracking branch 'origin/staging' into docs/auto-promote-staging-prereq-comment
# Conflicts:
#	.github/workflows/auto-promote-staging.yml
2026-04-28 20:46:42 -07:00
9bc3d6e352
Potential fix for pull request finding 'Unused global variable'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-04-28 20:45:53 -07:00
de5828b957
Merge pull request #2257 from Molecule-AI/fix/rfc2251-harness-hardening
fix(harness): cleanup trap + tenant scoping + dry-run for #2254
2026-04-29 03:43:51 +00:00
Hongming Wang
ddf6720498 chore(registry): snapshot tests + CLI-block alignment for #2240
Two follow-ups from the #2240 code review:

1. Snapshot tests for the rendered tool-instruction blocks. The
   structural tests added in #2240 guarantee tool NAMES are present;
   these new tests pin the SHAPE — bullet ordering, heading style,
   footer placement — so a future contributor who reorders fields in
   `_render_section` or rewrites a `when_to_use` paragraph sees the
   diff in CI rather than shipping a silently-different system prompt.
   Golden files live under workspace/tests/snapshots/.

2. CLI-block alignment test + corrected source-of-truth comment.
   `_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for
   ollama and other non-MCP runtimes — the registry can't auto-generate
   it because the CLI subprocess interface uses different command
   shapes (`peers` vs `list_peers`, etc.). A new
   `_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool →
   CLI-keyword correspondence (or explicit `None` for tools not
   exposed via subprocess). Two tests enforce coverage:

     - every a2a tool in the registry is keyed in the mapping
     - every non-None subcommand keyword literally appears in
       `_A2A_INSTRUCTIONS_CLI`

   Caught one real gap: `send_message_to_user` is in the registry but
   has no CLI subcommand. Mapped to `None` with an explanatory comment.

   The "no other source of truth" claim in registry.py's docstring
   was wrong post-#2240 (the CLI block survived) — corrected to
   describe the two surfaces explicitly and point at the alignment
   tests as the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:42:15 -07:00
Hongming Wang
039a41cce3 fix(harness): cleanup trap + tenant scoping + dry-run for measure-coordinator-task-bounds
Three follow-ups from #2254 code review before the harness is safe to
run against staging:

1. Cleanup trap. Workspaces are now auto-deleted on EXIT/INT/TERM. A
   Ctrl-C mid-run no longer leaks the PM + Researcher pair against
   shared infra. KEEP_WORKSPACES=1 opts out for post-run inspection.

2. Tenant scoping + admin auth. Non-localhost PLATFORM values now
   require both ADMIN_TOKEN and TENANT_ID; the script refuses to run
   without them. The previous version sent unauthenticated POSTs that,
   on staging, would either 401 every request or — worse — provision
   into the wrong tenant. Memory `feedback_never_run_cluster_cleanup_
   tests_on_live_platform` calls out the same hazard class.

3. DRY_RUN=1 mode. Prints platform target, tenant id, auth fingerprint,
   and the planned actions, then exits before any state mutation. The
   intended pre-flight before running against staging.

Also tightened OR_KEY check (the chained default silently accepted an
empty OPENROUTER_API_KEY) and added a heartbeat-trace caveat to the
interpretation guide explaining what `<endpoint_unavailable>` means
for the bound question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:38:35 -07:00