External callers (third-party SDKs, the channel plugin) authenticate
purely via bearer and frequently don't set the X-Workspace-ID header.
Without this, activity_logs.source_id ends up NULL — breaking the
peer_id signal on notifications, the "Agent Comms by peer" canvas tab,
and any analytics that breaks down inbound A2A by sender.
The bearer is the authoritative caller identity per the wsauth contract
(it's what proves who you are); the header is a display/routing hint
that must agree with it. So we derive callerID from the bearer's owning
workspace whenever the header is absent. The existing validateCallerToken
guard fires after this and enforces token-to-callerID binding the same
way it always has.
Org-token requests are skipped — those grant org-wide access and don't
bind to a single workspace, so the canvas-class semantics (callerID="")
are preserved. Bearer-resolution failures (revoked, removed workspace)
fall through to canvas-class as well, never 401.
New wsauth.WorkspaceFromToken exposes the bearer→workspace lookup as a
modular interface; mirrors ValidateAnyToken's defense-in-depth JOIN on
workspaces.status != 'removed'.
Tests: 4 unit tests on WorkspaceFromToken + 3 integration tests on
ProxyA2A covering the three observable paths (bearer-derived,
org-token skipped, derive-failure fallthrough).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRs that don't touch canvas/** paths skip the Canvas (Next.js) job via
its `if: needs.changes.outputs.canvas == 'true'` guard. GitHub reports
SKIPPED for that conclusion. Branch protection on staging requires
Canvas (Next.js) — and treats SKIPPED as not-passed, blocking merge
on every workspace-server-only or migration-only PR.
This is the design pattern documented in feedback memory
"branch_protection_check_name_parity": split into a real job + a
no-op shadow that share the same `name:`. Exactly one runs per PR;
both report the same check context, and at least one always reports
SUCCESS, satisfying the required check.
The no-op job runs in a few seconds (single `echo` step) and produces
the right check context for any PR that has changes outside canvas/**.
Concrete blocker that prompted this: PR #2314 (RFC #2312 PR-B) sat
APPROVED + CI-green + UP-TO-DATE for half an hour with mergeStateStatus
BLOCKED, traced via the GraphQL `isRequired` field to a single
SKIPPED Canvas (Next.js) check. PRs #2319 (PR-F) and the rest of the
RFC #2312 stack would have hit the same wall.
Foundation for the HTTP-forward architecture that replaces Docker-exec
in chat upload + 5 follow-on handlers. This PR is intentionally scoped
to schema + token mint + provisioner wiring; no caller reads the secret
yet so behavior is unchanged.
Why a second per-workspace bearer (not reuse the existing
workspace_auth_tokens row):
workspace_auth_tokens workspaces.platform_inbound_secret
───────────────────── ─────────────────────────────────
workspace → platform platform → workspace
hash stored, plaintext gone plaintext stored (platform reads back)
workspace presents bearer platform presents bearer
platform validates by hash workspace validates by file compare
Distinct roles, distinct rotation lifecycle, distinct audit signal —
splitting later would require a fleet-wide rolling rotation, so paying
the schema cost up front.
Changes:
* migration 044: ADD COLUMN workspaces.platform_inbound_secret TEXT
* wsauth.IssuePlatformInboundSecret + ReadPlatformInboundSecret
* issueAndInjectInboundSecret hook in workspace_provision: mints
on every workspace create / re-provision; Docker mode writes
plaintext to /configs/.platform_inbound_secret alongside .auth_token,
SaaS mode persists to DB only (workspace will receive via
/registry/register response in a follow-up PR)
* 8 unit tests against sqlmock — covers happy path, rotation, NULL
column, empty string, missing workspace row, empty workspaceID
PR-B (next) wires up workspace-side `/internal/chat/uploads/ingest`
that validates the bearer against /configs/.platform_inbound_secret.
Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2309 added an early-return that 422'd uploads to external workspaces
with "file upload not supported." Both halves of that diagnosis were wrong:
1. External workspaces SHOULD support uploads — gating with 422
locks off intended functionality and labels it as design.
2. The 503 the user actually hit was on an INTERNAL workspace, not
an external one. The runtime check never even ran.
Real root cause (separate fix incoming):
- findContainer(...) requires a non-nil h.docker.
- In SaaS (MOLECULE_ORG_ID set), main.go selects the CP provisioner
instead of the local Docker provisioner — dockerCli is nil.
- findContainer short-circuits to "" → 503 "container not running"
on every workspace, internal or external, on Railway-hosted
SaaS where workspaces actually live on EC2.
This PR strips the misleading gate so #2308 can be re-investigated
against the real symptom. The proper fix routes the multipart upload
over HTTP to the workspace's URL when dockerCli is nil — tracked
as a follow-up.
Refs #2308.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Creates a fresh tenant via /cp/admin/orgs, provisions an internal CEO
(claude-code default) + external child as its sub-agent, registers the
child, and probes peer visibility from three angles:
- DB-shape: child appears in /workspaces?parent_id=<parent>
- /registry/<child>/peers (child's bearer): does it see parent?
- /registry/<parent>/peers (parent's bearer, if exposed)
EXIT-trap teardown sends DELETE /cp/admin/tenants/:slug with the
required {"confirm":slug} body and polls /cp/admin/orgs for purge
confirmation (mirrors test_staging_full_saas.sh).
The harness was authored as the staging counterpart to the local
two-workspace reproduction script: local doesn't generalize to
staging's tenant-proxy auth chain, so each surface needs its own probe.
Run:
MOLECULE_ADMIN_TOKEN=<CP admin bearer> tests/e2e/test_2307_peer_visibility_staging.sh
Refs #2307.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: pasting a screenshot into the canvas chat for a runtime="external"
workspace returned `503 {"error":"workspace container not running"}` —
accurate from the upload handler's POV (no container exists for external
workspaces) but misleading because it implies the container has crashed.
Fix: detect runtime="external" via DB lookup BEFORE the container-find
step and return 422 with:
- error: "file upload not supported for external workspaces"
- detail: explains why + points at admin/secrets workaround +
references issue #2308 for the v0.2 native-support roadmap
- runtime: "external" (machine-readable for clients)
Why 422 not 200/501:
- 422 = Unprocessable Entity — the request is well-formed but the
workspace's runtime can't accept it. Standard REST semantics.
- 200 with empty result would lie; 501 implies the API itself is
unimplemented (it's not — works for non-external workspaces); 503
was the misleading status this PR fixes.
Verified via live E2E against localhost:
- Created `runtime=external,external=true` workspace
- Posted multipart to /workspaces/:id/chat/uploads
- Got 422 with the expected structured body
Unit test (`chat_files_external_test.go`) pins the contract via sqlmock
+ httptest. Notable: the handler is constructed with `templates: nil`
to prove the runtime check happens BEFORE any docker plumbing — if a
future change moves the check below findContainer, the test crashes
on nil-deref instead of silently regressing.
Out of scope (for v0.2 follow-up):
- Native external-workspace file ingest via artifacts table or the
channel-plugin's inbox/ pattern. Requires separate design pass.
Closes#2308
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catches the bot-generated-structurally-invalid-Go class that took
staging Platform(Go) red for hours on 2026-04-22 (PR #1769 commit
66ea0b64 nested a function declaration inside another function's body).
The patch tool applied it; the Go parser rejected it; every Go PR
targeting staging during the window failed CI through no fault of its
own.
Hook now runs `cd workspace-server && go build ./...` when any .go
file in workspace-server/ is staged. If the build fails, commit is
rejected with the first 20 lines of build output. Skip-with-warning
when go isn't installed (CI runners + bots without go bypass cleanly).
Cost: ~5-10s per commit that touches Go on a warm cache. Acceptable
for the class of bug it catches — the alternative (catch at PR-time
via CI) is too late, the malformed commit is already shared.
This is one of the three guards proposed in #1770. The other two
(branch-protection on `Platform (Go)` as required check; SHARED_RULES
clarification on bot-PR overrides) are admin / process changes that
need your action.
Closes the pre-commit half of #1770. Branch-protection + SHARED_RULES
work tracks separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 2 of #1815. Step 1 (instrumentation in canvas/vitest.config.ts)
already shipped — the inline comment there explicitly defers wiring
into CI to a follow-up because turning on a 70% threshold blind would
either fail CI immediately or paper over a real gap with an ad-hoc
exclude list.
This PR ships the observability half:
- Replaces `npx vitest run` with `npx vitest run --coverage` in the
canvas-build job. Coverage gets reported on every PR; no threshold
gate yet (vitest.config.ts intentionally doesn't set thresholds).
- Adds an artifact upload step for canvas/coverage/ (HTML + json-summary)
so reviewers can browse the coverage report from any PR. 7-day
retention; if-no-files-found=warn so a step skip doesn't fail.
Step 3 (thresholds + hard gate) is the natural follow-up — track in a
new sub-issue once we've seen ~5-10 PRs of baseline data and know
where current coverage sits. The issue body proposed lines:70 /
functions:70 / branches:65 / statements:70; that may need adjustment
once the baseline lands.
Closes the Step-2 portion of #1815. Step 3 stays open or gets a fresh
issue depending on your preference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a prominent section to CONTRIBUTING.md documenting that public
content (blog, marketing, OG images, SEO briefs, DevRel demos) belongs
in Molecule-AI/docs, not molecule-core. Mirrors the routing cheat-sheet
from #2060 with the table of content-type → target repo, and points
contributors at the existing `Block forbidden paths` CI gate as the
loud-fail signal.
Per the issue: 11 content PRs were silently blocked over 48h before
being closed and redirected. This in-repo notice gives contributors
(human and agent) a discoverable spot to learn the rule before opening
the wrong PR. The CI gate is already enforcing the policy; this just
makes the rule self-service.
Closes#2060
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The harness runner (scripts/measure-coordinator-task-bounds-runner.sh)
calls `/workspaces/:id/activity?since_secs=$A2A_TIMEOUT` to scope a
trace to a specific test window. The query param was silently
ignored — `ActivityHandler.List` accepted only `type`, `source`, and
`limit`, so the runner got the most-recent-100 events regardless of
how long ago they happened. Works for fresh-tenant tests where
activity_logs is ~empty pre-run, breaks on busy tenants and on tests
that exceed 100 events.
Adds `since_secs` parsing with three behaviors:
- Valid positive int → `AND created_at >= NOW() - make_interval(secs => $N)`
on the SQL. Parameterised; values bound via lib/pq, not interpolated.
`make_interval(secs => $N)` is required — the `INTERVAL '$N seconds'`
literal form rejects placeholder substitution inside the string.
- Above 30 days (2_592_000s) → silently clamped to the cap. Defends
against a paranoid client triggering a multi-month full-table scan
via `since_secs=999999999`.
- Negative, zero, or non-integer → 400 with a structured error, NOT
silently dropped. Silent drop is exactly the bug this is fixing
— a typoed param shouldn't be lost as most-recent-100.
Tests cover all four paths: accepted (with arg-binding assertion via
sqlmock.WithArgs), clamped at 30 days, invalid rejected (5 sub-cases),
and omitted (verifies no extra clause / arg leak via strict WithArgs
count).
RFC #2251 §V1.0 step 6 (platform-side-transition audit) also depends
on this for time-window filtering of activity_logs.
Closes#2268
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-workspace `restartState` entries (introduced under the name
`restartMu` pre-#2266, renamed to `restartStates` in #2266) are
created via `LoadOrStore` in `workspace_restart.go` but never
deleted. On a long-running platform process serving many short-lived
workspaces (E2E tests, transient sandbox tenants), the sync.Map grows
monotonically — ~16 bytes per workspace ever created.
Fix: call `restartStates.Delete(wsID)` after stopAndRemove +
ClearWorkspaceKeys for each cascaded descendant and the parent. Mirrors
the existing per-ID cleanup loop. `sync.Map.Delete` is safe on absent
keys, so workspaces that were never restarted (no LoadOrStore call)
are no-op.
This is a pre-existing leak — #2266 did not introduce it; just renamed
the holder. Filing as a separate commit to keep the change minimal and
reviewable.
Closes#2269
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-#2290 \`force: true\` flag on POST /org/import skipped the
required-env preflight, letting orgs import without their declared
required keys (e.g. ANTHROPIC_API_KEY). The ux-ab-lab incident: that
import path was used, the org shipped without ANTHROPIC_API_KEY in
global_secrets, and every workspace 401'd on the first LLM call.
Per #2290 picks (C/remove/both):
- Q1=C: template-derived required_env (no schema change — already
the existing aggregation via collectOrgEnv).
- Q2=remove: drop the bypass entirely. The seed/dev-org flow that
legitimately needs to skip becomes a separate dry-run-import path
with its own audit trail, not a permission bypass.
- Q3=block-at-import-only: provision-time drift logging is a
follow-up; for this PR, blocking at import is the gate.
Surface change:
- Force field removed from POST /org/import request body.
- 412 \"suggestion\" text drops the \"or pass force=true\" guidance.
- Legacy callers sending {\"force\": true} are silently tolerated
(Go's json.Unmarshal drops unknown fields), so no client-side
breakage; the bypass effect is just gone.
Audited callers in this repo:
- canvas/src/components/TemplatePalette.tsx — never sends force.
- scripts/post-rebuild-setup.sh — never sends force.
- Only external tooling sent force=true. Those callers must now set
the global secret via POST /settings/secrets before importing.
Adds TestOrgImport_ForceFieldRemoved as a structural pin: if a future
change re-adds Force to the body struct, the test fails and forces an
explicit reckoning with the #2290 rationale.
Closes#2290
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2265 renamed the harness trace endpoint and event name; sync the
cross-repo scripts/README.md to match.
Closes#2270
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2289.
Some workspace template images ship `/usr/local/bin/{git,gh}` wrappers
that bake `GH_TOKEN` into argv handling (preferred — auto-PR creation
authenticates without explicit token plumbing); other templates have
plain `/usr/bin/git` installed via apt with no wrapper. The hardcoded
`_GIT = "/usr/local/bin/git"` crashed every auto-push attempt on the
latter image class:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/git'
File "/app/molecule_runtime/executor_helpers.py", line 524, in _auto_push_and_pr_sync
subprocess.run(['/usr/local/bin/git', 'rev-parse', '--is-inside-work-tree'], ...)
`shutil.which("git")` walks PATH in order — finds the `/usr/local/bin/`
wrapper first when it exists, falls back to `/usr/bin/git` otherwise.
GH_TOKEN injection still wins on wrapper-equipped images; auto-push
no longer crashes on bare-apt images.
Verified locally: `shutil.which("git")` resolves to `/usr/bin/git` on
the bug-reporter's image; `shutil.which("gh")` resolves to the
homebrew path on dev. Both paths exist + are executable on respective
hosts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced via cross-template review of the a2a-sdk v0→v1 migration:
every adapter executor (claude-code, gemini-cli, crewai, openclaw,
autogen) builds A2A response Messages independently using
`new_text_message(text)` from the SDK, which omits `task_id` and
`context_id`. The runtime's own canonical pattern in
`workspace/a2a_executor.py:466-475` correctly threads both:
Message(
message_id=uuid.uuid4().hex,
role=Role.ROLE_AGENT,
parts=_parts,
task_id=task_id, # ← canonical
context_id=context_id, # ← canonical
)
Adapters skipping these correlation fields means the platform's a2a
proxy can't reliably tie the response back to the originating task.
This is a divergence from canonical, not necessarily a strict bug
(task_id may be optional with a default) — but it's enough of a
correlation/observability gap that the canonical pattern bothers to
thread it.
Add `new_response_message(context, text, files=None)` to
executor_helpers.py — single home for response Message construction.
Templates can migrate from `new_text_message(text)` to this helper
in stacked PRs once the runtime publishes to PyPI.
The helper:
- Reads `context.task_id`/`context.context_id` from the inbound
RequestContext, falling back to fresh UUIDs (RequestContextBuilder
always sets them in production; fallback is for unit tests).
- Sets `role=Role.ROLE_AGENT` (the v1 enum value).
- Builds text Parts via `Part(text=...)` and file Parts via
`Part(url="workspace:<path>", filename=..., media_type=...)`.
- Returns a v1 protobuf Message ready for
`event_queue.enqueue_event(...)`.
Why "files=None" with the workspace: URI scheme as the file Part
shape: matches the canonical pattern in a2a_executor.py exactly so
the platform's chat-attachment download path (executor_helpers.py
`resolve_attachment_uri`) interprets responses uniformly across all
adapters.
Tests (5, all pass with --no-cov against the live runtime image):
- test_new_response_message_text_only
- test_new_response_message_with_files
- test_new_response_message_files_only_no_text
- test_new_response_message_falls_back_when_context_ids_unset
- test_new_response_message_handles_missing_attrs
The conftest's a2a stubs needed an extension for Message + Role +
Part with kwargs preservation. Strictly additive — no existing tests
affected. (The 19 pre-existing failures in test_executor_helpers.py
are unrelated debt from the commit_memory/recall_memory rewrite,
visible on staging baseline before this change.)
Per-template migration is the follow-up: claude-code, gemini-cli,
crewai, openclaw, autogen all call `new_text_message(text)` today;
each gets a per-repo PR replacing it with
`new_response_message(context, text)`. This PR ships the helper
first so the templates have something to import.
Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix),
gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review caught a regression I introduced in #2266: if cycle() panics
(e.g. a future provisionWorkspace nil-deref or any runtime error from
the DB / Docker / encryption stacks it touches), the loop never reaches
`state.running = false`. The flag stays true forever, the early-return
guard at the top of coalesceRestart fires for every subsequent call,
and that workspace is permanently locked out of restarts until the
platform process restarts.
The pre-fix code had similar exposure (panic killed the goroutine
before defer wsMu.Unlock() ran in some Go versions), but my pending-
flag version made it worse: the guard is sticky, not ephemeral.
Fix: defer the state-clear so it always runs on exit, including panic.
Recover (and DON'T re-raise) so the panic doesn't propagate to the
goroutine boundary and crash the whole platform process — RestartByID
is always called via `go h.RestartByID(...)` from HTTP handlers, and
an unrecovered goroutine panic in Go terminates the program. Crashing
the platform for every tenant because one workspace's cycle panicked
is the wrong availability tradeoff. The panic message + full stack
trace via runtime/debug.Stack() are still logged for debuggability.
Regression test in TestCoalesceRestart_PanicInCycleClearsState:
1. First call's cycle panics. coalesceRestart's defer must swallow
the panic — assert no panic propagates out (would crash the
platform process from a goroutine in production).
2. Second call must run a fresh cycle (proves running was cleared).
All 7 tests pass with -race -count=10.
Surfaced via /code-review-and-quality self-review of #2266; the
re-raise-after-recover anti-pattern (originally argued as "don't
mask bugs") came up in the comprehensive review and was corrected
to log-with-stack-and-suppress for availability.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The naive mutex-with-TryLock pattern in RestartByID was silently dropping
the second of two close-together restart requests. SetSecret and SetModel
both fire `go restartFunc(...)` from their HTTP handlers, and both DB
writes commit before either restart goroutine reaches loadWorkspaceSecrets.
If the second goroutine arrives while the first holds the per-workspace
mutex, TryLock returns false and the second is logged-and-dropped:
Auto-restart: skipping <id> — restart already in progress
The first goroutine's loadWorkspaceSecrets ran before the second write
committed, so the new container boots without that env var. Surfaced
during the RFC #2251 V1.0 measurement as hermes returning "No LLM
provider configured" when MODEL_PROVIDER landed after the API-key write
and lost its restart to the mutex (HERMES_DEFAULT_MODEL absent →
start.sh fell back to nousresearch/hermes-4-70b → derived
provider=openrouter → no OPENROUTER_API_KEY → request-time error).
The same race hits any back-to-back secret/model save flow including
the canvas's "set MiniMax key + pick model" UX.
Fix: pending-flag / coalescing pattern. Any restart request that arrives
while one is in flight sets `pending=true` and returns. The in-flight
runner, on completion, checks the flag and runs another cycle. This
collapses N concurrent requests into at most 2 sequential cycles (the
current one + one more that picks up everyone who arrived during it),
while guaranteeing the final container always sees the latest secrets.
Concrete contract:
- 1 request, no concurrency: 1 cycle
- N concurrent requests during 1 in-flight cycle: 2 cycles total
- N sequential requests (no overlap): N cycles
- Per-workspace state — different workspaces never serialize
Coalescing is extracted into `coalesceRestart(workspaceID, cycle func())`
so the gate logic is testable without the full WorkspaceHandler / DB /
provisioner stack. RestartByID now wraps that with the production cycle
function. runRestartCycle calls provisionWorkspace SYNCHRONOUSLY (drops
the historical `go`) so the loop's pending-flag check happens AFTER the
new container is up — without that, the next cycle's Stop call would
race the previous cycle's still-spawning provision goroutine.
sendRestartContext stays async; it's a one-way notification.
Tests in workspace_restart_coalesce_test.go cover all five contract
points + race-detector clean over 10 iterations:
- Single call → 1 cycle
- 5 concurrent during in-flight → exactly 2 cycles total
- 3 sequential → 3 cycles
- Pending-during-cycle picked up (targeted bug repro)
- State cleared after drain (running flag reset)
- Per-workspace isolation (no cross-workspace serialization)
Refs: molecule-core#2256 (V1.0 gate measurement); root cause for the
"No LLM provider configured" symptom seen during hermes/MiniMax repro.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner was speculatively calling `/workspaces/:id/heartbeat-history` —
that endpoint doesn't exist on workspace-server. On local dev it 404'd;
on tenant builds the platform's :8080 canvas-proxy fallback intercepted
it and returned 28KB of Next.js HTML which then landed in the JSON event
log. Neither outcome was useful trace data.
`GET /workspaces/:id/activity` is the existing endpoint that reads
activity_logs. That table already records the events the RFC §V1.0
step 6 'platform-side transition' check needs (a2a_send / a2a_receive /
task_update / agent_log / error, plus duration_ms + status). Rename
the runner's fetch + emitted event accordingly.
Verified: GET /workspaces/<uuid>/activity?since_secs=60 returns 200
with `[]` against the local platform; no SaaS skip needed since the
endpoint exists in both environments.
Refs: molecule-core#2256 (V1.0 gate #1 measurement comment).
Three review-driven fixes to the runner before #2261 merges:
1. `WAIT_ONLINE_SECS / 3` truncated; an operator passing 200 actually
waited 198s. Round up so 200 → 67 polls × 3s = 201s ≥ requested.
2. The heartbeat-history endpoint isn't on tenant workspace-servers —
the platform's :8080 fallback proxies unmatched paths to the
canvas Next.js, so the SaaS run captured 28KB of HTML in the
`heartbeat_trace` event log. Skip the fetch in MODE=saas; emit an
explicit `<skipped: ...>` placeholder. Local mode behaviour
unchanged.
3. ORG_ID and ORG_SLUG had no client-side format check, so a typo'd
value got swallowed by TenantGuard's intentionally-opaque 404
(which doesn't tell the operator whether slug, UUID, or auth was
wrong). Validate UUID and slug shape up front; matching errors
are actionable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:
1. workspace/platform_tools/README.md — explains the ToolSpec
single-source-of-truth pattern (#2240), the CLI-block alignment
gap that hand-maintained generation can't close (#2258), the
snapshot golden files + LF-pinning (#2260), and the add/rename/
remove playbook. The next reader who lands in
workspace/platform_tools/ now has the design rationale + the
safe-edit procedure colocated with the code.
2. scripts/README.md — disambiguates the three measure-coordinator-
task-bounds.sh files that now exist across two repos:
- scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo)
- scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
- scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane)
Cross-references reference_harness_pair_pattern (auto-memory) for
the cross-repo design rationale. Documents the common safety
pattern (cleanup trap, DRY_RUN, non-target guard,
cleanup_*_failed events) and the heartbeat-trace caveat.
Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:
1. workspace/platform_tools/README.md — explains the ToolSpec
single-source-of-truth pattern (#2240), the CLI-block alignment
gap that hand-maintained generation can't close (#2258), the
snapshot golden files + LF-pinning (#2260), and the add/rename/
remove playbook. The next reader who lands in
workspace/platform_tools/ now has the design rationale + the
safe-edit procedure colocated with the code.
2. scripts/README.md — disambiguates the three measure-coordinator-
task-bounds.sh files that now exist across two repos:
- scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo)
- scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
- scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane)
Cross-references reference_harness_pair_pattern (auto-memory) for
the cross-repo design rationale. Documents the common safety
pattern (cleanup trap, DRY_RUN, non-target guard,
cleanup_*_failed events) and the heartbeat-trace caveat.
Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original measure-coordinator-task-bounds.sh was hardcoded for
local-dev (workspace-server on :8080) with claude-code/langgraph
templates and OPENROUTER_API_KEY. Running it against staging requires
both auth-chain plumbing (per-tenant ADMIN_TOKEN + X-Molecule-Org-Id
TenantGuard header + tenant subdomain routing) and template/secret
flexibility (e.g. Hermes/MiniMax for Token Plan keys).
This adds:
* `measure-coordinator-task-bounds-runner.sh` — separate runner that
wraps the same workspace-server API calls but takes everything as
env-var inputs. Two MODE values:
- `local` → direct workspace-server (no auth/tenant scoping)
- `saas` → tenant subdomain + per-tenant ADMIN_TOKEN bearer +
X-Molecule-Org-Id TenantGuard header. Auto-fetches
tenant token via CP /cp/admin/orgs/<slug>/admin-token
given ORG_SLUG + CP_ADMIN_API_TOKEN, OR accepts a
pre-resolved TENANT_ADMIN_TOKEN.
* Configurable PM_TEMPLATE / CHILD_TEMPLATE / MODEL / SECRET_NAME /
SECRET_VALUE — defaults match the original (claude-code-default +
langgraph + OpenRouter). Hermes/MiniMax example documented in the
header.
* Per-poll status_change events during wait_online, so a workspace
that never reaches online surfaces its last status (provisioning,
failed, etc.) instead of a bare timeout.
* WAIT_ONLINE_SECS knob (default 180s; SaaS cold-start needs ~420s
for first hermes-image pull on a freshly-provisioned EC2 tenant).
* `${args[@]+...}` guard on the api() helper — avoids `set -u`
exploding on an empty header array (the local-dev hot-path).
The original script also gained a SECRET_VALUE block earlier in the
session — that change (separately staged) makes the secret-name
configurable without forcing every operator through the new runner.
V1.0 gate #1 (RFC #2251, Issue 4 repro) measurement results posted
as a separate comment on molecule-core#2256.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review follow-up on #2258 (registry snapshot tests, just merged).
The byte-exact snapshot comparisons in test_platform_tools.py would
fail mysteriously on a Windows contributor's machine with
core.autocrlf=true: checkout would convert LF → CRLF, the test would
fail locally with no useful diagnostic, and the regen instructions
in the test-file header would produce LF files that disagree with
the working copy.
Pin workspace/tests/snapshots/*.txt to text eol=lf so this can't
happen. All three current snapshots are already LF; the attribute
ensures it stays that way.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review follow-ups on #2257:
- Drop `local exit_code=$?` from cleanup(). `trap`-handler return values
are ignored, so capturing $? only misled a future reader into thinking
exit-code preservation was happening.
- Replace silenced `>/dev/null 2>&1` DELETE with `-w '%{http_code}'`
capture. ADMIN_TOKEN expiring mid-run was the realistic failure mode
here — previously we swallowed it under the silenced redirect, leaving
workspaces leaked with no signal. Now a 401/403/5xx surfaces as a
`cleanup_failed` JSON event with a remediation hint pointing at
cleanup-rogue-workspaces.sh; 404 is treated as success (the
post-condition — workspace absent — holds).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>