* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled
With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.
Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.
Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.
* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)
Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.
Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.
Closes#1043.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix
Two regressions introduced by PR #1243 (fix issue #1207):
1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
`{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
expected only `{id, name}`. Added `hasChildren: false` to the assertion.
2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
without `act()`. With fake timers, `setState` (synchronous) is flushed by
`advanceTimersByTimeAsync`, but the React state update it triggers is a
microtask — so the test saw stale render. Wrapping in `act(async () =>
{ await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
before assertions run.
All 813 vitest tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add 100px proximity threshold to drag-to-nest detection
Fixes#1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.
The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.
Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go
Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.
- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
agentURL before outbound calls in mcpCallTool (line ~529) and
toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
(blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
isPrivateOrMetadataIP across all private/CGNAT/metadata ranges
1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go
Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.
* fix: remove orphaned commit-text lines from a2a_proxy.go
Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).
Deletion restores:
}
return agentURL, nil
}
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.
## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
[]string{"rm", "-rf", "/configs", filePath} (exec form). This
eliminates shell injection entirely — filePath is a plain argument,
never interpreted as shell code.
## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
injected args
Closes#1273.
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases
Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The provision-timeout sweeper was emitting a new WORKSPACE_PROVISION_TIMEOUT
event type, but the canvas event handler (canvas-events.ts:234) only
has a case for WORKSPACE_PROVISION_FAILED — the sweep's event fell
through silently. DB was being marked 'failed' but the UI stayed on
'starting' indefinitely until the user hard-refreshed.
Reusing the existing event name keeps the UI reaction uniform across
both fail paths (runtime-crash via bootstrap-watcher and boot-timeout
via sweeper). Operators who need to distinguish can read the `source`
payload field — "bootstrap_watcher" vs "provision_timeout_sweep".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
workspace_restart.go:127-133 accepted body.Template (attacker-controlled)
via raw filepath.Join(h.configsDir, template), allowing path traversal
(e.g. "../../../etc") to escape configsDir.
Fix: replace raw filepath.Join with resolveInsideRoot, same pattern as
workspace.go:102 (already fixed) and workspace.go:249 (already fixed).
Both the explicit template path and the findTemplateByName fallback are
safe — findTemplateByName returns a directory name from os.ReadDir which
is inherently bounded and cannot contain "/".
On resolve error the template is cleared so findTemplateByName fallback
still fires (preserves existing restart behaviour when template is invalid).
Closes: #1043
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
orgtoken.Validate now returns org_id (the org workspace UUID stored on
org_api_tokens rows, populated by #1212). Both call sites in
wsauth_middleware.go — WorkspaceAuth and AdminAuth — call
c.Set("org_id", orgID) after successful org-token validation.
This unbreaks orgCallerID(c) for org-token callers. Previously the
middleware populated org_token_id and org_token_prefix but never org_id,
so any handler reading c.Get("org_id") (e.g. requireCallerOwnsOrg) got
"" even for valid org tokens.
The change is additive: orgID may be empty for pre-migration tokens
minted before #1212. requireCallerOwnsOrg already handles empty org_id
by denying by default.
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: commits e6d48e6 and e085621 stored ci.yml with JSON-escaped
content (literal \n sequences, leading double-quote) instead of proper
YAML with actual newlines. All CI runs failed with "workflow file issue"
before any job could start.
Fix: restore from pre-corruption base (2517164), apply intended changes:
- concurrency.cancel-in-progress: true → false (queue rather than cancel)
- changes job: runs-on ubuntu-latest (frees mac mini for real work)
PR #1242 intent preserved, corruption from API commit removed.
orgtoken.Validate now returns org_id (the org workspace UUID stored on
org_api_tokens rows, populated by #1212). Both call sites in
wsauth_middleware.go — WorkspaceAuth and AdminAuth — call
c.Set("org_id", orgID) after successful org-token validation.
This unbreaks orgCallerID(c) for org-token callers. Previously the
middleware populated org_token_id and org_token_prefix but never org_id,
so any handler reading c.Get("org_id") (e.g. requireCallerOwnsOrg) got
"" even for valid org tokens.
The change is additive: orgID may be empty for pre-migration tokens
minted before #1212. requireCallerOwnsOrg already handles empty org_id
by denying by default.
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Store: pendingDelete now carries `hasChildren: boolean` (computed from
nodes.some(parentId === nodeId))
- ContextMenu: passes hasChildren into setPendingDelete
- Canvas: dialog title changes to "Delete Workspace and Children" with
⚠️ message when hasChildren; confirms with "Delete All"
Refs: #1137
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
- DetailsTab: use `(data.lastErrorRate ?? 0)` instead of bare multiply to
prevent NaN% when the field is absent on pre-provisioning workspaces.
- WorkspaceUsage: make formatPeriod accept optional start/end strings;
return "—" for undefined so the usage period shows blank rather than
"Invalid Date" for provisioning/partial workspaces.
Refs: #1139
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
PR #1229 sed command had no capture groups but used $1 in the
replacement, committing the literal string "defer func() { _ = \$1 }()"
instead of "defer func() { _ = resp.Body.Close() }()". Go does not
compile — $1 is not a valid identifier.
Fixed with: sed -i 's/defer func() { _ = \$1 }()/defer func() { _ = resp.Body.Close() }()/g'
Affected (all on origin/staging):
workspace-server/cmd/server/cp_config.go
workspace-server/internal/handlers/a2a_proxy.go
workspace-server/internal/handlers/github_token.go
workspace-server/internal/handlers/traces.go
workspace-server/internal/handlers/transcript.go
workspace-server/internal/middleware/session_auth.go
workspace-server/internal/provisioner/cp_provisioner.go (3 occurrences)
Closes: #1245
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Line 9 of ci.yml accidentally contained a bare string with the commit
SHA instead of the intended concurrency: block, causing all CI runs
to fail with a YAML parse error.
Also restores the changes from the PR #1242 intent (workflow-level
concurrency with cancel-in-progress: false).
Fixes: CI failure on staging after PR #1242 merge.
CPProvisioner env mutator error branch was left with unresolved conflict
markers after a prior rebase. Resolved to the HEAD-side generic message
"plugin env mutator chain failed" which is consistent with the same
message used in the Provisioner path (line 107/111).
No functional change.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cancel-in-progress: false queues new runs so the single mac mini
runner doesn't fight itself when pushes stack during rebases or
cross-PR contention. Existing e2e-api.yml already has this pattern.
Fixes: 19 queued runs on single self-hosted runner (02:55 UTC snapshot)
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
Root cause: tests used try/finally { vi.useRealTimers() / vi.useFakeTimers() }
back-and-forth. When any test's finally-block called vi.useFakeTimers(),
subsequent tests inherited fake timer state causing 50ms real setTimeouts
to not fire and mockFetch to accumulate calls across test boundaries.
Fix: consolidate timer management to beforeEach/afterEach hooks.
- beforeEach: vi.useFakeTimers() — all tests start from known fake state
- afterEach: cleanup() + vi.useRealTimers() — restore real timers for next test
- Individual tests: use vi.advanceTimersByTimeAsync(50) instead of real setTimeout
Also removed duplicate afterEach(cleanup()) and unused waitFor import.
Closes#1207.
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The post-fire UPDATE after s.proxy.ProxyA2ARequest() was using fireCtx,
which derives from the outer ctx passed into fireSchedule(). If that ctx
is cancelled — HTTP timeout, graceful shutdown, or any upstream deadline —
ExecContext returns context.Canceled and the UPDATE is silently skipped,
leaving next_run_at stale and causing the schedule to re-fire on the
next tick.
Fix: create a dedicated updateCtx from context.Background() with a 5s
deadline, independent of the outer ctx hierarchy. Also improved the
error log to include schedule name for easier debugging.
Complements PR #1241 (fix/f1089-scheduler-ctx-fix-main) which fixes
the goroutine-panic path in tick() — this fix covers the wider case of
normal-return + ctx-cancelled after the proxy call.
F1089 | Severity: HIGH+security
Co-authored-by: Molecule AI Infra Lead <infra-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(canvas): rewrite MemoryInspectorPanel to match backend API
Issue #909 (chunk 3 of #576).
The existing MemoryInspectorPanel used the wrong API endpoint
(/memory instead of /memories) and wrong field names (key/value/version
instead of id/content/scope/namespace/created_at). It also lacked
LOCAL/TEAM/GLOBAL scope tabs and a namespace filter.
Changes:
- Fix endpoint: GET /workspaces/:id/memories with ?scope= query param
- Fix MemoryEntry type to match actual API: id, content, scope,
namespace, created_at, similarity_score
- Add LOCAL/TEAM/GLOBAL scope tabs
- Add namespace filter input
- Remove Edit functionality (no update endpoint in backend)
- Delete uses DELETE /workspaces/:id/memories/:id (by id, not key)
- Full rewrite of 27 tests to match new API and UI structure
- Uses ConfirmDialog (not native dialogs) for delete confirmation
- All dark zinc theme (no light colors)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: tighten types + improve provision-timeout message (#1135, #1136)
#1135 — TypeScript: make BudgetData.budget_used and WorkspaceMetrics
fields optional to match actual partial-response shapes from provisioning-
stuck workspaces. Runtime already guarded with ?? 0.
#1136 — provisiontimeout.go: replace misleading "check required env vars"
hint (preflight catches that case upfront) with accurate message about
container starting but failing to call /registry/register.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(test): align ssrf_test.go localhost test cases with isSafeURL behaviour
isSafeURL blocks 127.0.0.1 via ip.IsLoopback() even in dev environments.
The test cases `wantErr: false` for localhost were incorrect — the
test would fail when go test runs. Fix by changing wantErr to true
for both localhost test cases.
Rationale: loopback blocking at this layer is intentional. Access
control is enforced by WorkspaceAuth + CanCommunicate at the A2A
routing layer, not by the URL validation. Opening this would widen
the SSRF attack surface without adding real dev flexibility.
Closes: ssrf_test.go inconsistency reported 2026-04-21
Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(canvas): rewrite MemoryInspectorPanel to match backend API
Issue #909 (chunk 3 of #576).
The existing MemoryInspectorPanel used the wrong API endpoint
(/memory instead of /memories) and wrong field names (key/value/version
instead of id/content/scope/namespace/created_at). It also lacked
LOCAL/TEAM/GLOBAL scope tabs and a namespace filter.
Changes:
- Fix endpoint: GET /workspaces/:id/memories with ?scope= query param
- Fix MemoryEntry type to match actual API: id, content, scope,
namespace, created_at, similarity_score
- Add LOCAL/TEAM/GLOBAL scope tabs
- Add namespace filter input
- Remove Edit functionality (no update endpoint in backend)
- Delete uses DELETE /workspaces/:id/memories/:id (by id, not key)
- Full rewrite of 27 tests to match new API and UI structure
- Uses ConfirmDialog (not native dialogs) for delete confirmation
- All dark zinc theme (no light colors)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: tighten types + improve provision-timeout message (#1135, #1136)
#1135 — TypeScript: make BudgetData.budget_used and WorkspaceMetrics
fields optional to match actual partial-response shapes from provisioning-
stuck workspaces. Runtime already guarded with ?? 0.
#1136 — provisiontimeout.go: replace misleading "check required env vars"
hint (preflight catches that case upfront) with accurate message about
container starting but failing to call /registry/register.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- mcp-server-setup.md: move 'Claude Code, Cursor, etc.' to a follow-on
sentence instead of parenthetical, fixing awkward 'agent (Claude Code,...)' pattern
- fly-machines-provisioner.md: replace 'as a Fly Machine' with 'on Fly Machines',
add clarification that the platform manages the workspace and Fly manages the machine
Co-authored-by: Molecule AI Documentation Specialist <documentation-specialist@agents.moleculesai.app>
* fix(security): call redactSecrets before seeding workspace memories (F1085)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(workspace_provision): add seedInitialMemories coverage for #1208
Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the
redactSecrets call (F1085 / #1132), both identified as untested in #1208.
- TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly
100k, 1 byte over, far over, and well under. Verifies INSERT receives
exactly maxMemoryContentLength bytes.
- TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs
before INSERT, regression test for F1085.
- TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently
skipped, no INSERT called.
- TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without
DB calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): Discord adapter launch visual assets (#1209)
Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging.
* fix(ci): golangci-lint errcheck failures on staging
Suppress errcheck warnings for calls where the return value is safely
ignored:
- resp.Body.Close() (artifacts/client.go): deferred cleanup — failure
to close a response body is non-critical; the defer itself is what
matters for connection reuse.
- rows.Close() (bundle/exporter.go): deferred cleanup in a loop where
rows.Err() already handles query errors.
- filepath.Walk (bundle/exporter.go): top-level walk call; errors in
sub-directory traversal are handled by the inner callback (which
returns nil for err != nil).
- broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget
event broadcast; errors are logged internally by the broadcaster.
- db.DB.ExecContext (bundle/importer.go): best-effort runtime column
update; non-critical auxiliary data that the provisioner re-extracts
if needed.
Fixes: #1143
* test(artifacts): suppress w.Write return values to satisfy errcheck
All httptest.ResponseWriter.Write calls in client_test.go now discard
the byte count and error return with _, _ = prefix. The Write method
is safe to discard in test handlers — httptest.ResponseWriter.Write
never returns an error for in-memory buffers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(CI): move changes job off self-hosted runner + add workflow concurrency
Cherry-pick from staging PR #1194 for main. Two changes to relieve
macOS arm64 runner saturation:
1. `changes` job: runs on ubuntu-latest instead of
[self-hosted, macos, arm64]. This job does a plain `git diff`
with zero macOS dependencies — moving it off the runner frees
a slot immediately on every workflow trigger.
2. Add workflow-level concurrency:
concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true
Prevents multiple stale in-flight CI runs from queuing on the
same ref when new commits arrive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done
* fix(tenant-guard): allowlist /registry/register + /registry/heartbeat
Final layer of today's stuck-provisioning saga. With the private-IP
platform_url fix and the intra-VPC :8080 SG rule in place, workspace
EC2s finally reached the tenant on the right port — only to have every
POST bounced with a synthetic 404 by TenantGuard.
TenantGuard is the SaaS hook that rejects cross-tenant routing. It
demands X-Molecule-Org-Id on every request, but CP's workspace user-
data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL,
RUNTIME, PORT), so the runtime can't attach the header. Net effect:
every workspace's first heartbeat to /registry/heartbeat was a silent
404, and the workspace sat in 'provisioning' until the platform
sweeper timed it out.
Allowlist the two workspace-boot paths:
- /registry/register — one-shot at runtime startup
- /registry/heartbeat — every 30s
Both are still gated by wsauth.HasAnyLiveToken (workspaces with a
token on file must present it; legacy tokenless workspaces are
grandfathered). And the tenant SG already scopes :8080 to the VPC
CIDR, so only intra-VPC callers can reach these paths in the first
place. The allowlist bypasses cross-org routing, not auth.
Follow-up: passing MOLECULE_ORG_ID into the workspace env would let
the runtime attach the header and drop this allowlist entry. Tracked
separately; not urgent since the multi-layer auth above is already
adequate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with index. Existing pre-migration tokens get org_id=NULL → deny
by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
cross-org denial, session bypass, DB error denial.
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace err.Error() leaks with prod-safe messages (#1206)
- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
replaced 7 err.Error() calls with "provisioning failed" in both
Broadcast payloads and last_sample_error DB column. Full error
preserved in server-side log.Printf.
- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
calls with generic messages:
"invalid plugin source"
"plugin source not supported"
"invalid plugin name"
"staged plugin exceeds size limit"
"plugin manifest integrity check failed"
Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.
Added regression tests (workspace_provision_test.go):
- TestProvisionWorkspace_NoInternalErrorsInBroadcast
- TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
- TestResolveAndStage_NoInternalErrorsInHTTPErr
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(F1089): log panic-recovery UPDATE errors in scheduler
The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.
context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(issue-1207): eliminate orgs-page test flakiness
Three root causes addressed:
1. Duplicate afterEach blocks (lines 97-103) — two identical
afterEach(() => { cleanup(); }) blocks collapsed to one.
2. Fake-timer isolation gap — if a polling test failed before its
finally-block ran, vi.useFakeTimers() persisted globally. The next
non-polling test's setTimeout(50) then hung indefinitely (fake
timers don't advance without vi.advanceTimersByTime), causing
waitFor/async timeouts. Fixed by calling vi.useRealTimers()
unconditionally in beforeEach (guaranteed clean slate) and
afterEach (even when a test fails before its own finally).
3. mockFetch.callHistory now cleared via mockReset() in beforeEach,
preventing "expected 2 calls but got N" failures from carry-over
between polling tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause (F1094 / #1200): requireCallerOwnsOrg in org_plugin_allowlist.go
was reading org_api_tokens.created_by — a provenance label string ("session",
"admin-token", "org-token:<prefix>"), NEVER a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller got 403
on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with partial index for fast lookups. Pre-migration tokens get org_id=NULL
→ deny by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL → callerOrg="" → deny.
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Tokens minted via another org token get org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Regression tests: happy path, unanchored denial, DB error denial.
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Issue #1196: golangci-lint errcheck flags bare resp.Body.Close()
calls because Body.Close() can return a non-nil error (e.g. when the
server sent fewer bytes than Content-Length). All occurrences fixed:
defer resp.Body.Close() → defer func() { _ = resp.Body.Close() }()
resp.Body.Close() → _ = resp.Body.Close()
12 files affected across all Go packages — channels, handlers,
middleware, provisioner, artifacts, and cmd. The body is already fully
consumed at each call site, so the error is always safe to discard.
🤖 Generated with [Claude Code](https://claude.ai)
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with index. Existing pre-migration tokens get org_id=NULL → deny
by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
cross-org denial, session bypass, DB error denial.
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace err.Error() leaks with prod-safe messages (#1206)
- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
replaced 7 err.Error() calls with "provisioning failed" in both
Broadcast payloads and last_sample_error DB column. Full error
preserved in server-side log.Printf.
- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
calls with generic messages:
"invalid plugin source"
"plugin source not supported"
"invalid plugin name"
"staged plugin exceeds size limit"
"plugin manifest integrity check failed"
Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.
Added regression tests (workspace_provision_test.go):
- TestProvisionWorkspace_NoInternalErrorsInBroadcast
- TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
- TestResolveAndStage_NoInternalErrorsInHTTPErr
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Use explicit navigator.clipboard check instead of optional chaining so
the no-op case is handled explicitly. When clipboard API is unavailable
(non-HTTPS context) show a toast: "Copy requires HTTPS — please select
and copy manually". Production is always HTTPS so this only affects
local dev with http:// canvas.
Closes#1199.
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #1210 added org_api_tokens.org_id but c.Set("org_id", ...) was never
called — so orgCallerID() always returns "" and all token callers are
denied org-scoped access even within their own org.
Fix: after orgtoken.Validate succeeds in AdminAuth, look up the token's
org_id column and set it in the gin context. Pre-fix tokens (org_id=NULL)
get no org_id in context, which is correct — requireCallerOwnsOrg already
denies access for nil org_id.
Test: TestAdminAuth_OrgToken_SetsOrgID covers both post-fix tokens
(org_id set) and pre-fix tokens (org_id=NULL, not set).
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with index. Existing pre-migration tokens get org_id=NULL → deny
by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
cross-org denial, session bypass, DB error denial.
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace err.Error() leaks with prod-safe messages (#1206)
- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
replaced 7 err.Error() calls with "provisioning failed" in both
Broadcast payloads and last_sample_error DB column. Full error
preserved in server-side log.Printf.
- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
calls with generic messages:
"invalid plugin source"
"plugin source not supported"
"invalid plugin name"
"staged plugin exceeds size limit"
"plugin manifest integrity check failed"
Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.
Added regression tests (workspace_provision_test.go):
- TestProvisionWorkspace_NoInternalErrorsInBroadcast
- TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
- TestResolveAndStage_NoInternalErrorsInHTTPErr
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(F1089): log panic-recovery UPDATE errors in scheduler
The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.
context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>