ev.HMAC[:12] panics when HMAC is shorter than 12 bytes.
Add len guards before truncation so the log line never panics —
the mismatch is still reported, just with whatever prefix is available.
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): restore test regressions from PR #1243
PR #1243 introduced two regressions in the canvas vitest suite:
1. ContextMenu.keyboard.test.tsx: the setPendingDelete call now
passes `{hasChildren, id, name}` (not just `{id, name}`). Updated
the keyboard-a11y test assertion to match the new store shape.
2. orgs-page.test.tsx: mockFetch.mockResolvedValueOnce() returned a
plain object that didn't match the two-argument (url, options)
call signature used by the component's fetch wrapper. Switched to
mockImplementationOnce returning a rejected Promise — matching
real fetch's rejection contract — and added runAllTimersAsync after
advanceTimersByTimeAsync(50) to flush React state updates.
54 test files · 813 tests · all passing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): replace bounding-box intersection with distance threshold for nest detection
ReactFlow's getIntersectingNodes uses bounding-box overlap detection, which
fires the drag-over state whenever any part of two nodes' position rectangles
overlap — even when the dragged node is far from the target. This made the
"Nest Workspace" dialog appear from large distances.
Fix: scan all nodes on each drag tick and set dragOverNodeId to the closest
node within NEST_PROXIMITY_THRESHOLD (150 px, center-to-center). This matches
the intuitive behavior: nest only when the node is actually dropped near another.
Constants:
- NEST_PROXIMITY_THRESHOLD = 150px (~60% of a collapsed node's width)
- DEFAULT_NODE_WIDTH = 245px (mid-range of min/max node widths)
- DEFAULT_NODE_HEIGHT = 110px
Also removed the unused getIntersectingNodes import (was causing duplicate
identifier error when both onNodeDrag and the zoom handler called useReactFlow
in the same component scope).
Closes#1052.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): cascade-delete UX — show child count and require checkbox before Delete All
Issue #1137: with ?confirm=true always sent, a single confirmation silently
cascades — a team lead with 20 children gets nuked on one click.
Changes:
- store/canvas.ts: pendingDelete type now includes children: {id, name}[]
- ContextMenu.tsx: passes child list to setPendingDelete on Delete click
- DeleteCascadeConfirmDialog.tsx: new component — shows child names, a
cascade warning, and requires the operator to tick a checkbox before
Delete All activates. Disabled by default; only enables after checkbox.
- Canvas.tsx: conditionally renders DeleteCascadeConfirmDialog for
hasChildren workspaces, or plain ConfirmDialog for leaf workspaces.
confirmDelete requires cascadeConfirmChecked=true when hasChildren.
- ContextMenu.keyboard.test.tsx: updated setPendingDelete assertion to
include children:[] (no children in the test fixture).
813 tests pass.
Closes#1137.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(container_files): remove duplicate ContainerWait loop in deleteViaEphemeral
Issue #1334: Staging HEAD c90ada3 (PR #1328) left two identical
ContainerWait loops in deleteViaEphemeral. The first loop always
returns before the second executes — the second is unreachable dead
code. Remove it.
No functional change (the remaining loop handles the wait correctly).
---------
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Issue #1317: validateRelPath was called in deleteViaEphemeral but
never defined — staging dc21821 would fail Go build if CI completed.
Changes:
- Add validateRelPath function (filepath.Clean + abs/traversal guard)
matching the pattern used on main (PR #1310).
- Upgrade deleteViaEphemeral to exec form ([]string{...}) so filePath
is passed as a plain argument, not interpolated into a shell string.
This eliminates shell injection (CWE-78) entirely.
- Add ContainerWait loop to guarantee rm completes before container
removal (avoids race on fast delete vs container-stop).
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(plugins): close F1086 err.Error() leaks in plugin install pipeline
F1086 / #1206: Three err.Error() calls in the plugin install pipeline
leaked internal file paths, resolver state, and query parameters in API
responses. Replaced with context-appropriate generic messages:
- ParseSource error → "invalid plugin source"
- Resolve error → "plugin resolution failed" (available_schemes kept for
self-service, raw error hidden)
- validatePluginName error → "invalid plugin name" (path traversal/injection
risk means no diagnostic should be returned)
🤖 Generated with [Claude Code](https://claude.ai)
* fix(provision): close F1086 err.Error() leaks in workspace_provision.go
F1086 / #1206: env mutator and provisioner start errors in
workspace_provision.go leaked internal error strings (credential URIs,
docker/volume paths, AMI/VPC details) via:
- Broadcast payloads to canvas Events tab
- last_sample_error field in the workspaces DB row
Fixed all 6 occurrences across both the docker and CPProvisioner code paths:
- env mutator failures → "environment configuration failed"
- provisioner/docker start failures → "workspace start failed"
The verbose %v-logged errors are preserved for operator diagnostics;
only the broadcast and DB fields receive generic messages.
🤖 Generated with [Claude Code](https://claude.ai)
---------
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
The provision-timeout sweeper was emitting a new WORKSPACE_PROVISION_TIMEOUT
event type, but the canvas event handler (canvas-events.ts:234) only
has a case for WORKSPACE_PROVISION_FAILED — the sweep's event fell
through silently. DB was being marked 'failed' but the UI stayed on
'starting' indefinitely until the user hard-refreshed.
Reusing the existing event name keeps the UI reaction uniform across
both fail paths (runtime-crash via bootstrap-watcher and boot-timeout
via sweeper). Operators who need to distinguish can read the `source`
payload field — "bootstrap_watcher" vs "provision_timeout_sweep".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
workspace_restart.go:127-133 accepted body.Template (attacker-controlled)
via raw filepath.Join(h.configsDir, template), allowing path traversal
(e.g. "../../../etc") to escape configsDir.
Fix: replace raw filepath.Join with resolveInsideRoot, same pattern as
workspace.go:102 (already fixed) and workspace.go:249 (already fixed).
Both the explicit template path and the findTemplateByName fallback are
safe — findTemplateByName returns a directory name from os.ReadDir which
is inherently bounded and cannot contain "/".
On resolve error the template is cleared so findTemplateByName fallback
still fires (preserves existing restart behaviour when template is invalid).
Closes: #1043
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
orgtoken.Validate now returns org_id (the org workspace UUID stored on
org_api_tokens rows, populated by #1212). Both call sites in
wsauth_middleware.go — WorkspaceAuth and AdminAuth — call
c.Set("org_id", orgID) after successful org-token validation.
This unbreaks orgCallerID(c) for org-token callers. Previously the
middleware populated org_token_id and org_token_prefix but never org_id,
so any handler reading c.Get("org_id") (e.g. requireCallerOwnsOrg) got
"" even for valid org tokens.
The change is additive: orgID may be empty for pre-migration tokens
minted before #1212. requireCallerOwnsOrg already handles empty org_id
by denying by default.
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #1229 sed command had no capture groups but used $1 in the
replacement, committing the literal string "defer func() { _ = \$1 }()"
instead of "defer func() { _ = resp.Body.Close() }()". Go does not
compile — $1 is not a valid identifier.
Fixed with: sed -i 's/defer func() { _ = \$1 }()/defer func() { _ = resp.Body.Close() }()/g'
Affected (all on origin/staging):
workspace-server/cmd/server/cp_config.go
workspace-server/internal/handlers/a2a_proxy.go
workspace-server/internal/handlers/github_token.go
workspace-server/internal/handlers/traces.go
workspace-server/internal/handlers/transcript.go
workspace-server/internal/middleware/session_auth.go
workspace-server/internal/provisioner/cp_provisioner.go (3 occurrences)
Closes: #1245
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
CPProvisioner env mutator error branch was left with unresolved conflict
markers after a prior rebase. Resolved to the HEAD-side generic message
"plugin env mutator chain failed" which is consistent with the same
message used in the Provisioner path (line 107/111).
No functional change.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The post-fire UPDATE after s.proxy.ProxyA2ARequest() was using fireCtx,
which derives from the outer ctx passed into fireSchedule(). If that ctx
is cancelled — HTTP timeout, graceful shutdown, or any upstream deadline —
ExecContext returns context.Canceled and the UPDATE is silently skipped,
leaving next_run_at stale and causing the schedule to re-fire on the
next tick.
Fix: create a dedicated updateCtx from context.Background() with a 5s
deadline, independent of the outer ctx hierarchy. Also improved the
error log to include schedule name for easier debugging.
Complements PR #1241 (fix/f1089-scheduler-ctx-fix-main) which fixes
the goroutine-panic path in tick() — this fix covers the wider case of
normal-return + ctx-cancelled after the proxy call.
F1089 | Severity: HIGH+security
Co-authored-by: Molecule AI Infra Lead <infra-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): call redactSecrets before seeding workspace memories (F1085)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(workspace_provision): add seedInitialMemories coverage for #1208
Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the
redactSecrets call (F1085 / #1132), both identified as untested in #1208.
- TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly
100k, 1 byte over, far over, and well under. Verifies INSERT receives
exactly maxMemoryContentLength bytes.
- TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs
before INSERT, regression test for F1085.
- TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently
skipped, no INSERT called.
- TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without
DB calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): Discord adapter launch visual assets (#1209)
Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging.
* fix(ci): golangci-lint errcheck failures on staging
Suppress errcheck warnings for calls where the return value is safely
ignored:
- resp.Body.Close() (artifacts/client.go): deferred cleanup — failure
to close a response body is non-critical; the defer itself is what
matters for connection reuse.
- rows.Close() (bundle/exporter.go): deferred cleanup in a loop where
rows.Err() already handles query errors.
- filepath.Walk (bundle/exporter.go): top-level walk call; errors in
sub-directory traversal are handled by the inner callback (which
returns nil for err != nil).
- broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget
event broadcast; errors are logged internally by the broadcaster.
- db.DB.ExecContext (bundle/importer.go): best-effort runtime column
update; non-critical auxiliary data that the provisioner re-extracts
if needed.
Fixes: #1143
* test(artifacts): suppress w.Write return values to satisfy errcheck
All httptest.ResponseWriter.Write calls in client_test.go now discard
the byte count and error return with _, _ = prefix. The Write method
is safe to discard in test handlers — httptest.ResponseWriter.Write
never returns an error for in-memory buffers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(CI): move changes job off self-hosted runner + add workflow concurrency
Cherry-pick from staging PR #1194 for main. Two changes to relieve
macOS arm64 runner saturation:
1. `changes` job: runs on ubuntu-latest instead of
[self-hosted, macos, arm64]. This job does a plain `git diff`
with zero macOS dependencies — moving it off the runner frees
a slot immediately on every workflow trigger.
2. Add workflow-level concurrency:
concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true
Prevents multiple stale in-flight CI runs from queuing on the
same ref when new commits arrive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done
* fix(tenant-guard): allowlist /registry/register + /registry/heartbeat
Final layer of today's stuck-provisioning saga. With the private-IP
platform_url fix and the intra-VPC :8080 SG rule in place, workspace
EC2s finally reached the tenant on the right port — only to have every
POST bounced with a synthetic 404 by TenantGuard.
TenantGuard is the SaaS hook that rejects cross-tenant routing. It
demands X-Molecule-Org-Id on every request, but CP's workspace user-
data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL,
RUNTIME, PORT), so the runtime can't attach the header. Net effect:
every workspace's first heartbeat to /registry/heartbeat was a silent
404, and the workspace sat in 'provisioning' until the platform
sweeper timed it out.
Allowlist the two workspace-boot paths:
- /registry/register — one-shot at runtime startup
- /registry/heartbeat — every 30s
Both are still gated by wsauth.HasAnyLiveToken (workspaces with a
token on file must present it; legacy tokenless workspaces are
grandfathered). And the tenant SG already scopes :8080 to the VPC
CIDR, so only intra-VPC callers can reach these paths in the first
place. The allowlist bypasses cross-org routing, not auth.
Follow-up: passing MOLECULE_ORG_ID into the workspace env would let
the runtime attach the header and drop this allowlist entry. Tracked
separately; not urgent since the multi-layer auth above is already
adequate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Issue #1196: golangci-lint errcheck flags bare resp.Body.Close()
calls because Body.Close() can return a non-nil error (e.g. when the
server sent fewer bytes than Content-Length). All occurrences fixed:
defer resp.Body.Close() → defer func() { _ = resp.Body.Close() }()
resp.Body.Close() → _ = resp.Body.Close()
12 files affected across all Go packages — channels, handlers,
middleware, provisioner, artifacts, and cmd. The body is already fully
consumed at each call site, so the error is always safe to discard.
🤖 Generated with [Claude Code](https://claude.ai)
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
PR #1210 added org_api_tokens.org_id but c.Set("org_id", ...) was never
called — so orgCallerID() always returns "" and all token callers are
denied org-scoped access even within their own org.
Fix: after orgtoken.Validate succeeds in AdminAuth, look up the token's
org_id column and set it in the gin context. Pre-fix tokens (org_id=NULL)
get no org_id in context, which is correct — requireCallerOwnsOrg already
denies access for nil org_id.
Test: TestAdminAuth_OrgToken_SetsOrgID covers both post-fix tokens
(org_id set) and pre-fix tokens (org_id=NULL, not set).
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with index. Existing pre-migration tokens get org_id=NULL → deny
by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
cross-org denial, session bypass, DB error denial.
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace err.Error() leaks with prod-safe messages (#1206)
- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
replaced 7 err.Error() calls with "provisioning failed" in both
Broadcast payloads and last_sample_error DB column. Full error
preserved in server-side log.Printf.
- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
calls with generic messages:
"invalid plugin source"
"plugin source not supported"
"invalid plugin name"
"staged plugin exceeds size limit"
"plugin manifest integrity check failed"
Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.
Added regression tests (workspace_provision_test.go):
- TestProvisionWorkspace_NoInternalErrorsInBroadcast
- TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
- TestResolveAndStage_NoInternalErrorsInHTTPErr
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(F1089): log panic-recovery UPDATE errors in scheduler
The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.
context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Multiple security findings addressed:
F1095 (BootstrapFailed): Replace err.Error() in ShouldBindJSON failure
response with generic "invalid request body" — raw gin binding errors
can expose validation detail, field names, and type mismatch info.
F1096 (BootstrapFailed): Handle RowsAffected() error instead of ignoring
it — the DB call can fail in ways the current code silently ignores.
#1206 (provision/plugin install): Replace raw err.Error() in API responses,
broadcasts, and last_sample_error DB fields across workspace_provision.go
(7 occurrences) and plugins_install_pipeline.go (6 occurrences). Replaced
with context-appropriate generic messages that don't leak internal DB
file paths, decrypt error details, or resolver internals to callers.
#1208 (test-gap): Add 3 new seedInitialMemories truncate tests:
- Exactly-at-limit (100k bytes → unchanged, boundary case)
- Empty content (skipped, no DB call)
- Oversized with embedded secrets (truncation fires before any other content inspection)
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with partial index for fast lookups. Existing pre-migration tokens
get org_id=NULL → deny by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added regression tests: happy path, unanchored denial, DB error denial.
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
* feat(canvas): rewrite MemoryInspectorPanel to match backend API
Issue #909 (chunk 3 of #576).
The existing MemoryInspectorPanel used the wrong API endpoint
(/memory instead of /memories) and wrong field names (key/value/version
instead of id/content/scope/namespace/created_at). It also lacked
LOCAL/TEAM/GLOBAL scope tabs and a namespace filter.
Changes:
- Fix endpoint: GET /workspaces/:id/memories with ?scope= query param
- Fix MemoryEntry type to match actual API: id, content, scope,
namespace, created_at, similarity_score
- Add LOCAL/TEAM/GLOBAL scope tabs
- Add namespace filter input
- Remove Edit functionality (no update endpoint in backend)
- Delete uses DELETE /workspaces/:id/memories/:id (by id, not key)
- Full rewrite of 27 tests to match new API and UI structure
- Uses ConfirmDialog (not native dialogs) for delete confirmation
- All dark zinc theme (no light colors)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: tighten types + improve provision-timeout message (#1135, #1136)
#1135 — TypeScript: make BudgetData.budget_used and WorkspaceMetrics
fields optional to match actual partial-response shapes from provisioning-
stuck workspaces. Runtime already guarded with ?? 0.
#1136 — provisiontimeout.go: replace misleading "check required env vars"
hint (preflight catches that case upfront) with accurate message about
container starting but failing to call /registry/register.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(test): align ssrf_test.go localhost test cases with isSafeURL behaviour
isSafeURL blocks 127.0.0.1 via ip.IsLoopback() even in dev environments.
The test cases `wantErr: false` for localhost were incorrect — the
test would fail when go test runs. Fix by changing wantErr to true
for both localhost test cases.
Rationale: loopback blocking at this layer is intentional. Access
control is enforced by WorkspaceAuth + CanCommunicate at the A2A
routing layer, not by the URL validation. Opening this would widen
the SSRF attack surface without adding real dev flexibility.
Closes: ssrf_test.go inconsistency reported 2026-04-21
Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
F1089: PR #1032's panic-recovery defers used the outer `ctx` passed into
fireSchedule/tick. If that ctx was cancelled during the panic window
(HTTP timeout, graceful shutdown), ExecContext returned early and the
next_run_at UPDATE was silently skipped — leaving the schedule stuck.
Fix: both panic defers now call ExecContext(context.Background()) so the
recovery UPDATE is independent of the outer ctx's lifecycle.
Refs: #1201 (F1089, security audit 2026-04-21)
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
- TestRecordSkipped_AdvancesNextRunAt: call recordSkipped directly instead
of going through fireSchedule, which now has a 2-min deferral loop (#969)
that makes sqlmock-based end-to-end testing impractical.
- TestFireSchedule_NormalSuccess_AdvancesNextRunAt: add missing expectation
for the consecutive_empty_runs reset query (#795) that fires on non-empty
successful responses.
- TestFireSchedule_ComputeNextRunError: same consecutive_empty_runs fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add production fix and three new test cases verifying that workspace
deletion cascade-disables all workspace_schedules for the deleted
workspace and its descendants, preventing zombie schedule firings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When fireSchedule panics before reaching the next_run_at UPDATE,
the deferred recover catches the panic but never advances next_run_at,
leaving it stuck in the past forever. The schedule then fires every
tick (30s) in an infinite retry loop.
Add next_run_at advancement to both panic recovery defers (the
per-goroutine one in tick() and the inner one in fireSchedule()) so
the schedule always moves forward regardless of how the fire exits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CP-QA approved. seedInitialMemories() now truncates mem.Content at 100,000 bytes before INSERT. Oversized content is logged with byte count before/after so operators can detect truncation. Fixes#1066 (CWE-400). NOTE: no unit tests in this commit — follow-up issue recommended.
CP-QA approved. Panic recovery in fireSchedule now advances next_run_at via ComputeNextRun + ExecContext, preventing a panicking cron from indefinitely starving all other schedules. 3 new tests: TestPanicRecovery_AdvancesNextRunAt, TestFireSchedule_NormalSuccess, TestRecordSkipped_AdvancesNextRunAt. Fixes#1029.
Security fixes for the memory backup/restore endpoints merged in PR #1051.
## F1084 / #1131: Memory export exposes all workspaces
GET /admin/memories/export now applies redactSecrets() to each content
field before including it in the JSON response. Pre-SAFE-T1201 memories
(stored before redactSecrets was mandatory on writes) no longer leak
credential patterns in the admin export.
## F1085 / #1132: Memory import does not call redactSecrets
POST /admin/memories/import now calls redactSecrets() on content before
BOTH the deduplication check and the INSERT. This ensures:
- Imported memories with embedded credentials cannot land unredacted in
agent_memories (SAFE-T1201 / #838 parity with the commit_memory path).
- Dedup is performed against the redacted value so two backups with
the same original secret both get [REDACTED:*] as their content and
are correctly treated as duplicates.
## New tests
admin_memories_test.go: 6 tests covering redactSecrets parity on
both Export and Import endpoints.
Closes#1131.
Closes#1132.
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
URLs returned from DB and Redis cache (db.GetCachedURL, workspaces.url column)
are now validated via validateAgentURL() before any HTTP request is made:
- mcpResolveURL (mcp.go): added validateAgentURL() calls on all three return
paths (internal cache, Redis cache, DB fallback).
- resolveAgentURL (a2a_proxy.go): added validateAgentURL() call before
returning agentURL to the A2A dispatcher.
validateAgentURL() was extended (registry.go) to resolve DNS hostnames and
check each returned IP against the blocklist (private ranges, loopback,
cloud-metadata 169.254.0.0/16). "localhost" is allowed by name for local dev.
GET /admin/memories/export now applies redactSecrets() to each content field
before including it in the JSON response. Pre-SAFE-T1201 memories (stored
before redactSecrets was mandatory on writes) no longer leak credentials.
POST /admin/memories/import now calls redactSecrets() on content before both
the deduplication check and the INSERT. Imported memories with embedded
credentials cannot bypass SAFE-T1201 (#838).
- admin_memories.go: GET /admin/memories/export + POST /admin/memories/import
handler (from PR #1051, with security fixes applied).
- admin_memories_test.go: 6 tests covering redactSecrets parity on both endpoints.
- registry_test.go: added DNS-lookup test cases for validateAgentURL (F1083).
"localhost" allowed by name (preserves existing test); nxdomain blocked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove duplicate-line ExecContext call that caused syntax error at mcp.go:784
- Update redactSecrets signature from 1-arg to 2-arg (workspaceID, content)
to match the canonical form established in PR #1017
- Update toolCommitMemory call site to use 2-arg form
- Add reserved workspaceID param note in docstring for future audit logging
Fixes PR #1036 compile-blocking issues (Platform Go job).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CP-QA approved. golangci-lint fixes in bundle/exporter.go + bundle/importer.go, redactSecrets in admin_memories.go, plus 489-line admin_memories_test.go.
Workspaces stuck in provisioning used to sit in "starting" for 10min
until the sweeper flipped them. The real signal — a runtime crash at
EC2 boot — lands on the serial console within seconds but nothing
listened. These endpoints close the loop.
1. POST /admin/workspaces/:id/bootstrap-failed
The control plane's bootstrap watcher posts here when it spots
"RUNTIME CRASHED" in ec2:GetConsoleOutput. Handler:
- UPDATEs workspaces SET status='failed' only when status was
'provisioning' (idempotent — a raced online/failed stays put)
- Stores the error + log_tail in last_sample_error so the canvas
can render the real stack trace, not a generic "timeout" string
- Broadcasts WORKSPACE_PROVISION_FAILED with source='bootstrap_watcher'
2. GET /workspaces/:id/console
Proxies to CP's new /cp/admin/workspaces/:id/console endpoint so
the tenant platform can surface EC2 serial console output without
holding AWS credentials. CPProvisioner.GetConsoleOutput is the
client; returns 501 in non-CP deployments (docker-compose dev).
Both gated by AdminAuth — CP holds the tenant ADMIN_TOKEN that the
middleware accepts on its tier 2b branch.
Tests cover: happy-path fail, already-transitioned no-op, empty id,
log_tail truncation, and the 501 fallback when no CP is wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes: #177 (CRITICAL — Dockerfile runs as root)
Dockerfiles changed:
- workspace-server/Dockerfile (platform-only): addgroup/adduser + USER platform
- workspace-server/Dockerfile.tenant (combined Go+Canvas): addgroup/adduser + USER canvas
+ chown canvas:canvas on canvas dir so non-root node process can read it
- canvas/Dockerfile (canvas standalone): addgroup/adduser + USER canvas
- workspace-server/entrypoint-tenant.sh: update header comment (no longer starts
as root; both processes now start non-root)
The entrypoint no longer needs a root→non-root handoff since both the Go
platform and Canvas node run as non-root by default. The 'canvas' user owns
/app and /platform, so volume mounts owned by the host's canvas user work
without needing a root init step.
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes audit #125 findings for CWE-639:
1. admin_test_token.go — CRITICAL IDOR (finding #112)
When ADMIN_TOKEN is set in production, require it explicitly on
GET /admin/workspaces/:id/test-token. The original gap: AdminAuth
accepted any valid org-scoped token, letting an Org A token holder
mint workspace bearer tokens for ANY workspace UUID they could enumerate.
Now requires ADMIN_TOKEN when it's configured; MOLECULE_ENV!=production
path still requires a valid bearer (any org token works for local dev).
2. org_plugin_allowlist.go — HIGH IDOR (finding #112)
GET and PUT /orgs/:id/plugins/allowlist: add requireOrgOwnership()
check after org existence verification. Org-token holders can only
read/write their own org's allowlist. Session and ADMIN_TOKEN callers
bypass the check (they have platform-wide access via the session
cookie path, not org tokens).
Closes: #112 (CWE-639 IDOR — tenant config access)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds isSafeURL() + isPrivateOrMetadataIP() in mcp.go and wires the
check into:
- MCP delegate_task (sync path) — line 530
- MCP delegate_task_async (fire-and-forget) — line 602
- a2a_proxy resolveAgentURL() — line 391
Blocklist covers: RFC-1918 private (10/8, 172.16/12, 192.168/16),
cloud metadata link-local (169.254/16), carrier-grade NAT (100.64/10),
documentation ranges (192.0.2/24, 198.51.100/24, 203.0.113/24),
loopback, unspecified, and link-local multicast.
For hostnames, DNS is resolved and every returned IP is validated —
blocks internal hostnames that resolve to private ranges.
Closes: #1130 (F1083 — SSRF in A2A proxy and MCP bridge)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three nits identified during post-merge review of #1119, #1133:
1. ContextMenu.tsx imported `removeNode` from the canvas store but
stopped using it when the delete-confirm flow moved to Canvas in
#1133. Also removed the now-unused mock entry in the keyboard
test so the test inventory matches the real call list.
2. Preflight's YAML parse failure was a silent pass — defensible since
the in-container preflight owns the schema, but invisible to ops if
a template ships malformed YAML. Log at WARN so the signal surfaces
without blocking the provision.
3. formatMissingEnvError rendered its slice via %q, producing
`["A" "B"]` which is Go-literal-looking and ugly in a user-facing
error. Join with ", " instead. Test updated to assert the new
format.
No behavioural changes beyond the log line; fixes are review nits, not
bug fixes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>