* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with index. Existing pre-migration tokens get org_id=NULL → deny
by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
cross-org denial, session bypass, DB error denial.
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace err.Error() leaks with prod-safe messages (#1206)
- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
replaced 7 err.Error() calls with "provisioning failed" in both
Broadcast payloads and last_sample_error DB column. Full error
preserved in server-side log.Printf.
- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
calls with generic messages:
"invalid plugin source"
"plugin source not supported"
"invalid plugin name"
"staged plugin exceeds size limit"
"plugin manifest integrity check failed"
Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.
Added regression tests (workspace_provision_test.go):
- TestProvisionWorkspace_NoInternalErrorsInBroadcast
- TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
- TestResolveAndStage_NoInternalErrorsInHTTPErr
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(F1089): log panic-recovery UPDATE errors in scheduler
The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.
context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Multiple security findings addressed:
F1095 (BootstrapFailed): Replace err.Error() in ShouldBindJSON failure
response with generic "invalid request body" — raw gin binding errors
can expose validation detail, field names, and type mismatch info.
F1096 (BootstrapFailed): Handle RowsAffected() error instead of ignoring
it — the DB call can fail in ways the current code silently ignores.
#1206 (provision/plugin install): Replace raw err.Error() in API responses,
broadcasts, and last_sample_error DB fields across workspace_provision.go
(7 occurrences) and plugins_install_pipeline.go (6 occurrences). Replaced
with context-appropriate generic messages that don't leak internal DB
file paths, decrypt error details, or resolver internals to callers.
#1208 (test-gap): Add 3 new seedInitialMemories truncate tests:
- Exactly-at-limit (100k bytes → unchanged, boundary case)
- Empty content (skipped, no DB call)
- Oversized with embedded secrets (truncation fires before any other content inspection)
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.
Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
with partial index for fast lookups. Existing pre-migration tokens
get org_id=NULL → deny by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
→ denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
Token minted via another org token gets its org_id set at mint time.
Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added regression tests: happy path, unanchored denial, DB error denial.
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
* feat(canvas): rewrite MemoryInspectorPanel to match backend API
Issue #909 (chunk 3 of #576).
The existing MemoryInspectorPanel used the wrong API endpoint
(/memory instead of /memories) and wrong field names (key/value/version
instead of id/content/scope/namespace/created_at). It also lacked
LOCAL/TEAM/GLOBAL scope tabs and a namespace filter.
Changes:
- Fix endpoint: GET /workspaces/:id/memories with ?scope= query param
- Fix MemoryEntry type to match actual API: id, content, scope,
namespace, created_at, similarity_score
- Add LOCAL/TEAM/GLOBAL scope tabs
- Add namespace filter input
- Remove Edit functionality (no update endpoint in backend)
- Delete uses DELETE /workspaces/:id/memories/:id (by id, not key)
- Full rewrite of 27 tests to match new API and UI structure
- Uses ConfirmDialog (not native dialogs) for delete confirmation
- All dark zinc theme (no light colors)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: tighten types + improve provision-timeout message (#1135, #1136)
#1135 — TypeScript: make BudgetData.budget_used and WorkspaceMetrics
fields optional to match actual partial-response shapes from provisioning-
stuck workspaces. Runtime already guarded with ?? 0.
#1136 — provisiontimeout.go: replace misleading "check required env vars"
hint (preflight catches that case upfront) with accurate message about
container starting but failing to call /registry/register.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(test): align ssrf_test.go localhost test cases with isSafeURL behaviour
isSafeURL blocks 127.0.0.1 via ip.IsLoopback() even in dev environments.
The test cases `wantErr: false` for localhost were incorrect — the
test would fail when go test runs. Fix by changing wantErr to true
for both localhost test cases.
Rationale: loopback blocking at this layer is intentional. Access
control is enforced by WorkspaceAuth + CanCommunicate at the A2A
routing layer, not by the URL validation. Opening this would widen
the SSRF attack surface without adding real dev flexibility.
Closes: ssrf_test.go inconsistency reported 2026-04-21
Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add production fix and three new test cases verifying that workspace
deletion cascade-disables all workspace_schedules for the deleted
workspace and its descendants, preventing zombie schedule firings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CP-QA approved. seedInitialMemories() now truncates mem.Content at 100,000 bytes before INSERT. Oversized content is logged with byte count before/after so operators can detect truncation. Fixes#1066 (CWE-400). NOTE: no unit tests in this commit — follow-up issue recommended.
Security fixes for the memory backup/restore endpoints merged in PR #1051.
## F1084 / #1131: Memory export exposes all workspaces
GET /admin/memories/export now applies redactSecrets() to each content
field before including it in the JSON response. Pre-SAFE-T1201 memories
(stored before redactSecrets was mandatory on writes) no longer leak
credential patterns in the admin export.
## F1085 / #1132: Memory import does not call redactSecrets
POST /admin/memories/import now calls redactSecrets() on content before
BOTH the deduplication check and the INSERT. This ensures:
- Imported memories with embedded credentials cannot land unredacted in
agent_memories (SAFE-T1201 / #838 parity with the commit_memory path).
- Dedup is performed against the redacted value so two backups with
the same original secret both get [REDACTED:*] as their content and
are correctly treated as duplicates.
## New tests
admin_memories_test.go: 6 tests covering redactSecrets parity on
both Export and Import endpoints.
Closes#1131.
Closes#1132.
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
URLs returned from DB and Redis cache (db.GetCachedURL, workspaces.url column)
are now validated via validateAgentURL() before any HTTP request is made:
- mcpResolveURL (mcp.go): added validateAgentURL() calls on all three return
paths (internal cache, Redis cache, DB fallback).
- resolveAgentURL (a2a_proxy.go): added validateAgentURL() call before
returning agentURL to the A2A dispatcher.
validateAgentURL() was extended (registry.go) to resolve DNS hostnames and
check each returned IP against the blocklist (private ranges, loopback,
cloud-metadata 169.254.0.0/16). "localhost" is allowed by name for local dev.
GET /admin/memories/export now applies redactSecrets() to each content field
before including it in the JSON response. Pre-SAFE-T1201 memories (stored
before redactSecrets was mandatory on writes) no longer leak credentials.
POST /admin/memories/import now calls redactSecrets() on content before both
the deduplication check and the INSERT. Imported memories with embedded
credentials cannot bypass SAFE-T1201 (#838).
- admin_memories.go: GET /admin/memories/export + POST /admin/memories/import
handler (from PR #1051, with security fixes applied).
- admin_memories_test.go: 6 tests covering redactSecrets parity on both endpoints.
- registry_test.go: added DNS-lookup test cases for validateAgentURL (F1083).
"localhost" allowed by name (preserves existing test); nxdomain blocked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove duplicate-line ExecContext call that caused syntax error at mcp.go:784
- Update redactSecrets signature from 1-arg to 2-arg (workspaceID, content)
to match the canonical form established in PR #1017
- Update toolCommitMemory call site to use 2-arg form
- Add reserved workspaceID param note in docstring for future audit logging
Fixes PR #1036 compile-blocking issues (Platform Go job).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CP-QA approved. golangci-lint fixes in bundle/exporter.go + bundle/importer.go, redactSecrets in admin_memories.go, plus 489-line admin_memories_test.go.
Workspaces stuck in provisioning used to sit in "starting" for 10min
until the sweeper flipped them. The real signal — a runtime crash at
EC2 boot — lands on the serial console within seconds but nothing
listened. These endpoints close the loop.
1. POST /admin/workspaces/:id/bootstrap-failed
The control plane's bootstrap watcher posts here when it spots
"RUNTIME CRASHED" in ec2:GetConsoleOutput. Handler:
- UPDATEs workspaces SET status='failed' only when status was
'provisioning' (idempotent — a raced online/failed stays put)
- Stores the error + log_tail in last_sample_error so the canvas
can render the real stack trace, not a generic "timeout" string
- Broadcasts WORKSPACE_PROVISION_FAILED with source='bootstrap_watcher'
2. GET /workspaces/:id/console
Proxies to CP's new /cp/admin/workspaces/:id/console endpoint so
the tenant platform can surface EC2 serial console output without
holding AWS credentials. CPProvisioner.GetConsoleOutput is the
client; returns 501 in non-CP deployments (docker-compose dev).
Both gated by AdminAuth — CP holds the tenant ADMIN_TOKEN that the
middleware accepts on its tier 2b branch.
Tests cover: happy-path fail, already-transitioned no-op, empty id,
log_tail truncation, and the 501 fallback when no CP is wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes audit #125 findings for CWE-639:
1. admin_test_token.go — CRITICAL IDOR (finding #112)
When ADMIN_TOKEN is set in production, require it explicitly on
GET /admin/workspaces/:id/test-token. The original gap: AdminAuth
accepted any valid org-scoped token, letting an Org A token holder
mint workspace bearer tokens for ANY workspace UUID they could enumerate.
Now requires ADMIN_TOKEN when it's configured; MOLECULE_ENV!=production
path still requires a valid bearer (any org token works for local dev).
2. org_plugin_allowlist.go — HIGH IDOR (finding #112)
GET and PUT /orgs/:id/plugins/allowlist: add requireOrgOwnership()
check after org existence verification. Org-token holders can only
read/write their own org's allowlist. Session and ADMIN_TOKEN callers
bypass the check (they have platform-wide access via the session
cookie path, not org tokens).
Closes: #112 (CWE-639 IDOR — tenant config access)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds isSafeURL() + isPrivateOrMetadataIP() in mcp.go and wires the
check into:
- MCP delegate_task (sync path) — line 530
- MCP delegate_task_async (fire-and-forget) — line 602
- a2a_proxy resolveAgentURL() — line 391
Blocklist covers: RFC-1918 private (10/8, 172.16/12, 192.168/16),
cloud metadata link-local (169.254/16), carrier-grade NAT (100.64/10),
documentation ranges (192.0.2/24, 198.51.100/24, 203.0.113/24),
loopback, unspecified, and link-local multicast.
For hostnames, DNS is resolved and every returned IP is validated —
blocks internal hostnames that resolve to private ranges.
Closes: #1130 (F1083 — SSRF in A2A proxy and MCP bridge)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three nits identified during post-merge review of #1119, #1133:
1. ContextMenu.tsx imported `removeNode` from the canvas store but
stopped using it when the delete-confirm flow moved to Canvas in
#1133. Also removed the now-unused mock entry in the keyboard
test so the test inventory matches the real call list.
2. Preflight's YAML parse failure was a silent pass — defensible since
the in-container preflight owns the schema, but invisible to ops if
a template ships malformed YAML. Log at WARN so the signal surfaces
without blocking the provision.
3. formatMissingEnvError rendered its slice via %q, producing
`["A" "B"]` which is Go-literal-looking and ugly in a user-facing
error. Join with ", " instead. Test updated to assert the new
format.
No behavioural changes beyond the log line; fixes are review nits, not
bug fixes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspaces stuck in status='provisioning' previously surfaced in three
bad ways:
1. **Details tab crashed** with `Cannot read properties of undefined
(reading 'toLocaleString')`. `BudgetSection` + `WorkspaceUsage`
assumed full response shapes but a provisioning-stuck workspace
returns partial `{}`. Guard each deep field with `?? 0` and cover
the partial-response case with regression tests.
2. **Missing required env vars failed silently** 15+ minutes later as
a cosmetic "Provisioning Timeout" banner. The in-container preflight
catches them but by then the container has already crashed without
calling /registry/register, so the workspace sat in 'provisioning'
forever. Mirror the preflight server-side: parse config.yaml's
`runtime_config.required_env` before launch, fail fast with a
WORKSPACE_PROVISION_FAILED event naming the missing vars.
3. **No backend timeout** ever flipped a stuck workspace to 'failed'.
Add a registry sweeper (10m default, env-overridable) that detects
workspaces stuck past the window, flips them to 'failed', and emits
WORKSPACE_PROVISION_TIMEOUT. Race-safe: the UPDATE re-checks the
status + age predicate so a concurrent register/restart wins.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the Critical + Important findings from today's code
review of the org API keys feature (PRs #1105-1108).
## Critical-1: rate-limit mint endpoint
Previously POST /org/tokens had no mint-rate limit. A compromised
WorkOS session or leaked bearer could mint thousands of tokens in
seconds, forcing a painful manual cleanup of each one.
Fix: dedicated per-IP token bucket, 10 mints/hour/IP. Legitimate
bursts fit under the ceiling; abuse bounces. List + Delete stay
on the global limiter — they can't be used to generate new
secret material.
## Important-1: HTTP handler integration tests
internal/orgtoken had 9 unit tests; the HTTP layer (org_tokens.go)
had none. Adds org_tokens_test.go covering:
- List happy path + DB error → 500
- Create actor="admin-token" (bootstrap), actor="org-token:<prefix>"
(chained mint), actor="session" (canvas browser path)
- Create name>100 chars → 400
- Create with empty body mints with no name
- Revoke happy path 200, missing id 404, empty id 400
- Plaintext returned in response body and prefix matches first 8 chars
- Warning text present
A regression that breaks the tier-ordering, drops the createdBy
field, or accepts oversized names now fails at CI not prod.
## Important-2: bound List output
List() had no LIMIT — a mint-storm bug or abuse could make the
admin UI slow to render and allocate proportionally. Adds
LIMIT 500 at the SQL layer. 10x realistic ceiling, guardrail
against pathological cases.
## Important-3: audit provenance uses plaintext prefix, not UUID
orgTokenActor() was logging "org-token:<first-8-of-uuid>" which
couldn't be cross-referenced with the UI (which shows first-8
of the plaintext). Users could not correlate "who minted this"
audit entries with the revoke button they're looking at.
Fix: Validate() now returns (id, prefix, error). Middleware
stashes both on the gin context. Handler reads prefix for the
actor string. Audit rows now match UI prefixes exactly.
## Nit: named constants for audit labels
actorOrgTokenPrefix / actorSession / actorAdminToken replace
the hardcoded strings scattered across the handler. Greppable
across log pipelines + audit queries; one place to change if
the format evolves.
## Tests
- internal/orgtoken: 9 existing + 0 new, all still green (updated
signatures for Validate returning prefix).
- internal/handlers/org_tokens_test.go: new — 9 HTTP-layer tests
above. Full gin.Context + sqlmock harness.
- Full `go test ./...` green except one pre-existing
TestGitHubToken_NoTokenProvider flake unrelated to this change
(expects 404, gets 500 — tracked separately).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds user-facing API keys with full-org admin scope. Replaces the
single ADMIN_TOKEN env var with named, revocable, audited tokens
that users can mint/rotate from the canvas UI without ops
intervention.
Designed for the beta growth phase — one token tier (full admin).
Future work will split into scoped roles (admin / workspace-write
/ read-only) and per-workspace bindings. See docs/architecture/
org-api-keys.md for the design + follow-up roadmap.
## Surface
POST /org/tokens mint (plaintext returned once)
GET /org/tokens list live keys (prefix-only)
DELETE /org/tokens/:id revoke (idempotent)
All AdminAuth-gated. Bootstrap path: mint the first token via
ADMIN_TOKEN or canvas session; tokens can mint more tokens after.
## Validation as a new AdminAuth tier (2a)
AdminAuth evaluation order:
Tier 0 lazy-bootstrap fail-open (only when no live tokens AND
no ADMIN_TOKEN env)
Tier 1 verified WorkOS session via /cp/auth/tenant-member
Tier 2a org_api_tokens SELECT — NEW
Tier 2b ADMIN_TOKEN env (bootstrap / CLI break-glass)
Tier 3 any live workspace token (deprecated, only when ADMIN_TOKEN
unset)
Tier 2a runs ONE indexed lookup (partial index on
token_hash WHERE revoked_at IS NULL) + an async last_used_at
bump. No measurable latency cost on the hot path.
## UI
New "Org API Keys" tab in the settings panel. Label field for
human-readable naming. Plaintext shown once + clipboard copy.
Revoke with confirm dialog. Mirrors the existing workspace-
TokensTab flow so users who've used one get the other for free.
## Security properties
- Plaintext never stored. sha256 hash + 8-char display prefix.
- Revocation is immediate: partial index on revoked_at IS NULL
means the next request validates or fails in microseconds.
- created_by audit field captures provenance: "org-token:<short>"
when a token mints another, "session" for browser-UI mints,
"admin-token" for the ADMIN_TOKEN bootstrap path.
- Validate() collapses all failure shapes into ErrInvalidToken
so response-shape can't distinguish "never existed" from
"revoked".
## Tests
- internal/orgtoken: 9 unit tests (hash storage, empty field
null-ing, validation happy path, empty plaintext, unknown hash,
revoked filtering, list ordering, revoke idempotency, has-any-
live short-circuit).
- AdminAuth tier-2a integration covered by existing middleware
tests unchanged (fail-open + bearer paths).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The github-app-auth plugin's go.mod had a relative replace directive
(../molecule-monorepo/platform) that didn't resolve in Docker where
the plugin is at /plugin/ and the platform at /app/. This caused the
plugin's provisionhook.TokenProvider interface to come from a different
package path than the platform's, so the type assertion in
FirstTokenProvider() failed — "no token provider registered".
Fix: sed the plugin's go.mod replace to point at /app during Docker build.
Also added debug logging to GetInstallationToken for future diagnosis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Soft-delete (status='removed') leaves orphan DB rows and FK data forever.
When ?purge=true is passed, after container cleanup the handler cascade-
deletes all leaf FK tables and hard-removes the workspace row.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The org import fired all workspace provisioning goroutines concurrently,
overwhelming Docker when creating 39+ containers. Containers timed out,
leaving workspaces stuck in 'provisioning' with no schedules or hooks.
Fix:
- Add provisionConcurrency=3 semaphore limiting concurrent Docker ops
- Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings
- Pass semaphore through createWorkspaceTree recursion
With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead
of timing out. Each workspace gets its full template: schedules, hooks,
settings, hierarchy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Delete handler acquired token revocation and schedule disable
queries but this test was never updated, causing sqlmock strict mode
to reject the unexpected ExecQuery calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add MemorySeed model and initial_memories support at three levels:
- POST /workspaces payload: seed memories on workspace creation
- org.yaml workspace config: per-workspace initial_memories with
defaults fallback
- org.yaml global_memories: org-wide GLOBAL scope memories seeded
on the first root workspace during import
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GET /admin/memories/export returns all agent memories with workspace
name mapping. POST /admin/memories/import accepts the same format,
resolves workspaces by name, and deduplicates on content+scope.
Both endpoints are AdminAuth-gated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The provisioner was unconditionally writing CLAUDE_CODE_OAUTH_TOKEN into
config.yaml's required_env for all claude-code workspaces. When the
baked token expired, preflight rejected every workspace — even those
with a valid token injected via the secrets API at runtime.
Changes:
- workspace_provision.go: remove hardcoded required_env for claude-code
and codex runtimes; tokens are injected at container start via secrets
- workspace_provision_test.go: flip assertion to reject hardcoded token
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a workspace is deleted (status set to 'removed'), its schedules
remained enabled, causing the scheduler to keep firing cron jobs for
non-existent containers. Add a cascade disable query alongside the
existing token revocation and canvas layout cleanup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes to boost agent throughput:
1. Event-driven cron triggers (webhooks.go): GitHub issues/opened events
fire all "pick-up-work" schedules immediately. PR review/submitted
events fire "PR review" and "security review" schedules. Uses
next_run_at=now() so the scheduler picks them up on next tick.
2. Auto-push hook (executor_helpers.py): After every task completion,
agents automatically push unpushed commits and open a PR targeting
staging. Guards: only on non-protected branches with unpushed work.
Uses /usr/local/bin/git and /usr/local/bin/gh wrappers with baked-in
GH_TOKEN. Never crashes the agent — all errors logged and continued.
3. Integration (claude_sdk_executor.py): auto_push_hook() called in the
_execute_locked finally block after commit_memory.
Closes productivity gap where agents wrote code but never pushed,
and where work crons only fired on timers instead of reacting to events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #881 closed SAFE-T1201 (#838) on the HTTP path by wiring redactSecrets()
into MemoriesHandler.Commit — but the sibling code path on the MCP bridge
(MCPHandler.toolCommitMemory) was left with only the TODO comment. Agents
calling commit_memory via the MCP tool bridge are the PRIMARY attack vector
for #838 (confused / prompt-injected agent pipes raw tool-response text
containing plain-text credentials into agent_memories, leaking into shared
TEAM scope). The HTTP path is only exercised by canvas UI posts, so the MCP
gap was the hotter one.
Change:
workspace-server/internal/handlers/mcp.go:725
- TODO(#838): run _redactSecrets(content) before insert — plain-text
- API keys from tool responses must not land in the memories table.
+ SAFE-T1201 (#838): scrub known credential patterns before persistence…
+ content, _ = redactSecrets(workspaceID, content)
Reuses redactSecrets (same package) so there's no duplicated pattern list —
a future-added pattern in memories.go automatically covers the MCP path too.
Tests added in mcp_test.go:
- TestMCPHandler_CommitMemory_SecretInContent_IsRedactedBeforeInsert
Exercises three patterns (env-var assignment, Bearer token, sk-…)
and uses sqlmock's WithArgs to bind the exact REDACTED form — so a
regression (removing the redactSecrets call) fails with arg-mismatch
rather than silently persisting the secret.
- TestMCPHandler_CommitMemory_CleanContent_PassesThrough
Regression guard — benign content must NOT be altered by the redactor.
NOTE: unable to run `go test -race ./...` locally (this container has no Go
toolchain). The change is mechanical reuse of an already-shipped function in
the same package; CI must validate. The sqlmock patterns mirror the existing
TestMCPHandler_CommitMemory_LocalScope_Success test exactly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two findings from the pre-launch log-scrub audit:
1. handlers/workspace_provision.go:548 logged `token[:8]` — the exact
H1 pattern that panicked on short keys. Even with a length guard,
leaking 8 chars of an auth token into centralized logs shortens the
search space for anyone who gets log-read access. Now logs only
`len(token)` as a liveness signal.
2. provisioner/cp_provisioner.go:101 fell back to logging the raw
control-plane response body when the structured {"error":"..."}
field was absent. If the CP ever echoed request headers (Authorization)
or a portion of user-data back in an error path, the bearer token
would end up in our tenant-instance logs. Now logs the byte count
only; the structured error remains in place for the happy path.
Also caps the read at 64 KiB via io.LimitReader to prevent
log-flood DoS from a compromised upstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two HIGH-severity DoS surfaces: both handlers read the entire HTTP
body with io.ReadAll(r.Body) and no upper bound, so a caller streaming
a multi-gigabyte request could exhaust memory on the tenant instance
before we even validated the JSON.
H3 (Discord webhook): wrap Body in io.LimitReader with a 1 MiB cap.
Discord Interactions payloads are well under 10 KiB in practice.
H4 (workspace config PATCH): wrap Body in http.MaxBytesReader with a
256 KiB cap. Real configs are <10 KiB; jsonb handles the cap
comfortably. Returns 413 Request Entity Too Large on overflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>