Commit Graph

57 Commits

Author SHA1 Message Date
rabbitblood
1e30386aec fix(auth): accept admin token in WorkspaceAuth for canvas dashboard
The canvas sends NEXT_PUBLIC_ADMIN_TOKEN on all API calls but per-workspace
routes (/activity, /delegations, /traces) use WorkspaceAuth which only
accepts per-workspace bearer tokens. This made the canvas dashboard 401
on every workspace detail view.

Fix: WorkspaceAuth now accepts the admin token as a fallback after
workspace token validation fails. This lets the canvas read all workspace
data with a single admin credential.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:42:43 -07:00
rabbitblood
dd224b2ae4 fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087)
Soft-delete (status='removed') leaves orphan DB rows and FK data forever.
When ?purge=true is passed, after container cleanup the handler cascade-
deletes all leaf FK tables and hard-removes the workspace row.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:08:44 -07:00
molecule-ai[bot]
247c0d8dcf Merge pull request #1085 from Molecule-AI/fix/org-import-concurrency-1084
fix(org-import): limit concurrent Docker provisioning to 3 (#1084)
2026-04-20 10:38:26 -07:00
rabbitblood
762b38fa30 fix(org-import): limit concurrent Docker provisioning to 3 (#1084)
The org import fired all workspace provisioning goroutines concurrently,
overwhelming Docker when creating 39+ containers. Containers timed out,
leaving workspaces stuck in 'provisioning' with no schedules or hooks.

Fix:
- Add provisionConcurrency=3 semaphore limiting concurrent Docker ops
- Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings
- Pass semaphore through createWorkspaceTree recursion

With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead
of timing out. Each workspace gets its full template: schedules, hooks,
settings, hierarchy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:08:17 -07:00
Hongming Wang
0a06cb4fc9 fix(cp_provisioner): cap IsRunning body read at 64 KiB
IsRunning used an unbounded json.NewDecoder(resp.Body).Decode on
CP status responses. Start already caps its body read at 64 KiB
(cp_provisioner.go:137) to defend against a misconfigured or
compromised CP streaming a huge body and exhausting memory.

IsRunning is called reactively per-request from a2a_proxy and
periodically from healthsweep, so it's a hotter path than Start
and arguably deserves the same defense more.

Adds TestIsRunning_BoundedBodyRead that serves a body padded past
the cap and asserts the decode still succeeds on the JSON prefix.

Follow-up to code-review Nit-2 on #1073.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:06:20 -07:00
Hongming Wang
cfa901b89a fix(cp_provisioner): IsRunning returns (true, err) on transient failures
My #1071 made IsRunning return (false, err) on all error paths, but that
breaks a2a_proxy which depends on Docker provisioner's (true, err) contract.
Without this fix, any brief CP outage causes a2a_proxy to mark workspaces
offline and trigger restart cascades across every tenant.

Contract now matches Docker.IsRunning:
  transport error    → (true, err)  — alive, degraded signal
  non-2xx response   → (true, err)  — alive, degraded signal
  JSON decode error  → (true, err)  — alive, degraded signal
  2xx state!=running → (false, nil)
  2xx state==running → (true, nil)

healthsweep.go is also happy with this — it skips on err regardless.

Adds TestIsRunning_ContractCompat_A2AProxy as regression guard that
asserts each error path explicitly against the a2a_proxy expectations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:58:18 -07:00
Hongming Wang
724456b7be Merge pull request #1071 from Molecule-AI/fix/isrunning-surface-http-errors
fix(workspace-server): IsRunning surfaces non-2xx + JSON errors
2026-04-20 08:50:03 -07:00
Hongming Wang
e502003c74 fix(workspace-server): IsRunning surfaces non-2xx + JSON errors
Pre-existing silent-failure path: IsRunning decoded CP responses
regardless of HTTP status, so a CP 500 → empty body → State="" →
returned (false, nil). The sweeper couldn't distinguish "workspace
stopped" from "CP broken" and would leave a dead row in place.

## Fix

  - Non-2xx → wrapped error, does NOT echo body (CP 5xx bodies may
    contain echoed headers; leaking into logs would expose bearer)
  - JSON decode error → wrapped error
  - Transport error → now wrapped with "cp provisioner: status:"
    prefix for easier log grepping

## Tests

+7 cases (5-status table + malformed JSON + existing transport).
IsRunning coverage 100%; overall cp_provisioner at 98%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:47:55 -07:00
molecule-ai[bot]
fabb108cb4 Merge pull request #1017 from Molecule-AI/fix/rows-err-missing
fix(bundle/exporter): add rows.Err() check + MCP secret scrub
2026-04-20 08:47:49 -07:00
molecule-ai[bot]
eeb5552fba Merge pull request #1022 from Molecule-AI/fix/unchecked-exec-workspace-provision
fix(mcp): scrub secrets in commit_memory + MCP handler tests
2026-04-20 08:47:25 -07:00
Hongming Wang
2730a20194 Merge pull request #1067 from Molecule-AI/fix/tenant-workspace-auth
fix(workspace-server): send X-Molecule-Admin-Token on CP calls
2026-04-20 08:39:49 -07:00
Hongming Wang
6c4d1ae4db test(workspace-server): cover Stop/IsRunning/Close + auth-header + transport errors
Closes review gap: pre-PR coverage on CPProvisioner was 37%.
After this commit every exported method is exercised:

  - NewCPProvisioner            100%
  - authHeaders                  100%
  - Start                         91.7% (remainder: json.Marshal error
                                   path, unreachable with fixed-type
                                   request struct)
  - Stop                         100% (new — header + path + error)
  - IsRunning                    100% (new — 4-state matrix + auth)
  - Close                        100% (new — contract no-op)

New cases assert both auth headers (shared secret + admin_token) land
on every outbound request, transport failures surface clear errors
on Start/Stop, and IsRunning doesn't misreport on transport failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:37:39 -07:00
rabbitblood
b1bb5f838a fix: GitHub token refresh — add WorkspaceAuth path for credential helper (#1068)
PR #729 tightened AdminAuth to require ADMIN_TOKEN, breaking the
workspace credential helper which called /admin/github-installation-token
with a workspace bearer token. Tokens expired after 60 min with no refresh.

Fix: Add /workspaces/:id/github-installation-token under WorkspaceAuth
so any authenticated workspace can refresh its GitHub token. Keep the
admin path as backward-compatible alias.

Update molecule-git-token-helper.sh to use the workspace-scoped path
when WORKSPACE_ID is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 08:30:02 -07:00
Hongming Wang
d3386ad620 fix(workspace-server): send X-Molecule-Admin-Token on CP calls
controlplane #118 + #130 made /cp/workspaces/* require a per-tenant
admin_token header in addition to the platform-wide shared secret.
Without it, every workspace provision / deprovision / status call
now 401s.

ADMIN_TOKEN is already injected into the tenant container by the
controlplane's Secrets Manager bootstrap, so this is purely a
header-plumbing change — no new config required on the tenant side.

## Change

- CPProvisioner carries adminToken alongside sharedSecret
- New authHeaders method sets BOTH auth headers on every outbound
  request (old authHeader deleted — single call site was misleading
  once the semantics changed)
- Empty values on either header are no-ops so self-hosted / dev
  deployments without a real CP still work

## Tests

Renamed + expanded cp_provisioner_test cases:
- TestAuthHeaders_NoopWhenBothEmpty — self-hosted path
- TestAuthHeaders_SetsBothWhenBothProvided — prod happy path
- TestAuthHeaders_OnlyAdminTokenWhenSecretEmpty — transition window

Full workspace-server suite green.

## Rollout

Next tenant provision will ship an image with this commit merged.
Existing tenants (none in prod right now — hongming was the only
one and was purged earlier today) will auto-update via the 5-min
image-pull cron.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:17:50 -07:00
rabbitblood
ff7ac87b97 feat: seed initial memories from org template and create payload (#1050)
Add MemorySeed model and initial_memories support at three levels:
- POST /workspaces payload: seed memories on workspace creation
- org.yaml workspace config: per-workspace initial_memories with
  defaults fallback
- org.yaml global_memories: org-wide GLOBAL scope memories seeded
  on the first root workspace during import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 00:35:49 -07:00
Hongming Wang
e345aa832a Merge pull request #1033 from Molecule-AI/bugfixes/platform-handler-fixes
fix: platform handler bug fixes (a2a proxy, secrets, terminal, webhooks)
2026-04-19 22:24:39 -07:00
Hongming Wang
05e2132d92 Merge pull request #1031 from Molecule-AI/fix/remove-baked-oauth-token-1028
fix: remove hardcoded CLAUDE_CODE_OAUTH_TOKEN from provisioner (#1028)
2026-04-19 22:24:36 -07:00
Hongming Wang
f124e2f404 Merge pull request #1030 from Molecule-AI/fix/1027-disable-schedules-on-workspace-delete
fix: disable schedules on workspace delete (#1027)
2026-04-19 22:24:33 -07:00
Molecule AI Platform Engineer
32f23d26b0 fix: multiple platform handler bug fixes
- secrets.go: Log RowsAffected errors instead of silently discarding them
- a2a_proxy.go: Add 60s safety timeout to a2aClient HTTP client
- terminal.go: Fix defer ordering - always close WebSocket conn on error,
  only defer resp.Close() after successful exec attach
- webhooks.go: Add shortSHA() helper to safely handle empty HeadSHA

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 05:01:01 +00:00
rabbitblood
30fc869c13 test: add cascade schedule disable tests for #1027
- TestWorkspaceDelete_DisablesSchedules — leaf workspace delete disables its schedules
- TestWorkspaceDelete_CascadeDisablesDescendantSchedules — parent+child+grandchild cascade
- TestWorkspaceDelete_ScheduleDisableOnlyTargetsDeletedWorkspace — negative test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 22:00:50 -07:00
rabbitblood
639b4dbb9f fix: stop hardcoding CLAUDE_CODE_OAUTH_TOKEN in required_env (#1028)
The provisioner was unconditionally writing CLAUDE_CODE_OAUTH_TOKEN into
config.yaml's required_env for all claude-code workspaces.  When the
baked token expired, preflight rejected every workspace — even those
with a valid token injected via the secrets API at runtime.

Changes:
- workspace_provision.go: remove hardcoded required_env for claude-code
  and codex runtimes; tokens are injected at container start via secrets
- workspace_provision_test.go: flip assertion to reject hardcoded token

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 21:56:21 -07:00
rabbitblood
a139687071 fix: disable schedules when workspace is deleted (#1027)
When a workspace is deleted (status set to 'removed'), its schedules
remained enabled, causing the scheduler to keep firing cron jobs for
non-existent containers. Add a cascade disable query alongside the
existing token revocation and canvas layout cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 21:53:30 -07:00
Hongming Wang
19cae2986c Merge pull request #1023 from Molecule-AI/feat/productivity-boost-event-crons-autopush
feat: event-driven crons + auto-push hook for agent productivity
2026-04-19 20:34:06 -07:00
rabbitblood
46c20731e6 feat: event-driven cron triggers + auto-push hook for agent productivity
Three changes to boost agent throughput:

1. Event-driven cron triggers (webhooks.go): GitHub issues/opened events
   fire all "pick-up-work" schedules immediately. PR review/submitted
   events fire "PR review" and "security review" schedules. Uses
   next_run_at=now() so the scheduler picks them up on next tick.

2. Auto-push hook (executor_helpers.py): After every task completion,
   agents automatically push unpushed commits and open a PR targeting
   staging. Guards: only on non-protected branches with unpushed work.
   Uses /usr/local/bin/git and /usr/local/bin/gh wrappers with baked-in
   GH_TOKEN. Never crashes the agent — all errors logged and continued.

3. Integration (claude_sdk_executor.py): auto_push_hook() called in the
   _execute_locked finally block after commit_memory.

Closes productivity gap where agents wrote code but never pushed,
and where work crons only fired on timers instead of reacting to events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:26:35 -07:00
Hongming Wang
85f594d108 Merge pull request #1007 from Molecule-AI/fix/scheduler-defer-busy-969
fix(scheduler): defer cron fires when workspace busy instead of skipping (#969)
2026-04-19 20:21:16 -07:00
Molecule AI Backend Engineer
efa40774bb fix(bundle/exporter): add rows.Err() after child workspace enumeration
Silent data loss on mid-cursor DB errors — partial sub-workspace
bundles returned instead of surfacing the iteration error. Adds
rows.Err() check after the SELECT id FROM workspaces query in
Export(), mirroring the pattern already used in scheduler.go
and handlers with similar recursion patterns.

Closes: R1 MISSING-ROWS-ERR findings (bundle/exporter.go)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 21:46:36 +00:00
Molecule AI Backend Engineer
818d5cde91 fix(mcp): scrub secrets in commit_memory MCP tool path (#838 sibling)
PR #881 closed SAFE-T1201 (#838) on the HTTP path by wiring redactSecrets()
into MemoriesHandler.Commit — but the sibling code path on the MCP bridge
(MCPHandler.toolCommitMemory) was left with only the TODO comment. Agents
calling commit_memory via the MCP tool bridge are the PRIMARY attack vector
for #838 (confused / prompt-injected agent pipes raw tool-response text
containing plain-text credentials into agent_memories, leaking into shared
TEAM scope). The HTTP path is only exercised by canvas UI posts, so the MCP
gap was the hotter one.

Change:

  workspace-server/internal/handlers/mcp.go:725
    - TODO(#838): run _redactSecrets(content) before insert — plain-text
    - API keys from tool responses must not land in the memories table.
    + SAFE-T1201 (#838): scrub known credential patterns before persistence…
    + content, _ = redactSecrets(workspaceID, content)

Reuses redactSecrets (same package) so there's no duplicated pattern list —
a future-added pattern in memories.go automatically covers the MCP path too.

Tests added in mcp_test.go:

  - TestMCPHandler_CommitMemory_SecretInContent_IsRedactedBeforeInsert
      Exercises three patterns (env-var assignment, Bearer token, sk-…)
      and uses sqlmock's WithArgs to bind the exact REDACTED form — so a
      regression (removing the redactSecrets call) fails with arg-mismatch
      rather than silently persisting the secret.

  - TestMCPHandler_CommitMemory_CleanContent_PassesThrough
      Regression guard — benign content must NOT be altered by the redactor.

NOTE: unable to run `go test -race ./...` locally (this container has no Go
toolchain). The change is mechanical reuse of an already-shipped function in
the same package; CI must validate. The sqlmock patterns mirror the existing
TestMCPHandler_CommitMemory_LocalScope_Success test exactly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 17:52:52 +00:00
rabbitblood
a3d30c1ece fix(scheduler): defer cron fires when workspace busy instead of skipping (#969)
Previously, the scheduler skipped cron fires entirely when a workspace
had active_tasks > 0 (#115). This caused permanent cron misses for
workspaces kept perpetually busy by the 5-min Orchestrator pulse — work
crons (pick-up-work, PR review) were skipped every fire because the
agent was always processing a delegation.

Measured impact on Dev Lead: 17 context-deadline-exceeded timeouts in
2 hours, ~30% of inter-agent messages silently dropped.

Fix: when workspace is busy, poll every 10s for up to 2 minutes waiting
for idle. If idle within the window, fire normally. If still busy after
2 min, fall back to the original skip behavior.

This is a minimal, safe change:
- No new goroutines or channels
- Same fire path once idle
- Bounded wait (2 min max, won't block the scheduler pool)
- Falls back to skip if workspace never becomes idle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 08:38:14 -07:00
Hongming Wang
393ecc74e3 Merge pull request #991 from Molecule-AI/perf/scheduler-returning-clause
perf(scheduler): collapse empty-run bump to single RETURNING query
2026-04-19 03:48:42 -07:00
Hongming Wang
b749c558f3 perf(scheduler): collapse empty-run bump to single RETURNING query
The phantom-producer detector (#795) was doing UPDATE + SELECT in two
roundtrips — first incrementing consecutive_empty_runs, then re-
reading to check the stale threshold. Switch to UPDATE ... RETURNING
so the post-increment value comes back in one query.

Called once per schedule per cron tick. At 100 tenants × dozens of
schedules per tenant, the halved DB traffic on the empty-response
path is measurable, not just cosmetic.

Also now properly logs if the bump itself fails (previously it silent-
swallowed the ExecContext error and still ran the SELECT, which would
confuse debugging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 03:44:48 -07:00
Hongming Wang
296c52cb25 test(ws-server): cover CPProvisioner — auth, env fallback, error paths
Post-merge audit flagged cp_provisioner.go as the only new file from
the canary/C1 work without test coverage. Fills the gap:

- NewCPProvisioner_RequiresOrgID — self-hosted without MOLECULE_ORG_ID
  refuses to construct (avoids silent phone-home to prod CP).
- NewCPProvisioner_FallsBackToProvisionSharedSecret — the operator
  ergonomics of using one env-var name on both sides of the wire.
- AuthHeader noop + happy path — bearer only set when secret is set.
- Start_HappyPath — end-to-end POST to stubbed CP, bearer forwarded,
  instance_id parsed out of response.
- Start_Non201ReturnsStructuredError — when CP returns structured
  {"error":"…"}, that message surfaces to the caller.
- Start_NoStructuredErrorFallsBackToSize — regression gate for the
  anti-log-leak change from PR #980: raw upstream body must NOT
  appear in the error, only the byte count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 03:41:16 -07:00
Hongming Wang
80caba97ee feat(ws-server): pull env from CP on startup
Paired with molecule-controlplane PR #55 (GET /cp/tenants/config). Lets
existing tenants heal themselves when we rotate or add a CP-side env
var (e.g. MOLECULE_CP_SHARED_SECRET landing earlier today) without any
ssh or re-provision.

Flow: main() calls refreshEnvFromCP() before any other os.Getenv read.
The helper reads MOLECULE_ORG_ID + ADMIN_TOKEN from the baked-in
user-data env, GETs {MOLECULE_CP_URL}/cp/tenants/config with those
credentials, and applies the returned string map via os.Setenv so
downstream code (CPProvisioner, etc.) sees the fresh values.

Best-effort semantics:
- self-hosted / no MOLECULE_ORG_ID → no-op (return nil)
- CP unreachable / non-200 → log + return error (main keeps booting)
- oversized values (>4 KiB each) rejected to avoid env pollution
- body read capped at 64 KiB

Once this image hits GHCR, the 5-minute tenant auto-updater picks it
up, the container restarts, refresh runs, and every tenant has
MOLECULE_CP_SHARED_SECRET within ~5 minutes — no operator toil.

Also fixes workspace-server/.gitignore so `server` no longer matches
the cmd/server package dir — it only ignored the compiled binary but
pattern was too broad. Anchored to `/server`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 02:41:15 -07:00
Hongming Wang
896a34429a Merge pull request #981 from Molecule-AI/fix/security-tenant-cpprovisioner-bearer
fix(security): tenant CPProvisioner sends CP bearer on provision / stop / status
2026-04-19 01:55:20 -07:00
Hongming Wang
a79366a04a fix(security): tenant CPProvisioner attaches CP bearer on all calls
Completes the C1 integration (PR #50 on molecule-controlplane). The CP
now requires Authorization: Bearer <PROVISION_SHARED_SECRET> on all
three /cp/workspaces/* endpoints; without this change the tenant-side
Start/Stop/IsRunning calls would all 401 (or 404 when the CP's routes
refused to mount) and every workspace provision from a SaaS tenant
would silently fail.

Reads MOLECULE_CP_SHARED_SECRET, falling back to PROVISION_SHARED_SECRET
so operators can use one env-var name on both sides of the wire. Empty
value is a no-op: self-hosted deployments with no CP or a CP that
doesn't gate /cp/workspaces/* keep working as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 01:53:12 -07:00
Hongming Wang
c8c92ffe21 Merge pull request #980 from Molecule-AI/fix/security-log-scrubbing
fix(security): scrub workspace-server token + upstream error logs
2026-04-19 01:39:39 -07:00
Hongming Wang
365f13199e fix(security): scrub workspace-server token + upstream error logs
Two findings from the pre-launch log-scrub audit:

1. handlers/workspace_provision.go:548 logged `token[:8]` — the exact
   H1 pattern that panicked on short keys. Even with a length guard,
   leaking 8 chars of an auth token into centralized logs shortens the
   search space for anyone who gets log-read access. Now logs only
   `len(token)` as a liveness signal.

2. provisioner/cp_provisioner.go:101 fell back to logging the raw
   control-plane response body when the structured {"error":"..."}
   field was absent. If the CP ever echoed request headers (Authorization)
   or a portion of user-data back in an error path, the bearer token
   would end up in our tenant-instance logs. Now logs the byte count
   only; the structured error remains in place for the happy path.
   Also caps the read at 64 KiB via io.LimitReader to prevent
   log-flood DoS from a compromised upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 01:33:47 -07:00
Hongming Wang
a5d6e5319f Merge pull request #979 from Molecule-AI/fix/security-adminauth-c4
fix(security): C4 — close AdminAuth fail-open race on hosted-SaaS fresh install
2026-04-19 01:29:54 -07:00
Hongming Wang
481b5cfb1a fix(security): C4 — close AdminAuth fail-open race on hosted-SaaS fresh install
Pre-launch review blocker. AdminAuth's Tier-1 fail-open fired whenever
the workspace_auth_tokens table was empty — including the window between
a hosted tenant EC2 booting and the first workspace being created. In
that window, every admin-gated route (POST /org/import, POST /workspaces,
POST /bundles/import, etc.) was reachable without a bearer, letting an
attacker pre-empt the first real user by importing a hostile workspace
into a freshly provisioned instance.

Fix: fail-open is now ONLY applied when ADMIN_TOKEN is unset (self-
hosted dev with zero auth configured). Hosted SaaS always sets
ADMIN_TOKEN at provision time, so the branch never fires in prod and
requests with no bearer get 401 even before the first token is minted.

Tier-2 / Tier-3 paths unchanged.

The old TestAdminAuth_684_FailOpen_AdminTokenSet_NoGlobalTokens test
was codifying exactly this bug (asserting 200 on fresh install with
ADMIN_TOKEN set). Renamed and flipped to
TestAdminAuth_C4_AdminTokenSet_FreshInstall_FailsClosed asserting 401.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 01:28:13 -07:00
Hongming Wang
af9aae2c38 fix(security): cap webhook + config PATCH bodies (H3/H4)
Two HIGH-severity DoS surfaces: both handlers read the entire HTTP
body with io.ReadAll(r.Body) and no upper bound, so a caller streaming
a multi-gigabyte request could exhaust memory on the tenant instance
before we even validated the JSON.

H3 (Discord webhook): wrap Body in io.LimitReader with a 1 MiB cap.
Discord Interactions payloads are well under 10 KiB in practice.

H4 (workspace config PATCH): wrap Body in http.MaxBytesReader with a
256 KiB cap. Real configs are <10 KiB; jsonb handles the cap
comfortably. Returns 413 Request Entity Too Large on overflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 01:23:03 -07:00
Hongming Wang
151e458c38 Merge pull request #975 from Molecule-AI/fix/hibernate-409-guard-active-tasks
feat(platform): 409 guard on /hibernate when active_tasks > 0 (closes #822)
2026-04-19 00:30:24 -07:00
Hongming Wang
4e7c4ceeb3 Merge pull request #976 from Molecule-AI/feat/last-outbound-at-817
feat(platform): track last_outbound_at for silent detection (closes #817)
2026-04-19 00:30:01 -07:00
Hongming Wang
c0233317b8 Merge pull request #968 from Molecule-AI/fix/security-memory-delimiter-npm-pin
fix(security): GLOBAL memory delimiter spoofing + pin MCP version (closes #807, #805)
2026-04-19 00:28:08 -07:00
Hongming Wang
d183f89b94 Merge pull request #964 from Molecule-AI/feat/schema-migrations-tracking
feat(db): schema_migrations tracking — run each migration only once
2026-04-19 00:27:27 -07:00
Hongming Wang
6fb8472c26 Merge pull request #966 from Molecule-AI/fix/strip-current-task-public-get
fix(security): strip current_task from public GET response (closes #955)
2026-04-19 00:26:27 -07:00
Hongming Wang
4e1a513160 feat(platform): track last_outbound_at for silent-workspace detection (closes #817)
Sub of #795 (phantom-busy post-mortem). Adds last_outbound_at TIMESTAMPTZ
column to workspaces. Bumped async on every successful outbound A2A call
from a real workspace (skip canvas + system callers). Exposed in
GET /workspaces/:id response as "last_outbound_at".

PM/Dev Lead orchestrators can now detect workspaces that have gone silent
despite being online (> 2h + active cron = phantom-busy warning).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 13:04:54 -07:00
Hongming Wang
37030c307d feat(platform): 409 guard on /hibernate when active_tasks > 0 (closes #822)
Phase 35.1 / #799 security condition C3 — prevents operator from
accidentally killing a mid-task agent.

Behavior:
- active_tasks == 0 → proceed as before
- active_tasks > 0 && ?force=true → log [WARN] + proceed
- active_tasks > 0 && no force → 409 with {error, active_tasks}

2 new tests: TestHibernateHandler_ActiveTasks_Returns409,
TestHibernateHandler_ActiveTasks_ForceTrue_Returns200.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 12:09:52 -07:00
Hongming Wang
4dfd7b969e test: GLOBAL memory delimiter spoofing escape + LOCAL scope untouched
- TestCommitMemory_GlobalScope_DelimiterSpoofingEscaped: verifies [MEMORY prefix
  is escaped to [_MEMORY before DB insert (SAFE-T1201, #807)
- TestCommitMemory_LocalScope_NoDelimiterEscape: LOCAL scope stored verbatim

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:54:52 -07:00
Hongming Wang
3612755590 test: verify current_task + last_sample_error + workspace_dir stripped from public GET
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:53:45 -07:00
Hongming Wang
ff45121a6c test: schema_migrations tracking — 4 cases (first boot, re-boot, mixed, down.sql filter)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:52:27 -07:00
Hongming Wang
5be8ba4b45 fix(security): GLOBAL memory delimiter spoofing + pin MCP npm version
SAFE-T1201 (#807): Escape [MEMORY prefix in GLOBAL memory content on
write to prevent delimiter-spoofing prompt injection. Content stored
as "[_MEMORY " so it renders as text, not structure, when wrapped with
the real delimiter on read.

SAFE-T1102 (#805): Pin @molecule-ai/mcp-server@1.0.0 in .mcp.json.example.
Prevents supply-chain attacks via unpinned npx -y.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:09:24 -07:00