fix(core#2566): refuse concierge/agent set_workspace_secret self-writes on org root #2756

Merged
devops-engineer merged 1 commits from fix/core2566-concierge-self-secret-write into main 2026-06-13 19:06:31 +00:00
Member

What

Adds a fail-closed guard in SecretsHandler.Set (workspace-server) that refuses the org-root concierge's set_workspace_secret call when the target is the concierge's own workspace (kind='platform'). Returns 403 with structured code CONCIERGE_SELF_WRITE_BLOCKED.

Why

Live incident (2026-06-11 00:08Z) — concierge called mcp__platform__mint_org_token then mcp__platform__set_workspace_secret against its own workspace; the secret-change auto-restart killed the concierge mid-turn (EOF), the org root went offline, and operator hand-repair was required.

This is a STRONGER defense than #2573 (auto-restart skip for kind='platform') and #2574 (approval gate for admin-token callers). It refuses the self-targeted write outright instead of just deferring the restart. Failures:

Caller Target kind Behavior
admin-token (concierge) platform 403 CONCIERGE_SELF_WRITE_BLOCKED
admin-token kind lookup error 403 fail-closed
admin-token regular workspace falls through to approval gate (202)
session cookie (operator) any not blocked (operator retains canvas Secrets tab access)
workspace token any not blocked (no agent surface)

Test plan

  • TestSecretsSet_AdminTokenSelfWrite_Refused — admin-token + kind='platform' → 403, code set, no INSERT, no approval created
  • TestSecretsSet_AdminTokenOnOtherWorkspace_NotBlocked — admin-token + kind='workspace' → falls through to approval gate (202)
  • TestSecretsSet_AdminTokenKindLookupError_FailsClosed — admin-token + DB error → 403 (no retry loop on a fundamentally-unsafe path)
  • TestSecretsSet_SessionCookieCaller_NotBlocked — operator session → not blocked, writes through normally
  • TestSecretsSet_AdminToken_GatedByApproval (#2574) — updated to also mock the new kind lookup (guard runs before gate; non-platform targets pass through)
  • Full internal/handlers package: go test ./internal/handlers/ → ok 23.9s
  • go vet ./internal/handlers/ → clean
  • go build ./internal/handlers/ → clean

Audit

Refused writes emit a secret.set.refused audit event with issue=core#2566 and reason=concierge_self_write_blocked so security review can correlate with the 2026-06-11 incident signature.

Scope

Set only. The DELETE path has the same restart-on-remove side effect (#2573) and could be guarded the same way; left for a follow-up ticket to keep this PR's diff focused on the reported incident.

Closes core#2566 (closes only the self-DoS finding; the approval-gate-applicability finding is owned by the existing #2574 fix).

## What Adds a fail-closed guard in `SecretsHandler.Set` (workspace-server) that refuses the org-root concierge's `set_workspace_secret` call when the target is the concierge's own workspace (`kind='platform'`). Returns 403 with structured code `CONCIERGE_SELF_WRITE_BLOCKED`. ## Why Live incident (2026-06-11 00:08Z) — concierge called `mcp__platform__mint_org_token` then `mcp__platform__set_workspace_secret` against its own workspace; the secret-change auto-restart killed the concierge mid-turn (EOF), the org root went offline, and operator hand-repair was required. This is a STRONGER defense than #2573 (auto-restart skip for kind='platform') and #2574 (approval gate for admin-token callers). It refuses the self-targeted write outright instead of just deferring the restart. Failures: | Caller | Target kind | Behavior | |---|---|---| | admin-token (concierge) | `platform` | **403** `CONCIERGE_SELF_WRITE_BLOCKED` | | admin-token | kind lookup error | **403** fail-closed | | admin-token | regular workspace | falls through to approval gate (202) | | session cookie (operator) | any | not blocked (operator retains canvas Secrets tab access) | | workspace token | any | not blocked (no agent surface) | ## Test plan - [x] `TestSecretsSet_AdminTokenSelfWrite_Refused` — admin-token + kind='platform' → 403, code set, no INSERT, no approval created - [x] `TestSecretsSet_AdminTokenOnOtherWorkspace_NotBlocked` — admin-token + kind='workspace' → falls through to approval gate (202) - [x] `TestSecretsSet_AdminTokenKindLookupError_FailsClosed` — admin-token + DB error → 403 (no retry loop on a fundamentally-unsafe path) - [x] `TestSecretsSet_SessionCookieCaller_NotBlocked` — operator session → not blocked, writes through normally - [x] `TestSecretsSet_AdminToken_GatedByApproval` (#2574) — updated to also mock the new kind lookup (guard runs before gate; non-platform targets pass through) - [x] Full `internal/handlers` package: `go test ./internal/handlers/` → ok 23.9s - [x] `go vet ./internal/handlers/` → clean - [x] `go build ./internal/handlers/` → clean ## Audit Refused writes emit a `secret.set.refused` audit event with `issue=core#2566` and `reason=concierge_self_write_blocked` so security review can correlate with the 2026-06-11 incident signature. ## Scope Set only. The DELETE path has the same restart-on-remove side effect (#2573) and could be guarded the same way; left for a follow-up ticket to keep this PR's diff focused on the reported incident. Closes core#2566 (closes only the self-DoS finding; the approval-gate-applicability finding is owned by the existing #2574 fix).
agent-dev-b added 1 commit 2026-06-13 19:02:42 +00:00
fix(core#2566): refuse concierge/agent set_workspace_secret self-writes on org root
CI / Python Lint & Test (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
Harness Replays / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 2s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 17s
CI / Canvas Deploy Status (pull_request) Successful in 0s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 26s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 33s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m13s
CI / Platform (Go) (pull_request) Successful in 2m30s
CI / all-required (pull_request) Successful in 3s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 9s
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 7s
650c1d1a64
Live incident (2026-06-11 00:08Z): the org-root concierge called
mcp__platform__set_workspace_secret against its own workspace and the
auto-restart killed the container mid-turn, taking the org root offline
and requiring operator hand-repair.

core#2573 (auto-restart skip for kind='platform') partially mitigates
the restart side, but the secret write itself triggers env-var reload
side effects inside the live container, and any subsequent restart
(operator click, watchdog, next Set/Delete) tears the concierge down.
The approval gate (#2574) is necessary but not sufficient — an
approval can be issued by a sleepy operator, and the concierge
consuming it still self-DoSes.

This change adds a fail-closed guard in SecretsHandler.Set that, for
admin-token callers (the concierge's surface), looks up the target
workspace's kind and refuses the write with 403 + structured code
CONCIERGE_SELF_WRITE_BLOCKED when the target is kind='platform'.

Failures:
  - admin-token + kind='platform'  → 403 CONCIERGE_SELF_WRITE_BLOCKED
  - admin-token + kind lookup err  → 403 (fail-closed, do not retry-loop)
  - admin-token + regular workspace → falls through to approval gate
  - session-cookie (operator)       → not blocked (operator retains
                                      ability to set concierge secrets
                                      via the canvas Secrets tab)
  - ordinary workspace-token caller → not blocked (no agent surface)

Tests (handlers package):
  + TestSecretsSet_AdminTokenSelfWrite_Refused
  + TestSecretsSet_AdminTokenOnOtherWorkspace_NotBlocked
  + TestSecretsSet_AdminTokenKindLookupError_FailsClosed
  + TestSecretsSet_SessionCookieCaller_NotBlocked
  ~ TestSecretsSet_AdminToken_GatedByApproval: now also mocks the kind
    lookup (guard runs before the approval gate; for non-platform
    targets the guard passes through to the existing gate)

Refused writes emit a 'secret.set.refused' audit event tagged with
issue=core#2566 and reason=concierge_self_write_blocked so security
review can correlate them with the original incident's signature.

Scope: Set only. The DELETE path has the same restart-on-remove
side effect (#2573) and could be guarded the same way; left for a
follow-up ticket to keep this PR's diff focused on the reported
incident.

Co-Authored-By: Claude <noreply@anthropic.com>
agent-reviewer-cr2 approved these changes 2026-06-13 19:06:12 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on head 650c1d1a.

Security-focused 5-axis review:

  • Correctness: conciergeSelfSecretWriteBlocked is scoped to admin-token callers and checks the target workspace kind before the approval gate/INSERT. Admin-token + kind=platform returns 403 with CONCIERGE_SELF_WRITE_BLOCKED; regular workspaces fall through to #2574 approval gating; non-admin-token session/workspace callers do not hit the guard.
  • Robustness: kind lookup errors fail closed with 403, preventing retry loops through an unsafe concierge self-write path. The blocked path returns before approval creation and before secret persistence. Tests cover platform refusal, regular workspace fallthrough, lookup-error fail-closed, and session-cookie non-blocking; existing admin-token approval test was updated for the new kind lookup.
  • Security: this closes the self-secret-write self-DoS path without broadening auth. Audit emits secret.set.refused with workspace/key/reason/issue on the blocked path. Workspace-token and session callers are not overblocked because callerIsAdminToken is a strict bool context check.
  • Performance: one kind lookup only on admin-token Set calls; no meaningful cost.
  • Readability: threat model, fail-closed behavior, and Set-only scope are documented clearly. DELETE remains out of scope and is acceptable as the existing #2573 follow-up, since this PR targets the live set_workspace_secret incident path.

CI/all-required is green on current head. I did not run local Go tests because this container lacks the Go toolchain.

/sop-ack

APPROVED on head 650c1d1a. Security-focused 5-axis review: - Correctness: `conciergeSelfSecretWriteBlocked` is scoped to admin-token callers and checks the target workspace kind before the approval gate/INSERT. Admin-token + kind=`platform` returns 403 with `CONCIERGE_SELF_WRITE_BLOCKED`; regular workspaces fall through to #2574 approval gating; non-admin-token session/workspace callers do not hit the guard. - Robustness: kind lookup errors fail closed with 403, preventing retry loops through an unsafe concierge self-write path. The blocked path returns before approval creation and before secret persistence. Tests cover platform refusal, regular workspace fallthrough, lookup-error fail-closed, and session-cookie non-blocking; existing admin-token approval test was updated for the new kind lookup. - Security: this closes the self-secret-write self-DoS path without broadening auth. Audit emits `secret.set.refused` with workspace/key/reason/issue on the blocked path. Workspace-token and session callers are not overblocked because `callerIsAdminToken` is a strict bool context check. - Performance: one kind lookup only on admin-token Set calls; no meaningful cost. - Readability: threat model, fail-closed behavior, and Set-only scope are documented clearly. DELETE remains out of scope and is acceptable as the existing #2573 follow-up, since this PR targets the live `set_workspace_secret` incident path. CI/all-required is green on current head. I did not run local Go tests because this container lacks the Go toolchain. /sop-ack
devops-engineer merged commit 1f7f513afb into main 2026-06-13 19:06:31 +00:00
Member

/sop-ack

Confirmed independent security 5-axis review on current head 650c1d1a649e648370bc468951106d2b0906f38d: fail-closed admin-token + platform-root Set guard, kind lookup failure blocks, regular workspace still goes to #2574 approval gate, session/workspace callers unaffected, no secret-value exposure, required CI green.

/sop-ack Confirmed independent security 5-axis review on current head `650c1d1a649e648370bc468951106d2b0906f38d`: fail-closed admin-token + platform-root Set guard, kind lookup failure blocks, regular workspace still goes to #2574 approval gate, session/workspace callers unaffected, no secret-value exposure, required CI green.
Member

Post-merge security audit for #2756 (head 650c1d1a649e648370bc468951106d2b0906f38d): CLEAN.

The PR merged before I could post a formal second-review approval, so I am recording the independent security audit here. The implemented conciergeSelfSecretWriteBlocked path is invoked in SecretsHandler.Set before the #2574 approval gate and before any workspace_secrets INSERT. Admin-token caller + target kind='platform' returns 403 with CONCIERGE_SELF_WRITE_BLOCKED; kind lookup errors also fail closed with 403, so the unsafe org-root self-write path cannot create an approval or persist a secret.

Scope is appropriately narrow: the guard keys on caller_is_admin_token only. Session-cookie/human-operator and ordinary workspace-token callers bypass the new guard; admin-token writes to regular workspaces fall through to the existing approval gate (202). DELETE remains unchanged, which matches the stated Set-only scope with #2573 deferred.

Security/privacy check: the diff adds no credential-shaped material; tests use placeholder values only. The refusal response/audit includes workspace id, key, reason, and issue, but not secret value. Required CI was green on the reviewed head (CI / all-required, Platform Go, handlers Postgres, API smoke, peer visibility). I could not run local Go tests because this container has no go binary.

Post-merge security audit for #2756 (head `650c1d1a649e648370bc468951106d2b0906f38d`): CLEAN. The PR merged before I could post a formal second-review approval, so I am recording the independent security audit here. The implemented `conciergeSelfSecretWriteBlocked` path is invoked in `SecretsHandler.Set` before the #2574 approval gate and before any `workspace_secrets` INSERT. Admin-token caller + target `kind='platform'` returns 403 with `CONCIERGE_SELF_WRITE_BLOCKED`; kind lookup errors also fail closed with 403, so the unsafe org-root self-write path cannot create an approval or persist a secret. Scope is appropriately narrow: the guard keys on `caller_is_admin_token` only. Session-cookie/human-operator and ordinary workspace-token callers bypass the new guard; admin-token writes to regular workspaces fall through to the existing approval gate (202). DELETE remains unchanged, which matches the stated Set-only scope with #2573 deferred. Security/privacy check: the diff adds no credential-shaped material; tests use placeholder values only. The refusal response/audit includes workspace id, key, reason, and issue, but not secret value. Required CI was green on the reviewed head (`CI / all-required`, Platform Go, handlers Postgres, API smoke, peer visibility). I could not run local Go tests because this container has no `go` binary.
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2756