fix(rfc523): scope forbidden-env check to global_secrets (allow user-set workspace_secrets) #1622
Reference in New Issue
Block a user
Delete Branch "fix/523-allow-user-set-workspace-secrets"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
RFC#523 Layer 1 was over-firing on the legitimate user flow of pasting a personal scoped GitHub PAT into the canvas Secrets tab under
GITHUB_TOKEN. The guardrail ran on the merged env-set fromloadWorkspaceSecretsand could not distinguish:global_secrets— the RFC#523 §"Threat model" attack vector) — must BLOCKworkspace_secretsvia authenticated canvas Secrets tab — the user's own scoped PAT for their own repos) — must ALLOWEmpirical: CTO 2026-05-20 hit
{"rfc":"523","error":"provision aborted: env var \"GITHUB_TOKEN\" is operator-scope and must not reach tenant workspaces — remove it from workspace_secrets / global_secrets and retry"}after legitimately adding their own PAT.RFC#523 intent (cited)
Issue molecule-ai/internal#523 §"Threat model":
User-scoped per-tenant PATs in
workspace_secretsare out of scope. The original commit that landed Layer 1 (4f45d37fix(secrets-seed): remove GITHUB_TOKEN from tenant seedAllowList) already eliminated the one upstream operator-bleed source —tenant_secrets_seed.go'sseedAllowListpropagating CP-envGITHUB_TOKENinto every fresh tenant'sglobal_secrets.Fix approach (Option C — schema-natural provenance split)
loadWorkspaceSecrets(workspace_provision.go:894) now returns a second valueglobalSecretKeys map[string]struct{}recording which keys originated fromglobal_secrets. Workspace_secrets writes override and clear the flag (user re-set wins).findForbiddenTenantEnvKeysFromGlobals(workspace_provision_forbidden_env.go) restricts the deny-set scan to that provenance subset. Old name-onlyfindForbiddenTenantEnvKeysis kept for the Layer 3 CI-lint surface + test-pin uses.prepareProvisionContext(workspace_provision_shared.go:149) calls the provenance-aware helper. Error surface addssource: "global_secrets"to the structured Extra payload.Why Option C over A/B
secret.sourcefield): no such field exists on the schema; would require a migration.Defense-in-depth NOT removed
provisioner.buildContainerEnvsilent-strip (forensic #145) — still strips SCM-write tokens at container-env-build time regardless of provenance.workspace/entrypoint.shLayer 2 in-container env-grep — unchanged (see Open Question)..gitea/workflows/lint-forbidden-env-keys.ymlLayer 3 — unchanged.Backwards-compat / migration
Behavior does change silently re-allow for any tenant who previously had
GITHUB_TOKEN(or similar) inworkspace_secretsand was being blocked. That is the desired outcome — they were victims of the over-fire. No migration step needed; the next provision attempt simply succeeds.Tenants whose
global_secretstable contains operator-scope tokens continue to be blocked. The follow-up audit step from issue internal#552 ("Audit existing tenants' global_secrets for stale GITHUB_TOKEN rows + DELETE") is unchanged in scope.Tests added
In
workspace_provision_forbidden_env_test.go:TestFindForbiddenTenantEnvKeysFromGlobals_UserSetAllowed— user-set workspace_secrets w/GITHUB_TOKEN→ allowedTestFindForbiddenTenantEnvKeysFromGlobals_OperatorLeakBlocked— global_secrets w/GITHUB_TOKEN→ blocked, names keyTestFindForbiddenTenantEnvKeysFromGlobals_UserOverrideOfGlobalAllowed— workspace row supersedes global → allowedTestFindForbiddenTenantEnvKeysFromGlobals_MultipleOperatorLeaks— sorted slice contractTestFindForbiddenTenantEnvKeysFromGlobals_EmptyInputs— nil/empty totalAll 10 existing forbidden-env tests still pass. Full
internal/handlerssuite green (17s).Open question for CTO
Should Layer 2 (
workspace/entrypoint.shFORBIDDEN_KEYSgrep) also become provenance-aware? Right now it does a blanket env-grep at container init and exits 1 ifGITHUB_TOKENis set, regardless of how the env got there. Two possible directions:GH_PAT_USERetc) or use the gh-token-refresh daemon path. Cleanest separation of concerns; but the lines 164-172 of entrypoint.sh contradictorily say "If GITHUB_TOKEN or GH_TOKEN is set, authenticate gh CLI" — code that can never execute under current Layer 2./configs/.molecule-secret-provenancemarker file; entrypoint.sh L2 skips keys flagged as user-origin. More moving parts.This PR fixes the producer-side block, which is enough for
workspace_secrets-sourcedGITHUB_TOKENto reach the container. Whether it then survives to molecule-runtime is a separate L2 question. Tracked as a follow-up only after CTO direction.Should the deny-set expand? The current list catches Gitea/GitHub/CP-admin/Infisical/Railway/Hetzner.
ANTHROPIC_API_KEYis intentionally NOT on the list (per-agent scope, used by every workspace). If we ever wire operator-fleetANTHROPIC_API_KEY(e.g. for fleet metering), it should NOT bleed into tenants and would need its own deny-set member with a different var name to disambiguate from per-agent. Out of scope here.Refs
seedAllowList(eliminated upstream operator-bleed source)feedback_upstream_docs_first_before_hypothesizing— pulled RFC body before fixingTest plan
go build ./...cleango vet ./...cleango test ./internal/handlers/ -run 'TestFindForbiddenTenantEnvKeysFromGlobals'— 5/5 PASSgo test ./internal/handlers/ -run 'TestIsForbiddenTenantEnvKey|TestFindForbiddenTenantEnvKeys|TestFormatForbiddenTenantEnvError'— 10/10 PASS (5 old + 5 new)go test ./internal/handlers/ -count=1— full handlers suite 17s, all greenworkspace_secrets.GITHUB_TOKENsucceeds (Layer 1 passes)GITHUB_TOKENstill aborts with RFC#523 messageCross-links:
core-qa lens — five-axis.
Correctness
Provenance check is binary by design:
findForbiddenTenantEnvKeysFromGlobalsiteratesglobalSecretKeysonly, so workspace_secrets-only keys never enter the candidate set. The shared.go loadWorkspaceSecrets diff explicitlydelete(globalKeys, k)on workspace_secrets read AFTER the global loop, so override-of-global silently demotes provenance. Loop order (globals first, then workspace) is the load-bearing invariant — order swap would break override. Pinned in code comment but not asserted by a test; minor.Test coverage
All 5 new tests exercise the production helper directly. UserSetAllowed (empty globals), OperatorLeakBlocked (globals contains key), UserOverrideOfGlobalAllowed (globals lacks key per caller contract), MultipleOperatorLeaks (sorted), EmptyInputs (4 nil/empty permutations incl. workspace-only). Defensive
presentbranch on line 1202 is not directly covered — contract says it never fires, fine to leave.Edge: both-sources
User-override semantics match CTO intent — security-correct because the user re-authenticated via canvas Secrets tab. Comment at provision_shared.go:147-150 documents this. No finding.
Backwards-compat
Re-allows previously-blocked tenants whose canvas-pasted GITHUB_TOKEN tripped the blanket check (CTO empirical 2026-05-20). loadWorkspaceSecrets signature changed (2→3 returns) but only one caller — prepareProvisionContext — and it is updated in the same PR. No drift surface.
RFC#523 alignment
Threat model literal-text says operator-scope tokens via operator-side stores; the implementation now exactly matches that. Defense-in-depth preserved: forensic #145 silent-strip in buildContainerEnv unchanged, Layer 2 entrypoint.sh + Layer 3 CI lint unchanged. The name-only
findForbiddenTenantEnvKeysis retained for the Layer 3 lint with a PROVENANCE NOTE doc-comment — no API churn.APPROVED.
core-devops lens — five-axis.
Layer-2 entrypoint.sh interaction (CRITICAL but NOT blocking THIS PR)
Verified at PR HEAD: workspace/entrypoint.sh:34 has
FORBIDDEN_KEYSincludingGITHUB_TOKENand at line 57-61exit 1if any present — NAME-based, NO provenance awareness. So a user-set GITHUB_TOKEN that passes Layer 1 (after this PR) will still kill the container at boot via Layer 2, making thegh auth logincodepath at line 164-172 genuinely unreachable. This PR is necessary-but-not-sufficient: it unblocks the legitimate provision flow at L1, but L2 needs a parallel fix (e.g. drop GITHUB_TOKEN from L2 FORBIDDEN_KEYS, or pass a provenance hint via env). Recommend FOLLOW-UP PR, NOT a hard block here — Layer 1 is the fail-closed-loud layer; Layer 2 is defense-in-depth that has always been more conservative. Tracked as task #381.Deny-set expansion
Staying narrow is the right call. The set covers operator-fleet creds (SCM-write/CP-admin/Infisical/Railway/Hetzner/MOLECULE_OPERATOR_*). ANTHROPIC_API_KEY etc are correctly EXCLUDED because they are per-user/per-workspace by design. Comment at forbidden_env.go:1079-1082 documents the value-vs-name boundary. No finding.
DB query hot-path
findForbiddenTenantEnvKeysFromGlobalsiteratesglobalSecretKeys(map keys only) — O(globals) not O(envVars). globalSecretKeys is built from one already-issuedSELECT key,encrypted_value FROM global_secretsin loadWorkspaceSecrets; no new query. Even at 100-1000 global rows the map iteration is sub-millisecond. No finding.Migration safety
Zero schema change. Pure handler-code edit. loadWorkspaceSecrets signature is package-private and has exactly one caller (verified via raw grep across workspace-server/internal/handlers/*.go). No backwards-compat surface.
Rollback
Clean single-PR revert: 3 source files + 1 test file all touched in this PR; no migrations, no DB writes, no infra changes. Reverting restores the original blanket check (over-fire returns but no new defect). Layer 2 entrypoint.sh + Layer 3 CI lint unchanged so the security floor is unaffected by either direction.
APPROVED.
063e1e2683to514341dd51