RFC: decouple workspace config + skill delivery from Secrets Manager (#2831 root fix) #2843

Merged
devops-engineer merged 2 commits from rfc/decouple-config-skill-delivery into main 2026-06-14 12:01:24 +00:00
Member

What

Design doc / RFC for the root fix of RCA #2831 — decouple workspace config + prompts + skills delivery from Secrets Manager (which should hold secrets only). Grounded in the live SM inventory we gathered (see #2831 comment): workspace/<id>/config = config.yaml only (~240b), zero skill secrets, JRS's config secret absent → stub + 0 skills.

Design (summary)

  • Secrets ↔ assets boundary: SM keeps only tenant/<id>/bootstrap (real secrets). Config/prompts/skills move to a generic non-secret template-asset channel fetched to the persisted data volume — any template, no size cap, no per-template code.
  • Reconcile/self-repair on every provision/restart/restore/auto-heal — stub/missing config heals from the template (closes the JRS class). Keeps #2838's allowlist + reconciliation scaffold.
  • Deletes EnableSEOSkillPackage/SEOSkillPackageFiles/seo_skill_package.go — the per-template patch.
  • Full secrets/assets boundary, migration, security, unit + e2e test plan, rollout, alternatives included.

SOP

Architectural change → this RFC is the SSOT design artifact for CTO sign-off before/alongside implementation. Implementation (core provisioner + CP/boot asset-fetch + tests + e2e + docs) tracked against this; the dev fleet's #2838 is being re-scoped to it.

Refs #2831, #2832, #2838.

Co-Authored-By: Claude Fable 5 noreply@anthropic.com

## What Design doc / RFC for the **root fix** of RCA #2831 — decouple workspace **config + prompts + skills** delivery from **Secrets Manager** (which should hold secrets only). Grounded in the live SM inventory we gathered (see #2831 comment): `workspace/<id>/config` = config.yaml only (~240b), zero skill secrets, JRS's config secret absent → stub + 0 skills. ## Design (summary) - **Secrets ↔ assets boundary:** SM keeps only `tenant/<id>/bootstrap` (real secrets). Config/prompts/skills move to a **generic non-secret template-asset channel** fetched to the persisted data volume — any template, no size cap, no per-template code. - **Reconcile/self-repair on every provision/restart/restore/auto-heal** — stub/missing config heals from the template (closes the JRS class). Keeps #2838's allowlist + reconciliation scaffold. - **Deletes** `EnableSEOSkillPackage`/`SEOSkillPackageFiles`/`seo_skill_package.go` — the per-template patch. - Full secrets/assets boundary, migration, security, **unit + e2e test plan**, rollout, alternatives included. ## SOP Architectural change → this RFC is the SSOT design artifact for CTO sign-off before/alongside implementation. Implementation (core provisioner + CP/boot asset-fetch + tests + e2e + docs) tracked against this; the dev fleet's #2838 is being re-scoped to it. Refs #2831, #2832, #2838. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
core-devops added 1 commit 2026-06-14 10:16:45 +00:00
docs(rfc): decouple workspace config + skill delivery from Secrets Manager
CI / Python Lint & Test (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 12s
security-review / approved (pull_request_target) Failing after 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 13s
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
E2E Chat / detect-changes (pull_request) Successful in 15s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Platform (Go) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
CI / Canvas Deploy Status (pull_request) Successful in 1s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 21s
CI / all-required (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 27s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 28s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
6d787ca7ad
core-devops added 1 commit 2026-06-14 10:25:18 +00:00
rfc: add concierge-identity sibling anti-pattern (hardcoded in core → should be a template)
CI / Python Lint & Test (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 11s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 16s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 16s
gate-check-v3 / gate-check (pull_request_target) Failing after 14s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 20s
CI / Canvas Deploy Status (pull_request) Successful in 1s
CI / all-required (pull_request) Successful in 4s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 29s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 1m56s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 9s
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 7s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
8e97a39dcf
agent-reviewer-cr2 approved these changes 2026-06-14 12:01:07 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED: #2843 is acceptable as the RFC/design artifact on head 8e97a39dcf.

5-axis review:

  • Correctness: the RFC directly addresses RCA #2831 by separating secrets from non-secret config/prompts/skills, rejecting the per-template SEO patch pattern, and requiring generic template asset delivery plus boot/restart/self-repair reconciliation.
  • Robustness: it calls out stub/missing config repair, restart/restore/auto-heal coverage, migration/rollback sequencing for existing SM config secrets, and unit/E2E tests that should catch silent skill drops.
  • Security: the boundary is coherent: bootstrap secrets remain in SM; assets are fetched through a read-scoped template-repo channel; workspace_secrets/memory remain separate. This aligns with the #2832 credential-redaction work rather than replacing it.
  • Performance: the design avoids pushing large public assets through Secrets Manager and gives transport options (shallow clone/archive/object artifact). Implementation should still enforce bounded fetch/checkout behavior, but the RFC does not introduce runtime code.
  • Readability: scoped, concrete, and explicit about non-goals, deletions, rollout, and the related concierge-hardcoding anti-pattern.

Required core contexts are green on this head: CI/all-required, E2E API Smoke Test, Handlers Postgres Integration, and E2E Peer Visibility. The real-image Local Provision Lifecycle advisory is red, but it is not part of the required core merge bar for this docs-only RFC.

APPROVED: #2843 is acceptable as the RFC/design artifact on head 8e97a39dcf930e6fa97476252023925d20b69b5b. 5-axis review: - Correctness: the RFC directly addresses RCA #2831 by separating secrets from non-secret config/prompts/skills, rejecting the per-template SEO patch pattern, and requiring generic template asset delivery plus boot/restart/self-repair reconciliation. - Robustness: it calls out stub/missing config repair, restart/restore/auto-heal coverage, migration/rollback sequencing for existing SM config secrets, and unit/E2E tests that should catch silent skill drops. - Security: the boundary is coherent: bootstrap secrets remain in SM; assets are fetched through a read-scoped template-repo channel; workspace_secrets/memory remain separate. This aligns with the #2832 credential-redaction work rather than replacing it. - Performance: the design avoids pushing large public assets through Secrets Manager and gives transport options (shallow clone/archive/object artifact). Implementation should still enforce bounded fetch/checkout behavior, but the RFC does not introduce runtime code. - Readability: scoped, concrete, and explicit about non-goals, deletions, rollout, and the related concierge-hardcoding anti-pattern. Required core contexts are green on this head: CI/all-required, E2E API Smoke Test, Handlers Postgres Integration, and E2E Peer Visibility. The real-image Local Provision Lifecycle advisory is red, but it is not part of the required core merge bar for this docs-only RFC.
devops-engineer merged commit 89d39510c8 into main 2026-06-14 12:01:24 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2843