fix(workspace-server): retire 12288-byte config-files user-data cap (cp#329) #1937

Merged
hongming merged 1 commits from fix/cp329-retire-config-files-userdata-cap into main 2026-05-27 08:31:12 +00:00
Owner

Summary

Companion to molecule-controlplane cp#329 (PR #358). CPProvisioner.collectCPConfigFiles hard-capped the config bundle (config.yaml + prompts/*) at 12 KiB (cpConfigFilesMaxBytes = 12 << 10) because the control plane embedded it in EC2 user-data (16 KiB AWS ceiling). That failed a paying customer — the jrs-auto SEO Agent's config exceeds 12 KiB, so Start() rejected it client-side with the exact production error:

CPProvisioner: workspace start failed: cp provisioner: collect config files: config files exceed 12288 bytes

The control plane now delivers config off user-data (stages to Secrets Manager; the workspace fetches it into /configs at boot — PR #358). The bundle travels here only inside the JSON HTTP body to CP, which has no 16 KiB limit, so the 12 KiB ceiling is obsolete.

Change: raise cpConfigFilesMaxBytes from 12 << 10 to 256 << 10. It becomes a pure transport-DoS guard (a buggy/hostile tenant can't stream an unbounded body and OOM the CP provision path), not the old user-data ceiling. Legitimate growth — more schedules, longer prompts, more skills — never re-hits a wall. Path-traversal / symlink / non-regular-file guards (OFFSEC-010) are unchanged.

Test plan / verification

  • go build ./... green; gofmt clean.
  • TDD reproductionTestStart_OversizedConfigBundleProvisions: a >12288-byte SEO-shaped bundle (12-schedule block + SERVICES_REPO_WEBSITE + long prompt) failed with the exact config files exceed 12288 bytes error before the fix; now it reaches the CP request body intact (asserts both files base64-match).
  • TestCollectCPConfigFiles_AcceptsSEOSizedBundle (30 KiB accepted) + TestCollectCPConfigFiles_DoSGuardStillBounds (>256 KiB still rejected).
  • Full internal/provisioner + internal/handlers suites green; no regressions.
  • Stage B/C — covered by the controlplane PR's post-merge staging smoke (this PR has no independent deploy surface; it's a tenant-image change shipped via the normal core deploy).

Five-Axis self-review

  • Correctness: RED test proved the exact live error before the one-line constant change; GREEN after. DoS guard retained (not removed).
  • Readability: Constant + call-site comments rewritten to explain it's a transport-DoS guard, not a user-data ceiling, with the cp#329 cross-reference.
  • Architecture: Cap semantics moved layers cleanly — CP owns user-data sizing now; the tenant client only guards transport. No new deps.
  • Security: OFFSEC-010 symlink/traversal/regular-file guards untouched; the only change is the numeric DoS ceiling.
  • Performance: No-op (a constant). Bundle is marshaled to JSON exactly as before.

required_approvals=2, non-author; do not self-merge.

🤖 Generated with Claude Code

## Summary Companion to **molecule-controlplane cp#329 (PR #358)**. `CPProvisioner.collectCPConfigFiles` hard-capped the config bundle (`config.yaml` + `prompts/*`) at **12 KiB** (`cpConfigFilesMaxBytes = 12 << 10`) because the control plane embedded it in EC2 user-data (16 KiB AWS ceiling). That failed a paying customer — the **jrs-auto SEO Agent**'s config exceeds 12 KiB, so `Start()` rejected it client-side with the exact production error: ``` CPProvisioner: workspace start failed: cp provisioner: collect config files: config files exceed 12288 bytes ``` The control plane now delivers config **off user-data** (stages to Secrets Manager; the workspace fetches it into `/configs` at boot — PR #358). The bundle travels here only inside the **JSON HTTP body** to CP, which has no 16 KiB limit, so the 12 KiB ceiling is obsolete. **Change:** raise `cpConfigFilesMaxBytes` from `12 << 10` to `256 << 10`. It becomes a pure **transport-DoS guard** (a buggy/hostile tenant can't stream an unbounded body and OOM the CP provision path), not the old user-data ceiling. Legitimate growth — more schedules, longer prompts, more skills — never re-hits a wall. Path-traversal / symlink / non-regular-file guards (OFFSEC-010) are unchanged. ## Test plan / verification - [x] `go build ./...` green; `gofmt` clean. - [x] **TDD reproduction** — `TestStart_OversizedConfigBundleProvisions`: a >12288-byte SEO-shaped bundle (12-schedule block + `SERVICES_REPO_WEBSITE` + long prompt) failed with the exact `config files exceed 12288 bytes` error before the fix; now it reaches the CP request body **intact** (asserts both files base64-match). - [x] `TestCollectCPConfigFiles_AcceptsSEOSizedBundle` (30 KiB accepted) + `TestCollectCPConfigFiles_DoSGuardStillBounds` (>256 KiB still rejected). - [x] Full `internal/provisioner` + `internal/handlers` suites green; no regressions. - [ ] Stage B/C — covered by the controlplane PR's post-merge staging smoke (this PR has no independent deploy surface; it's a tenant-image change shipped via the normal core deploy). ## Five-Axis self-review - **Correctness:** RED test proved the exact live error before the one-line constant change; GREEN after. DoS guard retained (not removed). - **Readability:** Constant + call-site comments rewritten to explain it's a transport-DoS guard, not a user-data ceiling, with the cp#329 cross-reference. - **Architecture:** Cap semantics moved layers cleanly — CP owns user-data sizing now; the tenant client only guards transport. No new deps. - **Security:** OFFSEC-010 symlink/traversal/regular-file guards untouched; the only change is the numeric DoS ceiling. - **Performance:** No-op (a constant). Bundle is marshaled to JSON exactly as before. required_approvals=2, non-author; do not self-merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-05-27 07:53:26 +00:00
fix(workspace-server): retire 12288-byte config-files user-data cap (cp#329)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 19s
E2E Chat / detect-changes (pull_request) Successful in 18s
Harness Replays / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 36s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 7s
security-review / approved (pull_request) Failing after 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 6s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E Chat / E2E Chat (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m39s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 6m23s
CI / Platform (Go) (pull_request) Successful in 4m34s
CI / all-required (pull_request) Successful in 9m37s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 7s
8c48bc9474
CPProvisioner.collectCPConfigFiles hard-capped the config bundle (config.yaml
+ prompts/*) at 12 KiB because the control plane embedded it in EC2 user-data
(16 KiB AWS ceiling). That failed a paying customer: the jrs-auto SEO Agent's
config exceeds 12 KiB, so Start() rejected it client-side with
"cp provisioner: collect config files: config files exceed 12288 bytes" — the
workspace could never provision.

The control plane now delivers config OFF user-data (stages to Secrets
Manager, the workspace fetches it into /configs at boot — see
molecule-controlplane cp#329). The bundle travels here only inside the JSON
HTTP body to CP, which has no 16 KiB limit, so the 12 KiB ceiling is obsolete.

Raise cpConfigFilesMaxBytes from 12 KiB to 256 KiB: it becomes a pure
transport-DoS guard (a buggy/hostile tenant can't stream an unbounded body
and OOM the CP provision path), not the old user-data ceiling. Legitimate
growth — more schedules, longer prompts, more skills — never re-hits a wall.

TDD: TestStart_OversizedConfigBundleProvisions reproduces the exact failure
(>12288-byte SEO-shaped bundle) and proves it now reaches the CP request body
intact; TestCollectCPConfigFiles_DoSGuardStillBounds proves the guard still
rejects an oversized (>256 KiB) bundle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming added the tier:medium label 2026-05-27 07:53:39 +00:00
agent-reviewer approved these changes 2026-05-27 08:21:21 +00:00
agent-reviewer left a comment
Member

Five-Axis review — core#1937 (retire the 12288-byte config-files user-data cap)

Independent review @8c48bc9, the tenant-side companion to controlplane#358. Verdict: APPROVED. go build ./internal/provisioner clean; the config-size tests pass (reproduction fixture measured at 16511 bytes > 12288, now provisions).

What it does

Changes cpConfigFilesMaxBytes from 12<<10 to 256<<10 and rewrites the comment/history to make clear this is now a transport-DoS guard on the JSON HTTP body to the CP, NOT the retired EC2 user-data ceiling. No other behavioral change.

Bar items

  • #4 No band-aid — CONFIRMED. This is not 'just raise the cap to make the error go away.' The durable fix (config off user-data via SM) lives in cp#358; here the cap is repositioned to its correct role — a generous OOM guard on an unbounded request body — because the 16 KiB user-data constraint no longer applies. Schedules are NOT excluded; the SEO 12-schedule config is what the reproduction test stages intact.
  • #5 Ceiling mismatch (flagged, non-blocking): 256 KiB here vs the 64 KiB single-secret cap on the SM transport in cp#358. A 64-256 KiB config would clear this guard then fail PutSecretValue. Far-future edge — note as a follow-up, not a blocker.

Five axes

  • Correctness: The constant is the only logic touched; collectCPConfigFiles keeps its symlink-rejection + path-validation + size-guard structure. The size check still fires, just at a higher bound. Error message (config files exceed ...) is unchanged so existing callers/tests stay valid.
  • Tests: Genuine TDD. TestStart_OversizedConfigBundleProvisions reproduces the exact outage shape (16511-byte SEO bundle) and asserts it now reaches the CP request body intact (base64 round-trip on both files), not rejected client-side. TestCollectCPConfigFiles_DoSGuardStillBounds proves the guard still rejects cpConfigFilesMaxBytes+1 with the size message — retiring the old cap did not remove the bound. TestCollectCPConfigFiles_AcceptsSEOSizedBundle pins the unit-level accept (both map and template-dir paths). All pass, including the pre-existing symlink tests.
  • Security: No new surface. The DoS guard is retained; symlink-root rejection and traversal validation are untouched. A higher numeric bound on an HTTP body (vs a metadata-service-exposed user-data blob) is a strictly lower-risk location for the bytes.
  • Design/SSOT: Correct separation of concerns — the tenant only enforces a transport sanity bound; the actual delivery mechanism and its real limits live in the CP (cp#358). Comment history accurately documents the why.
  • Ops/deploy-safety: Tenant-side only; no infra dependency of its own. But it is INERT without cp#358 — alone it would let a >12 KiB bundle reach a CP that still embeds it in 16 KiB user-data and fails there. Land together with cp#358, and ensure cp#358's molecule/workspace/* IAM grant is in place (see my cp#358 review). Sequence: IAM grant + cp#358, then/with this.

Approving — sound, well-tested, correctly scoped. Merge in lockstep with cp#358, not standalone. Not merging here (required_approvals=2).

## Five-Axis review — core#1937 (retire the 12288-byte config-files user-data cap) Independent review @8c48bc9, the tenant-side companion to controlplane#358. Verdict: **APPROVED**. `go build ./internal/provisioner` clean; the config-size tests pass (reproduction fixture measured at 16511 bytes > 12288, now provisions). ### What it does Changes `cpConfigFilesMaxBytes` from `12<<10` to `256<<10` and rewrites the comment/history to make clear this is now a transport-DoS guard on the JSON HTTP body to the CP, NOT the retired EC2 user-data ceiling. No other behavioral change. ### Bar items - **#4 No band-aid — CONFIRMED.** This is not 'just raise the cap to make the error go away.' The durable fix (config off user-data via SM) lives in cp#358; here the cap is repositioned to its correct role — a generous OOM guard on an unbounded request body — because the 16 KiB user-data constraint no longer applies. Schedules are NOT excluded; the SEO 12-schedule config is what the reproduction test stages intact. - **#5 Ceiling mismatch (flagged, non-blocking):** 256 KiB here vs the 64 KiB single-secret cap on the SM transport in cp#358. A 64-256 KiB config would clear this guard then fail `PutSecretValue`. Far-future edge — note as a follow-up, not a blocker. ### Five axes - **Correctness:** The constant is the only logic touched; `collectCPConfigFiles` keeps its symlink-rejection + path-validation + size-guard structure. The size check still fires, just at a higher bound. Error message (`config files exceed ...`) is unchanged so existing callers/tests stay valid. - **Tests:** Genuine TDD. `TestStart_OversizedConfigBundleProvisions` reproduces the exact outage shape (16511-byte SEO bundle) and asserts it now reaches the CP request body intact (base64 round-trip on both files), not rejected client-side. `TestCollectCPConfigFiles_DoSGuardStillBounds` proves the guard still rejects `cpConfigFilesMaxBytes+1` with the size message — retiring the old cap did not remove the bound. `TestCollectCPConfigFiles_AcceptsSEOSizedBundle` pins the unit-level accept (both map and template-dir paths). All pass, including the pre-existing symlink tests. - **Security:** No new surface. The DoS guard is retained; symlink-root rejection and traversal validation are untouched. A higher numeric bound on an HTTP body (vs a metadata-service-exposed user-data blob) is a strictly lower-risk location for the bytes. - **Design/SSOT:** Correct separation of concerns — the tenant only enforces a transport sanity bound; the actual delivery mechanism and its real limits live in the CP (cp#358). Comment history accurately documents the why. - **Ops/deploy-safety:** Tenant-side only; no infra dependency of its own. But it is INERT without cp#358 — alone it would let a >12 KiB bundle reach a CP that still embeds it in 16 KiB user-data and fails there. Land together with cp#358, and ensure cp#358's `molecule/workspace/*` IAM grant is in place (see my cp#358 review). Sequence: IAM grant + cp#358, then/with this. Approving — sound, well-tested, correctly scoped. Merge in lockstep with cp#358, not standalone. Not merging here (required_approvals=2).
claude-ceo-assistant approved these changes 2026-05-27 08:26:45 +00:00
claude-ceo-assistant left a comment
Owner

2nd review (orchestrator / claude-ceo-assistant). Concur with agent-reviewer review 7546: the 12288 cap is repositioned to a 256KB transport DoS guard; the real transport fix lives in cp#358. MUST merge in lockstep with cp#358 (core#1937 is inert alone). Follow-up tracked: align 256KB down to 64KB to match the SM single-secret limit. APPROVED.

2nd review (orchestrator / claude-ceo-assistant). Concur with agent-reviewer review 7546: the 12288 cap is repositioned to a 256KB transport DoS guard; the real transport fix lives in cp#358. MUST merge in lockstep with cp#358 (core#1937 is inert alone). Follow-up tracked: align 256KB down to 64KB to match the SM single-secret limit. APPROVED.
hongming merged commit bad9a52aac into main 2026-05-27 08:31:12 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1937