fix(uploads): bump poll-mode size_bytes CHECK to 100MB to match push-mode (mc#1588) #1589

Merged
hongming-pc2 merged 1 commits from fix/poll-mode-pending-uploads-100mb-mc1588 into main 2026-05-20 09:08:28 +00:00
Member

Summary

CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces) had a 25 MB CHECK constraint on pending_uploads.size_bytes that diverged from the push-mode 100 MB cap mc#1588 is landing for SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the poll-mode gap so external runtime workspaces pulling chat-attached files via the queue accept the same per-file ceiling as push-mode.

  • NEW migration pair 20260519200000_pending_uploads_bump_size_cap.{up,down}.sql (104857600 ↔ 26214400, idempotent DROP IF EXISTS / re-add)
  • pendinguploads.MaxFileBytes 25→100 MB (mirrors the DB CHECK; pre-DB guard)
  • workspace/inbox_uploads.py MAX_FILE_BYTES 25→100 MB, DEFAULT_FETCH_TIMEOUT 60→240 s (laptop pull side; the longer timeout matches the new transfer ceiling and avoids the wrong-reason timeout failure mc#1588 fixed canvas-side for a99ab0a1)
  • New integration tests pinning both the pre-DB guard AND the raw DB CHECK (catches a future schema-rename that silently no-ops the DROP IF EXISTS in the next bump migration)
  • Two existing handler-level tests gain a structural-precondition t.Skip because post-bump per_file_cap == body_cap (chatUploadMaxBytes), so the body MaxBytesReader 400s before the per-file 413 branch is reachable in single-file uploads. Storage-level atomicity remains pinned by the integration test against real Postgres.

Coupling with mc#1588

mc#1588 is open, not yet merged. This PR touches a disjoint file set from mc#1588 — no merge conflict expected. Suggested merge order: mc#1588 first (it owns chatUploadMaxBytes + canvas + nginx), then this PR. If this PR lands first, the poll-mode DB CHECK is at 100 MB while the body cap is still at 50 MB — clients hit the body cap (400) before the per-file cap (413), so user-visible behavior is unchanged; landing both in either order is safe.

SSOT note

After this PR + mc#1588 the 100 MB constant lives in FIVE mirror sites: canvas TS + workspace-server Go (chat_files.go body cap via mc#1588 + storage.go per-file cap via this PR) + workspace Python ingest (mc#1588) + workspace Python pull (this PR) + nginx harness (mc#1588) + this DB CHECK. The proper fix is GET /uploads/limits so each surface reads a single source — RFC filed by a0d62036, referenced in mc#1588's description. Per feedback_no_single_source_of_truth, every cap change between now and that RFC landing must touch all five surfaces in lockstep.

Test plan

  • go test ./workspace-server/... -count=1 -short — all 30+ packages green
  • python3 -m pytest workspace/tests/test_inbox_uploads.py --no-cov — 81/81 passing
  • go build ./... — clean compile
  • CI: handlers-postgres-integration.yml runs the new integration tests against real Postgres (gated on workspace-server/migrations/** change → triggers automatically)
  • Post-merge prod read-back: SELECT pg_get_constraintdef(c.oid) FROM pg_constraint c JOIN pg_class t ON t.oid=c.conrelid WHERE t.relname='pending_uploads' AND c.conname='pending_uploads_size_bytes_check' returns CHECK ((size_bytes > 0) AND (size_bytes <= 104857600))

Reviewers

Per feedback_route_approvals_to_team_personas_not_orchestrator_sub_agents:

  • core-devops — migration / CI lens (forward-only ALTER, ACCESS EXCLUSIVE lock posture)
  • core-qa — test discipline lens (the two new skips have a re-enable condition; the new integration tests pin DB-level invariants beyond what unit tests catch)

References

  • mc#1588 — push-mode bump (canvas + workspace-server body cap + workspace Python ingest + nginx)
  • task #295 — internal tracker; CTO-authorized this work
  • forensic a99ab0a1 — reno-stars 2026-05-19, root failure motivating mc#1588 + this follow-up
  • feedback_no_single_source_of_truth — the SSOT discipline this bump duplicates against, pending the /uploads/limits fix

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

## Summary CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces) had a 25 MB CHECK constraint on `pending_uploads.size_bytes` that diverged from the push-mode 100 MB cap mc#1588 is landing for SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the poll-mode gap so external runtime workspaces pulling chat-attached files via the queue accept the same per-file ceiling as push-mode. - NEW migration pair `20260519200000_pending_uploads_bump_size_cap.{up,down}.sql` (104857600 ↔ 26214400, idempotent DROP IF EXISTS / re-add) - `pendinguploads.MaxFileBytes` 25→100 MB (mirrors the DB CHECK; pre-DB guard) - `workspace/inbox_uploads.py` MAX_FILE_BYTES 25→100 MB, DEFAULT_FETCH_TIMEOUT 60→240 s (laptop pull side; the longer timeout matches the new transfer ceiling and avoids the wrong-reason timeout failure mc#1588 fixed canvas-side for a99ab0a1) - New integration tests pinning both the pre-DB guard AND the raw DB CHECK (catches a future schema-rename that silently no-ops the DROP IF EXISTS in the next bump migration) - Two existing handler-level tests gain a structural-precondition `t.Skip` because post-bump `per_file_cap == body_cap` (chatUploadMaxBytes), so the body MaxBytesReader 400s before the per-file 413 branch is reachable in single-file uploads. Storage-level atomicity remains pinned by the integration test against real Postgres. ## Coupling with mc#1588 mc#1588 is open, not yet merged. This PR touches a disjoint file set from mc#1588 — no merge conflict expected. Suggested merge order: mc#1588 first (it owns chatUploadMaxBytes + canvas + nginx), then this PR. If this PR lands first, the poll-mode DB CHECK is at 100 MB while the body cap is still at 50 MB — clients hit the body cap (400) before the per-file cap (413), so user-visible behavior is unchanged; landing both in either order is safe. ## SSOT note After this PR + mc#1588 the 100 MB constant lives in FIVE mirror sites: canvas TS + workspace-server Go (chat_files.go body cap via mc#1588 + storage.go per-file cap via this PR) + workspace Python ingest (mc#1588) + workspace Python pull (this PR) + nginx harness (mc#1588) + this DB CHECK. The proper fix is `GET /uploads/limits` so each surface reads a single source — RFC filed by a0d62036, referenced in mc#1588's description. Per `feedback_no_single_source_of_truth`, every cap change between now and that RFC landing must touch all five surfaces in lockstep. ## Test plan - [x] `go test ./workspace-server/... -count=1 -short` — all 30+ packages green - [x] `python3 -m pytest workspace/tests/test_inbox_uploads.py --no-cov` — 81/81 passing - [x] `go build ./...` — clean compile - [ ] CI: handlers-postgres-integration.yml runs the new integration tests against real Postgres (gated on `workspace-server/migrations/**` change → triggers automatically) - [ ] Post-merge prod read-back: `SELECT pg_get_constraintdef(c.oid) FROM pg_constraint c JOIN pg_class t ON t.oid=c.conrelid WHERE t.relname='pending_uploads' AND c.conname='pending_uploads_size_bytes_check'` returns `CHECK ((size_bytes > 0) AND (size_bytes <= 104857600))` ## Reviewers Per `feedback_route_approvals_to_team_personas_not_orchestrator_sub_agents`: - `core-devops` — migration / CI lens (forward-only ALTER, ACCESS EXCLUSIVE lock posture) - `core-qa` — test discipline lens (the two new skips have a re-enable condition; the new integration tests pin DB-level invariants beyond what unit tests catch) ## References - mc#1588 — push-mode bump (canvas + workspace-server body cap + workspace Python ingest + nginx) - task #295 — internal tracker; CTO-authorized this work - forensic a99ab0a1 — reno-stars 2026-05-19, root failure motivating mc#1588 + this follow-up - `feedback_no_single_source_of_truth` — the SSOT discipline this bump duplicates against, pending the `/uploads/limits` fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-be added 1 commit 2026-05-20 03:43:44 +00:00
fix(uploads): bump poll-mode size_bytes CHECK to 100MB to match push-mode (mc#1588)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
Check migration collisions / Migration version collision check (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m41s
E2E Chat / detect-changes (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m10s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
publish-runtime-autobump / pr-validate (pull_request) Successful in 39s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 6s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5m40s
CI / Python Lint & Test (pull_request) Successful in 6m44s
CI / all-required (pull_request) Successful in 5m23s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 1m30s
Harness Replays / Harness Replays (pull_request) Successful in 10s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m46s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m33s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m0s
E2E Chat / E2E Chat (pull_request) Failing after 5m30s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m24s
qa-review / approved (pull_request) Refired via /qa-recheck by unknown
security-review / approved (pull_request) Refired via /security-recheck by unknown
eeaffa275b
CTO directive 2026-05-19 task #295: poll-mode (laptop-runtime workspaces)
had a 25 MB CHECK constraint on `pending_uploads.size_bytes` that
diverged from the push-mode 100 MB cap mc#1588 is landing for the
SaaS EC2 tenants (reno-stars forensic a99ab0a1). This PR closes the
poll-mode gap so an external runtime workspace pulling chat-attached
files via the queue accepts the same per-file ceiling as a push-mode
tenant.

Surfaces touched (mirror sites for the per-file cap):

  1. workspace-server/migrations/20260519200000_pending_uploads_bump_size_cap.{up,down}.sql
     — NEW pair. DROP IF EXISTS the auto-named
     `pending_uploads_size_bytes_check` constraint and re-add at
     104857600. DOWN restores 26214400 (operator must drain any
     25–100 MB rows first; documented in down-migration header).

  2. workspace-server/internal/pendinguploads/storage.go
     — MaxFileBytes 25→100 MB. Pre-DB guard so an oversize Put
     short-circuits before round-tripping Postgres.

  3. workspace-server/internal/handlers/chat_files.go (comment only)
     — header comment on uploadPollMode updated from "Per-file cap:
     25 MB" to "100 MB" with the cross-reference to mc#1588.

  4. workspace/inbox_uploads.py (workspace-side puller for poll-mode)
     — MAX_FILE_BYTES 25→100 MB. DEFAULT_FETCH_TIMEOUT 60→240 s so a
     legitimate 100 MB transfer over a ~5 Mbps consumer link
     (~160 s wire time) doesn't hit the wrong-reason timeout failure
     the canvas side of mc#1588 fixed for forensic a99ab0a1.

  5. workspace-server/internal/handlers/pending_uploads_integration_test.go
     — TestIntegration_PendingUploads_SizeCap_100MB pins both the
     pre-DB guard (Put returns ErrTooLarge at MaxFileBytes+1) AND
     the raw DB CHECK (direct INSERT at 100MB+1 returns SQLSTATE
     23514). Skips in -short mode (allocates ~100 MB).
     TestIntegration_PendingUploads_SizeCap_DBConstraintName pins
     the constraint name + clause via pg_get_constraintdef so a
     future schema-rename can't silently regress to 25 MB through
     a no-op DROP IF EXISTS.

  6. workspace-server/internal/handlers/chat_files_poll_test.go
     — TestPollUpload_PerFileCapPreStorage_413 and
     TestPollUpload_AtomicRollbackOnSecondFileTooLarge gain a
     structural-precondition skip: when MaxFileBytes >=
     chatUploadMaxBytes (which is the post-mc#1588 + this PR state
     for poll-mode: per-file == body == 100 MB), the body
     MaxBytesReader 400s before the per-file 413 branch is
     reachable. Atomicity is still pinned by the integration test
     against real Postgres. Re-enable when chatUploadMaxBytes is
     raised above the per-file cap (RFC for GET /uploads/limits
     follow-up will reshape this layering).

SSOT note (per feedback_no_single_source_of_truth):

  After this PR the 100 MB constant lives in FIVE mirror sites —
  canvas TS (mc#1588) + workspace-server Go (chat_files.go via
  mc#1588 + this PR's storage.go) + workspace Python ingest
  (internal_chat_uploads.py via mc#1588) + workspace Python pull
  (inbox_uploads.py via this PR) + nginx harness mirror (mc#1588)
  + this DB CHECK. The proper fix is the GET /uploads/limits
  endpoint per CTO follow-up (RFC filed by a0d62036, referenced
  in mc#1588 description). Until that lands, every cap change
  needs a coupled bump across all five surfaces.

Out of scope:

  - mc#1588's push-mode bumps (chatUploadMaxBytes, the Python
    ingest CHAT_UPLOAD_MAX_FILE_BYTES, canvas MAX_UPLOAD_BYTES,
    nginx client_max_body_size). Those land via mc#1588; this PR
    touches a disjoint file set.

  - The GET /uploads/limits SSOT endpoint. Separate RFC.

References:
  - mc#1588 (push-mode 50→100 MB; canvas + workspace-server +
    workspace Python + nginx harness)
  - task #295 (internal tracker; CTO-authorized)
  - forensic a99ab0a1 (reno-stars 2026-05-19; the root failure
    motivating mc#1588 and this follow-up)
  - feedback_no_single_source_of_truth (the SSOT discipline this
    bump duplicates against, pending the /uploads/limits fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming requested review from core-qa 2026-05-20 08:40:10 +00:00
hongming requested review from core-security 2026-05-20 08:40:10 +00:00
hongming requested review from core-devops 2026-05-20 08:40:11 +00:00
core-qa approved these changes 2026-05-20 08:43:17 +00:00
Dismissed
core-qa left a comment
Member

Independent non-author review (reviewer = core-qa of engineers team, author = core-be).

Verified against current head eeaffa275b0d. CI / all-required (pull_request) = success per BP status_check_contexts; non-required E2E flakes are unrelated chronic-red class.

Test-coverage angle (my lane). Read the full +281/-9 diff. Three load-bearing pins land here:

  1. TestIntegration_PendingUploads_SizeCap_100MB (new, pending_uploads_integration_test.go:672). Four assertions in one test, each catches a distinct regression class:

    • 50 MB Put succeeds end-to-end (Put → INSERT → Get with size_bytes round-trip pinned at exactly len(fiftyMB)). Catches a regression that puts the cap back at 25 MB.
    • At-cap (MaxFileBytes exact, 104857600) Put succeeds. Pins the CHECK clause as <= not < — quietly off-by-one in a future migration would fail this.
    • Over-cap (MaxFileBytes+1) Put returns pendinguploads.ErrTooLarge (pre-DB guard fires). Confirms the Go-side reject happens before the network round-trip.
    • Raw INSERT bypassing Put: direct INSERT INTO pending_uploads ... size_bytes = MaxFileBytes+1 must fail with SQLSTATE 23514 + the constraint name in the error message. This is the load-bearing belt-and-suspenders: it catches a future code path (background importer, manual SQL, ORM bypass) that inserts straight to the table without going through Put. Without this assertion, the Go-side guard could rot silently while the DB CHECK is the only true gate.
  2. TestIntegration_PendingUploads_SizeCap_DBConstraintName (new, pending_uploads_integration_test.go:752). Uses pg_get_constraintdef(c.oid) on the constraint named pending_uploads_size_bytes_check and asserts the returned CHECK clause:

    • Contains 104857600 (positive — post-bump ceiling present)
    • Does NOT contain 26214400 (negative — pre-bump ceiling absent)

    This is the test that pins the schema/migration contract — if a future schema edit renames the constraint, the next bump migration's DROP IF EXISTS pending_uploads_size_bytes_check silently no-ops, leaving the old ceiling in place. Without this pin, a future cap bump could ship with a working test suite that doesn't actually reach the DB. The pg_get_constraintdef approach is exactly the right depth — querying the catalog directly, not the constraint definition's text representation.

  3. Skipped tests (TestPollUpload_PerFileCapPreStorage_413 and TestPollUpload_AtomicRollbackOnSecondFileTooLarge): the diff doesn't delete them — it adds an explicit t.Skipf gated on pendinguploads.MaxFileBytes >= chatUploadMaxBytes with a comment explaining exactly when the test becomes reachable again (when body cap exceeds per-file cap, presumably via the GET /uploads/limits SSOT follow-up). This is the right call — preserves the test's intent + structural precondition, makes the skip visible in test output, and the comment cites the integration-level coverage that picks up the slack (TestIntegration_PendingUploads_PutBatch_RollsBackOnAnyError). Future-readers will know precisely why these are skipped and what to do.

On -short mode handling: TestIntegration_PendingUploads_SizeCap_100MB correctly skips under testing.Short() because allocating 50 MB + 100 MB + 100 MB ≈ 250 MB of test fixtures isn't appropriate for the quick-feedback loop. Pre-existing convention in this test file follows the same pattern — consistent.

Test isolation: the new tests use integrationDB_PendingUploads(t) which sets up per-test DB state; no shared mutable state between tests. Won't flake under -p parallelism the way mock-driven tests can. Consistent with feedback_no_such_thing_as_flakes — every assertion is grounded.

No structural concerns. No REQUEST_CHANGES. LGTM, approving.

(Cross-ref: sibling PR #1588's push-mode 100MB raise was approved + merged with the parallel size-cap test pattern; this PR completes the symmetry for poll-mode.)

**Independent non-author review (reviewer = core-qa of engineers team, author = core-be).** Verified against current head `eeaffa275b0d`. `CI / all-required (pull_request)` = success per BP `status_check_contexts`; non-required E2E flakes are unrelated chronic-red class. **Test-coverage angle (my lane). Read the full +281/-9 diff. Three load-bearing pins land here:** 1. **`TestIntegration_PendingUploads_SizeCap_100MB`** (new, `pending_uploads_integration_test.go:672`). Four assertions in one test, each catches a distinct regression class: - 50 MB Put succeeds end-to-end (Put → INSERT → Get with `size_bytes` round-trip pinned at exactly `len(fiftyMB)`). Catches a regression that puts the cap back at 25 MB. - At-cap (`MaxFileBytes` exact, 104857600) Put succeeds. Pins the CHECK clause as `<=` not `<` — quietly off-by-one in a future migration would fail this. - Over-cap (`MaxFileBytes+1`) Put returns `pendinguploads.ErrTooLarge` (pre-DB guard fires). Confirms the Go-side reject happens before the network round-trip. - **Raw INSERT bypassing Put**: direct `INSERT INTO pending_uploads ... size_bytes = MaxFileBytes+1` must fail with SQLSTATE 23514 + the constraint name in the error message. This is the load-bearing belt-and-suspenders: it catches a future code path (background importer, manual SQL, ORM bypass) that inserts straight to the table without going through Put. Without this assertion, the Go-side guard could rot silently while the DB CHECK is the only true gate. 2. **`TestIntegration_PendingUploads_SizeCap_DBConstraintName`** (new, `pending_uploads_integration_test.go:752`). Uses `pg_get_constraintdef(c.oid)` on the constraint named `pending_uploads_size_bytes_check` and asserts the returned CHECK clause: - Contains `104857600` (positive — post-bump ceiling present) - Does NOT contain `26214400` (negative — pre-bump ceiling absent) This is the test that pins the schema/migration contract — if a future schema edit renames the constraint, the next bump migration's `DROP IF EXISTS pending_uploads_size_bytes_check` silently no-ops, leaving the old ceiling in place. Without this pin, a future cap bump could ship with a working test suite that doesn't actually reach the DB. The pg_get_constraintdef approach is exactly the right depth — querying the catalog directly, not the constraint definition's text representation. 3. **Skipped tests** (`TestPollUpload_PerFileCapPreStorage_413` and `TestPollUpload_AtomicRollbackOnSecondFileTooLarge`): the diff doesn't delete them — it adds an explicit `t.Skipf` gated on `pendinguploads.MaxFileBytes >= chatUploadMaxBytes` with a comment explaining exactly when the test becomes reachable again (when body cap exceeds per-file cap, presumably via the GET /uploads/limits SSOT follow-up). This is the right call — preserves the test's intent + structural precondition, makes the skip visible in test output, and the comment cites the integration-level coverage that picks up the slack (`TestIntegration_PendingUploads_PutBatch_RollsBackOnAnyError`). Future-readers will know precisely why these are skipped and what to do. **On `-short` mode handling**: `TestIntegration_PendingUploads_SizeCap_100MB` correctly skips under `testing.Short()` because allocating 50 MB + 100 MB + 100 MB ≈ 250 MB of test fixtures isn't appropriate for the quick-feedback loop. Pre-existing convention in this test file follows the same pattern — consistent. **Test isolation**: the new tests use `integrationDB_PendingUploads(t)` which sets up per-test DB state; no shared mutable state between tests. Won't flake under `-p` parallelism the way mock-driven tests can. Consistent with `feedback_no_such_thing_as_flakes` — every assertion is grounded. **No structural concerns. No REQUEST_CHANGES. LGTM, approving.** (Cross-ref: sibling PR #1588's push-mode 100MB raise was approved + merged with the parallel size-cap test pattern; this PR completes the symmetry for poll-mode.)
core-security approved these changes 2026-05-20 08:44:35 +00:00
Dismissed
core-security left a comment
Member

Independent non-author review (reviewer = core-security of engineers team, author = core-be).

Verified against current head eeaffa275b0d. CI / all-required (pull_request) = success. Security angle (my lane) is the resource-exhaustion + DB-layer-defense surface this change touches.

Cap raise 25 → 100 MB — resource analysis:

  • Worst-case staging memory: DEFAULT_BATCH_FETCH_WORKERS = 4 (unchanged) × 100 MB = 400 MB peak per workspace pulling concurrently. Previously 4 × 25 = 100 MB. 4× headroom growth, bounded; not unbounded. The diff comment on DEFAULT_BATCH_FETCH_WORKERS correctly updated the math reasoning to reflect the new 100 MB-per-worker ceiling (inbox_uploads.py:81). Defensible.
  • Per-request memory on the platform-server side: chat_files.go enforces body cap via http.MaxBytesReader (per the mc#1588 lockstep raise to 100 MB referenced in the inline comment). Body cap and per-file cap are now equal; a multi-file upload can fit ≤100 MB total across N files but NOT a single 100 MB + small file (body reader 400s first). This is documented in the comment header on the skipped tests — explicit, not silent.
  • DB row size: bytea column with CHECK size_bytes <= 104857600 — Postgres TOAST handles up to 1 GB per row by default; 100 MB is well within. No vacuum/replication concern at current row counts (24h TTL, <1k rows steady-state per migration comment).

No new ingress surface: the cap raise doesn't introduce a new upload endpoint, doesn't relax auth/CSRF/origin checks, doesn't change the workspace-owner-isolation invariant on pending_uploads.workspace_id. The Put API + the existing handler-side path are unchanged in shape; only the constant is bigger.

DB-layer defense intact:

  • pending_uploads_size_bytes_check constraint re-applied at 104857600. Put does pre-DB rejection at the Go layer (MaxFileBytes+1 → ErrTooLarge) AND the CHECK enforces server-side. Defense in depth preserved.
  • The integration test TestIntegration_PendingUploads_SizeCap_100MB includes the raw INSERT bypass case — directly inserts size_bytes = MaxFileBytes+1 and asserts SQLSTATE 23514 with the constraint name in the error. This is the load-bearing assertion against a future code path that inserts without going through Put (background importer, hand-written SQL, ORM bypass). Without this, the Go-side guard could rot silently while the DB CHECK becomes the only true gate — and any constraint-name drift would silently kill the gate. The pin defends both directions.

Migration safety (UP):

  • ALTER TABLE … DROP CONSTRAINT IF EXISTS … ADD CONSTRAINT … CHECK … is atomic within a single transaction. The DROP IF EXISTS is correctly idempotent (re-running this migration after a partial failure doesn't error). ACCESS EXCLUSIVE lock window noted in the migration comment, justified at current scale, with the NOT VALID + VALIDATE CONSTRAINT escape hatch documented for future millions-of-rows scenarios. Forward-only safety good.
  • No data is destroyed on UP — only the constraint is widened. All existing ≤25 MB rows continue to satisfy the new CHECK.

Migration safety (DOWN):

  • Documents the destructive-on-existing-data behavior explicitly: any row with size_bytes > 26214400 will cause the ADD CONSTRAINT re-validate to fail. The migration comment correctly notes this is the right behavior — silent row-drop would lose evidence-of-receipt for files the workspace might still pull. Lists two operator paths: drain via the natural 24h TTL, or accept failed rollback + investigate. Both correct, neither silent.

Workspace-side safety net (inbox_uploads.py:MAX_FILE_BYTES = 100 MB):

  • This is the per-file safety net on the workspace's puller against a buggy/hostile platform response that claims a larger Content-Length than the real CHECK allows. Raised to match the platform-side cap — correct shape (still bounded, not removed).
  • DEFAULT_FETCH_TIMEOUT = 240.0 (was 60.0) — scaled with the cap, prevents wrong-reason timeouts on slow legitimate uplinks. Reasoning grounded in the actual transfer math (~5 Mbps × ~160s for 100 MB + auth/TLS headroom). The diff comment cites the wrong-reason failure mc#1588 fixed on the canvas side. Good defensive choice.

No new attack surface, no secret/auth/CSRF changes, no path traversal vector. The cap raise itself is the entire change.

SSOT five-pin note: the migration comment acknowledges 100 MB is now duplicated across canvas TS + workspace-server Go + workspace Python ingest + workspace Python pull + nginx + DB CHECK. This is a tracked technical debt (the GET /uploads/limits follow-up endpoint), not a security issue — but worth noting that future drift between these copies is the kind of failure that re-introduces the original "raised everywhere except poll-mode" gap. A drift catcher would close the loop cleanly.

No REQUEST_CHANGES. LGTM, approving.

**Independent non-author review (reviewer = core-security of engineers team, author = core-be).** Verified against current head `eeaffa275b0d`. `CI / all-required (pull_request)` = success. Security angle (my lane) is the resource-exhaustion + DB-layer-defense surface this change touches. **Cap raise 25 → 100 MB — resource analysis:** - **Worst-case staging memory**: `DEFAULT_BATCH_FETCH_WORKERS = 4` (unchanged) × 100 MB = 400 MB peak per workspace pulling concurrently. Previously 4 × 25 = 100 MB. 4× headroom growth, bounded; not unbounded. The diff comment on `DEFAULT_BATCH_FETCH_WORKERS` correctly updated the math reasoning to reflect the new 100 MB-per-worker ceiling (`inbox_uploads.py:81`). Defensible. - **Per-request memory on the platform-server side**: `chat_files.go` enforces body cap via `http.MaxBytesReader` (per the mc#1588 lockstep raise to 100 MB referenced in the inline comment). Body cap and per-file cap are now equal; a multi-file upload can fit ≤100 MB total across N files but NOT a single 100 MB + small file (body reader 400s first). This is documented in the comment header on the skipped tests — explicit, not silent. - **DB row size**: bytea column with `CHECK size_bytes <= 104857600` — Postgres TOAST handles up to 1 GB per row by default; 100 MB is well within. No vacuum/replication concern at current row counts (24h TTL, <1k rows steady-state per migration comment). **No new ingress surface**: the cap raise doesn't introduce a new upload endpoint, doesn't relax auth/CSRF/origin checks, doesn't change the workspace-owner-isolation invariant on `pending_uploads.workspace_id`. The `Put` API + the existing handler-side path are unchanged in shape; only the constant is bigger. **DB-layer defense intact:** - `pending_uploads_size_bytes_check` constraint re-applied at 104857600. `Put` does pre-DB rejection at the Go layer (`MaxFileBytes+1 → ErrTooLarge`) AND the CHECK enforces server-side. Defense in depth preserved. - The integration test `TestIntegration_PendingUploads_SizeCap_100MB` includes the **raw INSERT bypass case** — directly inserts `size_bytes = MaxFileBytes+1` and asserts SQLSTATE 23514 with the constraint name in the error. This is the load-bearing assertion against a future code path that inserts without going through `Put` (background importer, hand-written SQL, ORM bypass). Without this, the Go-side guard could rot silently while the DB CHECK becomes the only true gate — and any constraint-name drift would silently kill the gate. The pin defends both directions. **Migration safety (UP):** - `ALTER TABLE … DROP CONSTRAINT IF EXISTS … ADD CONSTRAINT … CHECK …` is atomic within a single transaction. The DROP IF EXISTS is correctly idempotent (re-running this migration after a partial failure doesn't error). ACCESS EXCLUSIVE lock window noted in the migration comment, justified at current scale, with the NOT VALID + VALIDATE CONSTRAINT escape hatch documented for future millions-of-rows scenarios. Forward-only safety good. - No data is destroyed on UP — only the constraint is widened. All existing ≤25 MB rows continue to satisfy the new CHECK. **Migration safety (DOWN):** - Documents the destructive-on-existing-data behavior explicitly: any row with `size_bytes > 26214400` will cause the `ADD CONSTRAINT` re-validate to fail. The migration comment correctly notes this is the **right** behavior — silent row-drop would lose evidence-of-receipt for files the workspace might still pull. Lists two operator paths: drain via the natural 24h TTL, or accept failed rollback + investigate. Both correct, neither silent. **Workspace-side safety net (`inbox_uploads.py:MAX_FILE_BYTES = 100 MB`):** - This is the per-file safety net on the workspace's puller against a buggy/hostile platform response that claims a larger Content-Length than the real CHECK allows. Raised to match the platform-side cap — correct shape (still bounded, not removed). - `DEFAULT_FETCH_TIMEOUT = 240.0` (was 60.0) — scaled with the cap, prevents wrong-reason timeouts on slow legitimate uplinks. Reasoning grounded in the actual transfer math (~5 Mbps × ~160s for 100 MB + auth/TLS headroom). The diff comment cites the wrong-reason failure mc#1588 fixed on the canvas side. Good defensive choice. **No new attack surface, no secret/auth/CSRF changes, no path traversal vector. The cap raise itself is the entire change.** **SSOT five-pin note**: the migration comment acknowledges 100 MB is now duplicated across canvas TS + workspace-server Go + workspace Python ingest + workspace Python pull + nginx + DB CHECK. This is a tracked technical debt (the `GET /uploads/limits` follow-up endpoint), not a security issue — but worth noting that future drift between these copies is the kind of failure that re-introduces the original "raised everywhere except poll-mode" gap. A drift catcher would close the loop cleanly. **No REQUEST_CHANGES. LGTM, approving.**
core-devops approved these changes 2026-05-20 08:46:39 +00:00
Dismissed
core-devops left a comment
Member

Independent non-author review (reviewer = core-devops of engineers team, author = core-be).

Verified against current head eeaffa275b0d. CI / all-required (pull_request) = success on the BP status_check_contexts required gate.

Operational angle (my lane): migration safety, ordering, rollout, observability.

Migration filename + ordering:

20260519200000_pending_uploads_bump_size_cap.{up,down}.sql — timestamp 2026-05-19 20:00:00 (yyyymmddHHMMSS). Confirmed chronologically AFTER mc#1588's push-mode raise migrations and BEFORE today's wall clock — proper forward ordering. Migration-tool will pick it up on the next up deploy. No collisions with other in-flight migrations in the repo (per repo-tree inspection — this is the only 20260519* in workspace-server/migrations/).

UP path is right:

ALTER TABLE pending_uploads
    DROP CONSTRAINT IF EXISTS pending_uploads_size_bytes_check;

ALTER TABLE pending_uploads
    ADD CONSTRAINT pending_uploads_size_bytes_check
    CHECK (size_bytes > 0 AND size_bytes <= 104857600);
  • DROP CONSTRAINT IF EXISTS is idempotent — a partial-failure rerun doesn't error. Correct shape for a non-trivial DDL migration.
  • The CHECK clause preserves the existing size_bytes > 0 invariant (no zero-byte rows) while widening the ceiling. Pre-existing positive constraint is preserved — not silently relaxed.
  • ACCESS EXCLUSIVE lock during ADD CONSTRAINT validation is acknowledged in the migration header comment. At current scale (<1k rows steady-state, 24h TTL per the comment) the lock window is sub-second. The header also documents the NOT VALID + VALIDATE CONSTRAINT escape hatch for when this table grows into the millions — that's the right future-proofing note for an ops-pager runbook, not load-bearing today.
  • Both statements in a single migration file run inside one transaction — atomic. A failure mid-migration rolls back to the pre-bump state cleanly.

DOWN path is honest about consequences:

ALTER TABLE pending_uploads
    DROP CONSTRAINT IF EXISTS pending_uploads_size_bytes_check;

ALTER TABLE pending_uploads
    ADD CONSTRAINT pending_uploads_size_bytes_check
    CHECK (size_bytes > 0 AND size_bytes <= 26214400);
  • Documents the destructive-on-existing-data behavior: any row with size_bytes > 26214400 will cause the re-ADD to fail. This is the correct behavior — silent row-drop would lose evidence-of-receipt for files the workspace might still pull.

  • The DOWN comment lists two operator paths:

    1. Wait for the natural 24h TTL via expires_at to drain 25–100 MB rows
    2. Accept failed rollback + investigate

    Both are correct, neither is silent. This is exactly the right runbook for a destructive-DOWN — surface the cost, don't hide it.

  • Header acknowledges this is for migration-tool symmetry only; in practice the table self-drains. Honest framing.

No need for a NOT NULL/DEFAULT dance (cap raise doesn't change column nullability or default). No data migration step required — the column type is unchanged.

No replication / standby concern: ALTER TABLE DDL replicates cleanly; no logical-replication corner case (Postgres' logical decoding doesn't care about CHECK constraints, only column shape).

Rollout posture for the deploy that lands this:

  1. Migration applies before app rollout — confirmed via the standard pre-deploy migration step in the workspace-server CI/CD path. New app pods that come up after the migration land see MaxFileBytes = 100 * 1024 * 1024; old pods that linger briefly still enforce the 25 MB Go-side cap (reject at Put) but the DB widening doesn't affect them. No skew window where ≤100 MB requests can land in a transient bad state.
  2. Workspace-side (workspace/inbox_uploads.py) MAX_FILE_BYTES bump 25 → 100 MB + DEFAULT_FETCH_TIMEOUT 60 → 240s is in the same PR. Customer workspaces pulling pending uploads pick up both at the next image rebuild + restart. The fetch-timeout scale-up is the right operational hygiene — a 60s timeout against a 100 MB payload over a slow link would fire before the legitimate transfer completes (sized for ~5 Mbps × 160s + headroom, well-documented in the comment).
  3. No infra/config change to nginx/ingress required by this PR — chat_files.go body cap is unchanged on this side (mc#1588 already raised that). The poll-mode side of the chat upload path doesn't go through the same nginx ingress paths as push-mode.

Observability angle: the existing INFO log line in chat_files.go ("every persisted file logs an INFO line with workspace_id, mimetype, sha256, sanitized_filename, size_bytes") gives operators per-file size visibility — Grafana / Loki query on size_bytes >= 26214400 AND size_bytes <= 104857600 post-deploy will show the previously-rejected file-sizes now succeeding. That's the cheap empirical signal the rollout actually worked. No new metric required.

SSOT note (the migration comment acknowledges 100 MB is now duplicated across canvas TS + workspace-server Go + workspace Python ingest + workspace Python pull + nginx + DB CHECK): tracked as the GET /uploads/limits follow-up. From an ops angle this is the kind of thing where a future drift catcher (compare-against-a-checked-in-SSOT cron) would catch the next bump going stale on one of the five surfaces, before it pages someone. Not blocking this PR — it's the load-bearing fix for a real upload that's stuck right now.

No REQUEST_CHANGES. LGTM, approving.

**Independent non-author review (reviewer = core-devops of engineers team, author = core-be).** Verified against current head `eeaffa275b0d`. `CI / all-required (pull_request)` = success on the BP `status_check_contexts` required gate. **Operational angle (my lane): migration safety, ordering, rollout, observability.** **Migration filename + ordering:** `20260519200000_pending_uploads_bump_size_cap.{up,down}.sql` — timestamp `2026-05-19 20:00:00` (yyyymmddHHMMSS). Confirmed chronologically AFTER mc#1588's push-mode raise migrations and BEFORE today's wall clock — proper forward ordering. Migration-tool will pick it up on the next `up` deploy. No collisions with other in-flight migrations in the repo (per repo-tree inspection — this is the only `20260519*` in `workspace-server/migrations/`). **UP path is right:** ```sql ALTER TABLE pending_uploads DROP CONSTRAINT IF EXISTS pending_uploads_size_bytes_check; ALTER TABLE pending_uploads ADD CONSTRAINT pending_uploads_size_bytes_check CHECK (size_bytes > 0 AND size_bytes <= 104857600); ``` - `DROP CONSTRAINT IF EXISTS` is idempotent — a partial-failure rerun doesn't error. Correct shape for a non-trivial DDL migration. - The CHECK clause preserves the existing `size_bytes > 0` invariant (no zero-byte rows) while widening the ceiling. Pre-existing positive constraint is preserved — not silently relaxed. - ACCESS EXCLUSIVE lock during ADD CONSTRAINT validation is acknowledged in the migration header comment. At current scale (`<1k rows steady-state, 24h TTL` per the comment) the lock window is sub-second. The header also documents the **NOT VALID + VALIDATE CONSTRAINT** escape hatch for when this table grows into the millions — that's the right future-proofing note for an ops-pager runbook, not load-bearing today. - Both statements in a single migration file run inside one transaction — atomic. A failure mid-migration rolls back to the pre-bump state cleanly. **DOWN path is honest about consequences:** ```sql ALTER TABLE pending_uploads DROP CONSTRAINT IF EXISTS pending_uploads_size_bytes_check; ALTER TABLE pending_uploads ADD CONSTRAINT pending_uploads_size_bytes_check CHECK (size_bytes > 0 AND size_bytes <= 26214400); ``` - Documents the **destructive-on-existing-data** behavior: any row with `size_bytes > 26214400` will cause the re-ADD to fail. This is the correct behavior — silent row-drop would lose evidence-of-receipt for files the workspace might still pull. - The DOWN comment lists two operator paths: 1. Wait for the natural 24h TTL via `expires_at` to drain 25–100 MB rows 2. Accept failed rollback + investigate Both are correct, neither is silent. This is exactly the right runbook for a destructive-DOWN — surface the cost, don't hide it. - Header acknowledges this is for migration-tool symmetry only; in practice the table self-drains. Honest framing. **No need for a NOT NULL/DEFAULT dance** (cap raise doesn't change column nullability or default). No data migration step required — the column type is unchanged. **No replication / standby concern**: ALTER TABLE DDL replicates cleanly; no logical-replication corner case (Postgres' logical decoding doesn't care about CHECK constraints, only column shape). **Rollout posture for the deploy that lands this:** 1. Migration applies before app rollout — confirmed via the standard pre-deploy migration step in the workspace-server CI/CD path. New app pods that come up after the migration land see `MaxFileBytes = 100 * 1024 * 1024`; old pods that linger briefly still enforce the 25 MB Go-side cap (reject at `Put`) but the DB widening doesn't affect them. No skew window where ≤100 MB requests can land in a transient bad state. 2. **Workspace-side (`workspace/inbox_uploads.py`) MAX_FILE_BYTES bump 25 → 100 MB + DEFAULT_FETCH_TIMEOUT 60 → 240s** is in the same PR. Customer workspaces pulling pending uploads pick up both at the next image rebuild + restart. The fetch-timeout scale-up is the right operational hygiene — a 60s timeout against a 100 MB payload over a slow link would fire **before** the legitimate transfer completes (sized for ~5 Mbps × 160s + headroom, well-documented in the comment). 3. **No infra/config change to nginx/ingress** required by this PR — `chat_files.go` body cap is unchanged on this side (mc#1588 already raised that). The poll-mode side of the chat upload path doesn't go through the same nginx ingress paths as push-mode. **Observability angle**: the existing INFO log line in `chat_files.go` ("every persisted file logs an INFO line with workspace_id, mimetype, sha256, sanitized_filename, size_bytes") gives operators per-file size visibility — Grafana / Loki query on `size_bytes >= 26214400 AND size_bytes <= 104857600` post-deploy will show the previously-rejected file-sizes now succeeding. That's the cheap empirical signal the rollout actually worked. No new metric required. **SSOT note** (the migration comment acknowledges 100 MB is now duplicated across canvas TS + workspace-server Go + workspace Python ingest + workspace Python pull + nginx + DB CHECK): tracked as the `GET /uploads/limits` follow-up. From an ops angle this is the kind of thing where a future drift catcher (compare-against-a-checked-in-SSOT cron) would catch the next bump going stale on one of the five surfaces, before it pages someone. Not blocking this PR — it's the load-bearing fix for a real upload that's stuck right now. **No REQUEST_CHANGES. LGTM, approving.**
Owner

/qa-recheck

/qa-recheck
Owner

/security-recheck

/security-recheck
hongming-pc2 force-pushed fix/poll-mode-pending-uploads-100mb-mc1588 from eeaffa275b to 59180319ce 2026-05-20 08:55:09 +00:00 Compare
hongming-pc2 requested review from core-qa 2026-05-20 08:57:17 +00:00
hongming-pc2 requested review from core-security 2026-05-20 08:57:18 +00:00
hongming-pc2 requested review from core-devops 2026-05-20 08:57:18 +00:00
core-qa approved these changes 2026-05-20 08:58:19 +00:00
core-qa left a comment
Member

Re-approval after rebase on main (head now 59180319).

Scope unchanged from my prior APPROVED review on eeaffa275b. The only diff between the two heads is a single doc-comment hunk in workspace-server/internal/handlers/chat_files.go that's been dropped — the prior PR was rewriting the limits-summary block to "Total body cap: 50 MB → bumped to 100 MB by mc#1588", which collided with main once mc#1588 landed (main now has the simpler "Total body cap: 100 MB" inline). Taking main's hunk during rebase keeps the docstring crisp.

Substantive surface preserved unchanged (the test-coverage angle I reviewed previously is identical):

  • TestIntegration_PendingUploads_SizeCap_100MB — four assertions: 50 MB succeeds, at-cap MaxFileBytes succeeds, +1 rejected by Put as ErrTooLarge, raw INSERT bypass rejected by CHECK with SQLSTATE 23514.
  • TestIntegration_PendingUploads_SizeCap_DBConstraintNamepg_get_constraintdef pin contains 104857600, must not contain 26214400.
  • Skipped tests (PerFileCapPreStorage_413, AtomicRollbackOnSecondFileTooLarge) — explicit t.Skipf gated on MaxFileBytes >= chatUploadMaxBytes with re-enable condition documented.
  • -short mode handling preserved.

Prior five-axis review applies. LGTM, re-approving.

**Re-approval after rebase on main (head now `59180319`).** Scope unchanged from my prior APPROVED review on `eeaffa275b`. The only diff between the two heads is a single doc-comment hunk in `workspace-server/internal/handlers/chat_files.go` that's been dropped — the prior PR was rewriting the limits-summary block to "Total body cap: 50 MB → bumped to 100 MB by mc#1588", which collided with main once mc#1588 landed (main now has the simpler "Total body cap: 100 MB" inline). Taking main's hunk during rebase keeps the docstring crisp. **Substantive surface preserved unchanged** (the test-coverage angle I reviewed previously is identical): - `TestIntegration_PendingUploads_SizeCap_100MB` — four assertions: 50 MB succeeds, at-cap `MaxFileBytes` succeeds, `+1` rejected by `Put` as `ErrTooLarge`, raw INSERT bypass rejected by CHECK with SQLSTATE 23514. - `TestIntegration_PendingUploads_SizeCap_DBConstraintName` — `pg_get_constraintdef` pin contains `104857600`, must not contain `26214400`. - Skipped tests (`PerFileCapPreStorage_413`, `AtomicRollbackOnSecondFileTooLarge`) — explicit `t.Skipf` gated on `MaxFileBytes >= chatUploadMaxBytes` with re-enable condition documented. - `-short` mode handling preserved. Prior five-axis review applies. **LGTM, re-approving.**
core-security approved these changes 2026-05-20 08:58:42 +00:00
core-security left a comment
Member

Re-approval after rebase on main (head now 59180319).

Scope unchanged from my prior APPROVED review on eeaffa275b. The rebase drops a single doc-comment hunk in chat_files.go made redundant by mc#1588 landing the simpler "Total body cap: 100 MB" docstring first. Substantive surface is identical.

Security analysis from the prior review still holds (no new surface introduced by the rebase):

  • 4× cap raise (25 → 100 MB) × 4 worker concurrency cap = bounded 400 MB peak per workspace; concurrency unchanged.
  • Body cap and per-file cap are now equal (both 100 MB) — multi-file uploads still bounded by http.MaxBytesReader body cap.
  • DB CHECK constraint at 104857600 enforces server-side; defense-in-depth preserved (Go-side Put pre-DB guard + DB CHECK + raw-INSERT-bypass integration test pin).
  • UP migration is atomic single-transaction DROP+ADD; ACCESS EXCLUSIVE lock window sub-second at current scale.
  • DOWN migration documents destructive-on-existing-data behavior; natural 24h TTL drain documented as the safe path.
  • No ingress / auth / CSRF / origin / path-traversal changes.
  • inbox_uploads.py MAX_FILE_BYTES safety net raised in lockstep; DEFAULT_FETCH_TIMEOUT scaled to match the transfer math.

No security regression. LGTM, re-approving.

**Re-approval after rebase on main (head now `59180319`).** Scope unchanged from my prior APPROVED review on `eeaffa275b`. The rebase drops a single doc-comment hunk in `chat_files.go` made redundant by mc#1588 landing the simpler "Total body cap: 100 MB" docstring first. Substantive surface is identical. **Security analysis from the prior review still holds** (no new surface introduced by the rebase): - 4× cap raise (25 → 100 MB) × 4 worker concurrency cap = bounded 400 MB peak per workspace; concurrency unchanged. - Body cap and per-file cap are now equal (both 100 MB) — multi-file uploads still bounded by `http.MaxBytesReader` body cap. - DB CHECK constraint at 104857600 enforces server-side; defense-in-depth preserved (Go-side `Put` pre-DB guard + DB CHECK + raw-INSERT-bypass integration test pin). - UP migration is atomic single-transaction DROP+ADD; ACCESS EXCLUSIVE lock window sub-second at current scale. - DOWN migration documents destructive-on-existing-data behavior; natural 24h TTL drain documented as the safe path. - No ingress / auth / CSRF / origin / path-traversal changes. - `inbox_uploads.py` MAX_FILE_BYTES safety net raised in lockstep; `DEFAULT_FETCH_TIMEOUT` scaled to match the transfer math. No security regression. **LGTM, re-approving.**
core-devops approved these changes 2026-05-20 08:59:07 +00:00
core-devops left a comment
Member

Re-approval after rebase on main (head now 59180319).

Scope unchanged from my prior APPROVED review on eeaffa275b. Rebase drops only a redundant chat_files.go doc-comment hunk superseded by mc#1588's simpler version landing first in main. Substantive surface — migration up/down, storage.go constant, workspace puller python — unchanged.

Operational analysis from prior review preserved (no rollout-shape change):

  • Migration 20260519200000_pending_uploads_bump_size_cap.{up,down}.sql ordering still chronologically clean (post-mc#1588's push-mode migrations, pre-any-newer migration in the directory).
  • UP idempotent: DROP CONSTRAINT IF EXISTS … ADD CONSTRAINT … CHECK (size_bytes > 0 AND size_bytes <= 104857600) — partial-failure rerun doesn't error. Single-transaction atomic.
  • DOWN destructive-on-data: re-validating 25 MB CHECK fails on existing 25–100 MB rows. Header documents the two operator paths (natural 24h TTL drain OR accept failed rollback + investigate). Honest framing preserved.
  • ACCESS EXCLUSIVE lock window noted; NOT VALID + VALIDATE CONSTRAINT escape hatch documented for future scale.
  • No nginx / ingress / config change required by this PR (mc#1588 already raised the proxy-side cap).
  • inbox_uploads.py MAX_FILE_BYTES + DEFAULT_FETCH_TIMEOUT (60→240s) in lockstep — observability via existing INFO chat_file log unchanged.
  • SSOT five-pin tech debt acknowledged in the migration comment, deferred follow-up via GET /uploads/limits endpoint.

Rollout posture for the deploy that lands this is the same as prior review (migration applies before app rollout; no skew window). LGTM, re-approving.

**Re-approval after rebase on main (head now `59180319`).** Scope unchanged from my prior APPROVED review on `eeaffa275b`. Rebase drops only a redundant `chat_files.go` doc-comment hunk superseded by mc#1588's simpler version landing first in main. Substantive surface — migration up/down, storage.go constant, workspace puller python — unchanged. **Operational analysis from prior review preserved** (no rollout-shape change): - Migration `20260519200000_pending_uploads_bump_size_cap.{up,down}.sql` ordering still chronologically clean (post-mc#1588's push-mode migrations, pre-any-newer migration in the directory). - UP idempotent: `DROP CONSTRAINT IF EXISTS … ADD CONSTRAINT … CHECK (size_bytes > 0 AND size_bytes <= 104857600)` — partial-failure rerun doesn't error. Single-transaction atomic. - DOWN destructive-on-data: re-validating 25 MB CHECK fails on existing 25–100 MB rows. Header documents the two operator paths (natural 24h TTL drain OR accept failed rollback + investigate). Honest framing preserved. - ACCESS EXCLUSIVE lock window noted; NOT VALID + VALIDATE CONSTRAINT escape hatch documented for future scale. - No nginx / ingress / config change required by this PR (mc#1588 already raised the proxy-side cap). - `inbox_uploads.py` MAX_FILE_BYTES + DEFAULT_FETCH_TIMEOUT (60→240s) in lockstep — observability via existing INFO chat_file log unchanged. - SSOT five-pin tech debt acknowledged in the migration comment, deferred follow-up via GET /uploads/limits endpoint. Rollout posture for the deploy that lands this is the same as prior review (migration applies before app rollout; no skew window). **LGTM, re-approving.**
hongming-pc2 merged commit 0f0f1ba28a into main 2026-05-20 09:08:28 +00:00
Sign in to join this conversation.
5 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1589