feat(uploads): bump cap to 100MB + correct-reason error messages (no more "timeout" for file-size) #1588
Reference in New Issue
Block a user
Delete Branch "infra-runtime-be/upload-100mb-and-correct-reason-errors"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Bumps the chat-file upload cap from 50 MB to 100 MB AND fixes the wrong-reason error surface flagged by the CTO on forensic a99ab0a1 (reno-stars uploading a >50 MB file saw
Upload failed: signal timed outwhen the actual cause was file-size + the client's fixed 60 sAbortSignal.timeoutfiring before the slow uplink finished streaming; the server eventually returned400 body too largebut the client had already aborted itself, so the user-visible reason was the wrong one).CTO directive (verbatim): "if its file size issue, should have error that instead saying timeout which is wrong".
Bundled into one PR because the two changes are coupled — bumping the cap alone would still leak the fixed-60 s timeout for legitimate slow uploads; fixing the client alone would 413 every >50 MB attempt.
Failure-reason contract (the load-bearing UX change)
File too large (got X.XMB) — limit is 100MB. Please use a smaller file.AbortSignal.timeoutfires during fetchUpload timed out — your connection is too slow for this file. Try again, or reduce file size.(no mention of file-size — pre-flight already excluded it)Upload failed: <status> <body>)Upload failed: <e.message>No conflation. Each cause maps to ITS OWN message.
Server-side (push-mode, EC2 workspace)
workspace-server/internal/handlers/chat_files.gochatUploadMaxBytes50 → 100 MBhttpClient.Timeout120 → 1200 s (matches the new slow-uplink budget at 100 KB/s)workspace/internal_chat_uploads.pyCHAT_UPLOAD_MAX_BYTES50 → 100 MBCHAT_UPLOAD_MAX_FILE_BYTES25 → 100 MB (aligned with the total so a single legitimate large file — e.g. Ryan's PDF — succeeds end-to-end)Canvas
canvas/src/components/tabs/chat/uploads.tsMAX_UPLOAD_BYTESexported constant (100 MB)FileTooLargeErrorclass — distinct name so the catch path can route correctly without string-matchingfetch()so the file-size case can never surface as a downstream timeoutcomputeUploadTimeoutMs(bytes): 60 s floor + 100 KB/s scaled deadline (~1000 s at the 100 MB cap)canvas/src/components/tabs/chat/hooks/useChatSend.tsmapUploadErrorToReason(e)routes each cause to its own message; exported for unit testingTests
workspace-server/internal/handlers/chat_files_test.go(2 new):TestChatUpload_BodyUnderCap_ForwardspinschatUploadMaxBytes == 100 MBand confirms a sub-cap upload forwardsTestChatUpload_BodyOverCap_NotOKverifies an over-cap body does NOT silently succeedcanvas/src/components/tabs/chat/__tests__/uploads.cap.test.ts(10 cases, NEW): cap constant, pre-flight gate, exact-cap edge, scaled-timeout curve, server-413 propagation, AbortSignal shape, explicit negative pinningTimeoutError ≠ FileTooLargeErrorcanvas/src/components/tabs/chat/hooks/__tests__/useChatSend.errorReason.test.ts(5 cases, NEW): per-cause message contract, explicit negatives guarding against wrong-reason conflationLocal run: Go handlers suite green (16.5 s); canvas chat suite green 283/283 (11.4 s).
Test harness mirror
tests/harness/cf-proxy/nginx.confclient_max_body_size50 m → 100 m (the harness mirror; production CF / nginx tier is out-of-repo. If prod still caps at 50 m the mirror will pass while prod 413s — surfaced explicitly in the inline comment so on-call sees it.)Follow-up (NOT in this PR)
The 100 MB constant now lives in THREE mirror sites (canvas TS + workspace Python + platform Go). Per
feedback_no_single_source_of_truththe proper fix is exposing the cap viaGET /uploads/limitsso the client fetches the live value. Filing as a separate issue inmolecule-ai/internal.References
feedback_surface_actionable_failure_reason_to_user(CTO 2026-05-17)Test plan
api.moleculesai.app(workspace-server deploy) andapp.moleculesai.app(canvas Vercel deploy)CTO 2026-05-19 directive on forensic a99ab0a1 (reno-stars >50MB upload that surfaced "signal timed out" when the real cause was file-size + a fixed 60s client timeout): "if its file size issue, should have error that instead saying timeout which is wrong" Bundles the cap raise + the wrong-reason fix in ONE PR because the two are coupled — bumping the server alone would still leak the fixed-60s timeout for legitimate slow uploads; fixing the client alone would 413 every >50MB attempt. Server (push-mode, EC2 workspace): - workspace-server/internal/handlers/chat_files.go: chatUploadMaxBytes 50→100 MB httpClient.Timeout 120→1200 s (matches the new slow-uplink budget) - workspace/internal_chat_uploads.py: CHAT_UPLOAD_MAX_BYTES 50→100 MB CHAT_UPLOAD_MAX_FILE_BYTES 25→100 MB (aligned with total so a single legitimate large file succeeds end-to-end) Canvas: - canvas/src/components/tabs/chat/uploads.ts: MAX_UPLOAD_BYTES 100 MB constant + FileTooLargeError class pre-flight gate: file-size violation throws BEFORE any fetch, with the actionable "File too large (got X MB) — limit is 100MB" computeUploadTimeoutMs: 60s floor + 100 KB/s scaled deadline (was a fixed 60s — the root cause of the forensic) - canvas/src/components/tabs/chat/hooks/useChatSend.ts: mapUploadErrorToReason: routes each cause to ITS OWN message (FileTooLargeError | TimeoutError | server-Error | fallback) no conflation between file-size and connection-too-slow Tests: - workspace-server chat_files_test.go: pins 100 MB constant, asserts sub-cap forwards + over-cap non-2xx - canvas uploads.cap.test.ts (10 cases): pre-flight gate, exact-cap edge, scaled-timeout curve, server-413 propagation, AbortSignal shape — explicit negative on "TimeoutError ≠ FileTooLargeError" - canvas useChatSend.errorReason.test.ts (5 cases): per-cause message contract, explicit negatives that guard against the wrong-reason conflation Test harness mirror: - tests/harness/cf-proxy/nginx.conf: client_max_body_size 50m→100m (this is the harness mirror; the production CF / nginx tier is out-of-repo. If prod still caps at 50m, this mirror passes while prod 413s — surface to ops.) Follow-up (SSOT, NOT in this PR): The 100 MB constant now lives in THREE mirror sites (canvas TS + workspace Python + platform Go). Per feedback_no_single_source_of_truth, the proper fix is exposing the cap via GET /uploads/limits so the client fetches the live value. Filing as a separate issue. References: - task #295 (internal tracker; CTO-authorized this work) - forensic a99ab0a1 (reno-stars 2026-05-19) - feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>core-devops review — Go workspace-server + nginx + Python lens.
Verified:
All three server-side mirror sites at 100MB. SSOT drift risk acknowledged — author has TODO comment in canvas/uploads.ts pointing at GET /uploads/limits follow-up; I'll file the issue post-merge per the dispatch.
LGTM.
core-qa review — canvas TS + test coverage lens.
Verified the bug-fix contract (CTO forensic a99ab0a1 — 'file-size should not surface as timeout'):
Pre-flight gate (canvas/uploads.ts:80-94). Iterates files BEFORE any fetch(); throws FileTooLargeError with the offending size + cap in the message. No network round-trip on the violating case. Catch in useChatSend.ts:75 routes by instanceof — uses err.message verbatim. PASS: 101MB rejection is instant + actionable.
Timeout-vs-size disambiguation (useChatSend.ts:80-87). Discriminated by err.name === 'TimeoutError' (DOMException from AbortSignal). Because gate #1 already excluded oversize files, a TimeoutError reaching this branch CANNOT mean file-size — it's necessarily slow uplink. Message says exactly that: 'Upload timed out — your connection is too slow for this file. Try again, or reduce file size.' No conflation. PASS.
Scaled timeout curve (uploads.ts:50, computeUploadTimeoutMs). 60s floor for small files (so a typo'd host surfaces fast); above the floor, totalBytes/100 ms = 100 KB/s assumed uplink. At 100MB cap → 1049s, comfortably above any realistic slow-but-real connection (mobile tether @ 200 KB/s = 524s — fine). PASS.
Test coverage:
17 new tests, all locally green per dispatch state. Failure-reason contract honored end-to-end.
LGTM.
core-security review — security-surface lens.
Threat-model walk on the PR diff:
Body-size DoS surface. Server-side cap enforced via http.MaxBytesReader at workspace-server/internal/handlers/chat_files.go:304 BEFORE ParseMultipartForm. This is the correct order — ParseMultipartForm without an upstream MaxBytesReader would allocate up to disk before checking, exposing a slow-loris-style upload-DoS. PASS.
Memory-exhaustion via per-file streaming. workspace/internal_chat_uploads.py reads CHAT_UPLOAD_MAX_FILE_BYTES+1 (line 227) to bound the upload.read() call — a hostile client claiming small Content-Length cannot OOM the python worker. PASS.
Cap raised 50→100MB. Risk delta: 2× per-request memory/disk burst. Nginx harness (tests/harness/cf-proxy/nginx.conf) bumped to 100m matching, so prod-shaped reverse-proxy paths won't 413-clip pre-server. No regression on the prod CF/nginx edge is in this PR — flagging that for separate verification (which is in dispatch step 4). NOT A BLOCKER for this PR.
Error-message info-disclosure. New error messages embed file size (MB-rounded) and the 100MB cap — both safe (size is client-supplied; cap is the public contract). Server-side 'upload failed: ' propagates server body verbatim; existing pattern, no new leakage.
No new auth / no new env / no new secret. Diff is purely size/timeout/error-message logic. credentials: 'include' and platformAuthHeaders() are unchanged.
Pre-flight gate cannot be bypassed by client trickery. Even if a malicious frontend skips the JS pre-flight, the Go server enforces MaxBytesReader → 413. Defense-in-depth holds. PASS.
No security regressions. LGTM.
/security-recheck