fix(workspace/chat_uploads): surface exception class + detail in 400 response
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
E2E Chat / detect-changes (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
publish-runtime-autobump / pr-validate (pull_request) Successful in 40s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 2m35s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m22s
CI / Canvas (Next.js) (pull_request) Successful in 5m11s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m56s
CI / all-required (pull_request) Successful in 6m57s
audit-force-merge / audit (pull_request) Successful in 14s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
E2E Chat / detect-changes (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
publish-runtime-autobump / pr-validate (pull_request) Successful in 40s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 2m35s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m22s
CI / Canvas (Next.js) (pull_request) Successful in 5m11s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m56s
CI / all-required (pull_request) Successful in 6m57s
audit-force-merge / audit (pull_request) Successful in 14s
Hermes workspace PDF upload returned opaque 400 'failed to parse multipart form' (forensic a78762a0 2026-05-19). Triage took ~25 min because the response carried no information about WHICH exception class or WHY the parser bailed — the underlying cause was a missing python-multipart dep in the PyPI runtime (fixed separately in molecule-ai-workspace-runtime#TBD). Per feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17): user-facing failures MUST tell the user WHY. This patch surfaces exception class + str(exc) in the 400 JSON body, keeping the top-level 'error' key unchanged so existing canvas / alert rules keep matching. Salvage note on mc#1524 (the wrong-RCA PR, closed): mc#1524 attributed the 400 to Starlette's max_part_size limit and proposed bumping it. That diagnosis was incorrect — Starlette only enforces max_part_size on form FIELDS (text values), not on file PARTS, so a 5 MB PDF would not trip that limit regardless of the value. The useful idea from mc#1524 — surfacing the failure reason to the caller — is salvaged here as a separate, narrowly-scoped change. Adds unit test test_malformed_multipart_returns_exception_class_and_detail which sends a boundary-mismatched body, asserts 400, and pins the response shape (error/exception/detail keys present). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -149,8 +149,28 @@ async def ingest_handler(request: Request) -> JSONResponse:
|
||||
try:
|
||||
form = await request.form(max_files=64, max_fields=32)
|
||||
except Exception as exc: # multipart parse error
|
||||
logger.warning("internal_chat_uploads: multipart parse failed: %s", exc)
|
||||
return JSONResponse({"error": "failed to parse multipart form"}, status_code=400)
|
||||
# Surface exc.class + str(exc) to the caller. Prior behavior returned
|
||||
# only the opaque {"error": "failed to parse multipart form"}, which
|
||||
# took ~25 min to root-cause in forensic a78762a0 (Hermes workspace
|
||||
# PDF upload, 2026-05-19) — the underlying cause was a MISSING
|
||||
# python-multipart dep, surfaced as an AssertionError from Starlette's
|
||||
# parser. Surfacing exception class + detail in the 400 body would
|
||||
# have cut that to ~10 min. Per feedback_surface_actionable_failure_
|
||||
# reason_to_user (CTO 2026-05-17): user-facing failures MUST tell the
|
||||
# user WHY. Top-level "error" key is preserved for backwards-compat
|
||||
# with existing canvas / alert rules.
|
||||
logger.warning(
|
||||
"internal_chat_uploads: multipart parse failed: %s: %s",
|
||||
type(exc).__name__, exc,
|
||||
)
|
||||
return JSONResponse(
|
||||
{
|
||||
"error": "failed to parse multipart form",
|
||||
"exception": type(exc).__name__,
|
||||
"detail": str(exc),
|
||||
},
|
||||
status_code=400,
|
||||
)
|
||||
|
||||
# Starlette's FormData allows multiple values per key — `files` may
|
||||
# appear multiple times for batched uploads. getlist returns them
|
||||
|
||||
@@ -299,3 +299,46 @@ def test_symlink_at_target_is_refused(client: TestClient, chat_uploads_dir: Path
|
||||
assert r.status_code == 500, r.text
|
||||
# Sentinel content unchanged — the symlink wasn't followed.
|
||||
assert sentinel.read_bytes() == b"original"
|
||||
|
||||
|
||||
# Pins the diagnostic shape of the 400 returned when multipart parsing
|
||||
# fails. Prior to forensic a78762a0 (Hermes workspace PDF upload 2026-05-19),
|
||||
# the response was {"error": "failed to parse multipart form"} only — opaque
|
||||
# to the caller, requiring ~25 min of triage to root-cause a missing
|
||||
# python-multipart dep. Surfacing exception class + str(exc) makes the
|
||||
# failure self-diagnosing (would've shortened that to ~10 min). Per
|
||||
# feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17):
|
||||
# user-facing failures MUST tell the user WHY.
|
||||
def test_malformed_multipart_returns_exception_class_and_detail(
|
||||
client: TestClient,
|
||||
):
|
||||
"""Send a multipart-shaped body whose boundary in the header does
|
||||
NOT match the boundary in the body — Starlette's parser raises a
|
||||
MultiPartException, which our handler must surface as exception
|
||||
class + detail in the 400 JSON response.
|
||||
"""
|
||||
# Header claims boundary "outer" but body uses "different".
|
||||
bad_body = (
|
||||
b"--different\r\n"
|
||||
b'Content-Disposition: form-data; name="files"; filename="a.txt"\r\n'
|
||||
b"Content-Type: text/plain\r\n\r\n"
|
||||
b"hello\r\n"
|
||||
b"--different--\r\n"
|
||||
)
|
||||
r = client.post(
|
||||
"/internal/chat/uploads/ingest",
|
||||
data=bad_body,
|
||||
headers={
|
||||
"Authorization": "Bearer test-secret",
|
||||
"Content-Type": "multipart/form-data; boundary=outer",
|
||||
},
|
||||
)
|
||||
assert r.status_code == 400, r.text
|
||||
body = r.json()
|
||||
# Backwards-compatible top-level error keeps existing canvas /
|
||||
# alert rules matching.
|
||||
assert body.get("error") == "failed to parse multipart form"
|
||||
# New diagnostic fields — caller can now see the exception class +
|
||||
# detail without SSM access to the workspace stderr.
|
||||
assert "exception" in body and isinstance(body["exception"], str) and body["exception"]
|
||||
assert "detail" in body and isinstance(body["detail"], str)
|
||||
|
||||
Reference in New Issue
Block a user