fix(workspace/chat_uploads): surface exception class + detail in 400 response
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
E2E Chat / detect-changes (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
publish-runtime-autobump / pr-validate (pull_request) Successful in 40s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 2m35s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m22s
CI / Canvas (Next.js) (pull_request) Successful in 5m11s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m56s
CI / all-required (pull_request) Successful in 6m57s
audit-force-merge / audit (pull_request) Successful in 14s

Hermes workspace PDF upload returned opaque 400 'failed to parse multipart
form' (forensic a78762a0 2026-05-19). Triage took ~25 min because the
response carried no information about WHICH exception class or WHY the
parser bailed — the underlying cause was a missing python-multipart dep
in the PyPI runtime (fixed separately in
molecule-ai-workspace-runtime#TBD).

Per feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17):
user-facing failures MUST tell the user WHY. This patch surfaces
exception class + str(exc) in the 400 JSON body, keeping the top-level
'error' key unchanged so existing canvas / alert rules keep matching.

Salvage note on mc#1524 (the wrong-RCA PR, closed):
mc#1524 attributed the 400 to Starlette's max_part_size limit and
proposed bumping it. That diagnosis was incorrect — Starlette only
enforces max_part_size on form FIELDS (text values), not on file PARTS,
so a 5 MB PDF would not trip that limit regardless of the value. The
useful idea from mc#1524 — surfacing the failure reason to the
caller — is salvaged here as a separate, narrowly-scoped change.

Adds unit test test_malformed_multipart_returns_exception_class_and_detail
which sends a boundary-mismatched body, asserts 400, and pins the
response shape (error/exception/detail keys present).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
core-be
2026-05-19 13:40:29 -07:00
parent 01226cfc73
commit 5f6aa3da69
2 changed files with 65 additions and 2 deletions
+22 -2
View File
@@ -149,8 +149,28 @@ async def ingest_handler(request: Request) -> JSONResponse:
try:
form = await request.form(max_files=64, max_fields=32)
except Exception as exc: # multipart parse error
logger.warning("internal_chat_uploads: multipart parse failed: %s", exc)
return JSONResponse({"error": "failed to parse multipart form"}, status_code=400)
# Surface exc.class + str(exc) to the caller. Prior behavior returned
# only the opaque {"error": "failed to parse multipart form"}, which
# took ~25 min to root-cause in forensic a78762a0 (Hermes workspace
# PDF upload, 2026-05-19) — the underlying cause was a MISSING
# python-multipart dep, surfaced as an AssertionError from Starlette's
# parser. Surfacing exception class + detail in the 400 body would
# have cut that to ~10 min. Per feedback_surface_actionable_failure_
# reason_to_user (CTO 2026-05-17): user-facing failures MUST tell the
# user WHY. Top-level "error" key is preserved for backwards-compat
# with existing canvas / alert rules.
logger.warning(
"internal_chat_uploads: multipart parse failed: %s: %s",
type(exc).__name__, exc,
)
return JSONResponse(
{
"error": "failed to parse multipart form",
"exception": type(exc).__name__,
"detail": str(exc),
},
status_code=400,
)
# Starlette's FormData allows multiple values per key — `files` may
# appear multiple times for batched uploads. getlist returns them
@@ -299,3 +299,46 @@ def test_symlink_at_target_is_refused(client: TestClient, chat_uploads_dir: Path
assert r.status_code == 500, r.text
# Sentinel content unchanged — the symlink wasn't followed.
assert sentinel.read_bytes() == b"original"
# Pins the diagnostic shape of the 400 returned when multipart parsing
# fails. Prior to forensic a78762a0 (Hermes workspace PDF upload 2026-05-19),
# the response was {"error": "failed to parse multipart form"} only — opaque
# to the caller, requiring ~25 min of triage to root-cause a missing
# python-multipart dep. Surfacing exception class + str(exc) makes the
# failure self-diagnosing (would've shortened that to ~10 min). Per
# feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17):
# user-facing failures MUST tell the user WHY.
def test_malformed_multipart_returns_exception_class_and_detail(
client: TestClient,
):
"""Send a multipart-shaped body whose boundary in the header does
NOT match the boundary in the body — Starlette's parser raises a
MultiPartException, which our handler must surface as exception
class + detail in the 400 JSON response.
"""
# Header claims boundary "outer" but body uses "different".
bad_body = (
b"--different\r\n"
b'Content-Disposition: form-data; name="files"; filename="a.txt"\r\n'
b"Content-Type: text/plain\r\n\r\n"
b"hello\r\n"
b"--different--\r\n"
)
r = client.post(
"/internal/chat/uploads/ingest",
data=bad_body,
headers={
"Authorization": "Bearer test-secret",
"Content-Type": "multipart/form-data; boundary=outer",
},
)
assert r.status_code == 400, r.text
body = r.json()
# Backwards-compatible top-level error keeps existing canvas /
# alert rules matching.
assert body.get("error") == "failed to parse multipart form"
# New diagnostic fields — caller can now see the exception class +
# detail without SSM access to the workspace stderr.
assert "exception" in body and isinstance(body["exception"], str) and body["exception"]
assert "detail" in body and isinstance(body["detail"], str)