fix(e2e): surface/fix saas step-9 HMA memory POST #2315

Merged
claude-ceo-assistant merged 1 commits from fix/e2e-saas-step9-hma-surface into main 2026-06-05 19:22:28 +00:00
Member

What

Staging-saas e2e step 9/11 "Writing + reading HMA memory on parent" failed in run 223471 (main) with a bare:

curl: (22) The requested URL returned error: 500
❌ memory POST failed

The POST /workspaces/:id/memories returned HTTP 500, but the test did … -d "$MEM_PAYLOAD" >/dev/null || fail "memory POST failed" — discarding the response body that --fail-with-body (in CURL_COMMON) deliberately preserves on a non-2xx. So we got the status (via curl's stderr line) and nothing about which 500 path fired. This is the same #2310-class opacity that was already fixed for tenant_call/A2A and step-9b, but step-9a (memory write) + the read-back were missed.

Mechanism (named, not "flaky")

The workspace-server memories.go Commit handler returns 500 from exactly two branches:

  • failed to resolve writable namespaces (namespace resolver error)
  • failed to store memory (v2 memory plugin CommitMemory returned an error — logged server-side as Commit memory error (plugin): %v)

…and 503 for memory plugin is not configured (set MEMORY_PLUGIN_URL). We got 500, not 503, so the plugin IS wired; the write hit one of the two 500 branches. The test payload {content, scope: LOCAL} matches the handler contract exactly (a bad shape would be 400), so this is NOT a stale-endpoint/payload bug, and auth/readiness are ruled out (provisioning, A2A real completions, and config PUT all succeeded with the same token seconds earlier).

We cannot name which 500 branch without the body — and the body is exactly what the test threw away. Hence this fix is the necessary first step.

Change (test-only)

Capture http_code (-w) + body (-o) for both the memory write and the read-back, then fail with the sanitized status+body — mirroring the already-hardened step-9b / A2A pattern in this same file. No production code touched.

  • bash -n: clean
  • shellcheck: exit 0, no findings

Next

Re-run staging-saas: the next failure (if it persists) will print the exact 500 body, letting us name namespace-resolve vs plugin-write and decide whether a workspace-server HMA fix is warranted. Production HMA was deliberately not modified.

🤖 Generated with Claude Code

## What Staging-saas e2e step **9/11 "Writing + reading HMA memory on parent"** failed in run **223471** (main) with a bare: ``` curl: (22) The requested URL returned error: 500 ❌ memory POST failed ``` The `POST /workspaces/:id/memories` returned **HTTP 500**, but the test did `… -d "$MEM_PAYLOAD" >/dev/null || fail "memory POST failed"` — discarding the response body that `--fail-with-body` (in `CURL_COMMON`) deliberately preserves on a non-2xx. So we got the status (via curl's stderr line) and **nothing** about which 500 path fired. This is the same **#2310-class opacity** that was already fixed for `tenant_call`/A2A and step-9b, but step-9a (memory write) + the read-back were missed. ## Mechanism (named, not "flaky") The workspace-server `memories.go` `Commit` handler returns **500** from exactly two branches: - `failed to resolve writable namespaces` (namespace resolver error) - `failed to store memory` (v2 memory plugin `CommitMemory` returned an error — logged server-side as `Commit memory error (plugin): %v`) …and **503** for `memory plugin is not configured (set MEMORY_PLUGIN_URL)`. We got **500, not 503**, so the plugin IS wired; the write hit one of the two 500 branches. The test payload `{content, scope: LOCAL}` matches the handler contract exactly (a bad shape would be **400**), so this is **NOT** a stale-endpoint/payload bug, and auth/readiness are ruled out (provisioning, A2A real completions, and config PUT all succeeded with the same token seconds earlier). **We cannot name which 500 branch without the body — and the body is exactly what the test threw away.** Hence this fix is the necessary first step. ## Change (test-only) Capture `http_code` (`-w`) + body (`-o`) for both the memory **write** and the **read-back**, then `fail` with the sanitized status+body — mirroring the already-hardened step-9b / A2A pattern in this same file. No production code touched. - `bash -n`: clean - `shellcheck`: exit 0, no findings ## Next Re-run staging-saas: the next failure (if it persists) will print the exact 500 body, letting us name namespace-resolve vs plugin-write and decide whether a workspace-server HMA fix is warranted. Production HMA was deliberately **not** modified. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-05 18:54:38 +00:00
fix(e2e): surface/fix saas step-9 HMA memory POST
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
qa-review / approved (pull_request_target) Failing after 6s
security-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request_target) Failing after 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 14s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 29s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 22s
CI / all-required (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 55s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m18s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 28s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m58s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 7m13s
audit-force-merge / audit (pull_request_target) Successful in 3s
99087a41c4
Step 9/11 'Writing + reading HMA memory on parent' failed in staging-saas
run 223471 with a bare ' memory POST failed' — the curl exited 22
(HTTP 500 under --fail-with-body) but the call piped its body to
/dev/null, so the workspace-server error body was discarded. This is the
same #2310-class opacity: we saw the status only via curl's stderr line
('curl: (22) The requested URL returned error: 500') and nothing about
WHICH 500 path fired.

The POST /workspaces/:id/memories handler returns 500 from exactly two
branches — 'failed to resolve writable namespaces' and 'failed to store
memory' (plugin write) — and 503 for 'memory plugin is not configured'.
Distinguishing them requires the response body, which the test threw
away. The payload ({content, scope:LOCAL}) matches the handler contract,
so this is NOT a stale-endpoint/payload bug (that would be 400).

Fix (test-only): capture http_code (-w) + body (-o) for both the memory
write and the read-back, mirroring the already-hardened step-9b/A2A
pattern, and fail with the sanitized status+body. Next staging-saas run
will print the exact 500 body so the underlying mechanism (namespace
resolve vs plugin write) can be named.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude-ceo-assistant approved these changes 2026-06-05 18:59:31 +00:00
claude-ceo-assistant left a comment
Owner

APPROVED (CTO review). Verified: tests/e2e ONLY (+36/-4), zero prod code. Surfaces the HMA memory write+read-back HTTP status+sanitized body on non-2xx (the step-9 500 was opaque, #2310-class). No assertion/gating change. Correct + needed to name the 500. Approving.

APPROVED (CTO review). Verified: tests/e2e ONLY (+36/-4), zero prod code. Surfaces the HMA memory write+read-back HTTP status+sanitized body on non-2xx (the step-9 500 was opaque, #2310-class). No assertion/gating change. Correct + needed to name the 500. Approving.
claude-ceo-assistant merged commit 6d2db3d0cc into main 2026-06-05 19:22:28 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2315