molecule-core/workspace-server
Hongming Wang 5e36c6638c feat(platform,canvas): classify "datastore unavailable" as 503 + dedicated UI
User reported the canvas threw a generic "API GET /workspaces: 500
{auth check failed}" error when local Postgres + Redis were both
down. Two problems:

1. The error code (500) and message ("auth check failed") said
   nothing useful. The actual condition was "platform can't reach
   its datastore to validate your token" — a Service Unavailable
   class, not Internal Server Error.

2. The canvas had no way to distinguish infra-down from a real
   auth bug, so it rendered the raw API string in the same
   generic-error overlay it uses for everything.

Fix in two layers:

Server (wsauth_middleware.go):
  - New abortAuthLookupError helper centralises all three sites
    that previously returned `500 {"error":"auth check failed"}`
    when HasAnyLiveTokenGlobal or orgtoken.Validate hit a DB error.
  - Now returns 503 + structured body
    `{"error": "...", "code": "platform_unavailable"}`. 503 is
    the correct semantic ("retry shortly, infra is unavailable")
    and the code field is the contract the canvas reads.
  - Body deliberately excludes the underlying DB error string —
    production hostnames / connection-string fragments must not
    leak into a user-visible error toast.

Canvas (api.ts):
  - New PlatformUnavailableError class. api.ts inspects 503
    responses for the platform_unavailable code and throws the
    typed error instead of the generic "API GET /…: 503 …"
    message. Generic 503s (upstream-busy, etc.) keep the legacy
    path so existing busy-retry UX isn't disrupted.

Canvas (page.tsx):
  - New PlatformDownDiagnostic component renders when the
    initial hydration catches PlatformUnavailableError.
    Surfaces the actual condition with operator-actionable
    copy ("brew services start postgresql@14 / redis") +
    pointer to the platform log + a Reload button.

Tests:
  - Go: TestAdminAuth_DatastoreError_Returns503PlatformUnavailable
    pins the response shape (status, code field, no DB-error leak)
  - Canvas: 5 tests for PlatformUnavailableError classification —
    typed throw on 503+code match, generic-Error fallback for
    503-without-code (upstream busy), 500 stays generic, non-JSON
    body falls back to generic.

1015 canvas tests + full Go middleware suite pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:01:56 -07:00
..
cmd/server fix(platform): stop leaking workspace containers on delete 2026-04-25 12:36:22 -07:00
internal feat(platform,canvas): classify "datastore unavailable" as 503 + dedicated UI 2026-04-26 00:01:56 -07:00
migrations chore: second-pass review polish — symmetry + clearer test fixtures 2026-04-25 08:48:30 -07:00
pkg/provisionhook feat(#1957): wire gh-identity plugin into workspace-server 2026-04-24 15:01:41 +00:00
.ci-force chore: force Platform(Go) CI run on main — validate go vet clean 2026-04-21 15:43:19 +00:00
.gitignore feat(ws-server): pull env from CP on startup 2026-04-19 02:41:15 -07:00
.golangci.yaml chore(workspace-server): add golangci.yaml disabling errcheck 2026-04-24 07:16:54 +00:00
Dockerfile chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint 2026-04-22 20:00:16 -07:00
Dockerfile.tenant feat(terminal): remote path via aws ec2-instance-connect + pty 2026-04-21 18:13:29 -07:00
entrypoint-tenant.sh fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155) 2026-04-20 23:51:33 +00:00
go.mod feat(#1957): wire gh-identity plugin into workspace-server 2026-04-24 15:01:41 +00:00
go.sum feat(#1957): wire gh-identity plugin into workspace-server 2026-04-24 18:28:18 +00:00