molecule-core/workspace-server
Hongming Wang 284511f02e feat(external): default external runtime to poll-mode + awaiting_agent
Paired molecule-core change for the molecule-cli `molecule connect`
RFC (https://github.com/Molecule-AI/molecule-cli/issues/10).

After this PR an `external`-runtime workspace's full lifecycle
matches the operator-driven model: it boots in awaiting_agent, the
CLI connects in poll mode without operator-side flag tuning, the
heartbeat-loss path lands back on awaiting_agent (re-registrable)
instead of the terminal-feeling 'offline'.

Two changes in workspace-server:

1) `resolveDeliveryMode` (registry.go) now reads `runtime` alongside
   `delivery_mode`. Resolution order:
     a. payload.delivery_mode if non-empty (operator override)
     b. row's existing delivery_mode if non-empty (preserves prior
        registration)
     c. **NEW:** "poll" if row.runtime = "external" — external
        operators run on laptops without public HTTPS; push-mode
        would hard-fail at validateAgentURL anyway. (`molecule connect`
        registers without --mode and expects this default.)
     d. "push" otherwise (historical default for platform-managed
        runtimes — langgraph, hermes, claude-code, etc.)

2) Heartbeat-loss for external workspaces lands them in
   `awaiting_agent` instead of `offline`. Two code paths:
   - `liveness.go` — Redis TTL expiration. Uses a CASE expression
     so the conditional is one UPDATE (no extra round-trip for
     non-external runtimes, no TOCTOU between runtime read and
     status write).
   - `healthsweep.go::sweepStaleRemoteWorkspaces` — DB-side
     last_heartbeat_at age scan. This sweep is already external-
     only by query filter, so the UPDATE just hard-codes the new
     status.

   The Docker-side `sweepOnlineWorkspaces` keeps `offline` —
   recovery there is "restart the container", not "re-register from
   the operator's box".

Why awaiting_agent over offline for external:
- Matches the status the workspace was created in (workspace.go:333).
- The CLI re-registers on every invocation; awaiting_agent → online
  is the natural transition. offline is a terminal-feeling status
  that implies operator intervention is needed.
- An operator who closed their laptop overnight should see
  awaiting_agent in canvas, not 'offline (something is wrong)'.

Test plan:
- Existing: 9 `resolveDeliveryMode` test sites updated to the new
  query shape. Sqlmock now reads `delivery_mode, runtime` columns.
- New: TestRegister_ExternalRuntime_DefaultsToPoll asserts the
  external→poll branch. TestRegister_NonExternalRuntime_StillDefaultsToPush
  guards against the new branch overshooting (langgraph keeps push).
- Liveness: regex updated to match the CASE expression.
- Healthsweep: `TestSweepStaleRemoteWorkspaces_MarksStaleAwaitingAgent`
  (renamed for grep-ability), Docker-side sweepOnlineWorkspaces test
  unchanged (verified to still match `'offline'`).
- Full handlers + registry suite green under -race (12.873s + 2.264s).

No migration needed — `status` is a free-form text column; both
'offline' and 'awaiting_agent' are existing values used elsewhere
(workspace.go uses awaiting_agent on initial external creation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:39:57 -07:00
..
cmd/server feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
internal feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
migrations feat(workspaces): delivery_mode column + poll-mode register flow (#2339 PR 1) 2026-04-29 21:47:14 -07:00
pkg/provisionhook feat(#1957): wire gh-identity plugin into workspace-server 2026-04-24 15:01:41 +00:00
.ci-force chore: force Platform(Go) CI run on main — validate go vet clean 2026-04-21 15:43:19 +00:00
.gitignore feat(ws-server): pull env from CP on startup 2026-04-19 02:41:15 -07:00
.golangci.yaml chore(workspace-server): add golangci.yaml disabling errcheck 2026-04-24 07:16:54 +00:00
Dockerfile chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint 2026-04-22 20:00:16 -07:00
Dockerfile.tenant feat(terminal): remote path via aws ec2-instance-connect + pty 2026-04-21 18:13:29 -07:00
entrypoint-tenant.sh fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155) 2026-04-20 23:51:33 +00:00
go.mod chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave 2026-04-28 16:25:46 -07:00
go.sum chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave 2026-04-28 16:25:46 -07:00