molecule-core/workspace-server/internal/registry
Hongming Wang 284511f02e feat(external): default external runtime to poll-mode + awaiting_agent
Paired molecule-core change for the molecule-cli `molecule connect`
RFC (https://github.com/Molecule-AI/molecule-cli/issues/10).

After this PR an `external`-runtime workspace's full lifecycle
matches the operator-driven model: it boots in awaiting_agent, the
CLI connects in poll mode without operator-side flag tuning, the
heartbeat-loss path lands back on awaiting_agent (re-registrable)
instead of the terminal-feeling 'offline'.

Two changes in workspace-server:

1) `resolveDeliveryMode` (registry.go) now reads `runtime` alongside
   `delivery_mode`. Resolution order:
     a. payload.delivery_mode if non-empty (operator override)
     b. row's existing delivery_mode if non-empty (preserves prior
        registration)
     c. **NEW:** "poll" if row.runtime = "external" — external
        operators run on laptops without public HTTPS; push-mode
        would hard-fail at validateAgentURL anyway. (`molecule connect`
        registers without --mode and expects this default.)
     d. "push" otherwise (historical default for platform-managed
        runtimes — langgraph, hermes, claude-code, etc.)

2) Heartbeat-loss for external workspaces lands them in
   `awaiting_agent` instead of `offline`. Two code paths:
   - `liveness.go` — Redis TTL expiration. Uses a CASE expression
     so the conditional is one UPDATE (no extra round-trip for
     non-external runtimes, no TOCTOU between runtime read and
     status write).
   - `healthsweep.go::sweepStaleRemoteWorkspaces` — DB-side
     last_heartbeat_at age scan. This sweep is already external-
     only by query filter, so the UPDATE just hard-codes the new
     status.

   The Docker-side `sweepOnlineWorkspaces` keeps `offline` —
   recovery there is "restart the container", not "re-register from
   the operator's box".

Why awaiting_agent over offline for external:
- Matches the status the workspace was created in (workspace.go:333).
- The CLI re-registers on every invocation; awaiting_agent → online
  is the natural transition. offline is a terminal-feeling status
  that implies operator intervention is needed.
- An operator who closed their laptop overnight should see
  awaiting_agent in canvas, not 'offline (something is wrong)'.

Test plan:
- Existing: 9 `resolveDeliveryMode` test sites updated to the new
  query shape. Sqlmock now reads `delivery_mode, runtime` columns.
- New: TestRegister_ExternalRuntime_DefaultsToPoll asserts the
  external→poll branch. TestRegister_NonExternalRuntime_StillDefaultsToPush
  guards against the new branch overshooting (langgraph keeps push).
- Liveness: regex updated to match the CASE expression.
- Healthsweep: `TestSweepStaleRemoteWorkspaces_MarksStaleAwaitingAgent`
  (renamed for grep-ability), Docker-side sweepOnlineWorkspaces test
  unchanged (verified to still match `'offline'`).
- Full handlers + registry suite green under -race (12.873s + 2.264s).

No migration needed — `status` is a free-form text column; both
'offline' and 'awaiting_agent' are existing values used elsewhere
(workspace.go uses awaiting_agent on initial external creation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:39:57 -07:00
..
access_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
access.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
healthsweep_test.go feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
healthsweep.go feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
hibernation_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
hibernation.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
liveness_test.go feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
liveness.go feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
orphan_sweeper_test.go fix(orphan-sweeper): close TOCTOU race with issueAndInjectToken on restart 2026-04-27 17:28:50 -07:00
orphan_sweeper.go fix(orphan-sweeper): close TOCTOU race with issueAndInjectToken on restart 2026-04-27 17:28:50 -07:00
provisiontimeout_test.go fix(registry): runtime-aware provision-timeout sweep — give hermes 30 min 2026-04-26 01:44:09 -07:00
provisiontimeout.go fix(registry): runtime-aware provision-timeout sweep — give hermes 30 min 2026-04-26 01:44:09 -07:00