forked from molecule-ai/molecule-core
Same SSOT-divergence shape as #10 / fixed in #12, but on the a2a-proxy code path. The plugin handler was routed through `provisioner.RunningContainerName`; a2a-proxy was forwarding optimistically and only catching missing containers REACTIVELY via `maybeMarkContainerDead` after the network call timed out. Result on tenants whose agent containers had been recycled (e.g. post-EC2 replace from molecule-controlplane#20): canvas waits 2-30s for the network forward to fail before getting a 503, and the workspace-server logs only "ProxyA2A forward error" without the "container is dead" signal. This PR adds a proactive `Provisioner.IsRunning` check in `proxyA2ARequest` between `resolveAgentURL` and `dispatchA2A`, gated on the conditions where we know we're talking to a sibling Docker container we own (`h.provisioner != nil` AND `platformInDocker` AND the URL was rewritten to Docker-DNS form). Three outcomes via the SSOT helper: (true, nil) → forward as today (false, nil) → fast-503 with `error="workspace container not running — restart triggered"`, `restarting=true`, `preflight=true`, plus the same offline-flip + WORKSPACE_OFFLINE broadcast + async restart that `maybeMarkContainerDead` produces (true, err) → fall through to optimistic forward (matches IsRunning's "fail-soft as alive" contract — flaky daemon must not trigger a restart cascade) The `preflight=true` flag in the response distinguishes the proactive short-circuit from the reactive `maybeMarkContainerDead` path so canvas or downstream callers can render distinct messages later. * `internal/handlers/a2a_proxy.go` — preflight call site between resolveAgentURL and dispatchA2A; gated on `h.provisioner != nil && platformInDocker && url == http://<ContainerName(id)>:port`. * `internal/handlers/a2a_proxy_helpers.go` — `preflightContainerHealth` helper. Routes through `h.provisioner.IsRunning` (which itself wraps `RunningContainerName`). Identical offline-flip side-effects as `maybeMarkContainerDead` for the dead-container case. * `internal/handlers/a2a_proxy_preflight_test.go` — 4 tests: running → nil; not-running → structured 503 + sqlmock expectations on the offline-flip + structure_events insert; transient error → nil (fail-soft); AST gate pinning the SSOT routing (mirror of #12's gate). Mutation-tested: removing the `if running { return nil }` guard makes the production code fail to compile (unused var). A subtler mutation (replacing the !running branch with `return nil`) would make TestPreflight_ContainerNotRunning_StructuredFastFail fail at runtime with sqlmock's "expected DB call did not occur." Refs: molecule-core#36. Companion to #12 (issue #10). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| artifacts | ||
| buildinfo | ||
| bundle | ||
| channels | ||
| crypto | ||
| db | ||
| envx | ||
| events | ||
| handlers | ||
| imagewatch | ||
| memory | ||
| messagestore | ||
| metrics | ||
| middleware | ||
| models | ||
| orgtoken | ||
| pendinguploads | ||
| plugins | ||
| provisioner | ||
| provlog | ||
| registry | ||
| router | ||
| scheduler | ||
| supervised | ||
| textutil | ||
| ws | ||
| wsauth | ||