forked from molecule-ai/molecule-core
Same SSOT-divergence shape as #10 / fixed in #12, but on the a2a-proxy code path. The plugin handler was routed through `provisioner.RunningContainerName`; a2a-proxy was forwarding optimistically and only catching missing containers REACTIVELY via `maybeMarkContainerDead` after the network call timed out. Result on tenants whose agent containers had been recycled (e.g. post-EC2 replace from molecule-controlplane#20): canvas waits 2-30s for the network forward to fail before getting a 503, and the workspace-server logs only "ProxyA2A forward error" without the "container is dead" signal. This PR adds a proactive `Provisioner.IsRunning` check in `proxyA2ARequest` between `resolveAgentURL` and `dispatchA2A`, gated on the conditions where we know we're talking to a sibling Docker container we own (`h.provisioner != nil` AND `platformInDocker` AND the URL was rewritten to Docker-DNS form). Three outcomes via the SSOT helper: (true, nil) → forward as today (false, nil) → fast-503 with `error="workspace container not running — restart triggered"`, `restarting=true`, `preflight=true`, plus the same offline-flip + WORKSPACE_OFFLINE broadcast + async restart that `maybeMarkContainerDead` produces (true, err) → fall through to optimistic forward (matches IsRunning's "fail-soft as alive" contract — flaky daemon must not trigger a restart cascade) The `preflight=true` flag in the response distinguishes the proactive short-circuit from the reactive `maybeMarkContainerDead` path so canvas or downstream callers can render distinct messages later. * `internal/handlers/a2a_proxy.go` — preflight call site between resolveAgentURL and dispatchA2A; gated on `h.provisioner != nil && platformInDocker && url == http://<ContainerName(id)>:port`. * `internal/handlers/a2a_proxy_helpers.go` — `preflightContainerHealth` helper. Routes through `h.provisioner.IsRunning` (which itself wraps `RunningContainerName`). Identical offline-flip side-effects as `maybeMarkContainerDead` for the dead-container case. * `internal/handlers/a2a_proxy_preflight_test.go` — 4 tests: running → nil; not-running → structured 503 + sqlmock expectations on the offline-flip + structure_events insert; transient error → nil (fail-soft); AST gate pinning the SSOT routing (mirror of #12's gate). Mutation-tested: removing the `if running { return nil }` guard makes the production code fail to compile (unused var). A subtler mutation (replacing the !running branch with `return nil`) would make TestPreflight_ContainerNotRunning_StructuredFastFail fail at runtime with sqlmock's "expected DB call did not occur." Refs: molecule-core#36. Companion to #12 (issue #10). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| cmd | ||
| internal | ||
| migrations | ||
| pkg/provisionhook | ||
| .ci-force | ||
| .gitignore | ||
| .golangci.yaml | ||
| Dockerfile | ||
| Dockerfile.tenant | ||
| entrypoint-tenant.sh | ||
| go.mod | ||
| go.sum | ||