fix: override stale 'stopped' state when health probe confirms gateway alive

When the gateway responds to the health probe but the local
gateway_state.json has a stale 'stopped' state (common in cross-container
setups where the file was written before the gateway restarted), the
dashboard would show 'Running (remote)' but with a 'Stopped' badge.

Now if the HTTP probe succeeded (remote_health_body is not None) and
gateway_state is 'stopped' or None, override it to 'running'. Also
handles the no-shared-volume case where runtime is None entirely.
This commit is contained in:
Hermes Agent 2026-04-14 22:01:02 +00:00 committed by Teknium
parent 6ed682f111
commit 673acf22ae

View File

@ -418,6 +418,17 @@ async def get_status():
if not gateway_running:
gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"
gateway_platforms = {}
elif gateway_running and remote_health_body is not None:
# The health probe confirmed the gateway is alive, but the local
# runtime status file may be stale (cross-container). Override
# stopped/None state so the dashboard shows the correct badge.
if gateway_state in (None, "stopped"):
gateway_state = "running"
# If there was no runtime info at all but the health probe confirmed alive,
# ensure we still report the gateway as running (no shared volume scenario).
if gateway_running and gateway_state is None and remote_health_body is not None:
gateway_state = "running"
active_sessions = 0
try: