goose donated to Linux Foundation AAIF (alongside MCP + AGENTS.md) — AGENTS.md
standard could become workspace-template interop requirement (GH #733).
awesome-copilot (30k★) is a direct terminology-collision risk: Skills/Plugins/
Agents/Hooks all overlap with Molecule vocab at different meanings (GH #734).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The hibernation PR (5c1a9d0) accidentally removed the delivery_confirmed
fix that was introduced for issue #689. When io.ReadAll fails after the
target has already responded with headers (200-399), the message WAS
delivered — stripping delivery_confirmed from the error response caused
callers to treat a successful send as a hard failure.
Restore the full original body-read error block:
- deliveryConfirmed flag (true when status 200-399)
- log line with status/bytes_read context
- logA2ASuccess call when deliveryConfirmed (audit trail accuracy)
- proxyA2AError.Response includes "delivery_confirmed" field so callers
can distinguish "not delivered" from "delivered, body lost"
The hibernation auto-wake feature (resolveAgentURL status='hibernated'
check) is orthogonal and untouched.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The hibernation PR (7f5f74d) accidentally removed the delivery_confirmed
fix that was introduced for issue #689. When io.ReadAll fails after the
target has already responded with headers (200-399), the message WAS
delivered — stripping delivery_confirmed from the error response caused
callers to treat a successful send as a hard failure.
Restore the full original body-read error block:
- deliveryConfirmed flag (true when status 200-399)
- log line with status/bytes_read context
- logA2ASuccess call when deliveryConfirmed (audit trail accuracy)
- proxyA2AError.Response includes "delivery_confirmed" field so callers
can distinguish "not delivered" from "delivered, body lost"
The hibernation auto-wake feature (resolveAgentURL status='hibernated'
check) is orthogonal and untouched.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blast-radius isolation gap: AdminAuth called ValidateAnyToken which
accepted any live workspace bearer token. A compromised workspace agent
could present its own token to GET /admin/github-installation-token and
steal the platform's GitHub App credential, or hit /approvals/pending to
enumerate cross-workspace approvals.
Fix: introduce a dedicated admin credential tier via ADMIN_TOKEN env var.
When set, AdminAuth verifies the bearer against that secret exclusively
(crypto/subtle constant-time comparison). Workspace tokens are rejected
outright — no DB lookup occurs. When ADMIN_TOKEN is not set the previous
behaviour is preserved as a deprecated backward-compat fallback (tier 3)
so existing deployments without the env var don't break immediately.
Credential tiers (evaluated in order):
1. Fail-open — no live tokens globally (fresh install / pre-Phase-30)
2. ADMIN_TOKEN match — env var set, bearer must equal it exactly
3. Fallback (deprecated) — any valid workspace token (ADMIN_TOKEN unset)
Operators should set ADMIN_TOKEN=<openssl rand -base64 32> to fully close
the blast-radius gap. Tier 3 will be removed in a future release.
Fixes#684.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blast-radius isolation gap: AdminAuth called ValidateAnyToken which
accepted any live workspace bearer token. A compromised workspace agent
could present its own token to GET /admin/github-installation-token and
steal the platform's GitHub App credential, or hit /approvals/pending to
enumerate cross-workspace approvals.
Fix: introduce a dedicated admin credential tier via ADMIN_TOKEN env var.
When set, AdminAuth verifies the bearer against that secret exclusively
(crypto/subtle constant-time comparison). Workspace tokens are rejected
outright — no DB lookup occurs. When ADMIN_TOKEN is not set the previous
behaviour is preserved as a deprecated backward-compat fallback (tier 3)
so existing deployments without the env var don't break immediately.
Credential tiers (evaluated in order):
1. Fail-open — no live tokens globally (fresh install / pre-Phase-30)
2. ADMIN_TOKEN match — env var set, bearer must equal it exactly
3. Fallback (deprecated) — any valid workspace token (ADMIN_TOKEN unset)
Operators should set ADMIN_TOKEN=<openssl rand -base64 32> to fully close
the blast-radius gap. Tier 3 will be removed in a future release.
Fixes#684.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three bugs caused enabled schedules to silently disappear from the fire query
(which requires next_run_at IS NOT NULL AND next_run_at <= now()):
Bug 1 - fireSchedule() and recordSkipped(): when ComputeNextRun returned an
error, nextRunPtr stayed nil and UPDATE SET next_run_at = $2 wrote NULL.
Fix: change to COALESCE($2, next_run_at) so the existing DB value is preserved
when $2 is NULL, and log the error explicitly.
Bug 2 - org importer (handlers/org.go): nextRun, _ := ComputeNextRun(...)
silently discarded the error. A bad cron expression would pass time.Time{}
(zero value) to the INSERT. Fix: surface the error, log it, and skip the
schedule INSERT via continue.
Bug 3 - no startup repair: schedules already NULL'd by the pre-fix binary
would never recover. Fix: Start() now calls repairNullNextRunAt() once on
boot, recomputing next_run_at for every enabled schedule with a NULL value.
Tests: TestFireSchedule_ComputeNextRunError, TestRecordSkipped_ComputeNextRunError,
TestRepairNullNextRunAt_RepairsRows, TestRepairNullNextRunAt_DBError_NoPanic,
TestOrgImport_ScheduleComputeError (all pass).
Fixes#722
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three bugs caused enabled schedules to silently disappear from the fire query
(which requires next_run_at IS NOT NULL AND next_run_at <= now()):
Bug 1 - fireSchedule() and recordSkipped(): when ComputeNextRun returned an
error, nextRunPtr stayed nil and UPDATE SET next_run_at = $2 wrote NULL.
Fix: change to COALESCE($2, next_run_at) so the existing DB value is preserved
when $2 is NULL, and log the error explicitly.
Bug 2 - org importer (handlers/org.go): nextRun, _ := ComputeNextRun(...)
silently discarded the error. A bad cron expression would pass time.Time{}
(zero value) to the INSERT. Fix: surface the error, log it, and skip the
schedule INSERT via continue.
Bug 3 - no startup repair: schedules already NULL'd by the pre-fix binary
would never recover. Fix: Start() now calls repairNullNextRunAt() once on
boot, recomputing next_run_at for every enabled schedule with a NULL value.
Tests: TestFireSchedule_ComputeNextRunError, TestRecordSkipped_ComputeNextRunError,
TestRepairNullNextRunAt_RepairsRows, TestRepairNullNextRunAt_DBError_NoPanic,
TestOrgImport_ScheduleComputeError (all pass).
Fixes#722
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements automatic workspace hibernation for workspaces that have been idle
longer than their configured hibernation_idle_minutes threshold.
Changes:
- migrations/029: Add hibernation_idle_minutes INT DEFAULT NULL column +
partial index on workspaces table
- registry/hibernation.go: New StartHibernationMonitor goroutine that ticks
every 2 min and calls hibernateIdleWorkspaces via the HibernateHandler
callback (same import-cycle-prevention pattern as OfflineHandler)
- registry/hibernation_test.go: 5 unit tests covering handler calls, no-rows,
DB error, tick behaviour, and context-cancel shutdown
- handlers/workspace_restart.go: New Hibernate() HTTP handler (POST
/workspaces/:id/hibernate) + HibernateWorkspace(ctx, id) method — stops
container, sets status='hibernated', clears Redis keys, broadcasts event
- handlers/a2a_proxy.go: Auto-wake in resolveAgentURL — when status='hibernated'
and URL is empty, triggers async RestartByID and returns 503 + Retry-After: 15
so callers can retry transparently
- registry/liveness.go: Exclude 'hibernated' workspaces from offline detection
- router.go: Register POST /workspaces/:id/hibernate under wsAuth group
- cmd/server/main.go: Wire hibernation monitor via supervised.RunWithRecover
Closes#711
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements automatic workspace hibernation for workspaces that have been idle
longer than their configured hibernation_idle_minutes threshold.
Changes:
- migrations/029: Add hibernation_idle_minutes INT DEFAULT NULL column +
partial index on workspaces table
- registry/hibernation.go: New StartHibernationMonitor goroutine that ticks
every 2 min and calls hibernateIdleWorkspaces via the HibernateHandler
callback (same import-cycle-prevention pattern as OfflineHandler)
- registry/hibernation_test.go: 5 unit tests covering handler calls, no-rows,
DB error, tick behaviour, and context-cancel shutdown
- handlers/workspace_restart.go: New Hibernate() HTTP handler (POST
/workspaces/:id/hibernate) + HibernateWorkspace(ctx, id) method — stops
container, sets status='hibernated', clears Redis keys, broadcasts event
- handlers/a2a_proxy.go: Auto-wake in resolveAgentURL — when status='hibernated'
and URL is empty, triggers async RestartByID and returns 503 + Retry-After: 15
so callers can retry transparently
- registry/liveness.go: Exclude 'hibernated' workspaces from offline detection
- router.go: Register POST /workspaces/:id/hibernate under wsAuth group
- cmd/server/main.go: Wire hibernation monitor via supervised.RunWithRecover
Closes#711
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defense-in-depth: workspace-scoped ValidateToken now rejects tokens
belonging to workspaces with status='removed' at the DB layer, even
when revoked_at IS NULL. Mirrors the same guard added to ValidateAnyToken
in #696. Updated all test mock patterns (workspace_test, a2a_proxy_test,
secrets_test, admin_test_token_test, middleware) to match the new JOIN query.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defense-in-depth: workspace-scoped ValidateToken now rejects tokens
belonging to workspaces with status='removed' at the DB layer, even
when revoked_at IS NULL. Mirrors the same guard added to ValidateAnyToken
in #696. Updated all test mock patterns (workspace_test, a2a_proxy_test,
secrets_test, admin_test_token_test, middleware) to match the new JOIN query.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ValidateAnyToken: add JOIN on workspaces with AND w.status != 'removed'
so tokens belonging to deleted workspaces cannot be replayed against
admin endpoints even before the token row is explicitly revoked.
- tokens_test.go: update ValidateAnyToken regexp patterns to match new
JOIN query; add TestValidateAnyToken_RemovedWorkspaceRejected.
- wsauth_middleware_test.go: update validateAnyTokenSelectQuery constant
to match JOIN query; add TestAdminAuth_RemovedWorkspaceToken_Returns401
to pin the AdminAuth removed-workspace rejection at the middleware layer.
- ADR-001: restore full blast-radius endpoint table (15 affected admin
routes), explicit risk statement ("full platform takeover"), current
mitigations, and Phase-H remediation plan (schema, middleware, bootstrap
flow, migration path). Tracking issue: #710.
- ValidateAnyToken: add JOIN on workspaces with AND w.status != 'removed'
so tokens belonging to deleted workspaces cannot be replayed against
admin endpoints even before the token row is explicitly revoked.
- tokens_test.go: update ValidateAnyToken regexp patterns to match new
JOIN query; add TestValidateAnyToken_RemovedWorkspaceRejected.
- wsauth_middleware_test.go: update validateAnyTokenSelectQuery constant
to match JOIN query; add TestAdminAuth_RemovedWorkspaceToken_Returns401
to pin the AdminAuth removed-workspace rejection at the middleware layer.
- ADR-001: restore full blast-radius endpoint table (15 affected admin
routes), explicit risk statement ("full platform takeover"), current
mitigations, and Phase-H remediation plan (schema, middleware, bootstrap
flow, migration path). Tracking issue: #710.
#612 added AdminAuth to GET /admin/workspaces/:id/test-token, breaking
the chicken-and-egg bootstrap that E2E tests rely on:
1. POST /workspaces creates first workspace (fail-open, no tokens)
2. Provision generates a workspace auth token → inserts into DB
3. AdminAuth now sees a live token → requires auth on ALL routes
4. E2E calls test-token to get its first admin bearer → 401
5. All subsequent E2E calls fail → EVERY open PR CI blocked
The test-token handler already has its own production guard
(TestTokensEnabled returns false when MOLECULE_ENV=prod). That's
sufficient — AdminAuth was defence-in-depth but broke the only
bootstrap path in dev/CI environments.
This has been blocking CI for 6+ cycles, stalling 4 PRs (#650,
#651, #696, #701) and masking as 'flaky E2E Postgres timeout'
until root-cause analysis this cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#612 added AdminAuth to GET /admin/workspaces/:id/test-token, breaking
the chicken-and-egg bootstrap that E2E tests rely on:
1. POST /workspaces creates first workspace (fail-open, no tokens)
2. Provision generates a workspace auth token → inserts into DB
3. AdminAuth now sees a live token → requires auth on ALL routes
4. E2E calls test-token to get its first admin bearer → 401
5. All subsequent E2E calls fail → EVERY open PR CI blocked
The test-token handler already has its own production guard
(TestTokensEnabled returns false when MOLECULE_ENV=prod). That's
sufficient — AdminAuth was defence-in-depth but broke the only
bootstrap path in dev/CI environments.
This has been blocking CI for 6+ cycles, stalling 4 PRs (#650,
#651, #696, #701) and masking as 'flaky E2E Postgres timeout'
until root-cause analysis this cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two targeted fixes for the A2A false-negative (delivery succeeded but caller
receives A2A_ERROR):
Body-read failure: when Do() succeeds (target sent 2xx headers — delivery
confirmed) but io.ReadAll(resp.Body) fails, proxy now returns
{"delivery_confirmed": true} in the 502 body and logs the activity as
successful. Audit trail records true delivery, not a false failed entry.
isTransientProxyError fix: delegation retry loop now only retries 503s with
{restarting: true} (container died, message NOT delivered). 503 {busy: true}
signals the agent IS processing the delivered message — retrying causes
double-delivery. Fix prevents the double-delivery race.
All 16 packages pass: go test ./...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two targeted fixes for the A2A false-negative (delivery succeeded but caller
receives A2A_ERROR):
Body-read failure: when Do() succeeds (target sent 2xx headers — delivery
confirmed) but io.ReadAll(resp.Body) fails, proxy now returns
{"delivery_confirmed": true} in the 502 body and logs the activity as
successful. Audit trail records true delivery, not a false failed entry.
isTransientProxyError fix: delegation retry loop now only retries 503s with
{restarting: true} (container died, message NOT delivered). 503 {busy: true}
signals the agent IS processing the delivered message — retrying causes
double-delivery. Fix prevents the double-delivery race.
All 16 packages pass: go test ./...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>