feat(registry): admin endpoint to revoke a workspace's auth tokens (cross-cloud migration fix) #2738
Reference in New Issue
Block a user
Delete Branch "fix/migrate-revoke-stale-auth-token"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem (verified root cause)
Cross-cloud workspace migration (CP
migrate-provider+ CP#672) leaves a staleworkspace_auth_tokensrow so the migrated container's/registry/register401s forever on SaaS tenants — the workspace serves its agent-card but never re-registers, so its advertised URL never flips to the new box. The migration's health-check only checks "card serves 200", so it falsely reportscompletedand retires the source → self-heal re-provisions on the original cloud.Chain: source registers → live token in tenant DB. Migration provisions a fresh container with empty
/configs(CP#672 persists only/workspace+/home/agent/.claude, not/configs/.auth_token). Migrated container registers with no bearer →requireWorkspaceTokensees the source's still-live token → 401 (C18 ownership guard, registry.go:413). Nothing revokes it:sweepStaleTokensWithoutContaineronly runs in single-tenant Docker mode (orphan_sweeper.go safety filter #1), and the CP migrator bypasses the restart pipeline that would revoke (workspace_restart.go→issueAndInjectToken→wsauth.RevokeAllForWorkspace).(Explains why single-tenant molecules-prod migrations work — the sweeper runs there — while SaaS-tenant migrations wedge.)
Change
POST /admin/workspaces/:id/revoke-auth-tokens(AdminAuth-gated, in thewsAdmingroup) →wsauth.RevokeAllForWorkspace. Exposes the same revoke the restart pipeline already does, so the CP migrator (which provisions the target out-of-band) can trigger it during cutover. Idempotent: no live tokens →200no-op, so the migrator calls it unconditionally.Tests
4 unit tests (happy path, idempotent no-op, empty-id 400, db-error 500).
go build ./...,go vet,gofmtclean.Pairs with
CP-side PR: migrator calls this endpoint during cutover + hardens the migration health-gate to require the real URL-flip before retiring the source.
🤖 Generated with Claude Code
APPROVED: reviewed #2738 at head
3bbc846e.Correctness/robustness: the new AdminAuth-gated endpoint is wired under the existing wsAdmin group and calls the same wsauth.RevokeAllForWorkspace primitive used by restart token issuance. It is idempotent for zero live rows, returns 400 for an empty id, and surfaces DB failure as 500 so the CP migrator can fail cutover instead of retiring the source into a 401-wedged target.
Security: the route is admin-only and does not expose token material; responses/logs include only the workspace id and generic revoke state/error. SQL is exercised through the existing parameterized revoke helper. Performance/readability: single UPDATE path, small handler, clear tests for happy/no-op/400/500. Required CI is green. /sop-ack
/sop-ack
APPROVED (post-merge verification; PR was already merged when I fetched it). Head
3bbc846e64f729b56e8d536ba79a1a7914dd1284.5-axis review: scope is tight: one new admin handler, route wiring under the existing
wsAdmingroup, and focused tests. Behavior is correct for the cross-cloud migration wedge: it calls the existingwsauth.RevokeAllForWorkspacehelper so the next/registry/registercan bootstrap when no live token remains, while keeping the operation idempotent for already-revoked/never-registered workspaces. Security boundary is appropriate: the route is admin-auth gated, not exposed through workspace bearer auth; failures return 500 so migrators do not silently retire a source against a still-wedged target. Tests cover happy path, no-live-token idempotency, empty id 400, and DB error 500. No unrelated production surface changed.Fresh status on the merged head showed only stale gate/SOP contexts; the PR itself is already merged.
/sop-ack