#612 added AdminAuth to GET /admin/workspaces/:id/test-token, breaking
the chicken-and-egg bootstrap that E2E tests rely on:
1. POST /workspaces creates first workspace (fail-open, no tokens)
2. Provision generates a workspace auth token → inserts into DB
3. AdminAuth now sees a live token → requires auth on ALL routes
4. E2E calls test-token to get its first admin bearer → 401
5. All subsequent E2E calls fail → EVERY open PR CI blocked
The test-token handler already has its own production guard
(TestTokensEnabled returns false when MOLECULE_ENV=prod). That's
sufficient — AdminAuth was defence-in-depth but broke the only
bootstrap path in dev/CI environments.
This has been blocking CI for 6+ cycles, stalling 4 PRs (#650,
#651, #696, #701) and masking as 'flaky E2E Postgres timeout'
until root-cause analysis this cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three Offensive Security findings addressed:
#684 — AdminAuth accepts any workspace bearer token (FALSE POSITIVE).
ValidateAnyToken intentionally accepts any valid workspace token — the
platform's trust model uses workspace credentials as admin credentials.
No code change; documented as by-design in the PR body.
#682 — Deleted-workspace bearer tokens still authenticate (defense-in-depth).
The Delete handler already revokes all tokens (revoked_at = now()), so this
was a false positive. As defense-in-depth we add a JOIN against workspaces in
ValidateAnyToken so that even if revoked_at is not set (transient DB error
between status update and token revocation), the token still fails validation
once workspace.status = 'removed'.
Files: platform/internal/wsauth/tokens.go, tokens_test.go,
platform/internal/middleware/wsauth_middleware_test.go
#683 — /metrics unauthenticated (REAL).
GET /metrics was on the open router with no auth. The Prometheus endpoint
exposes the full HTTP route-pattern map, request counts by route+status, and
Go runtime memory stats — ops intel that should not reach unauthenticated
callers. Scraper must now present a valid workspace bearer token.
File: platform/internal/router/router.go
All 16 packages pass: go test ./...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ISSUE #680 — IDOR on PATCH /workspaces/🆔
- Route was on the open router with no auth middleware. Any unauthenticated
caller could rename, change role, or update any workspace field of any
workspace ID without credentials (zero auth + no ownership check).
- Fix: register under wsAuth (WorkspaceAuth middleware) which (a) requires a
valid bearer token and (b) validates the token belongs to the target
workspace, providing auth + ownership in a single check.
- Remove the now-redundant in-handler field-level auth block — the middleware
is a strictly stronger gate. Dead code gone.
- Remove unused `middleware` import from workspace.go.
- Update tests: two tests that asserted the old in-handler 401 are replaced
by TestWorkspaceUpdate_SensitiveField_AuthEnforcedByMiddleware (documents
that auth is now at the router layer); cosmetic-field test renamed.
ISSUE #681 — test-token endpoint auth:
- Confirmed: GET /admin/workspaces/:id/test-token already has
middleware.AdminAuth(db.DB). No change needed — finding was from older state.
Build: `go build ./...` clean. All 15 test packages pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FIX 1: Cloudflare Artifacts routes (wsAuth POST/GET /artifacts, /fork, /token)
were accidentally dropped when #618 modified router.go. Restored along with the
handler and client packages that were already on main (#595/#641) but missing
from this branch.
FIX 2: Stray `audh := handlers.NewAuditHandler()` / `wsAuth.GET("/audit", ...)` block
was added out-of-scope during #618 work. Removed — #594 (audit-ledger) is a
separate merged PR and its routes live on main independently.
Build: `go build ./...` clean. All 17 test packages pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operators and audit agents can now detect silent cron failures across all
workspaces with a single AdminAuth-gated request — no per-workspace bearer
tokens required. This closes the proactive detection gap that left issue #85
(cron died silently 10+ hours) undetectable until users noticed missing work.
Changes:
- platform/internal/handlers/admin_schedules_health.go: new AdminSchedulesHealthHandler
- GET /admin/schedules/health joins workspace_schedules + workspaces (excluding
removed workspaces), computes status (ok|stale|never_run) and
stale_threshold_seconds (2 × cron interval via scheduler.ComputeNextRun)
- computeStaleThreshold() and classifyScheduleStatus() extracted as
package-level helpers for direct unit testing
- platform/internal/handlers/admin_schedules_health_test.go: 16 tests
- Unit tests for computeStaleThreshold (5min/hourly/daily crons, invalid expr,
invalid timezone) and classifyScheduleStatus (never_run/stale/ok/zero-threshold)
- Integration tests via sqlmock: empty result, never_run classification,
stale detection, ok status, DB error → 500, multi-workspace response,
required JSON fields coverage
- platform/internal/router/router.go: register GET /admin/schedules/health
behind middleware.AdminAuth(db.DB), mirroring the /admin/liveness gate
Closes#618
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a minimal but complete integration with the Cloudflare Artifacts API
(private beta Apr 2026, public beta May 2026) — "Git for agents" versioned
workspace-snapshot storage.
## What's included
**`platform/internal/artifacts/client.go`** — typed Go HTTP client for the
CF Artifacts REST API:
- CreateRepo, GetRepo, ForkRepo, ImportRepo, DeleteRepo
- CreateToken, RevokeToken
- CF v4 response-envelope decoding; *APIError with StatusCode + Message
**`platform/internal/handlers/artifacts.go`** — four workspace-scoped
Gin handlers (all behind WorkspaceAuth middleware):
- POST /workspaces/:id/artifacts — attach or import a CF Artifacts repo
- GET /workspaces/:id/artifacts — get linked repo info (DB + live CF)
- POST /workspaces/:id/artifacts/fork — fork the workspace's repo
- POST /workspaces/:id/artifacts/token — mint a short-lived git credential
**`platform/migrations/028_workspace_artifacts.up.sql`** — `workspace_artifacts`
table: one-to-one link between a workspace and its CF Artifacts repo.
Credentials are never stored; only the credential-stripped remote URL.
**`platform/internal/router/router.go`** — wire the four routes into the
existing wsAuth group.
## Configuration
Two env vars gate the feature (returns 503 when either is absent):
- CF_ARTIFACTS_API_TOKEN — Cloudflare API token with Artifacts write perms
- CF_ARTIFACTS_NAMESPACE — Cloudflare Artifacts namespace name
## Tests
- 10 client-level tests (httptest.Server + CF v4 envelope mocks)
- 14 handler-level tests (sqlmock DB + mock CF server)
- Helper unit tests for stripCredentials, cfErrToHTTP
All 21 packages pass (go test ./...).
Closes#595
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Workspace agents could previously call PATCH /workspaces/:id/budget with their
own bearer token and set budget_limit=null, defeating the entire spend enforcement
feature. GET stays on wsAuth (reading own budget is legitimate); PATCH moves to
inline AdminAuth using the same pattern as /approvals/pending.
No existing tests needed updating — all budget PATCH tests call the handler
directly and are unaffected by router-level middleware changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New BudgetHandler with GetBudget and PatchBudget methods
- GET returns budget_limit (null or int64 USD cents), monthly_spend,
and computed budget_remaining (null when no limit, can be negative
when over-budget so callers can see the magnitude of the overage)
- PATCH accepts {budget_limit: int64|null}; null clears the ceiling;
validates non-negative values; re-reads DB to echo final state
- Both handlers are wired in router.go under the WorkspaceAuth group
- 14 unit tests covering happy paths, 404, 400 validation, DB errors,
over-budget state, zero limit, and clear-limit round-trip
- All 20 packages pass go test ./... and go build ./... is clean
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add an org-scoped allowlist table so org admins can restrict which plugins
workspace agents are allowed to install. An empty allowlist means
allow-all (backward-compatible with existing deployments).
• migrations/027_org_plugin_allowlist.{up,down}.sql — new table + unique
index on (org_id, plugin_name)
• handlers/org_plugin_allowlist.go — resolveOrgID, checkOrgPluginAllowlist
(fail-open on DB errors), GetAllowlist, PutAllowlist (atomic tx replace)
• handlers/org_plugin_allowlist_test.go — 23 unit tests covering all
handler paths, resolveOrgID, and all checkOrgPluginAllowlist branches
• handlers/plugins_install.go — allowlist gate between resolveAndStage and
deliverToContainer; returns 403 if plugin is blocked
• router/router.go — GET/PUT /orgs/:id/plugins/allowlist under AdminAuth
All tests pass; go build ./... clean; gosec Issues: 0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Migration 026 adds workspace_token_usage table (uuid pk, workspace_id FK with
CASCADE, period_start TIMESTAMPTZ, input_tokens, output_tokens, call_count,
estimated_cost_usd NUMERIC(12,6), updated_at) with a UNIQUE index on
(workspace_id, period_start) for day-granularity upserts.
A2A proxy (proxyA2ARequest) now spawns a detached goroutine after each
successful call to extractAndUpsertTokenUsage, which:
1. Parses usage.input_tokens / usage.output_tokens from result.usage
(JSON-RPC wrapper) with fallback to top-level usage (direct Anthropic).
2. Calls upsertTokenUsage — INSERT ... ON CONFLICT DO UPDATE so multi-
call days accumulate correctly. Estimated cost = input×$0.000003 +
output×$0.000015 (Claude Sonnet default; adjustable in a later phase).
Token tracking never blocks the critical A2A path.
New endpoint: GET /workspaces/:id/metrics (wsAuth — WorkspaceAuth bearer
bound to :id). Returns:
{"input_tokens":N,"output_tokens":N,"total_calls":N,
"estimated_cost_usd":"0.000000","period_start":"...","period_end":"..."}
404 if workspace missing. Period is current UTC day.
11 new tests (4 handler + 7 parse-unit); 19/19 packages pass.
Closes#593
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add in-process SSE subscription mechanism to Broadcaster (SubscribeSSE,
deliverToSSE) so both RecordAndBroadcast *and* BroadcastOnly fan out to
SSE subscribers — critical because BroadcastOnly skips Redis pub/sub and
would be invisible to a Redis-only subscriber (AGENT_MESSAGE, A2A_RESPONSE,
TASK_UPDATED are all BroadcastOnly events).
- Add handlers/sse.go: SSEHandler.StreamEvents sets text/event-stream headers,
checks workspace existence (404 if missing), subscribes via broadcaster, and
wraps each WSMessage in an AG-UI envelope:
data: {"type":"<event>","timestamp":<unix_ms>,"data":{...}}\n\n
- Register wsAuth.GET("/workspaces/:id/events/stream") behind existing
WorkspaceAuth middleware — bearer token bound to :id.
- Add 6 tests: Content-Type, initial ping, AG-UI format, workspace filter
(cross-workspace events not leaked), 404 on missing workspace, multiple
sequential events.
All 19 packages pass. Build clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without middleware, any caller on a non-production instance could mint a
bearer token for any workspace UUID with no authentication. AdminAuth is
defence-in-depth: on a fresh install (no tokens yet) it is fail-open so
the bootstrap path still works; once the first workspace enrolls a token
all callers must present a valid bearer.
Adds two router-level tests confirming the gate:
- TestTestTokenRoute_RequiresAdminAuth_WhenTokensExist → 401 with no header
- TestTestTokenRoute_FailOpenOnFreshInstall → 200 (bootstrap path intact)
Env-var gating inside GetTestToken is retained as a second layer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.
Fix (Option B — on-demand endpoint):
pkg/provisionhook:
- Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
Lives in pkg/ (public) so the github-app-auth plugin can implement it.
- Add Registry.FirstTokenProvider() — discovers the first mutator that
also satisfies TokenProvider via interface assertion. Safe under
concurrent reads (existing RWMutex).
platform/internal/handlers/github_token.go:
- New GitHubTokenHandler serving GET /admin/github-installation-token
- Delegates to the registered TokenProvider (plugin cache — always fresh)
- 404 if no GitHub App configured, 500 + [github] prefix log on error
- Never logs the token itself
platform/internal/handlers/workspace.go:
- Add TokenRegistry() getter so the router can wire the handler without
coupling to WorkspaceHandler internals
platform/internal/router/router.go:
- Register GET /admin/github-installation-token under AdminAuth
workspace-template/:
- scripts/molecule-git-token-helper.sh — git credential helper; calls
the platform endpoint on every push/fetch; falls through to next
helper (operator PAT) if platform unreachable
- entrypoint.sh — configure the credential helper at startup
Why Option B over Option A (background goroutine):
- The plugin already has its own cache refresh; nothing to refresh here.
- Pushing env updates into running containers requires docker exec, which
the architecture explicitly rejects (issue #547 "Alternatives").
- Pull-based is stateless, trivially testable, zero extra goroutines.
Closes#547
Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
httputil.ReverseProxy calls CloseNotify() which httptest.ResponseRecorder
doesn't implement. Gin casts the writer, causing a panic. Added a
closeNotifyRecorder wrapper with a no-op channel.
The canvas proxy was forwarding all headers verbatim to the Next.js process.
Workspace bearer tokens sent by agents (e.g. during an A2A call that hit a
canvas-side route) could reach unvalidated Next.js handlers and be echoed back
to an attacker via an error page or a debug endpoint.
Fix: Director now calls Header.Del("Authorization") + Header.Del("Cookie")
before forwarding. Non-credential headers (Accept, X-Request-Id, etc.) are
unaffected — the strip is surgical.
Four unit tests added (strips Authorization, strips Cookie, forwards other
headers, strips both simultaneously).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Single-container tenant architecture: Go platform (:8080) + Canvas
Node.js (:3000) in one Fly machine, with Go's NoRoute handler reverse-
proxying non-API routes to the canvas. Browser only talks to :8080.
Changes:
platform/Dockerfile.tenant — multi-stage build (Go + Node + runtime).
Bakes workspace-configs-templates/ + org-templates/ into the image.
Build context: repo root.
platform/entrypoint-tenant.sh — starts both processes, kills both if
either exits. Fly health check on :8080 covers the Go binary; canvas
health is implicit (proxy returns 502 if canvas is down).
platform/internal/router/canvas_proxy.go — httputil.ReverseProxy that
forwards unmatched routes to CANVAS_PROXY_URL (http://localhost:3000).
Activated by NoRoute when CANVAS_PROXY_URL env is set.
platform/internal/router/router.go — wire NoRoute → canvasProxy when
CANVAS_PROXY_URL is present; no-op otherwise (local dev unchanged).
platform/internal/middleware/securityheaders.go — relaxed CSP to allow
Next.js inline scripts/styles/eval + WebSocket + data: URIs. The
strict `default-src 'self'` was blocking all canvas rendering.
canvas/src/lib/api.ts — changed `||` to `??` for NEXT_PUBLIC_PLATFORM_URL
so empty string means "same-origin" (combined image) instead of falling
back to localhost:8080.
canvas/src/components/tabs/TerminalTab.tsx — same `??` fix for WS URL.
Verified: tenant machine boots, canvas renders, 8 runtime templates +
4 org templates visible, API routes work through the same port.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rebased cleanly onto current main (resolves the add/add conflicts that
blocked CI on PR #374 — the original branch diverged from a pre-repo-bootstrap
commit that predated most files).
Changes:
- schedules.go: add scheduleHealthResponse struct + Health handler
(mirrors A2A proxy auth pattern: X-Workspace-ID + CanCommunicate gate)
- router.go: register GET /workspaces/:id/schedules/health on r (not wsAuth)
so peer agents can query without holding the target workspace's bearer token
- schedules_test.go: 7 new tests (missing caller 401, self-call OK, legacy
peer grandfathered, non-peer 403, system caller bypass, no prompt exposure,
DB error 500)
isSystemCaller/validateCallerToken reused from a2a_proxy.go (same package).
registry.CanCommunicate import added to schedules.go.
Closes#249
Supersedes PR #374 (which could not get CI due to merge conflict)
Co-authored-by: PM (Molecule AI) <pm@molecule-ai.internal>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #N (issue to be filed)
Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
- doesn't work for remote workspaces (Phase 30 / Fly Machines)
- requires shell access on the host
- has no pagination
This PR adds:
1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
`{runtime, supported, lines, cursor, more, source}`. Default returns
`supported: false` so non-claude-code runtimes pass through gracefully.
2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
project dir name matches what Claude Code actually writes to. Limit
capped at 1000 to prevent OOM.
3. Workspace HTTP route `GET /transcript` — Starlette handler added
alongside the A2A app. Trusts the internal Docker network (same
model as POST / for A2A); Phase 30 remote-workspace auth is a
follow-up.
4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
workspace's URL, forwards GET, caps response at 1MB. Gated by
existing `WorkspaceAuth` middleware (same as /traces, /memories,
/delegations).
Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.
Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.
## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
Closes#250 (MEDIUM). POST /channels/discover was on the open router
and accepted an arbitrary Telegram bot token, turning it into:
1. A free bot-token validity oracle — attackers can enumerate/probe
tokens at zero cost
2. A drive-by deleteWebhook side effect — every call invokes
tgbotapi.DeleteWebhookConfig against the target bot, breaking
legitimate webhook delivery
3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call
Fix: one-line addition of middleware.AdminAuth(db.DB) to the route,
matching its actual intent (platform-operator admin helper, not a
per-workspace route). Pattern mirrors /admin/liveness, /events, and
/bundles/export from PR #167.
No new test: AdminAuth behavior is covered by
wsauth_middleware_test.go; this PR only wires it onto an additional
route. The load-bearing code comment references #250 so future
reviewers can't revert without an issue citation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#168 by the route-split path from #194's review. #167 put PUT
/canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom
persist because the canvas uses session cookies not bearer tokens.
New narrow middleware CanvasOrBearer:
- Accepts a valid bearer (same contract as AdminAuth) OR
- Accepts a request whose Origin exactly matches CORS_ORIGINS
- Lazy-bootstrap fail-open preserved for fresh installs
Applied ONLY to PUT /canvas/viewport. The softer check is acceptable
there because viewport corruption is cosmetic-only — worst case a
user refreshes the page. This middleware must NOT be used on routes
that leak prompts (#165), create resources (#164), or write files
(#190) — see #194 review for why.
The other canvas-facing routes mentioned in #168 (Events tab, Bundle
Export/Import) remain behind strict AdminAuth pending a proper
session-cookie-accepting AdminAuth (#168 follow-up for Phase H).
6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin
match, wrong origin 401, empty origin rejected, localhost default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#190 (HIGH). The route was registered on the root router with no
auth middleware, letting any unauthenticated caller write arbitrary files
into configsDir via a crafted template. Same vulnerability class as #164
(bundles/import) and path-traversal risk same as #103 (org/import).
One-line gate via the existing wsAdmin pattern. Lazy-bootstrap fail-open
preserved for fresh installs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this call Gin's default trusts all X-Forwarded-For headers, letting
any caller rotate their effective IP and bypass per-IP rate limiting.
SetTrustedProxies(nil) forces c.ClientIP() to always return the real
TCP RemoteAddr.
Adds two regression tests: one documenting the pre-fix bypass, one
asserting the spoofed header is ignored after the fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET /approvals/pending was registered on the open router with no
middleware, allowing any unauthenticated caller to enumerate all pending
approvals across every workspace on the platform.
Fix: add inline middleware.AdminAuth(db.DB) to the route registration,
matching the pattern used in PR #167 for bundles, events, and viewport.
The three workspace-scoped approvals routes (POST/GET /approvals,
POST /approvals/:id/decide) were already correctly behind WorkspaceAuth
inside the wsAuth group — no change needed there.
Tests: two new regression tests in wsauth_middleware_test.go —
TestAdminAuth_Issue180_ApprovalsListing_NoBearer_Returns401
TestAdminAuth_Issue180_ApprovalsListing_FailOpen_NoTokens
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRITICAL (#164):
POST /bundles/import — anon callers could create arbitrary workspaces
with user-supplied system prompts, plugins, and secrets envelopes.
Fixed by gating behind AdminAuth (bundleAdmin group).
HIGH (#165):
GET /bundles/export/:id — anon UUID probe leaked full system prompts,
agent_card, plugins, memory for any workspace.
GET /events + GET /events/:workspaceId — anon read of the append-only
event log leaked org topology, workspace names, card fragments.
Both moved into the same bundleAdmin / eventsAdmin groups.
MEDIUM (#166):
PUT /canvas/viewport — anon callers could reset shared viewport state.
Gated via a scoped viewportAdmin group; GET stays open so canvas
bootstraps without a bearer.
GET /admin/liveness — operational-intel leak (scheduler cadence
reveals work pattern). Inline AdminAuth on the single handler.
All 6 routes use the same lazy-bootstrap admin auth the rest of the
platform uses: zero-token installs fail-open, once any token exists
every request must present a valid bearer.
Known follow-up: canvas uses session cookies not bearer tokens (same
pattern as #138). In multi-tenant production these canvas features —
Events tab, Export/Duplicate, viewport persist — will return 401 once
a workspace is token-enrolled. Needs cookie-accepting AdminAuth as a
follow-up (tracked as option B in #138 triage discussion); a new issue
will be filed for that scope. The security gain from closing #164
CRITICAL outweighs the canvas UX regression for tonight.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#138. #125 moved PATCH /workspaces/:id into the wsAdmin AdminAuth
group to close the #120 unauth vulnerability, but broke canvas drag-
reposition and inline rename because canvas uses session cookies not
bearer tokens. Multi-tenant deployments with any live token would have
seen every canvas PATCH 401.
Option A per #138 triage: PATCH goes back on the open router, but
WorkspaceHandler.Update now enforces field-level authz:
Cosmetic (no bearer required):
name, role, x, y, canvas
Sensitive (bearer required when any live token exists):
tier — resource escalation
parent_id — A2A hierarchy manipulation
runtime — container image swap
workspace_dir — host bind-mount redirection
Fail-open bootstrap: HasAnyLiveTokenGlobal = 0 → pass-through
(fresh install, pre-Phase-30 upgrade path). Matches the same
lazy-bootstrap contract WorkspaceAuth and AdminAuth use elsewhere.
3 new tests cover all three branches of the matrix (cosmetic
no-bearer, sensitive no-bearer-rejected, sensitive fail-open).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#151. The middleware was already implemented + tested (3 passing
tests in securityheaders_test.go covering base set, multi-route, and
the don't-override-existing contract) but never registered in router.go.
One-line wire-up, runs after TenantGuard so rejected requests still
get the same headers as accepted ones, and before routes so handlers
can still opt out by setting their own header before c.Next() returns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Issue #120 (HIGH — immediately exploitable):
PATCH /workspaces/:id was registered on the root router with no auth
middleware. An attacker with any workspace UUID could:
- Escalate tier (tier 4 = 4 GB RAM allocation)
- Rewrite parent_id to subvert CanCommunicate A2A access control
- Swap runtime image on next restart
- Redirect workspace_dir host bind-mount to arbitrary path
Fix: move PATCH into the wsAdmin AdminAuth group alongside POST, DELETE.
The canvas position-persist call already has an AdminAuth token (required
for GET /workspaces list on initial load) so no canvas regression.
Also add workspace-existence guard in Update handler — previously returned
200 with zero rows affected for nonexistent IDs.
Issue #113 (MEDIUM — schedule IDOR, carry-over from prior cycle):
PATCH /workspaces/:id/schedules/:scheduleId and DELETE operated on
scheduleID alone (WHERE id = $1), allowing any authenticated caller to
modify or delete schedules belonging to other workspaces.
Fix: bind workspace_id = c.Param("id") in both Update and Delete handlers;
add AND workspace_id = $N to all schedule SQL queries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes#103 (HIGH). Three attack surfaces on the import endpoint —
body.Dir, workspace.Template, workspace.FilesDir — were concatenated
via filepath.Join without validation, letting an unauthenticated
caller probe arbitrary filesystem paths with "../../../etc".
Two layers of defense:
1. resolveInsideRoot() rejects absolute paths and any relative path
whose lexically cleaned join escapes the provided root (Abs +
HasPrefix + separator guard). 6 tests cover happy path, traversal
attempts, absolute path, empty input, prefix-sibling escape, and
deep subpath resolution.
2. Route now runs behind middleware.AdminAuth so an unauthenticated
attacker can't reach the handler at all once a token exists.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology
without any authentication. The endpoint was intentionally left open for
the canvas browser frontend; this PR closes that gap.
Router change:
- Move GET /workspaces from the bare root router into the wsAdmin AdminAuth
group alongside POST /workspaces and DELETE /workspaces/:id.
- AdminAuth uses the same fail-open bootstrap contract as all other auth
gates: fresh installs (no live tokens) pass through; once any workspace
has registered with a token, a valid bearer is required.
Status of findings C2–C11 (documented here for audit trail):
- C2 POST /workspaces/:id/activity → already in wsAuth group (Cycle 5)
- C3 POST /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5)
- C4 POST /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5)
- C5 GET /workspaces/:id/delegations → already in wsAuth group (Cycle 5)
- C7 GET /workspaces/:id/memories → already in wsAuth group (Cycle 5)
- C8 POST /workspaces/:id/memories → already in wsAuth group (Cycle 5)
- C9 POST /workspaces/:id/delegate → already in wsAuth group (Cycle 5)
- C10 GET /admin/secrets → already in adminAuth group (Cycle 7)
- C11 POST+DELETE /admin/secrets → already in adminAuth group (Cycle 7)
Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new):
WorkspaceAuth:
- fail-open when workspace has no tokens (bootstrap path)
- C4: no bearer on /delegations/:id/update → 401
- C8: no bearer on /memories POST → 401
- invalid bearer → 401
- cross-workspace token replay → 401
- valid bearer for correct workspace → 200
AdminAuth:
- fail-open when no tokens exist globally (fresh install)
- C10: no bearer on GET /admin/secrets → 401
- C11: no bearer on POST /admin/secrets → 401
- C11: no bearer on DELETE /admin/secrets/:key → 401
- valid bearer → 200
- invalid bearer → 401
Note: did NOT touch DELETE /admin/secrets in production — no destructive
calls to live secrets endpoints were made during this work.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Yesterday's scheduler-died incident (#85) was one instance of a systemic
bug: every long-running goroutine in the platform lacks panic recovery
and exposes no liveness signal. In a multi-tenant SaaS deployment, a
single tenant's bad data panicking any subsystem takes down the
subsystem for every tenant, silently, with all standard health probes
still green. That is a scale-of-one sev-1.
This PR:
1. Introduces `platform/internal/supervised/` with two primitives:
a. RunWithRecover(ctx, name, fn) — runs fn in a recover wrapper.
On panic logs the stack + exponential-backoff restart (1s → 2s →
4s → … → 30s cap). On clean return (fn decided to stop) returns.
On ctx.Done cancels cleanly.
b. Heartbeat(name) + LastTick(name) + Snapshot() + IsHealthy(names,
staleThreshold) — shared in-memory liveness registry. Every
subsystem calls Heartbeat(name) at the end of each tick so
operators can distinguish "goroutine alive and healthy" from
"alive but stuck inside a single tick".
2. Wraps every `go X.Start(ctx)` in main.go:
- broadcaster.Subscribe (Redis pub/sub relay → WebSocket)
- registry.StartLivenessMonitor
- registry.StartHealthSweep
- scheduler.Start (the one that died yesterday)
- channelMgr.Start (Telegram / Slack)
3. Adds `supervised.Heartbeat("scheduler")` inside the scheduler tick
loop as the first end-to-end demonstration. Follow-up PRs will add
heartbeats to the other four subsystems.
4. Adds `GET /admin/liveness` endpoint returning per-subsystem
last_tick_at + seconds_ago. Operators can poll this and alert on
any subsystem whose seconds_ago exceeds 2x its cron/tick interval.
5. Unit tests for RunWithRecover (clean return no restart; panic
restarts with backoff; ctx cancel stops restart loop) and for the
liveness registry.
Net new code: ~160 lines + ~100 lines of tests. Refactor of main.go:
~10 line changes. No behavior change on happy path; only lifts what
happens on a panic.
Closes#92. Supersedes the local recover added to scheduler.go in
#90 (kept conceptually, but now via the shared helper).
Phase 32 foundation. The SaaS control plane (private molecule-controlplane
repo) provisions one platform instance per customer org on Fly Machines
and sets MOLECULE_ORG_ID=<uuid> on the machine. Its subdomain router
forwards requests with X-Molecule-Org-Id=<uuid>.
TenantGuard:
- When MOLECULE_ORG_ID is set → every non-allowlisted request must carry a
matching X-Molecule-Org-Id header. Mismatched/missing header → 404 (not
403 — don't leak tenant existence by letting probers distinguish "wrong
org" from "route doesn't exist").
- When unset → passthrough. Self-hosted / dev / CI behavior unchanged.
- Allowlist is exact-match, not prefix — /health and /metrics only.
No orgs table, no signup, no billing, no Fly provisioning in this repo —
all that lives in the private control plane. The public repo's SaaS
surface is exactly this one middleware.
6 tests covering: unset-is-passthrough, matching header, mismatched
header 404 (with empty body), missing header 404, allowlist bypass, and
allowlist-is-exact-match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).
- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
C18 — Workspace URL hijacking (CRITICAL, CONFIRMED LIVE):
POST /registry/register now calls requireWorkspaceToken() before
persisting anything. If the workspace has any live auth tokens, the
caller must supply a valid Bearer token matching that workspace ID.
First registration (no tokens yet) passes through — token is issued
at end of this function (unchanged bootstrap contract). Mirrors the
same pattern already applied to /registry/heartbeat and
/registry/update-card. Attacker POC — overwriting Backend Engineer URL
to http://attacker.example.com:9999/steal — now returns 401.
C20 — Unauthenticated workspace deletion (CRITICAL, CONFIRMED LIVE):
DELETE /workspaces/:id moved from bare router into AdminAuth group.
Any valid workspace bearer token grants access (same fail-open
bootstrap contract as /settings/secrets). Mass-deletion attack chain
(C19 list → C20 delete all) requires auth for the DELETE step.
POST /workspaces (create) also moved to AdminAuth to prevent
unauthenticated workspace creation.
C19 (GET /workspaces topology exposure) deferred — canvas browser
has no bearer token; fix requires canvas service-token refactor.
Tests: 2 new registry tests — C18 bootstrap (no tokens, passes
through and issues token), C18 hijack blocked (has tokens, no
bearer → 401).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unauthenticated routes allowed arbitrary read/write/delete of all
global platform secrets (API keys, provider credentials) with zero auth:
- GET/PUT/POST /settings/secrets
- DELETE /settings/secrets/:key
- GET/POST/DELETE /admin/secrets (legacy aliases)
Fix: new AdminAuth middleware with same lazy-bootstrap contract as
WorkspaceAuth — fail-open when no tokens exist (fresh install / pre-Phase-30
upgrade), enforce once any workspace has a live token. Any valid workspace
bearer token grants access (platform-wide scope, no workspace binding needed).
Changes:
wsauth/tokens.go — HasAnyLiveTokenGlobal + ValidateAnyToken functions
wsauth/tokens_test.go — 5 new tests covering both new functions
middleware/wsauth_middleware.go — AdminAuth middleware
router/router.go — global secrets routes now registered under adminAuth group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
secrets.Values: workspaces with no live token are grandfathered through.
Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.
Fix A — platform/internal/router/router.go:
Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
and /a2a remain on root router; all other /workspaces/:id/* sub-routes
moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
CORS AllowHeaders updated to include Authorization so browser/agent callers
can send the bearer token cross-origin.
Fix B — workspace-template/heartbeat.py:
_check_delegations(): validate source_id == self.workspace_id before
accepting a delegation result. Attacker-crafted records with a foreign
source_id are silently skipped with a WARNING log (injection attempt).
trigger_msg no longer embeds raw response_preview text; references
delegation_id + status only — removes the prompt-injection vector.
Fix C — workspace-template/skill_loader/loader.py:
load_skill_tools(): before exec_module(), verify script is within
scripts_dir (path traversal guard) and temporarily scrub sensitive env
vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
in finally block. Defence-in-depth even if /plugins auth gate is bypassed.
Fix D — platform/internal/handlers/socket.go:
HandleConnect(): agent connections (X-Workspace-ID present) validated via
wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
Canvas clients (no X-Workspace-ID) remain unauthenticated.
Fix D — workspace-template/events.py:
PlatformEventSubscriber._connect(): include platform_auth bearer token in
WebSocket upgrade headers alongside X-Workspace-ID.
Fix E — workspace-template/executor_helpers.py:
recall_memories() and commit_memory() now pass platform_auth bearer token
in Authorization header so WorkspaceAuth middleware allows access.
Fix F — workspace-template/a2a_client.py:
send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.
Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete empty platform/plugins/ (dead remnant; plugins/ at repo root is
the real registry; router.go comment updated)
- Gitignore local dev cruft: platform/workspace-configs-templates/,
.agents/ (codex/gemini skill cache), backups/
- Untrack .agents/skills/ (keep local, stop tracking)
- Move examples/remote-agent/ → sdk/python/examples/remote-agent/
(co-locate with the SDK it exercises); update refs in
molecule_agent README + __init__ + PLAN.md + the demo's own README
- Move docs/superpowers/plans/ → plugins/superpowers/plans/
(plans were written by the superpowers plugin's writing-plans
subskill; belong with the plugin, not under docs)
- Add tests/README.md explaining the unit-tests-per-package +
root-E2E split so new contributors don't ask
- Add docs/README.md explaining why site tooling lives under docs/
rather than a separate docs-site/ (VitePress ergonomics)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>