Closes#250 (MEDIUM). POST /channels/discover was on the open router
and accepted an arbitrary Telegram bot token, turning it into:
1. A free bot-token validity oracle — attackers can enumerate/probe
tokens at zero cost
2. A drive-by deleteWebhook side effect — every call invokes
tgbotapi.DeleteWebhookConfig against the target bot, breaking
legitimate webhook delivery
3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call
Fix: one-line addition of middleware.AdminAuth(db.DB) to the route,
matching its actual intent (platform-operator admin helper, not a
per-workspace route). Pattern mirrors /admin/liveness, /events, and
/bundles/export from PR #167.
No new test: AdminAuth behavior is covered by
wsauth_middleware_test.go; this PR only wires it onto an additional
route. The load-bearing code comment references #250 so future
reviewers can't revert without an issue citation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#168 by the route-split path from #194's review. #167 put PUT
/canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom
persist because the canvas uses session cookies not bearer tokens.
New narrow middleware CanvasOrBearer:
- Accepts a valid bearer (same contract as AdminAuth) OR
- Accepts a request whose Origin exactly matches CORS_ORIGINS
- Lazy-bootstrap fail-open preserved for fresh installs
Applied ONLY to PUT /canvas/viewport. The softer check is acceptable
there because viewport corruption is cosmetic-only — worst case a
user refreshes the page. This middleware must NOT be used on routes
that leak prompts (#165), create resources (#164), or write files
(#190) — see #194 review for why.
The other canvas-facing routes mentioned in #168 (Events tab, Bundle
Export/Import) remain behind strict AdminAuth pending a proper
session-cookie-accepting AdminAuth (#168 follow-up for Phase H).
6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin
match, wrong origin 401, empty origin rejected, localhost default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#190 (HIGH). The route was registered on the root router with no
auth middleware, letting any unauthenticated caller write arbitrary files
into configsDir via a crafted template. Same vulnerability class as #164
(bundles/import) and path-traversal risk same as #103 (org/import).
One-line gate via the existing wsAdmin pattern. Lazy-bootstrap fail-open
preserved for fresh installs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this call Gin's default trusts all X-Forwarded-For headers, letting
any caller rotate their effective IP and bypass per-IP rate limiting.
SetTrustedProxies(nil) forces c.ClientIP() to always return the real
TCP RemoteAddr.
Adds two regression tests: one documenting the pre-fix bypass, one
asserting the spoofed header is ignored after the fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET /approvals/pending was registered on the open router with no
middleware, allowing any unauthenticated caller to enumerate all pending
approvals across every workspace on the platform.
Fix: add inline middleware.AdminAuth(db.DB) to the route registration,
matching the pattern used in PR #167 for bundles, events, and viewport.
The three workspace-scoped approvals routes (POST/GET /approvals,
POST /approvals/:id/decide) were already correctly behind WorkspaceAuth
inside the wsAuth group — no change needed there.
Tests: two new regression tests in wsauth_middleware_test.go —
TestAdminAuth_Issue180_ApprovalsListing_NoBearer_Returns401
TestAdminAuth_Issue180_ApprovalsListing_FailOpen_NoTokens
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRITICAL (#164):
POST /bundles/import — anon callers could create arbitrary workspaces
with user-supplied system prompts, plugins, and secrets envelopes.
Fixed by gating behind AdminAuth (bundleAdmin group).
HIGH (#165):
GET /bundles/export/:id — anon UUID probe leaked full system prompts,
agent_card, plugins, memory for any workspace.
GET /events + GET /events/:workspaceId — anon read of the append-only
event log leaked org topology, workspace names, card fragments.
Both moved into the same bundleAdmin / eventsAdmin groups.
MEDIUM (#166):
PUT /canvas/viewport — anon callers could reset shared viewport state.
Gated via a scoped viewportAdmin group; GET stays open so canvas
bootstraps without a bearer.
GET /admin/liveness — operational-intel leak (scheduler cadence
reveals work pattern). Inline AdminAuth on the single handler.
All 6 routes use the same lazy-bootstrap admin auth the rest of the
platform uses: zero-token installs fail-open, once any token exists
every request must present a valid bearer.
Known follow-up: canvas uses session cookies not bearer tokens (same
pattern as #138). In multi-tenant production these canvas features —
Events tab, Export/Duplicate, viewport persist — will return 401 once
a workspace is token-enrolled. Needs cookie-accepting AdminAuth as a
follow-up (tracked as option B in #138 triage discussion); a new issue
will be filed for that scope. The security gain from closing #164
CRITICAL outweighs the canvas UX regression for tonight.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#138. #125 moved PATCH /workspaces/:id into the wsAdmin AdminAuth
group to close the #120 unauth vulnerability, but broke canvas drag-
reposition and inline rename because canvas uses session cookies not
bearer tokens. Multi-tenant deployments with any live token would have
seen every canvas PATCH 401.
Option A per #138 triage: PATCH goes back on the open router, but
WorkspaceHandler.Update now enforces field-level authz:
Cosmetic (no bearer required):
name, role, x, y, canvas
Sensitive (bearer required when any live token exists):
tier — resource escalation
parent_id — A2A hierarchy manipulation
runtime — container image swap
workspace_dir — host bind-mount redirection
Fail-open bootstrap: HasAnyLiveTokenGlobal = 0 → pass-through
(fresh install, pre-Phase-30 upgrade path). Matches the same
lazy-bootstrap contract WorkspaceAuth and AdminAuth use elsewhere.
3 new tests cover all three branches of the matrix (cosmetic
no-bearer, sensitive no-bearer-rejected, sensitive fail-open).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#151. The middleware was already implemented + tested (3 passing
tests in securityheaders_test.go covering base set, multi-route, and
the don't-override-existing contract) but never registered in router.go.
One-line wire-up, runs after TenantGuard so rejected requests still
get the same headers as accepted ones, and before routes so handlers
can still opt out by setting their own header before c.Next() returns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Issue #120 (HIGH — immediately exploitable):
PATCH /workspaces/:id was registered on the root router with no auth
middleware. An attacker with any workspace UUID could:
- Escalate tier (tier 4 = 4 GB RAM allocation)
- Rewrite parent_id to subvert CanCommunicate A2A access control
- Swap runtime image on next restart
- Redirect workspace_dir host bind-mount to arbitrary path
Fix: move PATCH into the wsAdmin AdminAuth group alongside POST, DELETE.
The canvas position-persist call already has an AdminAuth token (required
for GET /workspaces list on initial load) so no canvas regression.
Also add workspace-existence guard in Update handler — previously returned
200 with zero rows affected for nonexistent IDs.
Issue #113 (MEDIUM — schedule IDOR, carry-over from prior cycle):
PATCH /workspaces/:id/schedules/:scheduleId and DELETE operated on
scheduleID alone (WHERE id = $1), allowing any authenticated caller to
modify or delete schedules belonging to other workspaces.
Fix: bind workspace_id = c.Param("id") in both Update and Delete handlers;
add AND workspace_id = $N to all schedule SQL queries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes#103 (HIGH). Three attack surfaces on the import endpoint —
body.Dir, workspace.Template, workspace.FilesDir — were concatenated
via filepath.Join without validation, letting an unauthenticated
caller probe arbitrary filesystem paths with "../../../etc".
Two layers of defense:
1. resolveInsideRoot() rejects absolute paths and any relative path
whose lexically cleaned join escapes the provided root (Abs +
HasPrefix + separator guard). 6 tests cover happy path, traversal
attempts, absolute path, empty input, prefix-sibling escape, and
deep subpath resolution.
2. Route now runs behind middleware.AdminAuth so an unauthenticated
attacker can't reach the handler at all once a token exists.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology
without any authentication. The endpoint was intentionally left open for
the canvas browser frontend; this PR closes that gap.
Router change:
- Move GET /workspaces from the bare root router into the wsAdmin AdminAuth
group alongside POST /workspaces and DELETE /workspaces/:id.
- AdminAuth uses the same fail-open bootstrap contract as all other auth
gates: fresh installs (no live tokens) pass through; once any workspace
has registered with a token, a valid bearer is required.
Status of findings C2–C11 (documented here for audit trail):
- C2 POST /workspaces/:id/activity → already in wsAuth group (Cycle 5)
- C3 POST /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5)
- C4 POST /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5)
- C5 GET /workspaces/:id/delegations → already in wsAuth group (Cycle 5)
- C7 GET /workspaces/:id/memories → already in wsAuth group (Cycle 5)
- C8 POST /workspaces/:id/memories → already in wsAuth group (Cycle 5)
- C9 POST /workspaces/:id/delegate → already in wsAuth group (Cycle 5)
- C10 GET /admin/secrets → already in adminAuth group (Cycle 7)
- C11 POST+DELETE /admin/secrets → already in adminAuth group (Cycle 7)
Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new):
WorkspaceAuth:
- fail-open when workspace has no tokens (bootstrap path)
- C4: no bearer on /delegations/:id/update → 401
- C8: no bearer on /memories POST → 401
- invalid bearer → 401
- cross-workspace token replay → 401
- valid bearer for correct workspace → 200
AdminAuth:
- fail-open when no tokens exist globally (fresh install)
- C10: no bearer on GET /admin/secrets → 401
- C11: no bearer on POST /admin/secrets → 401
- C11: no bearer on DELETE /admin/secrets/:key → 401
- valid bearer → 200
- invalid bearer → 401
Note: did NOT touch DELETE /admin/secrets in production — no destructive
calls to live secrets endpoints were made during this work.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Yesterday's scheduler-died incident (#85) was one instance of a systemic
bug: every long-running goroutine in the platform lacks panic recovery
and exposes no liveness signal. In a multi-tenant SaaS deployment, a
single tenant's bad data panicking any subsystem takes down the
subsystem for every tenant, silently, with all standard health probes
still green. That is a scale-of-one sev-1.
This PR:
1. Introduces `platform/internal/supervised/` with two primitives:
a. RunWithRecover(ctx, name, fn) — runs fn in a recover wrapper.
On panic logs the stack + exponential-backoff restart (1s → 2s →
4s → … → 30s cap). On clean return (fn decided to stop) returns.
On ctx.Done cancels cleanly.
b. Heartbeat(name) + LastTick(name) + Snapshot() + IsHealthy(names,
staleThreshold) — shared in-memory liveness registry. Every
subsystem calls Heartbeat(name) at the end of each tick so
operators can distinguish "goroutine alive and healthy" from
"alive but stuck inside a single tick".
2. Wraps every `go X.Start(ctx)` in main.go:
- broadcaster.Subscribe (Redis pub/sub relay → WebSocket)
- registry.StartLivenessMonitor
- registry.StartHealthSweep
- scheduler.Start (the one that died yesterday)
- channelMgr.Start (Telegram / Slack)
3. Adds `supervised.Heartbeat("scheduler")` inside the scheduler tick
loop as the first end-to-end demonstration. Follow-up PRs will add
heartbeats to the other four subsystems.
4. Adds `GET /admin/liveness` endpoint returning per-subsystem
last_tick_at + seconds_ago. Operators can poll this and alert on
any subsystem whose seconds_ago exceeds 2x its cron/tick interval.
5. Unit tests for RunWithRecover (clean return no restart; panic
restarts with backoff; ctx cancel stops restart loop) and for the
liveness registry.
Net new code: ~160 lines + ~100 lines of tests. Refactor of main.go:
~10 line changes. No behavior change on happy path; only lifts what
happens on a panic.
Closes#92. Supersedes the local recover added to scheduler.go in
#90 (kept conceptually, but now via the shared helper).
Phase 32 foundation. The SaaS control plane (private molecule-controlplane
repo) provisions one platform instance per customer org on Fly Machines
and sets MOLECULE_ORG_ID=<uuid> on the machine. Its subdomain router
forwards requests with X-Molecule-Org-Id=<uuid>.
TenantGuard:
- When MOLECULE_ORG_ID is set → every non-allowlisted request must carry a
matching X-Molecule-Org-Id header. Mismatched/missing header → 404 (not
403 — don't leak tenant existence by letting probers distinguish "wrong
org" from "route doesn't exist").
- When unset → passthrough. Self-hosted / dev / CI behavior unchanged.
- Allowlist is exact-match, not prefix — /health and /metrics only.
No orgs table, no signup, no billing, no Fly provisioning in this repo —
all that lives in the private control plane. The public repo's SaaS
surface is exactly this one middleware.
6 tests covering: unset-is-passthrough, matching header, mismatched
header 404 (with empty body), missing header 404, allowlist bypass, and
allowlist-is-exact-match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).
- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
C18 — Workspace URL hijacking (CRITICAL, CONFIRMED LIVE):
POST /registry/register now calls requireWorkspaceToken() before
persisting anything. If the workspace has any live auth tokens, the
caller must supply a valid Bearer token matching that workspace ID.
First registration (no tokens yet) passes through — token is issued
at end of this function (unchanged bootstrap contract). Mirrors the
same pattern already applied to /registry/heartbeat and
/registry/update-card. Attacker POC — overwriting Backend Engineer URL
to http://attacker.example.com:9999/steal — now returns 401.
C20 — Unauthenticated workspace deletion (CRITICAL, CONFIRMED LIVE):
DELETE /workspaces/:id moved from bare router into AdminAuth group.
Any valid workspace bearer token grants access (same fail-open
bootstrap contract as /settings/secrets). Mass-deletion attack chain
(C19 list → C20 delete all) requires auth for the DELETE step.
POST /workspaces (create) also moved to AdminAuth to prevent
unauthenticated workspace creation.
C19 (GET /workspaces topology exposure) deferred — canvas browser
has no bearer token; fix requires canvas service-token refactor.
Tests: 2 new registry tests — C18 bootstrap (no tokens, passes
through and issues token), C18 hijack blocked (has tokens, no
bearer → 401).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unauthenticated routes allowed arbitrary read/write/delete of all
global platform secrets (API keys, provider credentials) with zero auth:
- GET/PUT/POST /settings/secrets
- DELETE /settings/secrets/:key
- GET/POST/DELETE /admin/secrets (legacy aliases)
Fix: new AdminAuth middleware with same lazy-bootstrap contract as
WorkspaceAuth — fail-open when no tokens exist (fresh install / pre-Phase-30
upgrade), enforce once any workspace has a live token. Any valid workspace
bearer token grants access (platform-wide scope, no workspace binding needed).
Changes:
wsauth/tokens.go — HasAnyLiveTokenGlobal + ValidateAnyToken functions
wsauth/tokens_test.go — 5 new tests covering both new functions
middleware/wsauth_middleware.go — AdminAuth middleware
router/router.go — global secrets routes now registered under adminAuth group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
secrets.Values: workspaces with no live token are grandfathered through.
Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.
Fix A — platform/internal/router/router.go:
Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
and /a2a remain on root router; all other /workspaces/:id/* sub-routes
moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
CORS AllowHeaders updated to include Authorization so browser/agent callers
can send the bearer token cross-origin.
Fix B — workspace-template/heartbeat.py:
_check_delegations(): validate source_id == self.workspace_id before
accepting a delegation result. Attacker-crafted records with a foreign
source_id are silently skipped with a WARNING log (injection attempt).
trigger_msg no longer embeds raw response_preview text; references
delegation_id + status only — removes the prompt-injection vector.
Fix C — workspace-template/skill_loader/loader.py:
load_skill_tools(): before exec_module(), verify script is within
scripts_dir (path traversal guard) and temporarily scrub sensitive env
vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
in finally block. Defence-in-depth even if /plugins auth gate is bypassed.
Fix D — platform/internal/handlers/socket.go:
HandleConnect(): agent connections (X-Workspace-ID present) validated via
wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
Canvas clients (no X-Workspace-ID) remain unauthenticated.
Fix D — workspace-template/events.py:
PlatformEventSubscriber._connect(): include platform_auth bearer token in
WebSocket upgrade headers alongside X-Workspace-ID.
Fix E — workspace-template/executor_helpers.py:
recall_memories() and commit_memory() now pass platform_auth bearer token
in Authorization header so WorkspaceAuth middleware allows access.
Fix F — workspace-template/a2a_client.py:
send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.
Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete empty platform/plugins/ (dead remnant; plugins/ at repo root is
the real registry; router.go comment updated)
- Gitignore local dev cruft: platform/workspace-configs-templates/,
.agents/ (codex/gemini skill cache), backups/
- Untrack .agents/skills/ (keep local, stop tracking)
- Move examples/remote-agent/ → sdk/python/examples/remote-agent/
(co-locate with the SDK it exercises); update refs in
molecule_agent README + __init__ + PLAN.md + the demo's own README
- Move docs/superpowers/plans/ → plugins/superpowers/plans/
(plans were written by the superpowers plugin's writing-plans
subskill; belong with the plugin, not under docs)
- Add tests/README.md explaining the unit-tests-per-package +
root-E2E split so new contributors don't ask
- Add docs/README.md explaining why site tooling lives under docs/
rather than a separate docs-site/ (VitePress ergonomics)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>