After a workspace restart (HTTP /restart or programmatic RestartByID) and
re-registration, the platform sends a synthetic A2A message/send to the
workspace containing:
- restart timestamp
- previous session end timestamp + human duration
- env-var keys now available (keys only — never values)
The message is rendered in the format proposed in #19 and marked with
metadata.kind=restart_context so agents can detect and handle it
specifically if they choose.
Skip path: if the workspace doesn't re-register within 30s, log and drop.
The Restart HTTP response is unaffected by delivery success.
Layer 2 (user-defined restart_prompt via config.yaml / org.yaml) is
deferred — tracked as a separate follow-up issue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Global secrets (e.g. CLAUDE_CODE_OAUTH_TOKEN) are injected as container env
vars at Start() time. Until now, rotating one only propagated to a workspace
on the next full restart-from-zero, which manual ops had to drive via a
`POST /workspaces/:id/restart` loop. Tier-3 Claude Code agents hit the
stale-token path first and surfaced as 401s inside the SDK.
Restart-time re-read of global_secrets + workspace_secrets was already
correct in `provisionWorkspaceOpts` — the missing piece was the trigger.
SetGlobal / DeleteGlobal now enqueue RestartByID for every non-paused,
non-removed, non-external workspace that does NOT shadow the key with a
workspace-level override. Matches the existing behaviour of workspace-scoped
`Set` / `Delete`.
Adds two sqlmock-backed tests exercising both branches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves#14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and
TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults
agreed in the issue:
- T2: 512 MiB / 1024 shares (1 CPU) — unchanged baseline
- T3: 2048 MiB / 2048 shares (2 CPU) — new cap (previously uncapped)
- T4: 4096 MiB / 4096 shares (4 CPU) — new cap (previously uncapped)
CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the
value is translated to NanoCPUs for a hard allocation so behaviour
remains deterministic across hosts. Malformed or non-positive env
values silently fall back to the default.
Behaviour change note: T3 and T4 previously had no explicit cap.
Operators who relied on unlimited can set very large TIERn_MEMORY_MB /
TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited
semantics if required.
Tests:
- TestGetTierMemoryMB_DefaultsMatchLegacy
- TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback)
- TestGetTierCPUShares_EnvOverride
- TestApplyTierConfig_T3_UsesEnvOverride (wiring)
- TestApplyTierConfig_T3_DefaultCap (documents the new cap)
Docs: .env.example section + CLAUDE.md platform env-vars list updated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves#12. The claude-code SDK stores conversations in
/root/.claude/sessions/ and Postgres tracks current_session_id, but the
container filesystem was recreated on every restart — next agent message
failed with "No conversation found with session ID: <uuid>".
Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted
read-write at /root/.claude/sessions. Gated by runtime=claude-code so
other runtimes don't pay for a path they don't use. Volume is cleaned up
in RemoveVolume alongside the config volume.
Two opt-outs discard the volume before restart for a fresh session:
- env WORKSPACE_RESET_SESSION=1 on the container
- POST /workspaces/:id/restart?reset=true (or {"reset": true} body)
Plumbed via new ResetClaudeSession field on WorkspaceConfig +
provisionWorkspaceOpts helper so the flag stays request-scoped (not
persisted on CreateWorkspacePayload).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).
- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves#17.
Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id
or name starts with known test placeholder prefixes (aaaaaaaa-, etc.)
and force-removes the paired Docker container. Documented in
tests/README.md.
Part B: add a pre-flight check in provisionWorkspace() — when neither a
template path nor in-memory configFiles supplies config.yaml, probe the
existing named volume via a throwaway alpine container. If the volume
lacks config.yaml, mark the workspace status='failed' with a clear
last_sample_error instead of handing it to Docker's unless-stopped
restart policy (which otherwise loops forever on FileNotFoundError).
New pure helper provisioner.ValidateConfigSource + unit tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
C18 — Workspace URL hijacking (CRITICAL, CONFIRMED LIVE):
POST /registry/register now calls requireWorkspaceToken() before
persisting anything. If the workspace has any live auth tokens, the
caller must supply a valid Bearer token matching that workspace ID.
First registration (no tokens yet) passes through — token is issued
at end of this function (unchanged bootstrap contract). Mirrors the
same pattern already applied to /registry/heartbeat and
/registry/update-card. Attacker POC — overwriting Backend Engineer URL
to http://attacker.example.com:9999/steal — now returns 401.
C20 — Unauthenticated workspace deletion (CRITICAL, CONFIRMED LIVE):
DELETE /workspaces/:id moved from bare router into AdminAuth group.
Any valid workspace bearer token grants access (same fail-open
bootstrap contract as /settings/secrets). Mass-deletion attack chain
(C19 list → C20 delete all) requires auth for the DELETE step.
POST /workspaces (create) also moved to AdminAuth to prevent
unauthenticated workspace creation.
C19 (GET /workspaces topology exposure) deferred — canvas browser
has no bearer token; fix requires canvas service-token refactor.
Tests: 2 new registry tests — C18 bootstrap (no tokens, passes
through and issues token), C18 hijack blocked (has tokens, no
bearer → 401).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /registry/register accepted any URL string and persisted it as
the workspace's A2A endpoint — an attacker could register a workspace
with url=http://169.254.169.254/latest/meta-data/ and cause the platform
to proxy requests to the cloud metadata service when proxying A2A traffic.
Fix: validateAgentURL() helper rejects:
- empty URL
- non-http/https schemes (file://, ftp://, etc.)
- 169.254.0.0/16 link-local IPs (AWS/GCP/Azure IMDS endpoints)
Allows RFC-1918 private ranges (Docker networking uses 172.16-31.x.x).
Adds 12 unit tests covering valid Docker-internal URLs and all SSRF vectors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unauthenticated routes allowed arbitrary read/write/delete of all
global platform secrets (API keys, provider credentials) with zero auth:
- GET/PUT/POST /settings/secrets
- DELETE /settings/secrets/:key
- GET/POST/DELETE /admin/secrets (legacy aliases)
Fix: new AdminAuth middleware with same lazy-bootstrap contract as
WorkspaceAuth — fail-open when no tokens exist (fresh install / pre-Phase-30
upgrade), enforce once any workspace has a live token. Any valid workspace
bearer token grants access (platform-wide scope, no workspace binding needed).
Changes:
wsauth/tokens.go — HasAnyLiveTokenGlobal + ValidateAnyToken functions
wsauth/tokens_test.go — 5 new tests covering both new functions
middleware/wsauth_middleware.go — AdminAuth middleware
router/router.go — global secrets routes now registered under adminAuth group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
secrets.Values: workspaces with no live token are grandfathered through.
Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.
Fix A — platform/internal/router/router.go:
Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
and /a2a remain on root router; all other /workspaces/:id/* sub-routes
moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
CORS AllowHeaders updated to include Authorization so browser/agent callers
can send the bearer token cross-origin.
Fix B — workspace-template/heartbeat.py:
_check_delegations(): validate source_id == self.workspace_id before
accepting a delegation result. Attacker-crafted records with a foreign
source_id are silently skipped with a WARNING log (injection attempt).
trigger_msg no longer embeds raw response_preview text; references
delegation_id + status only — removes the prompt-injection vector.
Fix C — workspace-template/skill_loader/loader.py:
load_skill_tools(): before exec_module(), verify script is within
scripts_dir (path traversal guard) and temporarily scrub sensitive env
vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
in finally block. Defence-in-depth even if /plugins auth gate is bypassed.
Fix D — platform/internal/handlers/socket.go:
HandleConnect(): agent connections (X-Workspace-ID present) validated via
wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
Canvas clients (no X-Workspace-ID) remain unauthenticated.
Fix D — workspace-template/events.py:
PlatformEventSubscriber._connect(): include platform_auth bearer token in
WebSocket upgrade headers alongside X-Workspace-ID.
Fix E — workspace-template/executor_helpers.py:
recall_memories() and commit_memory() now pass platform_auth bearer token
in Authorization header so WorkspaceAuth middleware allows access.
Fix F — workspace-template/a2a_client.py:
send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.
Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up to the quality-fixes-pass2 code review.
## Go: direct unit tests for PR #5 extracted helpers (~47 new tests)
a2a_proxy_test.go:
- resolveAgentURL: cache hit, cache-miss DB hit, not-found, null-URL,
docker-rewrite guard
- dispatchA2A: build error, canvas timeout, agent timeout, success
- handleA2ADispatchError: context deadline, generic error, build error
- maybeMarkContainerDead: nil-provisioner, runtime=external short-circuits
- logA2AFailure, logA2ASuccess: activity_logs row content + status
delegation_test.go:
- bindDelegateRequest: valid / malformed / bad-UUID
- lookupIdempotentDelegation: no-key / no-match / failed-row-deleted / existing-pending
- insertDelegationRow: insertOK / insertHandledByIdempotent /
insertTrackingUnavailable
- insertDelegationOutcome: zero-value is insertOutcomeUnknown sentinel
discovery_test.go:
- discoverWorkspacePeer: online / not-found / access-denied + 2 edges
- writeExternalWorkspaceURL: 3 cases
- discoverHostPeer: smoke test documents the unreachable-by-design path
activity_test.go:
- parseSessionSearchParams: defaults + custom limit/offset/q
- buildSessionSearchQuery: no-filters + with-query shapes
- scanSessionSearchRows: empty / single / multiple rows
Package coverage: 56.1% → 57.6%. Every helper extracted in PR #5 is
now at or near 100% line coverage (see PR notes for the 4 remaining
gaps, all blocked on provisioner interface mockability).
## Defensive enum zero-value fix
insertDelegationOutcome now starts with insertOutcomeUnknown=0 as a
sentinel so an un-initialized variable can't silently read as
"success". insertOK, insertHandledByIdempotent, insertTrackingUnavailable
shift to 1/2/3. No caller changes needed.
## Canvas: ConfirmDialog.singleButton test (5 cases)
canvas/src/components/__tests__/ConfirmDialog.test.tsx covers:
- default render (both buttons)
- singleButton hides Cancel
- singleButton: Escape still fires onCancel
- singleButton: backdrop-click still fires onCancel
- singleButton: onConfirm fires on click
vitest total: 352 → 357, all passing.
## Docstring clarity
ConfirmDialog.tsx: expanded singleButton prop comment to explicitly
instruct callers to pass the same handler for onConfirm/onCancel when
using it as an info toast (matches TemplatePalette usage).
## ErrorBoundary clipboard observability
.catch(() => {}) silently swallowed rejections. Now:
.catch((e) => console.warn("clipboard write failed:", e))
so permission-denied / insecure-context failures surface in the console.
## Verification
- go build ./... clean
- go vet ./... clean
- go test -race ./internal/... — all pass
- canvas npm run build — clean
- canvas npm test -- --run — 357/357 pass
- tests/e2e/test_api.sh — 46/62 pass; all 16 failures are pre-existing
(token-auth enforcement + stale test workspaces + missing Docker
network). None involve handlers touched in PR #5.
- Manual: platform + canvas running locally, title=Molecule AI,
/workspaces returns [], /health returns ok. Identified + killed a
stale Next.js server from the old Starfire-AgentTeam repo that was
serving the old brand on IPv4 port 3000.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Post-review fixes on top of the quality-pass-2 branch.
1. delegation.go: replaced insertDelegationRow's (bool, bool) return
with a typed insertDelegationOutcome enum (insertOK /
insertHandledByIdempotent / insertTrackingUnavailable). Eliminates
the positional-boolean decoding the caller had to do. Internal, no
behavior change.
2. ConfirmDialog.tsx: added singleButton prop. When true, hides the
Cancel button for single-action info toasts (Esc still dismisses
via onCancel). TemplatePalette's import notice uses it.
3. ErrorBoundary.tsx: fixed the floating clipboard promise. Added
.catch(() => {}) so a rejected writeText (permission denied,
insecure context) doesn't surface as unhandled rejection.
4. a2a_proxy_test.go: added 5 direct unit tests for
normalizeA2APayload (invalid JSON, wraps-bare, preserves-existing-
id, preserves-existing-messageId, missing-method). Fills the unit-
test gap for the helper extracted in the last pass.
Verification:
- go test -race ./internal/handlers/... passes (incl. 5 new tests)
- go build ./... clean
- canvas npm run build clean
- canvas npm test -- --run -> 352/352
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete empty platform/plugins/ (dead remnant; plugins/ at repo root is
the real registry; router.go comment updated)
- Gitignore local dev cruft: platform/workspace-configs-templates/,
.agents/ (codex/gemini skill cache), backups/
- Untrack .agents/skills/ (keep local, stop tracking)
- Move examples/remote-agent/ → sdk/python/examples/remote-agent/
(co-locate with the SDK it exercises); update refs in
molecule_agent README + __init__ + PLAN.md + the demo's own README
- Move docs/superpowers/plans/ → plugins/superpowers/plans/
(plans were written by the superpowers plugin's writing-plans
subskill; belong with the plugin, not under docs)
- Add tests/README.md explaining the unit-tests-per-package +
root-E2E split so new contributors don't ask
- Add docs/README.md explaining why site tooling lives under docs/
rather than a separate docs-site/ (VitePress ergonomics)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>