Resolves#12. The claude-code SDK stores conversations in
/root/.claude/sessions/ and Postgres tracks current_session_id, but the
container filesystem was recreated on every restart — next agent message
failed with "No conversation found with session ID: <uuid>".
Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted
read-write at /root/.claude/sessions. Gated by runtime=claude-code so
other runtimes don't pay for a path they don't use. Volume is cleaned up
in RemoveVolume alongside the config volume.
Two opt-outs discard the volume before restart for a fresh session:
- env WORKSPACE_RESET_SESSION=1 on the container
- POST /workspaces/:id/restart?reset=true (or {"reset": true} body)
Plumbed via new ResetClaudeSession field on WorkspaceConfig +
provisionWorkspaceOpts helper so the flag stays request-scoped (not
persisted on CreateWorkspacePayload).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docs/edit-history/2026-04-14.md: append tick-3 section covering the
admin test-token route (#53), the prior-tick doc-sync PR (#54), and
the hermes required_env alignment (#55). Record measured test counts
(Go +4 for the TestAdminTestToken_* quartet).
- CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the
new quartet. Route-table row and env-var mentions for the admin
route already landed with #53; verified on main.
- .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about
the prod-hidden default. Closes the code-review doc-sync flag from
#53 (var was in CLAUDE.md but missing from .env.example).
No PLAN.md / README.md / README.zh-CN.md update needed — none of the
three merges expose a user-visible surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hermes config required NOUS_API_KEY but the executor
(workspace-template/adapters/hermes/executor.py from PR #49) checks
HERMES_API_KEY and OPENROUTER_API_KEY. A workspace created from this
template would have the provisioner block on a missing NOUS_API_KEY
even when HERMES_API_KEY was set, or pass provisioning but fail at
executor init. .env.example already documents HERMES_API_KEY.
Fix: rename the required_env entry to HERMES_API_KEY and update the
comments to match the executor's actual fallback order (HERMES_API_KEY
first, OPENROUTER_API_KEY second).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).
- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UIUX Designer figured out at runtime (Run 6, 2026-04-14) how to get
Playwright working without a Dockerfile change:
LD_LIBRARY_PATH="/home/agent/.cache/ms-playwright/firefox-1509/firefox"
node script.cjs
Using @sparticuz/chromium + puppeteer-core, and borrowing the NSS/NSPR
libs bundled with Playwright's Firefox binary. This resolves every missing
lib on the container without needing apt-get or image rebuild.
Agent memory persists the trick across restarts, but a fresh org-template
import (new user) would have to rediscover it. Baking the recipe into the
cron prompt so every clone inherits day-one screenshot capability.
Evidence it works (from Run 6 memory):
- 14 screenshots captured and vision-analysed
- Found 2 new criticals (C4 onboarding-guide a11y, C5 settings panel white
refresh button confirmed in production) that only surface via live DOM
- Full user-flow coverage: home → create → settings → help → templates →
mobile 375 → responsive 1280
Replaces the previous "best-effort + fall back to HTML" wording with a
specific, proven command path. Falls back on HTML only if the browser
genuinely won't launch (e.g. host.docker.internal:3000 down).
Template-level fix; the general platform-level path would be to ship
these libs in the workspace-template image directly (future Dockerfile
change — out of scope here).
Observed 2026-04-14 morning: audit crons (Security, UIUX, QA) were flowing
messages into PM per the PR #26 contract, but PM stopped sub-delegating to
Dev Lead ~10 hours ago. Meanwhile audits started opening PRs directly
(bypassing Dev Lead), and Dev Lead / BE / FE / DevOps / QA sat idle for
17+ maintenance cycles despite PRs continuing to land.
Root cause: PM's system prompt defined delegation behavior for "tasks from
CEO" but didn't explicitly treat audit summaries as tasks. PM was reading
"audit of SHA X, filed issue #N, top recommendation: fix Y" as a status
report and committing it to memory without triggering the dispatch chain.
Adds a dedicated "Audit Routing" section to PM's prompt that:
- Treats every audit summary with open issue numbers as a dispatch trigger
- Specifies routing by category (security→BE, ui→FE, infra→DevOps, qa→QA)
- Requires parallel `delegate_task_async` when issues span categories
- Makes clean-cycle acks the only no-op case
This turns PM from a receptionist into a dispatcher — which was the
original intent of the audit-routing contract in #26.
Aligns with the north-star goal (keep the team running 24/7): dead idle
windows when audits had live issue numbers is a defect in orchestration,
not a quiet period.
Adds HERMES_API_KEY to .env.example with a cross-reference to the
OPENROUTER_API_KEY fallback, and adds the hermes runtime row to the
CLAUDE.md runtime table so the new adapter is discoverable alongside
its siblings (langgraph, claude-code, openclaw, crewai, autogen,
deepagents).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves#17.
Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id
or name starts with known test placeholder prefixes (aaaaaaaa-, etc.)
and force-removes the paired Docker container. Documented in
tests/README.md.
Part B: add a pre-flight check in provisionWorkspace() — when neither a
template path nor in-memory configFiles supplies config.yaml, probe the
existing named volume via a throwaway alpine container. If the volume
lacks config.yaml, mark the workspace status='failed' with a clear
last_sample_error instead of handing it to Docker's unless-stopped
restart policy (which otherwise loops forever on FileNotFoundError).
New pure helper provisioner.ValidateConfigSource + unit tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Docker Desktop (macOS/Windows), host-path bind mounts often appear
root-owned inside the container. The previous entrypoint only chowned
/workspace top-level, so agents (uid 1000) still couldn't write to
/workspace/repo/* — git clone, pip install, and file edits failed with
EACCES and fell back to /tmp. Detect the root-owned-contents case by
sampling the first entry; if it's root-owned, recursively chown the
tree. On normal Linux Docker with matching uids this is a no-op, so the
fast-startup path is preserved for the common case.
Part B of the issue (private-repo initial_prompt clone) was addressed
by PR #20.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_peers() was sending no auth headers to /registry/:id/peers — this would
return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware)
deploys, breaking peer discovery entirely.
discover_peer() had X-Workspace-ID but was missing the bearer token, also
required by Phase 30.6 for /registry/discover/:id.
Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}.
get_workspace_info() was already correct (auth_headers() present since PR #39).
Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the
discover_peer header assertion to use presence-check rather than exact equality.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Z as a keyboard equivalent for the existing double-click zoom-to-team
gesture (WCAG 2.1.1). When a team node is selected, pressing Z dispatches
molecule:zoom-to-team, which fitBounds to the parent and all children.
Input elements are guarded so Z still types normally in text fields.
Adds a 6th help panel entry documenting the Dbl-click / Z gesture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
globals.css: append @media (prefers-reduced-motion: reduce) block that zeroes
animation/transition durations, disables .animate-in/.slide-in-from-* entry
animations (Toaster, ApprovalBanner, SidePanel slide), strips dashdraw and
node-appear keyframes from React Flow elements.
Components: replace all bare animate-pulse (13 occurrences across WorkspaceNode,
StatusDot, Toolbar, SidePanel, Legend, SearchDialog, TerminalTab, TemplatePalette)
with motion-safe:animate-pulse so status indicator pulsing stops for users with
vestibular disorders. Replace 3 animate-bounce occurrences in ChatTab typing
indicator with motion-safe:animate-bounce.
Tests: new canvas/src/__tests__/reduced-motion.test.ts (12 tests) verifies the
@media block is present in globals.css and that every component file uses the
motion-safe: variant rather than bare animation classes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
H3 (compliance.py): GitHub fine-grained PATs use the github_pat_ prefix
with an 82-character alphanumeric+underscore suffix — different from
classic tokens (36 chars). Add the missing pattern to _PII_PATTERNS so
fine-grained PATs are redacted in compliance logs alongside classic tokens.
M4 (platform_auth.py): Replace write_text()+chmod() in save_token() with
os.open(O_WRONLY|O_CREAT|O_TRUNC, 0o600) + os.write(). The old approach
had a TOCTOU window where a concurrent reader could access the token file
before chmod restricted permissions. os.open with explicit mode creates the
file with 0600 permissions atomically in a single syscall.
H2 (a2a_client.py): Already fixed in commit 6c78962 (Cycle 5); no-op.
Tests: 1136 passed, 2 skipped (workspace-template pytest suite)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IMPACT WITHOUT THIS FIX: deploying PR #31 (WorkspaceAuth middleware on
/workspaces/*) without this patch causes EVERY delegation cycle to silently
break — the heartbeat poll returns 401, the self-message A2A POST returns
401, agents never wake up after task completion, and memory consolidation
stops. The entire multi-agent coordination system degrades to single-shot
interactions with no result delivery.
Changes (all using the existing platform_auth.auth_headers() pattern
already used for POST /registry/heartbeat):
heartbeat.py — 5 calls fixed:
- GET /workspaces/:id/delegations (delegation poll)
- GET /workspaces/:id (self workspace info for parent lookup)
- GET /workspaces/{parent_id} (parent workspace name lookup)
- POST /workspaces/:id/a2a (self-message to wake agent on results)
- POST /workspaces/:id/notify (canvas delegation result notification)
Also moved `from platform_auth import auth_headers` from inline (per-call)
to module-level import so _check_delegations() can use it without re-importing.
consolidation.py — 4 calls fixed:
- GET /workspaces/:id/memories (fetch memories for consolidation)
- POST /workspaces/:id/memories (write consolidated summary — agent path)
- DELETE /workspaces/:id/memories/:id (delete original memories post-consolidation)
- POST /workspaces/:id/memories (write consolidated summary — fallback path)
a2a_client.py — 1 call fixed:
- GET /workspaces/:id (get_workspace_info())
⚠️ DEPLOYMENT NOTE: This PR MUST be merged and deployed at the same time as
PR #31 (WorkspaceAuth middleware). Deploying #31 without this fix will
immediately break all delegation result delivery.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a `canvas-deploy-reminder` job to ci.yml that fires on every
push to main once `canvas-build` passes. It posts a commit comment via
the built-in GITHUB_TOKEN (no new secrets needed) reminding whoever
monitors CI to run:
cd /g/personal_programs/molecule-monorepo
git pull origin main
docker compose build canvas && docker compose up -d canvas
The comment includes the commit SHA and a direct link to the build log.
Rationale: 5 consecutive merge cycles (PRs #21, #25, #30, #32, #34)
went undeployed because there is no auto-deploy hook and the manual
step was silently forgotten. A commit comment on the merge commit is
the lowest-friction reminder that requires no external secrets or infra.
Does NOT run on PRs — only on direct pushes to main (i.e. post-merge).
Uses `needs: canvas-build` so the reminder only fires after build+tests
pass; a failing build produces no comment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gap 1 — WS_URL now derives from NEXT_PUBLIC_PLATFORM_URL when
NEXT_PUBLIC_WS_URL is not set (http→ws, appends /ws; https→wss).
Operators need only one env var. NEXT_PUBLIC_WS_URL remains an explicit
override escape hatch.
Gap 2 — Add canvas/.env.example documenting NEXT_PUBLIC_PLATFORM_URL
(required) and NEXT_PUBLIC_WS_URL (optional override, commented out).
Gap 3 — Toolbar fires showToast("Live updates restored", "success")
when wsStatus transitions connecting→connected. mountedRef (set after
2 s) suppresses the toast on the very first page-load connection so
only genuine reconnects notify the user.
Gap 4 — New canvas/src/store/__tests__/socket.url.test.ts (6 tests):
· fallback to ws://localhost:8080/ws when no env set
· http→ws derivation from NEXT_PUBLIC_PLATFORM_URL
· https→wss derivation
· NEXT_PUBLIC_WS_URL override takes precedence
· api.ts PLATFORM_URL fallback
· api.ts reads NEXT_PUBLIC_PLATFORM_URL
375/375 tests passing, production build clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This PR gates DELETE /workspaces/:id behind AdminAuth. The E2E smoke
test's three DELETE calls (cleanup of echo, summarizer, re-imported
bundle) need to send Authorization: Bearer <token>. Any valid live
token is accepted — use the token issued to each workspace at
/registry/register.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a live/reconnecting/offline pill to the Toolbar so users can see
at a glance whether the canvas is receiving real-time updates.
Changes:
- canvas/src/store/canvas.ts: add wsStatus ('connected'|'connecting'|
'disconnected') field + setWsStatus action to CanvasState (initial:
'connecting')
- canvas/src/store/socket.ts: wire setWsStatus into ReconnectingSocket —
'connecting' on connect() call, 'connected' in onopen, 'connecting'
in onclose (will reconnect), 'disconnected' in disconnect()
- canvas/src/components/Toolbar.tsx: subscribe to wsStatus; render
WsStatusPill (green "Live" / amber pulsing "Reconnecting" / red
"Offline") after the workspace count section
- canvas/src/store/__tests__/socket.test.ts: add setWsStatus: vi.fn()
to the canvas store mock (global factory, beforeEach reset, and the
mid-test override in the onmessage test)
369/369 canvas tests passing, production build clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both watcher.py (ConfigWatcher) and skill_loader/watcher.py
(SkillsWatcher) used hashlib.md5() for file-integrity change detection.
MD5 is collision-prone: a crafted config file could produce the same
hash as a benign one, silently suppressing the hot-reload callback and
preventing agents from picking up legitimate config changes.
Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods.
Update docstrings, comments, and the type-annotation comment
(rel_path → md5 hex → sha256 hex).
Test update: test_skills_watcher.py — rename helper _md5 → _sha256,
update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and
rename the test from test_hash_file_returns_md5_for_existing_file to
test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests
pass.
Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5
(timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by
code review before opening this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C18 — Workspace URL hijacking (CRITICAL, CONFIRMED LIVE):
POST /registry/register now calls requireWorkspaceToken() before
persisting anything. If the workspace has any live auth tokens, the
caller must supply a valid Bearer token matching that workspace ID.
First registration (no tokens yet) passes through — token is issued
at end of this function (unchanged bootstrap contract). Mirrors the
same pattern already applied to /registry/heartbeat and
/registry/update-card. Attacker POC — overwriting Backend Engineer URL
to http://attacker.example.com:9999/steal — now returns 401.
C20 — Unauthenticated workspace deletion (CRITICAL, CONFIRMED LIVE):
DELETE /workspaces/:id moved from bare router into AdminAuth group.
Any valid workspace bearer token grants access (same fail-open
bootstrap contract as /settings/secrets). Mass-deletion attack chain
(C19 list → C20 delete all) requires auth for the DELETE step.
POST /workspaces (create) also moved to AdminAuth to prevent
unauthenticated workspace creation.
C19 (GET /workspaces topology exposure) deferred — canvas browser
has no bearer token; fix requires canvas service-token refactor.
Tests: 2 new registry tests — C18 bootstrap (no tokens, passes
through and issues token), C18 hijack blocked (has tokens, no
bearer → 401).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace raw Parent Workspace ID text input with a <select> populated
from GET /workspaces (T{tier} · {name} format, graceful fallback on
fetch error). Raise all interactive button text from text-[8px]/[9px]
to text-[11px] across SkillsTab, ScheduleTab, secrets-section,
ActivityTab, SidePanel, ChatTab; non-interactive labels/badges to
text-[10px]. Adds 7 CreateWorkspaceDialog unit tests (372/372 passing).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /registry/register accepted any URL string and persisted it as
the workspace's A2A endpoint — an attacker could register a workspace
with url=http://169.254.169.254/latest/meta-data/ and cause the platform
to proxy requests to the cloud metadata service when proxying A2A traffic.
Fix: validateAgentURL() helper rejects:
- empty URL
- non-http/https schemes (file://, ftp://, etc.)
- 169.254.0.0/16 link-local IPs (AWS/GCP/Azure IMDS endpoints)
Allows RFC-1918 private ranges (Docker networking uses 172.16-31.x.x).
Adds 12 unit tests covering valid Docker-internal URLs and all SSRF vectors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unauthenticated routes allowed arbitrary read/write/delete of all
global platform secrets (API keys, provider credentials) with zero auth:
- GET/PUT/POST /settings/secrets
- DELETE /settings/secrets/:key
- GET/POST/DELETE /admin/secrets (legacy aliases)
Fix: new AdminAuth middleware with same lazy-bootstrap contract as
WorkspaceAuth — fail-open when no tokens exist (fresh install / pre-Phase-30
upgrade), enforce once any workspace has a live token. Any valid workspace
bearer token grants access (platform-wide scope, no workspace binding needed).
Changes:
wsauth/tokens.go — HasAnyLiveTokenGlobal + ValidateAnyToken functions
wsauth/tokens_test.go — 5 new tests covering both new functions
middleware/wsauth_middleware.go — AdminAuth middleware
router/router.go — global secrets routes now registered under adminAuth group
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>