# Edit History — 2026-04-06 ## Summary Merged PR from `HongmingWang-Rabbit/molecule-monorepo#1` (Claude Code workspace runtime + A2A delegation + canvas improvements — 46 commits, 2,548 additions). Then performed comprehensive code review across all 3 layers (Python, Go, TypeScript) and fixed 18 issues (5 critical, 10 warnings, 3 suggestions). ## Merged PR: Claude Code Workspace Runtime - **CLI-based workspace runtimes** — unified executor for Claude Code, Codex, Ollama, or custom CLI agents - **A2A delegation via MCP + CLI** — `delegate_task`, `delegate_task_async`, `check_task_status`, `list_peers` - **Canvas improvements** — legend panel, communication overlay, chat persistence with session sidebar, confirmation dialogs, enhanced thinking indicator - **Platform fixes** — offline→online heartbeat recovery, file API writes to correct config dir, restart uses workspace's own config, configurable rate limiter, Docker-in-Docker mount resolution - **Security** — unique temp files, shlex.quote for tokens, subprocess kill on timeout, path traversal prevention ## Code Review Fixes (18 issues) ### Critical (5 fixed) 1. **ChatTab.tsx** — Elapsed time calculation was `Date.now() - Date.now() + thinkingStartTime` (always equals `thinkingStartTime`). Fixed to `Date.now() - thinkingStartTime`. 2. **Canvas.tsx** — `saveTimerRef` debounce timer never cleared on component unmount. Added `useEffect` cleanup. 3. **workspace.go Update handler** — All 5 `ExecContext` calls in `Update()` silently discarded errors. Added `log.Printf` on each. 4. **workspace.go Delete handler** — All 4 cascade delete `ExecContext` calls ignored errors. Added `log.Printf` on each. 5. **cli_executor.py** — Temp files leaked if exception occurred between `mkstemp` and `_temp_files.append()`. Moved `append()` immediately after creation. ### Warnings (10 fixed) 6. **a2a_cli.py** — `resp.json()` could crash on malformed JSON response. Wrapped in try/except. 7. **a2a_mcp_server.py** — `chunk.decode()` could crash on invalid UTF-8. Added `errors="replace"`. 8. **a2a_cli.py** — Async mode timeout returned misleading `"submitted_timeout"` status. Changed to `"uncertain"` on stderr. 9. **templates.go** — Config files written with 0644 (world-readable). Changed all 4 occurrences to 0600. 10. **CommunicationOverlay.tsx** — `fetchComms` callback recreated on every `nodes` change, causing interval reset. Stabilized with `useRef`. 11. **ContextMenu.tsx** — Delete confirmation dialog orphaned when context menu closed externally. Added `useEffect` cleanup. 12. **ContextMenu.tsx** — No loading guard on export/duplicate async actions. Added `actionLoading` state to prevent double clicks. 13. **cli_executor.py** — `config.args` appended after prompt, breaking CLI flag parsing. Moved before prompt. 14. **main.py** — Any non-`langgraph` runtime silently treated as CLI. Added validation warning for unknown values. 15. **provisioner.go** — Created container not cleaned up if `ContainerStart` failed. Added `ContainerRemove` on failure. ### Suggestions (3 fixed) 16. **router.go** — CORS origins hardcoded to localhost. Now configurable via `CORS_ORIGINS` env var (comma-separated). 17. **config.py** — `int()` conversion on tier crashed on non-numeric YAML. Added `.isdigit()` guard with default 1. 18. **ChatTab.tsx** — `loadSessions()` called twice during mount. Consolidated to single call shared between state initializers. ## Provisioner Auto-Setup (URL Resolution) Fixed the core issue preventing workspace chat from working after creation without manual intervention: - **provisioner.go** — Now inspects container after start to resolve the actual host-mapped ephemeral port (`127.0.0.1:`), instead of returning the Docker-internal URL. The host URL is stored in DB and Redis, preserved by the registry's `ON CONFLICT` clause when the agent self-registers. - **workspace.go** — `provisionWorkspace` now also caches the Docker-internal URL (`ws-:8000`) for inter-container discovery. - **discovery.go** — When a workspace discovers another workspace (via `X-Workspace-ID` header), constructs the Docker-internal URL from the container name convention (`ws-:8000`) when the Redis cache is empty. This enables inter-agent A2A delegation. Before: create workspace → agent registers with Docker hostname → proxy gets 502 → manual re-registration needed. After: create workspace → provisioner stores host URL → proxy works immediately. ## Grid Layout for Embedded Team Members - **WorkspaceNode.tsx** — Departments render in a 3-column grid at depth 0 (was single column). Sub-teams use 2-column grid at depth 1+. Root nodes wider (720-960px) to accommodate side-by-side layout. Company org chart now fits in one screen without scrolling. ## Chat UX Improvements - **ChatTab.tsx** — 502/503/timeout errors show user-friendly messages ("CEO is not responding. The agent container may not be running. Try restarting the workspace.") instead of raw API error dumps. Input disables after failure. Agent unreachable state shown in empty chat and placeholder. - **ChatTab.tsx** — Agent and system messages now render markdown (bold, lists, code blocks, headers, tables) via `react-markdown` + `remark-gfm` + `@tailwindcss/typography`. User messages stay as plain text. ## Workspace Config Cleanup - **`.gitignore`** — Added `workspace-configs-templates/ws-*` to exclude auto-generated provisioner instance configs (not templates, shouldn't be committed). - Removed 15 stale `ws-*` instance directories from the templates folder. ## Test Infrastructure - **test_api.sh** — Fixed degraded status test to re-register before high error rate heartbeat (avoids Redis TTL expiry race). - **test_activity_e2e.sh** — Fixed assertion to match actual Go binding error field name (`ActivityType` not `activity_type`). - Full clean-slate E2E verified: nuke → setup → create 11 workspaces → all online with HOST URLs → 21/21 tests pass (peer discovery, access control, chat, delegation, activity logs, current task, URL auto-resolution). ## Code Review Round 2 (7 fixes) ### Critical (2 fixed) 1. **workspace.go** — `workspaceID[:12]` panics on IDs shorter than 12 chars. Added length guard matching `containerName()` pattern. 2. **discovery.go** — Fallback URL synthesis returned a Docker-internal URL even for non-existent or offline workspaces. Now checks workspace status (online/degraded) before constructing URL. ### Warnings (3 fixed) 3. **discovery.go** — `CacheInternalURL` error silently discarded (inconsistent with workspace.go). Added `log.Printf`. 4. **ChatTab.tsx** — `ReactMarkdown` rendered for both agent and system messages. System error messages (containing `*`, `#`, etc.) could produce unexpected formatting. Now only renders markdown for `role === "agent"`. 5. **ChatTab.tsx** — `thinkingStartTime` state used in `setInterval` closure was stale (captured before `setThinkingStartTime` applied). Replaced with ref + local variable captured at effect creation time. ### Suggestions (2 fixed) 6. **tailwind.config.ts** — `require("@tailwindcss/typography")` replaced with ESM `import typography` for consistency with TypeScript config. 7. **ci.yml** — CI Node.js bumped from 20 to 22 (LTS). Lock file (lockfileVersion 3, npm 11) had `@emnapi` resolution differences with Node 20's npm 10, causing `npm ci` to fail. ## Code Review Round 3 (DRY + hardening) ### Refactor: Exported `provisioner.ContainerName()` / `provisioner.InternalURL()` The `ws-:8000` URL construction was duplicated in discovery.go, workspace.go, and terminal.go. Exported the provisioner's existing helpers and replaced all inline duplications. Prevents drift if naming convention changes. ### Fix: Discovery fall-through returned host URLs to container callers When a workspace-to-workspace discovery request hit a workspace that was offline/provisioning/failed, the code fell through to the external URL path and returned `http://127.0.0.1:` — unreachable from inside Docker. Now returns `503 workspace not available` (with status) or `404 workspace not found`. ### Fix: Dead `thinkingStartRef` removed (ChatTab.tsx) Round 2 replaced `thinkingStartTime` state with a ref + local variable. The ref was written but never read — only the local `startTime` in the closure was used. Removed the dead ref entirely. ### Fix: Terminal.go container name lookup Replaced inline `"ws-"+workspaceID[:12]` with `provisioner.ContainerName()`. Cached the result in a local `name` variable to avoid calling the function twice. ### Hardening: `.gitignore` comprehensiveness Added 12 missing patterns: `.awareness/`, `**/.next/`, `mcp-server/dist/`, `dist/`, `.pytest_cache/`, `coverage/`, `.nyc_output/`, `*.db`/`*.sqlite*`, `postgres_data/`/`redis_data/`, `.env.production`, `*.bundle.json`. ## CLI Executor Fixes ### Fix: Claude Code exit code 1 with valid output Claude Code sometimes exits with code 1 but still produces valid output on stdout (e.g. MCP tool failures that don't prevent a response). The executor now accepts stdout output regardless of exit code (`if proc.returncode == 0 or stdout_text`). Also added detailed stderr/stdout logging on non-zero exit. ### Fix: Empty description crashes AgentCard (main.py) Pydantic's `AgentCard` requires a non-null string for `description`. Auto-generated configs had `description: ""`. Fixed with `config.description or config.name`. ### Fix: No timeout on A2A proxy and CLI executor Removed all artificial timeouts from the A2A proxy (`http.Client{}`), CLI executor (`timeout: 0` → `await proc.communicate()` without `wait_for`), and MCP delegation client (`httpx timeout=None`). Delegation chains (PM → Lead → Agent) can take arbitrarily long — agent liveness is monitored via heartbeat, not proxy deadlines. Proxy uses `context.WithoutCancel(ctx)` to survive client disconnect while still canceling on server shutdown. ## Restart Handler Fixes ### Fix: Template resolution by config.yaml name field `findTemplateByName("PM")` normalized to `"pm"` but the template dir is `org-pm`. Added a second pass that reads `config.yaml` files in template dirs and matches by the `name:` field. ### Fix: Stale ws-* config dirs take precedence on restart A previous restart's `ensureDefaultConfig` created a `ws-/` dir with only `config.yaml` (wrong runtime, empty description). On next restart, the ownDir check found it and used it. Fixed: only use ownDir if it contains more than just `config.yaml` (meaning files were uploaded via the Files API). ## Live Activity Feed (ChatTab) Replaced the fake rotating status messages ("Analyzing your request...", "Almost there...") with a **real-time activity feed** powered by WebSocket events: - Opens a dedicated WebSocket while `sending=true` - Listens for `ACTIVITY_LOGGED` events across all workspaces - Shows color-coded delegation progress: `→ Delegating to Marketing Lead...` (blue), `← Marketing Lead responded (42s)` (green), `⚠ error` (red) - MCP server now reports `a2a_send` activity before each delegation call ## WebSocket Health Check (socket.ts) Added periodic rehydration to the canvas WebSocket — if no events arrive for 30s, automatically re-fetches workspace state from the API. Prevents the canvas from showing stale offline status when agents recover between heartbeat cycles without a WebSocket event. ## Shared Workspace Mount (WORKSPACE_DIR) Added `WORKSPACE_DIR` env var for the platform. When set, all provisioned workspace containers bind-mount the host directory as `/workspace` instead of using isolated Docker named volumes. This gives all agents read/write access to the same codebase. ## Default Org Setup (setup-org.sh) Created `setup-org.sh` — reproducible script that creates the full 15-agent org hierarchy: - PM → Marketing Lead (Content Writer, SEO Specialist, Social Media Manager) - PM → Research Lead (Market Analyst, Technical Researcher, Competitive Intelligence) - PM → Dev Lead (Frontend Engineer, Backend Engineer, DevOps Engineer, Security Auditor, QA Engineer) All agents use Claude Code runtime with shared OAuth token. Script also extracts the token from macOS keychain and distributes to all `org-*` templates. ## Canvas Agent Task Visibility ### Live current_task on workspace cards CLI executor now reports `current_task` via immediate heartbeat push when starting/finishing a request. The MCP server also pushes `current_task` when delegating. Each workspace card on the canvas shows an amber task banner with what the agent is currently working on — visible across the entire org chart in real time. - `heartbeat.py` — added `current_task` field to heartbeat payload - `cli_executor.py` — calls `_set_current_task(summary)` on execute start, clears on finish via try/finally - `a2a_mcp_server.py` — pushes `current_task` heartbeat alongside `report_activity` on delegation ### Session continuity (Claude Code --resume) CLI executor now maintains conversation state across messages using Claude Code's `--resume` flag: - First message: runs with `--output-format json` to capture `session_id` - Subsequent messages: runs with `--resume ` to continue the conversation - System prompt only injected on first message (resumed sessions already have it) ### Chat input textarea Replaced single-line `` with auto-growing `