Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22 KiB
22 KiB
2026-04-08 Session
Summary
Fixed ChatTab agent reachability, added conversation history to all A2A adapters, added current_task heartbeat reporting, fixed WORKSPACE_PROVISIONING for restarts, fixed Config tab runtime dropdown, and improved config save/restart UX.
Changes
ChatTab — Agent Reachability Fix
- Problem: ChatTab called
GET /registry/discover/:idwithoutX-Workspace-IDheader → 400 error → "Agent not available" even though agent was online - Fix: Derived reachability from
data.status(online/degraded) instead of network call. Messages are proxied throughPOST /workspaces/:id/a2aso browser never needs the agent's internal URL. - Files:
canvas/src/components/tabs/ChatTab.tsx
Conversation History
- ChatTab now sends last 20 messages via
params.metadata.historyin A2Amessage/send a2a_executor.py: New_extract_history()function extracts history from request metadata- LangGraph/DeepAgents: History prepended as
("human"/"ai", text)tuples - CrewAI/AutoGen: History prepended as text prefix in task description
- Files:
ChatTab.tsx,a2a_executor.py, all adapter files
Current Task Heartbeat
- New shared
set_current_task(heartbeat, task)function ina2a_executor.py - All 5 adapters now set current_task during execution (truncated to 60 chars)
- Task cleared in
finallyblock after execution completes - Heartbeat passed from
AdapterConfigthroughcreate_executor()in all adapters - Files:
a2a_executor.py,langgraph/adapter.py,deepagents/adapter.py,crewai/adapter.py,autogen/adapter.py,openclaw/adapter.py
WORKSPACE_PROVISIONING for Restarts
- Problem:
applyEventWORKSPACE_PROVISIONING only created new nodes, silently ignored restarts of existing nodes → UI didn't show "starting" state - Fix: Added
elsebranch that sets existing node tostatus: "provisioning", clearsneedsRestartandcurrentTask - Files:
canvas/src/store/canvas.ts
Config Tab Improvements
- Runtime dropdown: Removed invalid options (Codex, Ollama). Now shows only available adapters: LangGraph, Claude Code, CrewAI, AutoGen, DeepAgents, OpenClaw
- Save & Restart: Config save now auto-restarts workspace so changes take effect immediately. "Save" button also available for save-only (sets needsRestart banner)
- Secrets: Removed
needsRestart: truefrom secrets save/delete since platform already auto-restarts - Retry→Restart: Chat error banner button changed from no-op "Retry" to functional "Restart" with confirmation dialog
- Files:
canvas/src/components/tabs/ConfigTab.tsx,ChatTab.tsx
Tests
- 8 new Python tests (15 total in test_a2a_executor.py, 80 total):
_extract_history: 5 tests (basic, empty, None, malformed, non-list)- History prepend in executor: 1 test
set_current_task: 2 tests (update + None heartbeat)
- 1 updated Canvas test: WORKSPACE_PROVISIONING updates existing node status on restart
- All existing tests updated (
"user"→"human"role format, metadata in mock context)
Code Review Fixes
- PEP 8 spacing in all
set_current_task()calls - OpenClaw
set_current_task("")moved intofinallyblock _extract_historyguards against non-dict entries in history list
Merged PR #1: Workspace Awareness Integration
- Platform assigns deterministic
awareness_namespace(workspace:<id>) per workspace AWARENESS_URLandAWARENESS_NAMESPACEinjected into containers during provisioning (only whenAWARENESS_URLenv var is set on the platform)commit_memory/search_memorytools route through awareness when configured, fall back to platform memory API- New migration
010_workspace_awareness.sqladdsawareness_namespacecolumn to workspaces agent.py: Anthropic/OpenAI base URL support viaANTHROPIC_BASE_URL/OPENAI_BASE_URLenv varstest_sandbox.py:asyncio.get_event_loop()→asyncio.run()for Python 3.13 compat- New files:
workspace/tools/awareness_client.py,workspace/tests/test_memory.py,workspace/tests/test_agent_base_urls.py - Files:
workspace-server/internal/handlers/workspace.go,workspace-server/internal/models/workspace.go,workspace-server/internal/provisioner/provisioner.go,workspace-server/migrations/010_workspace_awareness.sql,workspace/agent.py,workspace/main.py,workspace/tools/memory.py,workspace/tools/awareness_client.py
Restart Runtime Detection + Template Fallback
- Problem: Changing runtime via Config tab (e.g. langgraph → claude-code) didn't take effect on restart — provisioner used the old image because it only read runtime from the template dir, not the container's config volume
- Fix: Restart handler reads runtime from the running container via
ExecRead(docker exec cat) BEFORE stopping it. Falls back to this value when no template provides a runtime. - Template auto-apply: When a runtime has a default template (e.g.
claude-code-default/), it's automatically applied on restart — copies CLAUDE.md,.claude/settings.json, etc. into the container - Replaced
ReadFileFromVolume(temp Alpine container, slow) withExecRead(exec in existing container, instant) - Files:
workspace-server/internal/handlers/workspace.go,workspace-server/internal/provisioner/provisioner.go
MCP Memory Tools for CLI Runtimes
- Added
commit_memoryandrecall_memorytoa2a_mcp_server.py— now ALL runtimes (including Claude Code) can persist and recall memories via platform API - Updated
workspace-configs-templates/claude-code-default/CLAUDE.mdwith memory usage guidelines (recall at conversation start, commit after interactions) - 7 unit tests in
test_mcp_memory.py+ 16 new E2E checks for memory CRUD, scope filtering, cross-workspace isolation
Comprehensive Test Suite
registry/access_test.go: 10 tests for CanCommunicate (siblings, parent-child, root, denied, grandchild)handlers_extended_test.go: 14 tests for Delete, Update, Restart, Secrets, Discover, Peers, CheckAccess, Bundle, Configtest_cli_executor.py: 14 tests for CLI command building, session resume, model flags, timeouttest_plugins.py: 9 tests for plugin loading (rules, skills, prompts)test_comprehensive_e2e.sh: 68 checks covering ALL platform endpoints including runtime assignment and memory
UI Cleanup
- Removed 3 redundant task notifications from SidePanel/ChatTab (kept only the amber banner below tabs)
- PM system prompt updated for fully autonomous delegation (no more "Shall I delegate?")
Runtime Persisted in Database (migration 011)
- Root cause: runtime was only in config.yaml inside Docker volumes — fragile detection via ExecRead/ReadFromVolume failed when containers were dead
- Fix: Added
runtimecolumn to workspaces table. Stored at creation, read on restart with simple SELECT - Fixed 6 broken paths: Restart, RestartByID, Create, Update (PATCH), Bundle import, ConfigTab
- Removed ExecRead/ReadFromVolume workarounds entirely
Auto-Memory for CLI Agents
cli_executor.py: auto-recalls memories on first message (no session), auto-commits summary after each response- Memories persist via platform API, survive container restarts
- Fixed memory pollution: saves original input, not memory-injected version
MCP Memory Tools
- Added
commit_memoryandrecall_memorytoa2a_mcp_server.py— all runtimes can persist/recall memories - Updated
claude-code-default/CLAUDE.mdwith memory guidelines
Real-Time Task Status on Canvas
set_current_taskpushes heartbeat immediately when setting a task (not just on 30s loop)- Clearing deferred to next heartbeat cycle — keeps task visible for quick A2A responses
- Team leads now show task banners during delegation
Auth & Session Fixes
- CLI executor clears session_id on auth errors (prevents poisoned session resume)
- FilesTab: deduplicated tree keys with
path:type(.claudedir + file collision)
UX Improvements
- Chat tab is now first and default tab (was Details)
- Rate limit increased from 100 to 600 req/min (15 workspaces overwhelmed the default)
- Merged PR #3: Awareness memory dashboard embedded as iframe in Memory tab
CI Fixes
- Updated handler tests for runtime column (INSERT 7 args, SELECT includes runtime)
Build Fixes
workspace/Dockerfile: AddedCOPY policies/ ./policies/workspace/requirements.txt: Addedlangchain-coreto base depsadapters/crewai/adapter.py: Fixed_langchain_to_crewaidocstring
Container Health Detection & Auto-Restart
- Problem: When Docker Desktop crashes, containers die but platform still thinks workspaces are "online" for up to 60s (Redis TTL). A2A proxy returns errors, terminal fails, discovery returns stale URLs.
- Three-layer fix:
- Reactive: A2A proxy checks
provisioner.IsRunning()on connection error → marks offline, clears Redis, triggers restart. Returns 503 with"restarting": true(or 502 if container is running but unresponsive) - Proactive: New
registry.StartHealthSweeppolls Docker API every 15s for all online workspaces → catches dead containers before users notice - Auto-restart: Both liveness monitor and health sweep trigger
RestartByID()on offline detection. Per-workspace mutex deduplicates concurrent restart attempts.
- Reactive: A2A proxy checks
WorkspaceHandlermoved fromrouter.Setuptomain.gocreation soRestartByIDis accessible in offline callbacks- New
db.ClearWorkspaceKeys()shared helper replaces 3x duplicated Redis cleanup - New files:
workspace-server/internal/registry/healthsweep.go,healthsweep_test.go(3 tests) - Files:
workspace-server/cmd/server/main.go,workspace-server/internal/handlers/workspace.go,workspace-server/internal/router/router.go,workspace-server/internal/db/redis.go,workspace-server/internal/registry/healthsweep.go
Template Fallback for Missing Templates
- Root cause of auth error:
setup-org.shreferenced non-existentorg-*templates → containers got empty/configs→ fell back tolanggraphruntime withanthropic:claude-sonnet-4-6but noANTHROPIC_API_KEY - Fix: Create handler now validates template exists via
os.Stat, falls back to{runtime}-defaulttemplate, thenensureDefaultConfig() runtimecolumn added to List/Get API response (scanWorkspaceRow,workspaceListQuery, Get query)- Files:
workspace-server/internal/handlers/workspace.go,workspace-server/internal/handlers/handlers_test.go
Graceful Delegation Error Handling
- Problem: When child workspace fails (auth error, offline), PM forwarded raw error message to user instead of handling gracefully
- Fix (3 layers):
a2a_mcp_server.py:delegate_taskdetects errors via[A2A_ERROR]sentinel prefix, wraps asDELEGATION FAILEDwith instructions to try another peer or handle itselfcoordinator.py: Strengthened coordination rule 5 — "do NOT forward raw errors to user"cli_executor.py: AddedIMPORTANTblock in A2A instructions for delegation failure handling
- Auth errors in CLI executor now retry with exponential backoff (same as rate limits)
- Claude Code adapter: Fixed
dict.get("command", "claude")→.get("command") or "claude"for empty string handling - Files:
workspace/a2a_mcp_server.py,workspace/coordinator.py,workspace/cli_executor.py,workspace/adapters/claude_code/adapter.py
Agent Push Messaging (send_message_to_user)
- Feature: Agents can now push messages to the user's canvas chat at any time — not just as A2A responses
- Use case: Agent says "Got it, delegating now...", continues working, then sends results when done
- Platform: New
POST /workspaces/:id/notifyendpoint → broadcastsAGENT_MESSAGEvia WebSocket (BroadcastOnly) - MCP tool:
send_message_to_userina2a_mcp_server.py— calls notify endpoint - Canvas:
AGENT_MESSAGEhandled in globalapplyEvent→ stored inagentMessagesmap → ChatTab consumes via store subscription (no extra WS connection) - Prompts: Updated A2A instructions + CLAUDE.md with "RESPOND FAST, FOLLOW UP LATER" rule
- Files:
workspace-server/internal/handlers/activity.go,workspace-server/internal/router/router.go,workspace/a2a_mcp_server.py,canvas/src/store/canvas.ts,canvas/src/components/tabs/ChatTab.tsx,workspace/cli_executor.py,workspace-configs-templates/claude-code-default/CLAUDE.md
Remove Default Agent Timeout
- Changed default timeout from 300s to 0 (no timeout) — delegation chains can take arbitrarily long
- Files:
workspace-configs-templates/claude-code-default/config.yaml,workspace/config.py,workspace-server/internal/handlers/workspace.go
WebSocket Error Suppression
- Suppressed noisy
WebSocket error: {}console.error insocket.ts—onerrorfires beforeoncloseand the Event object has no useful info - Files:
canvas/src/store/socket.ts
Setup Script Fix
- Removed dead code copying auth tokens to non-existent
org-*template dirs - Auth token now auto-propagated via
claude-code-defaulttemplate fallback - Files:
setup-org.sh
Remove Default Agent Timeout
- Problem: PM timed out after 300s during delegation chains. Long-running tasks (multi-agent coordination, research) are expected to exceed 5 minutes.
- Fix: Changed default timeout from 300s to 0 (no timeout) in three places:
workspace-configs-templates/claude-code-default/config.yaml— template defaultworkspace/config.py—RuntimeConfig.timeoutdataclass default + YAML parser defaultworkspace-server/internal/handlers/workspace.go—ensureDefaultConfiggenerated config
timeout: 0→self.config.timeout or None→None→proc.communicate()waits indefinitely- Files:
workspace-configs-templates/claude-code-default/config.yaml,workspace/config.py,workspace-server/internal/handlers/workspace.go
Build Script for Runtime Images
- Problem: Each runtime has its own Dockerfile extending
workspace-template:basewith pre-installed deps. Manually runningdocker buildfor each is error-prone — we shipped with 5-hour-old images and didn't notice. - Fix: New
workspace/build-all.sh— builds base first, then all 6 runtime images in order. Supports selective builds (build-all.sh claude-code langgraph). Handles underscore/hyphen naming mismatch (dirclaude_code→ tagclaude-code). No:latesttag — each runtime uses its own explicit tag. - Added missing error logging in
activity.goList handler (was returning 500 "query failed" without logging the actual SQL error) - Files:
workspace/build-all.sh(new),workspace-server/internal/provisioner/provisioner.go,workspace-server/internal/handlers/activity.go,CLAUDE.md
Codebase Modularization (Major Refactoring)
Split 6 large files (~4,200 lines total) into 22 focused modules. Pure structural — no behavior changes. All tests pass.
Platform handlers:
workspace.go(978→377 lines) → split outworkspace_provision.go(217),workspace_restart.go(173),a2a_proxy.go(251)templates.go(814→371 lines) → split outcontainer_files.go(168),template_import.go(175)
Workspace template:
a2a_mcp_server.py(572→293 lines) → split outa2a_client.py(97),a2a_tools.py(275)
Canvas:
ConfigTab.tsx(738→310 lines) → split outconfig/form-inputs.tsx,config/secrets-section.tsx,config/yaml-utils.tsChatTab.tsx(635→340 lines) → split outchat/types.ts,chat/storage.ts,chat/message-parser.tscanvas.ts(449→215 lines) → split outcanvas-events.ts,canvas-topology.ts,canvas-capabilities.ts
Tier System Simplified (T1/T2/T3, removed T4)
- T1 Sandboxed: No
/workspacemount, config only (unchanged) - T2 Standard: Normal Docker +
/workspacemount (unchanged, was identical to T3 before) - T3 Full Access:
--privileged+--pid=host— full machine access for dev team - T4 removed: EC2 VMs were unimplemented; privileged Docker achieves the same goal
- Updated provisioner switch statement, CreateWorkspaceDialog (3-col grid, no T4), docs/architecture/workspace-tiers.md (full rewrite)
- Files:
workspace-server/internal/provisioner/provisioner.go,canvas/src/components/CreateWorkspaceDialog.tsx,docs/architecture/workspace-tiers.md
Config Volume Persistence (Restart no longer overwrites)
- Problem: Restart re-applied
claude-code-defaulttemplate, overwriting user config changes (e.g. model: opus → sonnet) - Fix: Restart handler skips templates by default. New
"apply_template": trueflag in restart body for explicit re-application (used when runtime changes). RestartByID(auto-restart) also skips templates — passes empty template path- Files:
workspace-server/internal/handlers/workspace_restart.go
Skills Self-Improvement System
- Documented how agents can create persistent skills in
/configs/skills/<name>/SKILL.md - Skills are auto-loaded into system prompt via
skills/loader.py - Skills persist on Docker named volume — survive restarts
- Updated
workspace-configs-templates/claude-code-default/CLAUDE.mdwith skills creation guide - Trained PM agent to convert operating procedures into skills
Agent Code Fixes (from agent-written code)
- Fixed
pytest.ini: removed--cov-fail-under=100that broke test runner - Fixed 6 test files: replaced hardcoded
/workspace/workspace/paths withos.path.dirname(__file__)relative paths - Fixed
aes_test.go: test key that wasn't 32 bytes after base64 decode - Fixed
agent_test.go: SQL mock arg count mismatch (2 args for 1-param query) - Fixed
liveness_test.go: unused variable - Cleaned up
.coverage,.coveragerc,__pycache__,index_minimal.ts
Agent Training via A2A
- Sent feedback to PM, Dev Lead, QA Engineer about test-writing rules, path handling, config discipline
- All 3 agents committed rules to persistent memory
- PM + dev team upgraded to Opus 4.6 model, T3 tier
- Marketing/Research teams remain Sonnet, T2
Misc
.gitignore: Added.claude/worktrees/to prevent stale worktrees showing as submodule changes
Workspace Pause/Resume (PR #4)
- New
POST /workspaces/:id/pause— stops container, sets status='paused', clears Redis keys - New
POST /workspaces/:id/resume— re-provisions from existing config volume - Health sweep, liveness monitor, and auto-restart all skip paused workspaces
- Canvas: indigo "Paused" status dot, Legend entry, context menu Pause/Resume toggle
WORKSPACE_PAUSEDWebSocket event handled in canvas-events.ts- Cascade: pausing a parent pauses all descendants (recursive CTE), resuming does the reverse
- Guard: children cannot restart or resume while any ancestor is paused (409 Conflict)
isParentPaused()recursive helper checks ancestor chain- Context menu: right-click nested team members now opens correct child menu (not parent's)
- Context menu closes immediately on pause/resume click (before API call, not after)
- Files:
workspace-server/internal/handlers/workspace_restart.go,workspace-server/internal/router/router.go,workspace-server/internal/registry/liveness.go,canvas/src/store/canvas-events.ts,canvas/src/components/StatusDot.tsx,canvas/src/components/WorkspaceNode.tsx,canvas/src/components/Legend.tsx,canvas/src/components/ContextMenu.tsx
Files Changed
canvas/src/components/tabs/ChatTab.tsxcanvas/src/components/tabs/ConfigTab.tsxcanvas/src/store/canvas.tscanvas/src/store/__tests__/canvas.test.tsworkspace/a2a_executor.pyworkspace/adapters/langgraph/adapter.pyworkspace/adapters/deepagents/adapter.pyworkspace/adapters/crewai/adapter.pyworkspace/adapters/autogen/adapter.pyworkspace/adapters/openclaw/adapter.pyworkspace/tests/test_a2a_executor.pyworkspace-server/cmd/server/main.goworkspace-server/internal/db/redis.goworkspace-server/internal/handlers/workspace.goworkspace-server/internal/handlers/handlers_test.goworkspace-server/internal/router/router.goworkspace-server/internal/registry/healthsweep.go(new)workspace-server/internal/registry/healthsweep_test.go(new)workspace/a2a_mcp_server.pyworkspace/adapters/claude_code/adapter.pyworkspace/cli_executor.pyworkspace/coordinator.pysetup-org.shCLAUDE.mddocs/architecture/provisioner.mdworkspace/config.pyworkspace-configs-templates/claude-code-default/config.yamlworkspace-configs-templates/claude-code-default/CLAUDE.mdworkspace-server/internal/handlers/activity.gocanvas/src/store/socket.tsdocs/architecture/provisioner.mdworkspace-server/internal/provisioner/provisioner.goworkspace/build-all.sh(new)docs/agent-runtime/cli-runtime.mddocs/agent-runtime/config-format.mdworkspace-server/internal/handlers/workspace_provision.go(new — extracted from workspace.go)workspace-server/internal/handlers/workspace_restart.go(new — extracted from workspace.go)workspace-server/internal/handlers/a2a_proxy.go(new — extracted from workspace.go)workspace-server/internal/handlers/container_files.go(new — extracted from templates.go)workspace-server/internal/handlers/template_import.go(new — extracted from templates.go)workspace/a2a_client.py(new — extracted from a2a_mcp_server.py)workspace/a2a_tools.py(new — extracted from a2a_mcp_server.py)workspace/tests/test_mcp_memory.pycanvas/src/store/canvas-events.ts(new — extracted from canvas.ts)canvas/src/store/canvas-topology.ts(new — extracted from canvas.ts)canvas/src/store/canvas-capabilities.ts(new — extracted from canvas.ts)canvas/src/components/tabs/chat/types.ts(new)canvas/src/components/tabs/chat/storage.ts(new)canvas/src/components/tabs/chat/message-parser.ts(new)canvas/src/components/tabs/chat/index.ts(new)canvas/src/components/tabs/config/form-inputs.tsx(new)canvas/src/components/tabs/config/secrets-section.tsx(new)canvas/src/components/tabs/config/yaml-utils.ts(new)canvas/src/components/tabs/config/index.ts(new)