molecule-core/docs/edit-history/2026-04-05.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

14 KiB
Raw Blame History

Edit History — 2026-04-05

Summary

Session focused on recursive sub-workspace rendering, eject/extract UX, and embedded nesting bug fixes. Child nodes now properly hide/show when nested/un-nested, the eject button replaces the old close icon with a distinct sky-blue arrow, and sub-workspaces render recursively up to 3 levels deep with full status detail on each chip. Context menu gains "Extract from Team" action. Six code review fixes applied. API test fix for register endpoint field name.

Embedded Sub-Workspace Fixes (canvas/src/store/canvas.ts)

  • nestNode visibility: now sets hidden: !!targetId so child nodes disappear from the canvas when nested into a parent and reappear when un-nested (dragged to empty canvas)
  • removeNode fix: was incorrectly reading n.parentId (React Flow's layout field) instead of n.data.parentId (the actual hierarchy field). Fixed to use n.data.parentId. Also properly sets hidden on re-parented children and simplified edge cleanup logic.

Eject/Extract Button (canvas/src/components/WorkspaceNode.tsx)

  • Replaced the generic close icon on embedded child chips with a new EjectIcon SVG (arrow pointing up-right) — visually distinct from delete
  • Hover color changed from red to sky-blue to reinforce "extract" (not "delete") semantics
  • Each embedded child chip shows the eject button on hover to extract from team

Recursive Sub-Workspaces (canvas/src/components/WorkspaceNode.tsx)

New TeamMemberChip component that recursively renders children as mini-cards inside parent nodes:

  • Each sub-card mirrors the parent card layout: status dot + gradient bar, name, tier badge, skills pills, status label, active tasks count, descendant count badge
  • Sub-cards can contain their own "Team" section with further nested sub-cards
  • MAX_NESTING_DEPTH = 3 constant caps recursion to prevent runaway rendering
  • countDescendants() helper counts all descendants recursively (memoized via useMemo)
  • Parent node dynamically sizes based on nesting depth:
    • No children: 210-280px
    • With children: 320-450px
    • With grandchildren: 400-560px
  • Badge shows total descendant count, not just direct children
  • Callbacks passed as props (onSelect, onExtract) instead of individual store subscriptions per chip — avoids N+1 Zustand subscriptions

Context Menu Updates (canvas/src/components/ContextMenu.tsx)

  • Added nestNode store access
  • New "Extract from Team" menu item with up-arrow icon for child nodes
  • handleRemoveFromTeam with try/catch error handling
  • Toast notification says "Extracted from team" (consistent wording with eject button)

Code Review Fixes Applied

  1. countDescendants memoized via useMemo to prevent recalculation on every render
  2. Stable handleExtract callback via useCallback to prevent unnecessary re-renders
  3. Invalid Tailwind class bg-zinc-750/70 changed to valid bg-zinc-700/70
  4. Sub-children layout changed from 2-column grid to space-y-1 (single column) at all depths to prevent content overflow
  5. Removed fragile col-span-2 class that caused layout issues with odd numbers of children

Code Review Rounds 1821 (canvas components + store)

Comprehensive review across WorkspaceNode.tsx, canvas.ts, ContextMenu.tsx, Toolbar.tsx. All issues resolved:

Critical fixes

  • countDescendants cycle protection: added visited Set parameter to prevent infinite recursion on circular parentId references
  • WORKSPACE_REMOVED re-parents children: event handler now re-parents orphaned children to the removed node's parent and clears stale selectedNodeId — matching removeNode behavior

Performance fixes

  • useHierarchyInfo consolidated hook: replaced separate useChildNodes + allNodes subscriptions with a single stable selector that returns children, hasGrandchildren, and descendantCount — prevents redundant re-renders on every node drag
  • EmbeddedTeam wrapper component: isolates the allNodes store subscription to only mount when children exist, so leaf nodes don't subscribe at all
  • Toolbar single-pass counts: replaced 6 .filter() passes with a single useMemo reduce loop
  • ContextMenu reactive selector: replaced stale getState() during render with proper useCanvasStore() selector for hasChildren, moved above early return for hooks compliance

Type safety / cleanup

  • Removed unsafe data as unknown as WorkspaceNodeData double cast in openContextMenu call
  • Removed redundant as Record<string, unknown> | null casts on data.agentCard
  • Added runtime typeof guard for agent_card in AGENT_CARD_UPDATED event handler
  • Renamed children prop to members in EmbeddedTeam to avoid React reserved prop name
  • Removed console.error in savePosition — silent catch like other non-critical handlers
  • Consistent selectedNodeId destructuring at top of applyEvent instead of separate get() call

Dual URL Routing for Agent-to-Agent Communication

Docker containers can't reach 127.0.0.1:PORT (that's their own loopback). Discovery endpoint now returns different URLs based on caller:

  • Workspace caller (X-Workspace-ID header present) → Docker-internal URL (http://<container-hostname>:8000)
  • Canvas/proxy (no header) → Host-mapped URL (http://127.0.0.1:<ephemeral-port>)

Implementation:

  • CacheInternalURL / GetCachedInternalURL in db/redis.go — separate Redis key (ws:{id}:internal_url)
  • Register endpoint caches agent-reported URL as internal URL
  • Discovery checks internal URL first when X-Workspace-ID is present, falls back to host URL

Verified both directions: Echo Agent delegated to SEO Agent (got SEO advice back), SEO Agent delegated to Echo Agent (got echo back).

A2A End-to-End Pipeline (8e) — Fully Working

Verified the full pipeline: Canvas → Platform proxy (POST /workspaces/:id/a2a) → Docker agent container → OpenRouter API → LLM response.

Infrastructure fixes to make it work

  1. findConfigsDir validation (main.go): auto-discovery was finding a stale empty workspace-server/workspace-configs-templates/ dir before the real one at ../workspace-configs-templates/. Fixed by requiring at least one template with config.yaml inside the dir.
  2. PLATFORM_URL for Docker containers (main.go): was hardcoded to http://localhost:PORT. Containers can't reach host's localhost. Changed to http://host.docker.internal:PORT. Now configurable via PLATFORM_URL env var.
  3. Host port mapping (provisioner.go): platform runs on host but agents run in Docker. Added ephemeral host port binding (127.0.0.1:0→8000/tcp) and resolved actual port via ContainerInspect after start.
  4. Provisioner URL preservation (workspace.go + registry.go): provisioner returns http://127.0.0.1:PORT URL, but agent self-registration overwrites it with Docker-internal hostname. Fixed: pre-store provisioner URL in DB+Redis; register endpoint preserves URLs starting with http://127.0.0.1.

Code review fixes (round 22)

  • Provisioner URL storage errors now logged (were silently ignored)
  • Registration reads URL from DB instead of Redis (avoids TTL race condition)
  • Test timeout configurable via A2A_TIMEOUT env var

OpenRouter max_tokens fix (workspace/agent.py)

  • LangChain ChatOpenAI defaults to 64000 max_tokens which exceeds free-tier credits
  • Added MAX_TOKENS env var (default 2048) for OpenRouter provider

Bundle Round-Trip Test (12j)

Added to test_api.sh: export → delete → import → verify name/tier/agent_card match with new ID. 9 new assertions, all passing.

Comprehensive A2A E2E Test Suite (test_a2a_e2e.sh)

New test script with 22 assertions across 12 test scenarios using free google/gemini-2.5-flash via OpenRouter:

  1. Basic message/send — Echo Agent
  2. Basic message/send — SEO Agent
  3. Auto JSON-RPC envelope wrapping (bare request)
  4. Full JSON-RPC 2.0 envelope with custom ID preserved
  5. Invalid method returns -32601 error
  6. Offline workspace returns error
  7. Nonexistent workspace returns 404
  8. Multi-turn conversation
  9. Long input handling (50 sentences)
  10. Peer discovery (agents see each other)
  11. Agent cards reflect skills
  12. Heartbeat updates uptime

Activity Logging, A2A Communication Tracking, and Current Task Visibility

Full-stack feature for comprehensive workspace activity logging, inter-agent communication visibility, and real-time current task display.

Backend (Go Platform)

  • Migration 009 (workspace-server/migrations/009_activity_logs.sql): new activity_logs table (workspace_id, activity_type, source/target, method, summary, request/response JSONB, duration_ms, status, error_detail) with composite index. Added current_task TEXT to workspaces table.
  • Activity handler (workspace-server/internal/handlers/activity.go): GET /workspaces/:id/activity (list with type filter + limit cap at 500), POST /workspaces/:id/activity (agent self-report with type validation)
  • A2A proxy logging (workspace.go): ProxyA2A now logs every request/response to activity_logs with method, duration, status. Uses context.WithoutCancel for async goroutine.
  • Heartbeat current_task (registry.go): HeartbeatPayload extended with current_task. Reads prev value before UPDATE, only broadcasts TASK_UPDATED on change.
  • BroadcastOnly (broadcaster.go): WebSocket-only broadcast (no structure_events insert) for high-frequency events.
  • Activity retention: Background goroutine in main.go with configurable retention via ACTIVITY_RETENTION_DAYS (default 7) and ACTIVITY_CLEANUP_INTERVAL_HOURS (default 6) env vars.

Frontend (Canvas)

  • ActivityTab (canvas/src/components/tabs/ActivityTab.tsx): Comprehensive activity log viewer with type filters (All, A2A In/Out, Tasks, Logs, Errors), color-coded entries, A2A flow visualization (source→target), expandable request/response JSON, 5s auto-refresh with live/paused toggle.
  • Current task display: Amber pulsing banner in WorkspaceNode cards and SidePanel header when agent has active task.
  • Store updates: currentTask field in WorkspaceNodeData, TASK_UPDATED event handler, "activity" panel tab.

MCP Server

  • Added list_activity tool with type/limit filters.

Tests (36 new tests)

  • Go: 25 total (was 14). Added: TaskChanged/Unchanged/Cleared heartbeat, Activity List/ListByType/ListEmpty/ListCustomLimit/ListMaxLimit, Report/ReportAllValidTypes/ReportMissingBody/Report_InvalidType, WorkspaceGet_CurrentTask.
  • Canvas Vitest: 58 total (was 52). Added: TASK_UPDATED set/clear/unknown/edge cases, ACTIVITY_LOGGED no-op, hydrate currentTask, setPanelTab activity.
  • Integration (test_api.sh): ~62 checks (was ~43). Added 19 activity + current_task checks.
  • E2E (test_activity_e2e.sh): New script with 25 tests requiring 1 online agent — A2A logging verification, self-report, filtering, task lifecycle, cross-workspace isolation.

API Test Fix

  • Register endpoint test updated to use id field instead of workspace_id — discovered during E2E testing that the platform expects the field named id

CI Pipeline & Test Infrastructure (PM Review Session)

PM review identified 7 action items: zero test coverage, no CI, no branch protection, stale tasks, no release tags, incomplete bundle round-trip test. All addressed in this session.

GitHub Actions CI (.github/workflows/ci.yml)

  • 4 parallel jobs: Go build+vet+test, Canvas build+vitest, MCP Server build, Python pytest
  • Triggers on push to main and PRs targeting main
  • Caching: npm for Canvas/MCP, pip for Python, Go modules via setup-go
  • Go version set to stable (go.mod says 1.25 which doesn't exist in Actions yet)
  • Test steps fail on real failures (no || true swallowing)

Canvas Store Tests (47 tests) — canvas/src/store/__tests__/canvas.test.ts

  • Vitest setup with vitest.config.ts (node environment, @/ path alias)
  • Tests: selectNode, hydrate (3), applyEvent (11 covering 6 event types), removeNode (5), isDescendant (6), updateNodeData (2), context menu (2), setPanelTab (2), getSelectedNode (3), savePosition (1), saveViewport (1), nestNode (4 including API revert), misc setters (3)
  • Global fetch mock with per-test override for API-calling actions

Go Handler Tests (9 tests) — workspace-server/internal/handlers/handlers_test.go

  • Uses go-sqlmock for DB, miniredis for Redis, real Broadcaster with no-op Hub
  • Tests: Register (upsert+event), Heartbeat normal/degraded/recovery (status transitions), WorkspaceCreate (201+provisioning), WorkspaceList (multi-row scan), ProxyA2A wrapping/404/503
  • Each test isolates globals via t.Cleanup

Python Runtime Tests (45 tests) — workspace/tests/

  • pytest with conftest.py mocking a2a SDK modules (heavy external dep)
  • test_config.py (12): load_config, defaults, env overrides, nested configs, FileNotFoundError
  • test_heartbeat.py (9): init, record_success/error, error_rate, async HTTP POST, stop
  • test_prompt.py (9): prompt files, fallback, plugins, skills, peers, JSON agent_card
  • test_skills_loader.py (7): frontmatter parsing, defaults, load_skills, missing SKILL.md
  • test_a2a_executor.py (7): text extraction, empty parts, errors, content blocks

Stale Task Cleanup

  • Closed 4 awareness tasks from April 1 that were already completed: A2A endpoint, templates endpoint, ANTHROPIC_API_KEY (now uses OpenRouter), garbage task

PLAN.md

  • Marked 12j (bundle round-trip test) as done — test already existed in test_api.sh

Parent Context Inheritance Feature

Implements automatic context file sharing from parent workspaces to direct children, closing the gap between the HMA docs (L2 Team Memory as "Department Drive") and the actual implementation.

How It Works

  1. Parent declares shared_context: [architecture.md, conventions.md] in config.yaml
  2. Platform injects PARENT_ID env var when provisioning children during Expand
  3. Child calls GET /workspaces/{parent_id}/shared-context at startup
  4. Parent's shared files injected into child's system prompt as ## Parent Context
  5. Grandchildren only see their direct parent's context (1-level inheritance)

Files Changed

  • workspace/config.py — Added shared_context field
  • workspace-server/internal/handlers/team.go — Inject PARENT_ID env var during Expand
  • workspace-server/internal/handlers/templates.go — New SharedContext endpoint
  • workspace-server/internal/router/router.go — Register new route
  • workspace/coordinator.py — New get_parent_context() function
  • workspace/prompt.py — Added parent_context param to build_system_prompt()
  • workspace/main.py — Wire parent context into startup