molecule-core/docs/edit-history/2026-04-10.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

24 KiB

2026-04-10 Session

Summary

Documentation maintenance for the new long-form Molecule AI product and technical narratives: moved both repository-root drafts into the VitePress docs tree, added sidebar/homepage entry points so they are discoverable from the docs site, and linked them from the product overview for ongoing maintenance inside docs/.

Also brought the landing-page messaging report under docs maintenance by tracking docs/product/landing-messaging-report.md in git and adding it to the product navigation surface.

Changes

New Long-Form Docs Added To docs/

  • Moved MOLECULE_PRODUCT_DOC.md into docs/product/molecule-product-doc.md
  • Moved MOLECULE_TECHNICAL_DOC.md into docs/architecture/molecule-technical-doc.md
  • Kept the full source content intact while relocating it into the maintained docs structure

VitePress Navigation Updated

  • docs/.vitepress/config.ts
  • Added Product Narrative under the Product sidebar group
  • Added Landing Messaging Report under the Product sidebar group
  • Added Technical Documentation under the Architecture sidebar group

Docs Entry Points Updated

  • docs/index.md
  • Added homepage recommended-reading links for the new product and technical documents
  • docs/product/overview.md
  • Added direct links to the product narrative, landing messaging report, and comprehensive technical documentation

Additional Product Doc Tracked

  • Added docs/product/landing-messaging-report.md to version control under the Product docs section

Files Changed

  • docs/.vitepress/config.ts
  • docs/index.md
  • docs/product/overview.md
  • docs/product/landing-messaging-report.md
  • docs/product/molecule-product-doc.md
  • docs/architecture/molecule-technical-doc.md
  • docs/edit-history/2026-04-10.md (new)

CEO Session — Infrastructure Audit + Chain Break Fix

Infra Audit (fix/infra-audit-critical — PR #5)

Comprehensive codebase audit identified 19 issues across 4 priority levels. Critical fixes:

  1. Race condition in crypto/aes.goencryptionKey global accessed without sync. Fixed with sync.Once. Added ResetForTesting() for tests.
  2. Missing DB indexes — Migration 014: workspaces(parent_id), workspaces(status), canvas_layouts(workspace_id). Speeds up hierarchy queries, cascade deletes, list/get joins.
  3. N+1 cascade delete — Replaced per-child UPDATE+DELETE loop with recursive CTE batch query. Docker stops still per-child.
  4. CI linting — Added golangci-lint step (continue-on-error until codebase clean).

Chain Break Root Cause + Fix

Problem: Delegation chain died after first result. PM delegated to Dev Lead + QA, results completed, heartbeat wrote results file — but PM was never woken again.

Root cause: Self-message cooldown was 5 minutes. First delegation triggered a self-message within the window. All subsequent completions were blocked by cooldown. PM never woke up to report.

Fix: Reduced SELF_MESSAGE_COOLDOWN from 300s to 60s. With 30s heartbeat cycles, new results trigger a self-message within 1-2 cycles. Results file dedup prevents double-processing.

Agent-Authored PRs Received

Agents autonomously created PRs while CEO did infra work:

  • PR #3 — Settings Panel (Frontend Engineer): 34 files, 279 tests, full UX spec implementation
  • PR #4 — Onboarding Interception (Frontend Engineer): 10 files, 1362 additions, deploy preflight + missing keys modal

Monitoring

  • 13/13 workspaces online throughout session
  • Heartbeats active (Redis TTL refreshing)
  • Frontend Engineer + QA Engineer were actively processing tasks
  • No container crashes, no degraded workspaces

Files Changed (CEO Session)

  • workspace-server/internal/crypto/aes.go (sync.Once)
  • workspace-server/internal/crypto/aes_test.go (ResetForTesting)
  • workspace-server/internal/handlers/workspace.go (recursive CTE delete)
  • workspace-server/internal/handlers/workspace_test.go (updated mocks)
  • workspace-server/migrations/014_indexes.sql (new — 3 indexes)
  • .github/workflows/ci.yml (golangci-lint)
  • workspace/heartbeat.py (60s cooldown, parent reporting, cached lookup)
  • workspace-server/internal/handlers/plugins_test.go (new — 16 tests)
  • CLAUDE.md (test counts: Go 365+, Python 869, migration 14)
  • docs/api-protocol/registry-and-heartbeat.md (delegation checking section)

Delegation Chain — Last Mile Fix

Problem: PM received delegation results but never reported to CEO. The heartbeat self-message said "report back to them" without specifying who.

Fix: Heartbeat looks up parent workspace name (cached after first call) and includes explicit instruction: "Report these results back to your parent 'CEO'." This closes the full chain: CEO → PM → team → results → PM wakes → reports to CEO.

Plugins Handler Tests (16 new)

Covered: ListRegistry (empty/nonexistent/with plugins), Install validation (missing name, path traversal, not found), Uninstall validation, validatePluginName (valid/slash/dotdot/backslash/empty), parseManifestYAML (valid/invalid/minimal).

Agent PRs Completed

Team autonomously completed test plan checklists:

  • PR #3 (Settings Panel): 9/9 tasks
  • PR #4 (Onboarding): 10/10 tasks

Chain worked: CEO → PM → Dev Lead → FE + QA → PRs updated → all checklists done → PM reported back.

Root Scripts Cleanup

Deleted 4 dead scripts replaced by platform features:

  • setup-org.sh, setup_reno_stars.shPOST /org/import
  • import-ecc.sh → plugin system
  • scripts/setup-default-org.shPOST /org/import

Moved utility scripts to scripts/: import-agent.sh, bundle-compile.sh

Moved 5 E2E test scripts to tests/e2e/: test_api.sh (62 tests), test_a2a_e2e.sh (22), test_activity_e2e.sh (25), test_claude_code_e2e.sh, test_comprehensive_e2e.sh (68). Updated CLAUDE.md paths.

PR #3 + #4 Code Review Delegated

CEO reviewed both PRs and found 6 critical bugs + 9 warnings. Delegated fixes through PM → Dev Lead → FE. Both PRs updated at 4:50 with fixes in progress.

Provisioner Stale Image Fix

Root cause: Docker's unless-stopped restart policy races with provisioner's Stop → Start sequence. Old container restarts before ContainerRemove completes, blocking ContainerCreate. Result: old image keeps running after rebuild.

Fix: Pre-emptive ContainerRemove(force: true) before ContainerCreate — kills any stale container from restart policy. Added image ID logging on create and start for immediate visibility of stale-image issues.

PRs #3 + #4 Reverted

Agent-authored PRs had too many integration bugs (infinite re-renders, wrong API format, white theme on dark canvas). Reverted both via cherry-pick rebuild of main.

Template Runtime Detection Bug

Problem: Deploying "Claude Code Agent" from the template palette started a langgraph container instead of claude-code. The agent error was [Errno 2] No such file or directory: '/claude'.

Root cause: workspace.go:Create defaulted payload.Runtime to "langgraph" (line 50-52) before reading the template's config.yaml. The later detection block (line 142) checked if payload.Runtime == "" but it was already set, so the template's runtime: claude-code was never used.

Fix: Moved runtime detection from template config.yaml to before the DB insert and before the default fallback. Removed the now-dead duplicate detection block in the provisioning section. Added debug log when config.yaml read fails.

Branding + License

  • Replaced gradient "S" square in toolbar with actual Molecule AI flame icon (/molecule-icon.png)
  • Added Molecule AI favicon (canvas/src/app/icon.png)
  • Added BSL 1.1 LICENSE file — personal/non-commercial use OK, no competing SaaS, converts to Apache 2.0 on 2029-01-01
  • Updated README badge and license section

AutoGen Adapter 'kwargs' Fix

Problem: Deploying AutoGen Agent from template palette resulted in AutoGen error: 'kwargs' on every message.

Root cause: _langchain_to_autogen() wrapped LangChain tools as async def wrapper(**kwargs). AutoGen 0.7.5's FunctionTool introspects function signatures with type hints — **kwargs has no type annotation, causing KeyError: 'kwargs' in _function_utils.py.

Fix: Replaced **kwargs wrapper with typed async def _invoke(input: str) -> str and used autogen_core.tools.FunctionTool directly. JSON parsing bridges structured input for tools that expect dicts.

Chat Duplicate Messages Fix

Problem: Sending a message showed the agent response twice in the chat.

Root cause: Two paths both added the response: (1) WebSocket A2A_RESPONSE handler in ChatTab, and (2) Zustand store's pendingA2AResponse effect. Both fired from the same event.

Fix: Removed the duplicate WebSocket handler in ChatTab — the store effect is the canonical path.

Canvas Pan-to-Node on Deploy

New workspaces now appear near center and the canvas smoothly pans to them on deploy instead of placing them all at (0,0).

Docs Cleanup

Deleted 6 UX spec files for reverted Phase 20 features (settings panel, onboarding interception, deploy interception) — no longer in codebase.

Initial Prompt System

New feature: agents can auto-execute a configurable prompt on startup — before any user interaction.

Architecture:

  • config.py: new initial_prompt field (string or initial_prompt_file reference)
  • main.py: after server ready, sends initial_prompt as A2A message/send to self
  • org.go: InitialPrompt on OrgDefaults and OrgWorkspace structs with JSON+YAML tags; injected into config.yaml as YAML block scalar during org import
  • Org template: per-agent initial prompts instruct dev agents to clone repo, read CLAUDE.md, study codebase, and report ready

Manual E2E verified: 12 agents deployed, 11/11 non-PM agents cloned repo to /workspace/repo/, PM has repo at /workspace (bind-mounted). All 12 have codebase access.

Runtime Change on Restart Fix

Problem: Comprehensive E2E test "Runtime change langgraph→deepagents on restart" failed — container kept using old image.

Root cause: workspace_restart.go read runtime from DB (COALESCE(runtime, 'langgraph')) but when the user changes config.yaml runtime, the DB is never updated. Also, ExecRead was called after Stop() (container already stopped).

Fix: Read config.yaml runtime from running container before stopping it. If runtime differs from DB, update DB. Use configDirName(id) for container name (not raw workspace ID).

QA System Prompt Overhaul

Comprehensive rewrite: never trust self-reported results, must clone repo independently, run ALL test suites to 100% green, E2E tests required, visual style verification against dark zinc theme, red flags checklist.

Org Struct JSON Tags

Added json tags to OrgTemplate, OrgDefaults, and OrgWorkspace structs — without them, JSON POST bodies couldn't populate initial_prompt and other snake_case fields.

Files Changed

  • workspace-server/internal/handlers/workspace.go — runtime detection before DB insert
  • workspace-server/internal/handlers/workspace_restart.go — read runtime from container config before stop
  • workspace-server/internal/handlers/org.go — InitialPrompt field, JSON tags, config.yaml injection
  • workspace-server/internal/handlers/org_test.go — 5 new tests (YAML parsing, injection, special chars)
  • workspace/config.py — initial_prompt field + file reference
  • workspace/main.py — auto-send initial_prompt after server ready
  • workspace/tests/test_config.py — 5 new tests (inline, file, precedence, default, missing)
  • workspace/cli_executor.pydel getattr guard
  • workspace/adapters/autogen/adapter.py — FunctionTool wrapper
  • workspace/tests/test_common_setup.py — autogen skipif + FunctionTool assertions
  • org-templates/molecule-dev/org.yaml — per-agent initial prompts
  • org-templates/molecule-dev/qa-engineer/system-prompt.md — comprehensive QA rewrite
  • canvas/src/components/Canvas.tsx — pan-to-node on deploy
  • canvas/src/components/Toolbar.tsx — Molecule AI icon
  • canvas/src/components/tabs/ChatTab.tsx — remove duplicate A2A_RESPONSE handler
  • canvas/src/store/canvas-events.ts — node position offset + pan event + window guard
  • canvas/src/store/__tests__/canvas.test.ts — relaxed position assertion
  • canvas/src/lib/api/__tests__/secrets.test.ts — match actual API format
  • canvas/src/app/icon.png — favicon
  • tests/e2e/test_comprehensive_e2e.sh — fix secrets test assumption
  • .gitignore — test-results/, playwright-report/
  • LICENSE — BSL 1.1
  • README.md — license badge + section
  • CLAUDE.md — template resolution docs, initial prompt section, test counts
  • Deleted: docs/ux-specs/*, docs/onboarding-interception.md

Initial Prompt Cascade Loop Fix

Problem: 12 agents all executed initial prompts simultaneously on first boot. Each prompt ended with "report ready to parent" — sending A2A messages while other agents were still booting. Under load, containers died → ProxyA2A detected dead containers → triggered auto-restart → new container → initial prompt fired again → cascade loop.

Root cause: Two issues: (1) initial prompts instructed agents to send A2A messages during boot, (2) initial prompt re-executed on every restart (no idempotency guard).

Fixes:

  • main.py: writes .initial_prompt_done marker file after first execution. Skips on restart.
  • org.yaml: rewrote all 12 agent prompts — no outbound A2A, no test suite runs during boot. Agents clone repo, read docs, save to commit_memory, then wait for tasks.
  • workspace_restart.go: fixed misleading "after secret change" log in RestartByID (called by multiple paths, not just secrets).

Chat Separation: My Chat + Agent Comms

Refactored ChatTab into two sub-tabs:

  • My Chat: user↔agent conversation only (source=canvas filter)
  • Agent Comms: agent↔agent A2A traffic (source=agent filter), read-only, live WebSocket updates

Backend: Added source query param to GET /workspaces/:id/activitycanvas filters source_id IS NULL, agent filters source_id IS NOT NULL. Invalid values return 400.

Initial prompt fix: Routes through platform A2A proxy instead of self-send, so the prompt appears as a proper user message in chat history (logged with source_id=NULL). Removed /notify push code — proxy's A2A_RESPONSE broadcast handles delivery.

Shared helper: Extracted extractRequestText() into message-parser.ts — used by both ChatTab and AgentCommsPanel.

Files Changed (Chat Separation)

  • workspace-server/internal/handlers/activity.gosource query param + validation
  • workspace/main.py — route initial prompt through proxy, remove /notify
  • canvas/src/components/tabs/ChatTab.tsx — sub-tab container + MyChatPanel
  • canvas/src/components/tabs/chat/AgentCommsPanel.tsx — new agent comms view
  • canvas/src/components/tabs/chat/message-parser.ts — shared extractRequestText()

Claude Code Adapter: CLI Subprocess → Claude Agent SDK Migration

Replaced the claude-code runtime's subprocess-based CLIAgentExecutor with a new ClaudeSDKExecutor that uses the official claude-agent-sdk Python package. The SDK wraps the same Claude Code engine, so plugins/skills/CLAUDE.md still work — but eliminates subprocess fragility (stdout buffering, zombie processes, session-ID parsing, ~500ms startup overhead).

New files:

  • workspace/claude_sdk_executor.pyClaudeSDKExecutor with asyncio.Lock serialization, cooperative cancel, QueryResult dataclass, session resume via SDK
  • workspace/executor_helpers.py — shared helpers extracted from cli_executor.py: memory recall/commit, delegation results, heartbeat, system prompt, error sanitization (sanitize_agent_error + classify_subprocess_error), markdown-aware brief_summary, extract_message_text
  • workspace/tests/test_claude_sdk_executor.py — 30 tests including concurrency (timestamp-ordered), cancel (GeneratorExit via async generator), session resume, error sanitization
  • workspace/tests/test_executor_helpers.py — 73 tests for all shared helpers

Modified files:

  • workspace/adapters/claude_code/adapter.pycreate_executor() returns ClaudeSDKExecutor; removed shutil.which CLI check
  • workspace/adapters/claude_code/Dockerfile — pre-installs SDK via pip install -r requirements.txt
  • workspace/adapters/claude_code/requirements.txt — added claude-agent-sdk>=0.1.58
  • workspace/cli_executor.py — removed claude-code from RUNTIME_PRESETS, deleted all self.runtime == "claude-code" branches (JSON parsing, --resume, --output-format json, _session_id), calls shared helpers directly (no more one-line wrapper methods), uses sys.executable for MCP server, regex word-boundary error classification
  • workspace/tests/conftest.py — session-wide claude_agent_sdk stub for test imports
  • .gitignore.initial_prompt_done, .coverage*

Architecture decisions:

  • asyncio.Lock on the SDK executor serializes concurrent turns (matches old CLI behavior, keeps session_id race-free)
  • ResultMessage.result preferred over concatenated AssistantMessage chunks (avoids doubled pre/post-tool text)
  • Error sanitization unified: sanitize_agent_error(exc=..., category=...) serves both SDK exceptions and CLI subprocess stderr
  • classify_subprocess_error() uses regex word boundaries to avoid false positives (\brate\b not "rate" in)

Coverage: 100% on claude_sdk_executor.py (110 stmts), cli_executor.py (179 stmts), executor_helpers.py (154 stmts). Total: 443 stmts, 0 misses.

Live verification: 12 workspaces restarted on new image. Echo, session resume, Bash tool, TodoWrite, PM→QA MCP delegation, and concurrent requests all verified. Rate-limited on quota (not a code bug).

5 iterative code review passes caught and fixed: the _active_stream race, dead claude-code branches, duplicated A2A instructions, raw-stderr leaks, deprecated typing.AsyncIterator, the _install_fake_sdk teardown leak, inconsistent error patterns, missing encoding args, and 7 other issues across successive rounds.

Agent Quality Enforcement Stack

Built three layers of quality enforcement after observing that agents (same Claude Opus model) missed bugs like 'use client' directives because they lacked institutional memory and system-level enforcement.

Layer 1: Git pre-commit hook (.githooks/pre-commit)

  • Rejects commits missing 'use client' on hook-using .tsx files
  • Rejects light theme colors in canvas components
  • Rejects SQL injection patterns in Go (fmt.Sprintf with SQL)
  • Rejects leaked secrets (sk-ant-, ghp_, AKIA)
  • System-enforced — agents cannot bypass

Layer 2: Molecule AI-dev plugin (plugins/molecule-dev/)

  • rules/codebase-conventions.md — injected into every agent's CLAUDE.md with past bugs, patterns, self-check scripts
  • skills/review-loop/SKILL.md — multi-round FE→QA→fix→re-verify workflow for Dev Lead

Layer 3: Awareness memory via initial_prompt

  • Key conventions saved to commit_memory on first boot
  • Agents recall them on every future task via memory system
  • Builds institutional knowledge across sessions

Also shipped:

  • SDK executor retry logic (exponential backoff: 5s→10s→20s for rate limits)
  • Force-remove in provisioner.Stop() to prevent restart-policy zombie containers
  • All 12 agent system prompts rewritten from checklists to senior-engineer expectations
  • Dev Lead prompt requires UIUX + Security involvement for UI/credential work
  • Repo made public — removed GITHUB_TOKEN from initial_prompt

Cron Scheduling System (Phase 22)

New feature: users can set up recurring tasks that fire A2A messages to agents on a cron schedule.

Backend:

  • workspace-server/migrations/015_workspace_schedules.sql — new table with cron_expr, timezone, prompt, enabled, last_run_at, next_run_at, run_count, last_status
  • workspace-server/internal/scheduler/scheduler.go — goroutine polls every 30s, fires due schedules via proxyA2ARequest with system:scheduler caller, WaitGroup for completion, semaphore (max 10 concurrent)
  • workspace-server/internal/handlers/schedules.go — 6 REST endpoints: list, create, update (COALESCE-based), delete, run-now, history
  • robfig/cron/v3 for cron expression parsing + next-run computation
  • proxyA2ARequest exposed as public method for internal callers
  • Dedicated cron_run activity log entries with schedule metadata for history queries

Frontend:

  • canvas/src/components/tabs/ScheduleTab.tsx — CRUD UI with create/edit form, cron-to-English helper, status indicators, Run Now button, delete confirmation
  • Wired into SidePanel as new "Schedule" tab (⏲ icon)

Org template:

  • OrgSchedule struct in org.go, inserted during org import
  • Example: Security Auditor daily scan in org-templates/molecule-dev/org.yaml

E2E verified: Created every-minute schedule, scheduler fired at next minute boundary, agent received and responded, schedule updated with status=ok + run_count=1.

Volume Ownership: Root → Gosu Agent Pattern

Docker creates volume contents as root, but workspace containers run as UID 1000 (agent). This caused PermissionError when the adapter tried to write CLAUDE.md with plugin rules. Initially fixed with scattered chown hacks in the provisioner and plugin handler, then properly fixed with the standard Docker pattern:

  • Dockerfile: installs gosu, removes USER agent (entrypoint handles privilege drop)
  • entrypoint.sh: starts as root → chown -R agent:agent /configs /workspaceexec gosu agentpython3 main.py
  • Removed all band-aid chown calls from provisioner and plugin handler
  • Verified: 12/12 containers, CLAUDE.md owned by agent:agent, plugin rules injected

Comprehensive Code Review — 13 Issues Fixed + Test Coverage

Two-pass code review across the entire repo identified 24 issues. All 13 critical/warning items fixed:

Critical (8):

  • a2a_proxy.go: ADD access control via CanCommunicate for agent-to-agent proxy requests (closing security boundary). Canvas requests (no X-Workspace-ID), self-calls, and system callers (webhook:*, system:*, test:*) bypass via explicit isSystemCaller() helper.
  • org.go, delegation.go: replace db.DB.Exec() with ExecContext + error checks. Errors no longer silently dropped on inserts/updates.
  • activity.go, workspace.go: add rows.Err() checks after iteration loops to catch DB iteration failures (was returning partial results).
  • ws/hub.go: add safeSend with recover for race between Broadcast and Unregister (defensive fix for closed channel send).
  • workspace.go: improve canvas_layouts insert error log (non-fatal).
  • ChatTab.tsx, AgentCommsPanel.tsx: add WebSocket onerror handlers (orphaned connections on failure).
  • app/page.tsx: log hydration errors instead of silent catch.
  • cli_executor.py: guarantee proc.wait() after kill on timeout to prevent zombie processes; bounded 5s wait timeout.

Warning (5):

  • a2a_proxy.go: cap LogActivity context with 30s timeout (was WithoutCancel = unbounded lifetime).
  • activity.go: log JSON marshal failures in LogActivity instead of silently corrupting activity logs with nil bodies.
  • org.go: replace 500ms time.Sleep with workspaceCreatePacingMs = 50 constant (org of 12 was 6s+).
  • main.py: stop heartbeat if adapter.setup() raises (resource leak).
  • Canvas.tsx: document intentional getState() pattern in imperative event handlers.

Test coverage added:

  • a2a_proxy_test.go: mockCanCommunicate helper + 4 access control tests (denied, self-exempt, system caller, canvas) + table-driven TestIsSystemCaller (7 cases)
  • test_cli_executor.py: 2 zombie reap tests (verify proc.wait() called after kill; degraded path when wait() also times out)

Verification:

  • Go: 6 packages, all tests pass
  • Canvas Vitest: 344 tests pass
  • Python pytest: 874 tests pass (was 872, +2 new)
  • Playwright E2E: 13/13 pass (incl. 3 data-flow tests verifying real browser content)
  • Comprehensive bash E2E: 68/68 pass
  • Manual verification: 12-agent org deployed, initial prompts complete, chat shows messages