molecule-core/docs/edit-history/2026-04-11.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

138 lines
12 KiB
Markdown

# 2026-04-11 Session
## Summary
Restored 6 changes lost during PR squash merge, then ran comprehensive code review and fixed all findings. Added 100% test coverage for DeepAgents adapter and model fallback logic across Go and Python. Deleted stale `feat/cron-scheduler` branch.
## Changes
### Squash Merge Restoration (PR #50)
- `workspace-server/internal/handlers/org.go` — Added `OrgDefaults.Model` field + model fallback propagation so org templates correctly pass model to workspaces
- `workspace-server/internal/handlers/workspace_provision.go` — Model always at top level in generated `config.yaml` (config.py reads `raw["model"]` for all runtimes); deepagents excluded from `runtime_config` block
- `workspace/agent.py` — Added Cerebras provider support (`cerebras:model` format)
- `workspace/adapters/deepagents/adapter.py` — Full SDK utilization: FilesystemBackend, MemorySaver checkpointer, FilesystemPermission, memory files, InMemoryCache, native skills, plus cerebras/google_genai/ollama providers
- `workspace/adapters/deepagents/requirements.txt` — Added `langchain-google-genai` + `langchain-anthropic` deps
- `workspace/adapters/langgraph/requirements.txt` — Added `langchain-google-genai` dep for Gemini support
### Code Review Fixes
- `adapter.py` — Removed unused `Path` import
- `adapter.py` — Changed default provider from `"openai"` to `"anthropic"` (aligned with `agent.py`)
- `adapter.py` — Replaced silent OpenAI fallback with `ValueError` for unknown providers (fail fast)
- `adapter.py` — Added guard on `self.agent` in `create_executor()` (RuntimeError if setup not called)
- `org.go` — Added third-level model fallback: ws.Model → defaults.Model → runtime-specific default (matches runtime/tier pattern)
### Test Coverage (100% on changed files)
- `org_test.go` — 8 new tests: Model YAML parsing, empty model, workspace overrides default, fallback for claude-code/deepagents/langgraph, defaults.Model used when ws empty
- `workspace_provision_test.go` — 6 new tests: deepagents runtime, openclaw/crewai get runtime_config, empty runtime defaults to langgraph, empty name/role, model-always-top-level (3 sub-tests)
- `test_adapters.py` — 18 new/updated tests: cerebras/google_genai/ollama providers, unknown provider raises ValueError, default provider is anthropic, create_executor guard, multiple colons in model string, openrouter fallback chain, empty API keys, base URL presence/absence, MAX_TOKENS env var
### Async Delegation Merge (PR #41)
- Rebased `feat/async-delegation` onto main, resolved edit-history conflict
- `tools/delegation.py` — non-blocking `delegate_to_workspace` + `check_delegation_status` polling
- `adapters/base.py` — registered `check_delegation_status` as 6th core tool
- `coordinator.py``route_task_to_team` uses async delegation
- 13 delegation tests rewritten for async model
### Delegation Lint Fixes (PR #52)
- `test_delegation.py` — moved `import os` after module docstring
- `base.py` — fixed stale comment "5 core" → "6 core" tools
- `delegation.py` — log notify failures at debug level instead of silent `pass`
### Social Channel System (PR #54)
- `workspace-server/internal/channels/adapter.go``ChannelAdapter` interface + `InboundMessage` + `MessageHandler`
- `workspace-server/internal/channels/registry.go` — adapter registry (Telegram registered)
- `workspace-server/internal/channels/telegram.go` — Telegram adapter (webhook + long-polling)
- `workspace-server/internal/channels/manager.go` — orchestrator with hot reload, conversation history (Redis), allowlist, A2A proxy, typing indicator
- `workspace-server/internal/handlers/channels.go` — REST API (CRUD, send, test, webhook, discover)
- `workspace-server/migrations/016_workspace_channels.sql` — workspace_channels table
- `workspace-server/internal/handlers/a2a_proxy.go` — added `"channel:"` to system caller prefixes
- `canvas/src/components/tabs/ChannelsTab.tsx` — Canvas UI for connecting/managing social channels
- `mcp-server/src/index.ts` — 7 new MCP tools (list_channel_adapters, list_channels, add_channel, update_channel, remove_channel, send_channel_message, test_channel)
- 41 unit tests (channels package) + 13 handler tests (sqlmock) + 23 E2E API checks
- Go test count: 406 → 448, MCP tools: 54 → 61
#### UX iterations (during PR #54)
- **Multi-chat IDs per channel** — `chat_id` field accepts comma-separated list. One Telegram bot can serve multiple groups from a single channel entry.
- **Auto-detect chats** — `POST /channels/discover` calls Telegram getUpdates, returns groups/DMs the bot has seen. Canvas "Detect Chats" button auto-populates the chat_id field.
- **`/start` welcome reply** — bot replies immediately with chat ID so users get instant feedback that it works.
- **`PausePollersForToken`** — discovery pauses any active poller for the same bot token to avoid Telegram's 409 "only one getUpdates" conflict.
- **Hidden manual input** — after Detect Chats, the redundant text input is hidden behind an "edit manually" toggle.
#### Telegram Bot API audit fixes (PR #54 follow-up)
**Critical bugs:**
- SQL `LIKE '%id%'` substring match → exact match in code (chat_id "123" was matching "1234").
- Webhook secret_token verification (X-Telegram-Bot-Api-Secret-Token).
- 4096-char message splitting at paragraph/line/word boundaries.
- Group privacy mode warning surfaced in Discover (`can_read_all_group_messages` field).
**Reliability:**
- Bot instance cache (sync.RWMutex) — eliminates `getMe` API call on every send.
- Typed Telegram error handling: 401→invalidate token, 403→forbidden, 429→honor RetryAfter and retry once.
- DisableWebPagePreview by default.
**UX:**
- `sendChatAction("typing")` goroutine during agent calls — re-sends every 4s.
- Bot commands registered via `setMyCommands``/start`, `/help`, `/reset`, `/cancel` autocomplete.
- `/help`, `/reset` (clears Redis history), `/cancel` handled inline.
- `my_chat_member` event handling: bot auto-greets when added to a group.
- `channel_post` support (Telegram channels in addition to groups/DMs).
- Token format regex validation rejects malformed tokens before API call.
### auth_token_file → required_env (PR #55)
- `workspace/config.py` — added `required_env: list[str]` to `RuntimeConfig`. Deprecated `auth_token_file` / `auth_token_env` (backward compat retained).
- `workspace/preflight.py` — checks `required_env` vars exist; legacy `auth_token_file` still works.
- `workspace/cli_executor.py``_resolve_auth_token()` checks `required_env` first.
- `workspace/adapters/claude_code/adapter.py` — schema declares `required_env: ["CLAUDE_CODE_OAUTH_TOKEN"]`.
- `workspace-server/internal/handlers/workspace_provision.go` — generates `required_env` per runtime, removed `.auth-token` file copying.
- `claude-code-default/config.yaml`, `molecule-dev/org.yaml`, `reno-stars/org.yaml``required_env` replaces `auth_token_file`.
- `canvas/src/components/tabs/ConfigTab.tsx``TagList` for `required_env` replaces `TextInput` for `auth_token_file`.
- New `reno-stars` org template added (15-agent team with full system prompts, knowledge bases, skills).
- 17 Python preflight tests, 10 Go provisioner tests updated.
### E2E Flaky Test Fixes
- `tests/e2e/test_comprehensive_e2e.sh` — Runtime image checks now poll up to 30s for container readiness instead of fixed 10s sleep. Eliminates intermittent FAILs on cold-start container provisioning.
### Restart Pending UX + Poller Lifetime Fix (PR #56)
**Critical bug fix:**
- Channel pollers were dying ~50ms after channel creation because `Reload()` used `c.Request.Context()` from the HTTP handler — when the handler returned, the request ctx was cancelled, killing the polling goroutine.
- **Fix:** Manager now stores a long-lived `bgCtx` set by `Start()` via `sync.Once`. All pollers spawn from `bgCtx`, not request ctx.
- 2 regression tests: `TestManager_PollerSurvivesRequestContext`, `TestManager_BgCtxFallback`.
**UX improvements:**
- `canvas/src/components/Toolbar.tsx` — "Restart Pending (N)" button replaces always-visible "Restart All". Only shows workspaces flagged `needsRestart`; auto-clears flag and disappears after successful restart. Toast feedback for partial failures.
- Global secret CRUD (both legacy `secrets-section.tsx` + new `secrets-store.ts`) marks all workspaces as `needsRestart`. Workspace-scoped secrets only mark the affected workspace.
- `ConfirmDialog.tsx` — uses React Portal (escapes parent transform/filter containing blocks); added `"warning"` amber variant; callbacks via refs to avoid keydown handler churn on parent re-renders.
- New shared helper `canvas/src/lib/canvas-actions.ts``markAllWorkspacesNeedRestart` / `markWorkspaceNeedsRestart` (was duplicated across 2 files).
**Org template channels: field**
- New `channels:` section in org.yaml auto-creates social channel rows on import. Config values support `${VAR}` expansion from `.env` files (workspace `.env` > org root `.env` > platform process env).
- New `OrgChannel` struct, `expandWithEnv()` helper using `os.Expand`, regex-based `hasUnresolvedVarRef()` (literal `$5` no longer false-flagged).
- Adapter validation upfront via `channels.GetAdapter()` + `ValidateConfig()` — fails fast for unknown types or invalid config.
- Idempotent insert: `ON CONFLICT (workspace_id, channel_type) DO UPDATE` — re-importing the same org doesn't fail.
- `channelMgr.Reload()` called once at end of Import (not per-workspace).
- Skipped channels surfaced in import response (`channels_skipped` field with reason).
- Extracted `loadWorkspaceEnv()` helper used by both secret injection and channel config expansion.
- `org-templates/molecule-dev/pm/.env.example` documents required vars (real `.env` gitignored).
- `org-templates/molecule-dev/org.yaml` PM block references vars in `channels: telegram` — talk to PM directly from Telegram immediately after deploy.
- 10 new tests: OrgChannel YAML parsing, expandWithEnv (4 paths), hasUnresolvedVarRef (5 cases).
**Verified live:** Org import created PM workspace, telegram channel auto-linked, poller started polling `@molecule_team_bot` for the configured chat — no manual setup needed.
### Gemini Org + Chat UX Fixes (post-merge)
- `org-templates/molecule-worker-gemini/org.yaml``gemini-2.0-flash``gemini-2.5-flash` (the older model was decommissioned).
- `workspace/a2a_executor.py` — added `recursion_limit` to LangGraph run_config (default 100, configurable via `LANGGRAPH_RECURSION_LIMIT`). Library default of 25 wasn't enough for DeepAgents planning + delegation cycles.
- `canvas/src/components/tabs/ChatTab.tsx` — three fixes:
1. **Hardcoded "Processing with Claude..."** → uses `runtimeDisplayName(data.runtime)` so DeepAgents/LangGraph/CrewAI workspaces show their actual runtime.
2. **Stuck "Processing..." indicator after agent finishes** → HTTP `.then()` handler now extracts the reply from the synchronous response and clears the spinner, in addition to the existing WebSocket path.
3. **Race condition** between WS event and HTTP response → both paths now check `sendingFromAPIRef` and the first-to-fire wins (no duplicate agent messages).
- `canvas/src/lib/runtime-names.ts` — extracted shared `runtimeDisplayName()` for reuse.
- `A2AResponse` type alias + `extractReplyText()` helper extracted in ChatTab (mirrors Go-side `extractReplyText` in `manager.go`).
- `.env.example` — documented `LANGGRAPH_RECURSION_LIMIT`.
### Documentation
- Created `docs/edit-history/2026-04-11.md` (this file)
- Updated `CLAUDE.md` — test counts, API routes, MCP tool count, migration count
- Updated `PLAN.md` — Phase 25 (Social Channels), test coverage table
- Updated `.env.example` — added GROQ_API_KEY, CEREBRAS_API_KEY, GOOGLE_API_KEY, MAX_TOKENS, TELEGRAM_BOT_TOKEN