2026-06-02 18:14:07 +00:00
16 changed files with 1 additions and 2417 deletions
@@ -1,96 +0,0 @@
---
-title: "Hermes Adapter — Shell Design Spec"
-description: "Design spec for the Hermes runtime adapter — the BaseAdapter shell, provider map, and integration points."
---
-# Hermes Adapter — Shell Design Spec
-
-**Perspective:** DevOps Engineer + Backend Engineer  
-**Status:** Draft — pre-implementation  
-**Hermes source:** `NousResearch/hermes-agent` (~61k ⭐)  
-**Adapter runtime key:** `hermes`
-
---
-
-## 1. Files Under `workspace/adapters/hermes/`
-
-| File | Purpose |
-|------|---------|
-| `Dockerfile` | Extends `workspace-template:base`; installs `hermes-agent` Python SDK and its deps via pip at image build time |
-| `requirements.txt` | Python package list — at minimum `hermes-agent`; pin to a specific release tag for reproducibility |
-| `adapter.py` | `HermesAdapter(BaseAdapter)` — implements `name()`, `display_name()`, `description()`, `get_config_schema()`, `setup()`, `create_executor()`; delegates to `_common_setup()` for plugins/skills/tools |
-| `__init__.py` | Exports `Adapter = HermesAdapter` — required by the adapter autodiscovery loader in `workspace/adapters/__init__.py` |
-
-### `Dockerfile` sketch (no implementation — shape only)
-
-```dockerfile
-FROM workspace-template:base
-COPY adapters/hermes/requirements.txt /tmp/hermes-requirements.txt
-RUN pip install --no-cache-dir -r /tmp/hermes-requirements.txt
-```
-
-### `adapter.py` shape
-
-```python
-class HermesAdapter(BaseAdapter):
-    @staticmethod
-    def name() -> str:
-        return "hermes"
-
-    async def setup(self, config: AdapterConfig) -> None:
-        # validate NOUS_API_KEY or OPENROUTER_API_KEY is set
-        # call self._common_setup(config) for plugins/skills/tools
-        ...
-
-    async def create_executor(self, config: AdapterConfig) -> AgentExecutor:
-        # wrap Hermes SDK session as an A2A AgentExecutor
-        ...
-```
-
---
-
-## 2. Platform-Side Changes
-
-### `workspace-server/internal/provisioner/provisioner.go` — `RuntimeImages` map
-
-Add one entry to the existing map:
-
-```go
-var RuntimeImages = map[string]string{
-    // ... existing entries ...
-    "hermes": "workspace-template:hermes",   // ← ADD THIS
-}
-```
-
-No other platform Go changes are required for the minimal adapter shell. The `runtime` column in the `workspaces` table is a free-form string; no enum migration needed.
-
-### `workspace/build-all.sh`
-
-Add `hermes` to the adapter build loop so `build-all.sh` (and the `build-all.sh claude-code`-style single-runtime path) includes it:
-
-```bash
-ADAPTERS=(langgraph claude_code openclaw autogen hermes codex google-adk)
-```
-
---
-
-## 3. Required Environment Variables
-
-| Name | Required | Description |
-|------|----------|-------------|
-| `NOUS_API_KEY` | Required (unless `OPENROUTER_API_KEY` set) | Nous Research Portal API key — primary model provider for Hermes; obtain from `nousresearch.com` |
-| `OPENROUTER_API_KEY` | Optional | Fallback provider; lets operators use any Hermes-supported model via OpenRouter instead of Nous Portal |
-| `HERMES_MODEL` | Optional | Model identifier (e.g. `nous-hermes-3`, `openrouter:anthropic/claude-sonnet-4-5`); adapter defaults to `nous-hermes-3` if unset |
-| `HERMES_SKILLS_DIR` | Optional | Path inside the container where Hermes looks for skills; defaults to `/configs/skills` — consistent with the Claude Code and LangGraph adapters |
-
-**Note:** `NOUS_API_KEY` and `OPENROUTER_API_KEY` must be set as workspace secrets via `POST /workspaces/:id/secrets`, not baked into the image. At least one of the two must be present at container start; `setup()` should `raise RuntimeError` early with a clear message if both are absent.
-
---
-
-## 4. Smallest Viable Adapter — Scope Constraints
-
-This spec covers the **shell only** — the minimum to make a Hermes workspace provision, boot, and accept A2A messages:
-
- No Hermes learning loop (skill self-improvement) in v1 — that requires persistent storage writes outside `/configs`; defer to a follow-up PR.
- No multi-messenger gateway integration — Hermes's Telegram/Discord/Slack channels are separate from Molecule AI's `/channels` feature; map these later via the channels adapter.
- No FTS5 memory backend — use Molecule AI's existing `commit_memory` / `search_memory` built-in tools for v1; Hermes-native memory can be layered in a subsequent PR.
- The executor wraps one Hermes agent session per workspace, matching the 1:1 workspace→agent model used by all other adapters.
@@ -1,78 +0,0 @@
---
-title: "Hermes Adapter — Implementation Plan"
-description: "Implementation plan for the Hermes runtime adapter, from SDK import path to adapter.py build steps."
---
-# Hermes Adapter — Implementation Plan
-
-**Author:** Dev Lead  
-**Date:** 2026-04-13  
-**Branch convention:** `feat/hermes-adapter-<step>` for each PR below  
-**Target:** Ship a minimal but functional Hermes workspace adapter in 4 PRs, each ≤200 lines changed.
-
---
-
-## PR Sequence
-
-### PR 1 — Docker image shell
-
-**Title:** `feat(hermes): add workspace-template:hermes Docker image`
-
-**Files touched:**
- `workspace/adapters/hermes/Dockerfile` (new)
- `workspace/adapters/hermes/requirements.txt` (new)
- `workspace/adapters/hermes/__init__.py` (new)
- `workspace/build-all.sh` (1-line addition)
-
-**Description:** Adds the Hermes Docker image layer. `Dockerfile` extends `workspace-template:base` and installs `hermes-agent` (and declared deps) via pip at build time. `build-all.sh` gains `hermes` in the adapter list so `bash build-all.sh` and `bash build-all.sh hermes` both work. No Python adapter logic yet — just proves the image builds and that `import hermes` succeeds inside the container. CI: add `hermes` to the docker-build matrix.
-
---
-
-### PR 2 — Python adapter + A2A executor
-
-**Title:** `feat(hermes): implement HermesAdapter and A2A executor`
-
-**Files touched:**
- `workspace/adapters/hermes/adapter.py` (new, ~80 lines)
- `workspace/tests/test_adapters.py` (extend existing test file, ~30 lines)
-
-**Description:** Implements `HermesAdapter(BaseAdapter)` with `name()`, `display_name()`, `description()`, `get_config_schema()`, `setup()`, and `create_executor()`. `setup()` calls `_common_setup()` to load plugins/skills/tools identically to other adapters, then validates that `NOUS_API_KEY` or `OPENROUTER_API_KEY` is present and initialises a Hermes SDK session. `create_executor()` wraps the session as an `AgentExecutor`. Tests cover: adapter name/display_name contract, `setup()` raises `RuntimeError` when both API keys are absent, executor is returned after valid setup.
-
---
-
-### PR 3 — Platform RuntimeImages entry
-
-**Title:** `fix(provisioner): add hermes to RuntimeImages map`
-
-**Files touched:**
- `workspace-server/internal/provisioner/provisioner.go` (1-line addition)
- `workspace-server/internal/provisioner/provisioner_test.go` (1-line addition in RuntimeImages coverage test)
-
-**Description:** Adds `"hermes": "workspace-template:hermes"` to the `RuntimeImages` map. Without this entry the platform falls back to `workspace-template:langgraph` (wrong deps, agent fails to start). Test: extend the existing table-driven test that asserts every declared runtime resolves to a non-empty image tag.
-
---
-
-### PR 4 — Integration docs + org template entry
-
-**Title:** `docs(hermes): adapter usage guide and org template example`
-
-**Files touched:**
- `docs/adapters/hermes-adapter-design.md` (update status from Draft → Implemented)
- `workspace-configs-templates/hermes/config.yaml` (new, ~20 lines — minimal config template)
- `org-templates/molecule-worker-gemini/org.yaml` or a new `molecule-hermes/` org template (optional, ~30 lines)
-
-**Description:** Marks the design doc as implemented, adds a `workspace-configs-templates/hermes/config.yaml` so operators can create a Hermes workspace from the UI template picker, and optionally adds a minimal org template showing a Hermes-runtime team. Documents the three env vars (`NOUS_API_KEY`, `OPENROUTER_API_KEY`, `HERMES_MODEL`) in the config template comments.
-
---
-
-## Sequencing Notes
-
- PRs 1 and 2 can overlap in development but PR 2 must merge after PR 1 (image must exist before adapter tests run in CI).
- PR 3 is a single-line change and can merge any time after PR 1 lands.
- PR 4 has no code risk; it can be drafted alongside PR 2 and merged last.
- Total estimated diff: ~180 lines of new code across all 4 PRs; well within the ≤200 lines/PR budget.
-
-## Open Questions (resolve before PR 2)
-
-1. **Hermes SDK import path** — confirm the pip package name and the Python import path (`import hermes`? `from hermes_agent import ...`?). Check `NousResearch/hermes-agent` README before writing adapter.py.
-2. **Session persistence** — Hermes has a learning loop that writes skill files. Decide at PR 2 time whether to mount `/workspace` as the Hermes skills root or suppress auto-write in v1.
-3. **Model default** — confirm the correct model identifier string for Nous Portal (e.g. `nous-hermes-3-70b` vs `hermes-3`); hardcode a safe default in `get_config_schema()`.
@@ -1,264 +0,0 @@
---
-title: "Hermes Agent — Adapter Reconnaissance"
-description: "Reconnaissance of the NousResearch hermes-agent project as a candidate Molecule AI runtime adapter."
---
-# Hermes Agent — Adapter Reconnaissance
-
-Reconnaissance of [NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent) (v0.8.0, 68,713 ⭐, MIT) for potential Molecule AI adapter integration.
-
-> **Status:** Design-only recon — no implementation.
-
---
-
-## a) CLI Invocation
-
-**Install** (curl-to-bash, targets Linux/macOS/WSL2/Termux):
-
-```bash
-curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
-```
-
-The `hermes` binary in the repo root is a Python script (`#!/usr/bin/env python3`) that imports and calls `hermes_cli.main.main()`. After install it lands on `$PATH`.
-
-**Minimal interactive session:**
-
-```bash
-hermes                      # launches TUI, auto-detects provider from env
-hermes chat                 # explicit; same as bare `hermes`
-hermes setup                # one-time wizard: sets model, provider, API keys
-```
-
-**Key runtime flags:**
-
-```bash
-hermes chat \
-  --model anthropic/claude-opus-4.6 \
-  --provider openrouter \
-  --toolsets terminal,file,web \
-  --max-turns 60 \
-  --query "build me a FastAPI app" \
-  --resume                  # continue most recent session
-  --worktree                # git-worktree isolation per session
-  --profile myprofile       # load alternate HERMES_HOME profile
-```
-
-**One-shot (non-interactive):**
-
-```bash
-hermes chat --query "summarise this repo" --quiet
-```
-
-**Gateway (messaging platforms) start:**
-
-```bash
-hermes gateway start        # daemonises; reads gateway config from config.yaml
-hermes gateway status
-hermes gateway stop
-```
-
-**OpenClaw migration:**
-
-```bash
-hermes claw migrate --dry-run   # preview; drop --dry-run to execute
-```
-
---
-
-## b) Config Format
-
-**Format:** YAML  
-**Primary path:** `~/.hermes/config.yaml` (default), overrideable via `HERMES_HOME` env var.  
-**Reference file in repo:** `cli-config.yaml.example`
-
-**Minimal working config** (provider = OpenRouter, Docker terminal backend):
-
-```yaml
-# ~/.hermes/config.yaml
-
-model:
-  default: "anthropic/claude-opus-4.6"
-  provider: "openrouter"          # required; "auto" if you want env-var detection
-  base_url: "https://openrouter.ai/api/v1"
-
-terminal:
-  backend: "local"                # required; options: local | ssh | docker | singularity | modal | daytona
-  cwd: "."
-  timeout: 180
-  lifetime_seconds: 300
-
-memory:
-  memory_enabled: true
-  user_profile_enabled: true
-  memory_char_limit: 2200
-  user_char_limit: 1375
-  nudge_interval: 10
-
-agent:
-  max_turns: 60
-  reasoning_effort: "medium"      # xhigh | high | medium | low | minimal | none
-```
-
-**Required fields:** `model.default`, `model.provider`, `terminal.backend`.  
-Everything else has a hardcoded default.
-
-**Credentials** go in `~/.hermes/.env` (separate from config.yaml):
-
-```bash
-OPENROUTER_API_KEY=sk-or-...
-ANTHROPIC_API_KEY=sk-ant-...
-HERMES_HOME=~/.hermes           # optional override
-```
-
-**Skills config** (in `config.yaml`):
-
-```yaml
-skills:
-  creation_nudge_interval: 15   # remind agent to persist a skill every N tool iterations
-  external_dirs:
-    - ~/.agents/shared-skills   # read-only external skill dirs
-```
-
-**Compression config** (in `config.yaml`):
-
-```yaml
-compression:
-  enabled: true
-  threshold: 0.50
-  summary_model: "google/gemini-3-flash-preview"
-```
-
---
-
-## c) Runtime Dependencies
-
-**Python version:** 3.13 (Dockerfile base: `ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie`)  
-**Package manager:** [uv](https://github.com/astral-sh/uv) (not pip directly; `uv pip install .`)  
-**Package version:** `hermes-agent==0.8.0`
-
-**Top core pip dependencies** (from `pyproject.toml`):
-
-| Package | Version constraint | Purpose |
-|---|---|---|
-| `openai` | `>=2.21.0,<3` | Primary LLM client (all providers via OpenAI-compat API) |
-| `anthropic` | `>=0.39.0,<1` | Direct Anthropic API adapter |
-| `python-dotenv` | `>=1.2.1,<2` | `.env` loading |
-| `fire` | `>=0.7.1,<1` | CLI argument dispatch |
-| `httpx[socks]` | `>=0.28.1,<1` | Async HTTP (gateway, webhooks) |
-| `rich` | `>=14.3.3,<15` | TUI rendering |
-| `pyyaml` | `>=6.0.2,<7` | Config file parsing |
-| `pydantic` | `>=2.12.5,<3` | Data validation |
-| `prompt_toolkit` | `>=3.0.52,<4` | Interactive TUI / multiline input |
-| `tenacity` | `>=9.1.4,<10` | Retry logic |
-
-**Key optional extras:**
-
-```bash
-pip install "hermes-agent[modal]"     # modal>=1.0.0 — serverless backend
-pip install "hermes-agent[daytona]"   # daytona>=0.148.0 — cloud sandbox backend
-pip install "hermes-agent[mcp]"       # mcp>=1.2.0 — MCP server/client
-pip install "hermes-agent[honcho]"    # honcho-ai — cross-session user modeling
-pip install "hermes-agent[messaging]" # telegram, discord.py, aiohttp, slack
-pip install "hermes-agent[voice]"     # faster-whisper, sounddevice, numpy
-pip install "hermes-agent[rl]"        # atroposlib, fastapi, uvicorn, wandb
-```
-
-**System binaries** (from Dockerfile `apt-get install`):
-
-```
-nodejs  npm  ripgrep  ffmpeg  gcc  python3-dev  libffi-dev  procps  build-essential
-```
-
-`ripgrep` is used by the `file` toolset for fast codebase search. `ffmpeg` is used for voice transcription pre-processing.
-
---
-
-## d) Session State
-
-**All persistent state lives under `HERMES_HOME`** (default: `~/.hermes/`, overrideable via env var).
-
-**Primary state store: SQLite**
-
-```
-~/.hermes/state.db          ← DEFAULT_DB_PATH = get_hermes_home() / "state.db"
-```
-
- Schema version: **6** (`SCHEMA_VERSION = 6` in `hermes_state.py`)
- WAL mode (`PRAGMA journal_mode=WAL`) — supports concurrent gateway + CLI writers
- Three core tables: `schema_version`, `sessions`, `messages`
- **FTS5 virtual table** `messages_fts` with auto-sync triggers on INSERT/UPDATE/DELETE — backs the `session_search` toolset (full-text search across all past conversation content)
- Compression-triggered session splitting tracked via `parent_session_id` chain in `sessions` table
- Session source tagged as `'cli'`, `'telegram'`, `'discord'`, etc. for per-platform filtering
-
-**Full directory layout:**
-
-```
-~/.hermes/
-├── config.yaml          ← get_config_path()
-├── .env                 ← get_env_path()
-├── state.db             ← SQLite WAL, FTS5
-├── skills/              ← get_skills_dir() — user-created skill SKILL.md files
-├── logs/                ← get_logs_dir() — trajectory JSONs
-│   └── session_YYYYMMDD_HHMMSS_<uuid>.json
-├── MEMORY.md            ← agent's curated notes (injected into system prompt)
-├── USER.md              ← user profile (injected into system prompt)
-└── skins/               ← optional custom theme YAMLs
-```
-
-**State is persistent by default.** Session history, memories (`MEMORY.md`/`USER.md`), and skills survive restarts. The `session_reset` config controls when gateway sessions are cleared (default: `mode: both`, idle after 1440 min or at 4 AM daily). Before any reset, Hermes is given one flush turn to write important context to `MEMORY.md`.
-
-Container backend state is controlled separately by `container_persistent: true/false` in the `terminal:` block.
-
---
-
-## e) Execution Backends
-
-**Six backends configured via a single `terminal.backend` key in `config.yaml`:**
-
-| Backend | Where commands run | Key extra config |
-|---|---|---|
-| `local` | Host machine, current dir | — |
-| `ssh` | Remote server | `ssh_host`, `ssh_user`, `ssh_key` |
-| `docker` | Inside a Docker container | `docker_image`, `docker_mount_cwd_to_workspace` |
-| `singularity` | Singularity/Apptainer container (HPC) | `singularity_image` |
-| `modal` | Modal cloud sandbox (serverless) | `modal_image`, `pip install hermes-agent[modal]` |
-| `daytona` | Daytona cloud sandbox | `daytona_image`, `container_disk`, `pip install hermes-agent[daytona]` |
-
-**Architecture clarification:** Hermes's Python process **always runs locally** (or wherever you launched it). The `backend` setting controls only where the **`terminal` tool** executes shell commands. For `docker`, Hermes calls the Docker API to spawn/reuse a container and routes `terminal` tool calls into it via exec — Hermes itself is **not** containerised by this setting.
-
-**Docker backend minimal config:**
-
-```yaml
-terminal:
-  backend: "docker"
-  cwd: "/workspace"                              # path inside the container
-  timeout: 180
-  lifetime_seconds: 300
-  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
-  docker_mount_cwd_to_workspace: false           # default: false (security off). Set true to bind-mount launch dir into /workspace
-  docker_forward_env:
-    - "GITHUB_TOKEN"
-    - "NPM_TOKEN"
-  container_cpu: 1
-  container_memory: 5120                         # MB
-  container_disk: 51200                          # MB
-  container_persistent: true                     # false = ephemeral container, wiped after session
-```
-
-**The Dockerfile** (for running *all of Hermes* inside Docker, distinct from the backend setting) uses:
-
-```dockerfile
-FROM debian:13.4
-ENV HERMES_HOME=/opt/data
-ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
-VOLUME /opt/data
-ENTRYPOINT ["/opt/hermes/docker/entrypoint.sh"]
-# Runs as non-root user hermes (UID 10000), home /opt/data
-```
-
-**Serverless hibernation** (Modal + Daytona): `container_persistent: false` produces fully ephemeral sandboxes that are destroyed after `lifetime_seconds`; `true` persists the container filesystem between sessions (warm-resume, no re-install overhead).
-
---
-
-## f) Value Proposition
-
-Integrating Hermes adds one capability that none of the other existing adapters (LangGraph, Claude Code, AutoGen, OpenClaw, Codex, Google ADK) deliver end-to-end: **a closed learning loop that compounds across sessions at the skill, memory, and user-model layers simultaneously.** Concretely: after a complex task, Hermes autonomously creates a `SKILL.md` file in `~/.hermes/skills/` (prompted every `creation_nudge_interval=15` tool iterations), and those skills are re-injected as context in future sessions — agents get better at tasks they've done before without any human curation step. The `session_search` toolset adds FTS5 + Gemini Flash summarization over `state.db`, so the agent can recall specific conversations from months ago with semantic-quality results. Layered on top is **Honcho dialectic user modeling** (`plastic-labs/honcho`) — a cross-session profile that tracks user communication style, preferences, and expectations, shared across any Honcho-integrated tool (not just Hermes). Finally, the **Modal and Daytona serverless backends with `container_persistent`** give Molecule AI a path to hibernating, pay-per-use sandboxes that no existing adapter exposes — directly relevant to Molecule AI's multi-workspace billing model. The `hermes claw migrate` command (backed by `optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py`) is also relevant: Molecule AI could offer equivalent migration tooling to attract OpenClaw's existing ~247k-user base, and the **`agentskills.io` skill-manifest spec** (referenced in `optional-skills/`) should be reviewed before Molecule AI finalises its own plugin manifest schema to ensure interoperability with what is rapidly becoming the de-facto file-based skill standard.
@@ -1,177 +0,0 @@
---
-title: "MeDo Integration Design — Molecule AI Hackathon (May 20 2026)"
-description: "Design for integrating the Baidu MeDo / Miaoda App Builder as an OpenClaw-runtime workspace, with A2A delegation and open questions."
---
-# MeDo Integration Design — Molecule AI Hackathon (May 20 2026)
-
-**Status:** Design — implementation pending operator sign-off on open questions (§5).  
-**Scope:** How the molecule-dev team builds MeDo apps for the "Build with MeDo" hackathon.  
-**Key constraint:** MeDo App Builder is an OpenClaw skill on ClawHub (`seiriosPlus/miaoda-app-builder`),
-not a REST API. All interactions go through natural-language messages to an OpenClaw workspace.
-
---
-
-## 1. Architecture Overview
-
-```
-CEO / Canvas
-    │  A2A task
-    ▼
-  PM (claude-code)
-    │  delegate_task_async → workspace: medo-builder
-    ▼
-  MeDo Builder workspace  [runtime: openclaw, skill: miaoda-app-builder]
-    │  OpenClaw CLI → skill → api.miaoda.cn
-    ▼
-  MeDo platform (app created / published → URL returned)
-    │  result relayed via A2A event_queue
-    ▼
-  PM → CEO
-```
-
-The MeDo Builder workspace is a **dedicated OpenClaw-runtime workspace** inside the
-molecule-dev org with the Miaoda App Builder skill pre-installed. PM delegates natural-language
-app-build requests to it via `delegate_task_async` and polls for the result (5–8 min latency).
-
---
-
-## 2. Installing the Miaoda App Builder Skill
-
-### 2.1 API Key
-
-The skill requires `MIAODA_API_KEY` (not `MEDO_API_KEY`).
-
-> ⚠️ **Credential name mismatch**: the global platform secret is currently named `MEDO_API_KEY`.
-> The skill's frontmatter declares `primaryEnv: MIAODA_API_KEY`. The MeDo Builder workspace must
-> set `MIAODA_API_KEY` — either rename the global secret or add a workspace-level alias.
-> See open question §5-A.
-
-Obtain the key from: **MeDo website → Settings → API Keys**. Keys do not expire, but generating
-a new one immediately invalidates the previous one.
-
-### 2.2 Installation Query
-
-OpenClaw installs skills by sending a natural-language install message to the agent.
-No CLI command is documented on ClawHub — send this message to the OpenClaw workspace on first boot:
-
-```
-Install the Miaoda App Builder skill from ClawHub: seiriosPlus/miaoda-app-builder
-```
-
-OpenClaw auto-downloads the skill, installs Python runtime deps (`requests`), and makes the skill
-available for subsequent messages.
-
-### 2.3 Workspace Config Sketch (`org-templates/medo-builder/workspace.yaml`)
-
-```yaml
-name: MeDo Builder
-role: Builds and publishes MeDo applications via the Miaoda App Builder OpenClaw skill
-runtime: openclaw
-tier: 2
-required_env:
-  - MIAODA_API_KEY          # TODO: resolve name vs platform secret MEDO_API_KEY (§5-A)
-  - OPENROUTER_API_KEY      # OpenClaw needs an LLM provider
-initial_prompt: |
-  You are a MeDo App Builder. On startup:
-  1. Install the Miaoda App Builder skill:
-     "Install the Miaoda App Builder skill from ClawHub: seiriosPlus/miaoda-app-builder"
-  2. Confirm installation succeeded.
-  3. Wait for build tasks from PM via A2A.
-  When you receive a build task, use natural language to instruct the skill:
-  "Create a [description] app and publish it when done."
-  App generation takes 5–8 minutes — poll the skill or wait for confirmation before reporting done.
-```
-
---
-
-## 3. A2A Delegation Pattern (5–8 Min Latency)
-
-App generation is asynchronous and slow. PM **must** use `delegate_task_async` + `check_task_status`
-rather than `delegate_task` (which has a shorter timeout and will return before the app is ready).
-
-### 3.1 PM Delegation Flow
-
-```python
-# Step 1: fire and forget
-task = await delegate_task_async(
-    workspace_id="medo-builder-workspace-id",
-    task="Build a restaurant reservation tool with online booking, menu display, "
-         "and contact form. Publish when done and return the URL."
-)
-
-# Step 2: poll every 60s (app takes 5–8 min)
-while True:
-    status = await check_task_status(task_id=task["task_id"])
-    if status["status"] in ("completed", "failed"):
-        break
-    await asyncio.sleep(60)
-
-result_url = status.get("result")  # MeDo app URL on success
-```
-
-### 3.2 Invocation Patterns (verified from Baidu doc)
-
-Natural-language messages the MeDo Builder workspace should accept from PM:
-
-| Intent | Message to send to MeDo Builder workspace |
-|--------|-------------------------------------------|
-| List existing apps | `"Show me my apps"` |
-| Create + auto-publish | `"Create a [description] and publish it when done"` |
-| Create only | `"Create a [description]"` |
-| Modify existing | `"Add a search function to app [name/ID]"` |
-| Publish draft | `"Publish this app"` |
-| Status check | `"Is the app generation done yet?"` |
-
---
-
-## 4. Proposed Org Template — `org-templates/medo-builder/`
-
-```
-org-templates/medo-builder/
-├── org.yaml                    ← minimal single-workspace org (not full team)
-├── medo-builder/
-│   ├── system-prompt.md        ← MeDo Builder agent persona + delegation rules
-│   └── workspace.yaml          ← runtime: openclaw, skill install, env
-```
-
-**org.yaml sketch:**
-
-```yaml
-name: MeDo Builder
-description: Single-workspace org for building MeDo apps (hackathon)
-defaults:
-  runtime: openclaw
-  tier: 2
-  required_env: [MIAODA_API_KEY, OPENROUTER_API_KEY]
-
-workspaces:
-  - name: MeDo Builder
-    role: Builds and publishes MeDo applications via Miaoda App Builder skill
-    files_dir: medo-builder
-    canvas: { x: 400, y: 300 }
-```
-
-The medo-builder workspace is deployed **as a child of the molecule-dev PM** in the hackathon org,
-not as a standalone org. Full `org-templates/medo-builder/` implementation is Week 2 scope.
-
---
-
-## 5. Open Questions (Operator Resolution Required)
-
-| # | Question | Why it blocks |
-|---|----------|---------------|
-| 5-A | **Credential name**: platform secret is `MEDO_API_KEY`; skill expects `MIAODA_API_KEY`. Rename global secret or add workspace alias? | Workspace boot will fail with "MIAODA_API_KEY not set" |
-| 5-B | **Credit cost per app**: Baidu doc mentions a Credit System but content was not rendered. How many credits does create+generate+publish consume? Do we have enough for hackathon testing? | Budget planning |
-| 5-C | **Rate limits**: no rate-limit info in docs or ClawHub page. What's the max concurrent app generations per API key? | Parallelism planning |
-| 5-D | **Failure recovery**: what happens if the OpenClaw skill process crashes mid-generation (after Confirm & Generate, before Publish)? Is there a way to resume or check status by app ID? | Reliability design |
-| 5-E | **Submission format**: does the hackathon judge the published MeDo app URL, the Molecule AI org config, or both? | Determines whether we need a polished demo org or just a working app |
-
---
-
-## 6. Implementation Checklist (Weeks 1–3)
-
- [x] Week 1: This design doc (`docs/adapters/medo-integration.md`)
- [ ] Week 1: Resolve §5-A (credential name) + obtain API key credits estimate
- [ ] Week 2: `org-templates/medo-builder/` — full system-prompt + workspace.yaml
- [ ] Week 2: Integration test — PM delegates one real app build end-to-end
- [ ] Week 3: Polish demo org; rehearse submission flow; publish hackathon entry
@@ -1,117 +0,0 @@
---
-title: "MeDo Smoke Test Log — 2026-04-13 (Run 4)"
-description: "Smoke-test run log for the MeDo / Miaoda App Builder OpenClaw integration."
---
-# MeDo Smoke Test Log — 2026-04-13 (Run 4)
-
-**Tester:** PM (direct execution)  
-**Goal:** Install Miaoda App Builder skill → build "Hello Molecule AI" landing page → publish → URL.  
-**Credits spent:** 0 across all four runs.
-
---
-
-## Run Summary
-
-| Run | Blocker | Resolution |
-|-----|---------|------------|
-| 1 | `workspace-template:openclaw` image not built | ✅ Operator rebuilt image |
-| 2 | Adapter key lookup ignores `AISTUDIO_API_KEY` / `QIANFAN_API_KEY` | ✅ Code fix committed (d779e16) |
-| 3 | Executor creates fresh OpenClaw session per A2A message | ✅ Code fix committed (9466943) |
-| 4 | `payloads: []` on every response — agent never returns text via `--json` mode | ❌ Root cause below |
-
---
-
-## Run 4 — Detailed Findings
-
-### Environment — all green
-| Check | Result |
-|-------|--------|
-| Platform health | ✅ |
-| `workspace-template:openclaw` image | ✅ boots in 31s |
-| AISTUDIO_API_KEY + gemini-2.0-flash | ✅ confirmed in every response meta |
-| Stable session ID (workspace ID) | ✅ `sessionKey: agent:main:explicit:a507780d-...` consistent across all calls |
-
-### Messages Sent and Responses
-
-| Message | Response | Duration |
-|---------|----------|----------|
-| Install skill | `payloads: [], livenessState: working` | 1.7s |
-| Build Hello Molecule AI | `payloads: [], livenessState: working` | 0.8s |
-| Check status (sessions_list) | `LLM request failed: provider rejected request schema/payload` | — |
-| Reply with exactly: STATUS_OK | `payloads: [], livenessState: working` (after restart) | 1.8s |
-
-The "Reply with exactly: STATUS_OK" response is decisive. A vanilla LLM call with no tool use should produce a text payload. It didn't. This rules out skill complexity or message ambiguity as the cause.
-
-### Root Cause — `openclaw agent --json` Does Not Surface Agent Text in `payloads`
-
-The OpenClaw agent processes messages using background session dispatch (`sessions_spawn` / `sessions_yield`). In this mode:
-1. Main session receives message → immediately spawns background session → calls `sessions_yield`
-2. `openclaw agent --json` exits with `payloads: [], livenessState: 'working'`
-3. Background session processes the actual work and produces text — but only visible in interactive/streaming mode, not in the `--json` subprocess call
-
-**Evidence:** Even "Reply with exactly: STATUS_OK" returns `payloads: []`. The agent is using background sessions for everything, including trivial echo requests.
-
-**Likely cause:** OpenClaw's default `SOUL.md` / `BOOTSTRAP.md` workspace config instructs the agent to always use async session patterns. In a terminal session these background responses appear naturally; via subprocess `--json`, only the main session's synchronous output is captured.
-
-### Transient issue: LLM request failed
-After 3+ rapid A2A calls (install → build → status check), the Gemini AI Studio API returned a schema/payload rejection. Resolved by restarting the workspace (`POST /workspaces/:id/restart`). Likely a rate-limit or context-size rejection from Gemini. Restarted in 30s, normal on next call.
-
---
-
-## 4. Required Fix — OpenClawA2AExecutor Response Capture
-
-The executor must retrieve the agent's text response from session history **after** the main session yields. The `sessions_history` CLI command (exposed as `session_history` tool) retrieves past messages.
-
-**Proposed change** to `workspace/adapters/openclaw/adapter.py` (`execute()` method):
-
-```python
-# After proc.communicate() returns with payloads=[]:
-if not reply or reply.startswith("{'payloads': []"):
-    # Agent yielded without responding — fetch last message from session history
-    await asyncio.sleep(2)  # brief wait for background session to complete short tasks
-    hist_proc = await asyncio.create_subprocess_exec(
-        "openclaw", "sessions", "history",
-        "--session-id", self._session_id,
-        "--limit", "1", "--json",
-        stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
-        env={**os.environ, "PATH": f"{os.path.expanduser('~/.local/bin')}:{os.environ.get('PATH', '')}"}
-    )
-    hist_stdout, _ = await asyncio.wait_for(hist_proc.communicate(), timeout=15)
-    hist_data = json.loads(hist_stdout.decode().strip() or "{}")
-    last_msg = (hist_data.get("messages") or [{}])[-1]
-    reply = last_msg.get("content", reply)  # fall back to original if no history
-```
-
-**Note on long tasks (5–8 min builds):** Session history won't have the build result until it completes. For Miaoda App Builder, PM must poll: send a follow-up "What is the status of the Hello Molecule AI app build?" message every 60s until the response contains a URL or error.
-
---
-
-## 5. Open Questions Status
-
-### 5-C — Rate limits
-**UNKNOWN.** Never reached skill invocation.  
-*New data:* Gemini AI Studio hit a schema/payload rejection after 3 rapid calls. This may be a Gemini-specific issue with large tool schemas (OpenClaw's `cron` schema is 6311 chars). Worth filing separately.
-
-### 5-D — Failure recovery
-**UNKNOWN.** Never reached app generation.
-
---
-
-## 6. Issues to File
-
-| # | Issue | Status | Location |
-|---|-------|--------|----------|
-| A | `fix(openclaw): use stable workspace session ID` | ✅ fixed in 9466943 | adapter.py |
-| B | `fix(openclaw): extend key lookup for AISTUDIO/QIANFAN` | ✅ fixed in d779e16 | adapter.py |
-| C | `fix(provisioner): surface Docker errors in last_sample_error` | ❌ open | provisioner.go |
-| **D** | **`fix(openclaw): capture agent response via session history when payloads=[]`** | ❌ open — see §4 | adapter.py |
-| **E** | **`fix(openclaw): Gemini rejects request after N rapid calls with large tool schema`** | ❌ open — investigate cron schema size | adapter.py |
-
---
-
-## 7. Next Steps (before Run 5)
-
- [ ] **Dev Lead:** Implement §4 session-history fallback in `OpenClawA2AExecutor.execute()`
- [ ] **Dev Lead (optional):** Trim `cron` tool schema to reduce Gemini schema-size rejection risk
- [ ] **Operator:** Rebuild image: `bash workspace/build-all.sh openclaw`
- [ ] **PM (Run 5):** Re-run smoke test — expected to finally reach skill install confirmation
@@ -1,112 +0,0 @@
---
-title: "ADR-001: Admin endpoints accept any workspace bearer token"
-description: "ADR-001: why admin endpoints validate any workspace bearer token, and the AdminAuth lockdown that followed."
---
-# ADR-001: Admin endpoints accept any workspace bearer token
-
-**Status:** Accepted — known risk, Phase-H remediation planned
-**Date:** 2026-04-17
-**Issue:** #684
-**Tracking:** Phase-H — #710
-
-## Context
-
-The `AdminAuth` middleware validates callers by calling `ValidateAnyToken`, which
-accepts any live workspace bearer token regardless of which workspace issued it.
-There is no separation between workspace-scoped tokens (issued to individual
-agents) and admin-scoped tokens (intended for platform operators).
-
-This means any workspace agent that has been issued a token can reach every
-admin-gated route on the platform.
-
-## Decision
-
-Proper token-tier separation (workspace vs. admin scope) is deferred to Phase-H.
-The known risk is explicitly accepted. Mitigation controls are documented below.
-
-## Blast radius — affected admin endpoints
-
-A compromised workspace token grants unauthenticated-equivalent access to all
-of the following:
-
-| Endpoint | Impact |
-|----------|--------|
-| `GET /admin/workspaces/:id/test-token` | Mint a fresh bearer token for any workspace |
-| `DELETE /workspaces/:id` | Delete any workspace and auto-revoke its tokens |
-| `PUT /settings/secrets` / `POST /admin/secrets` | Overwrite any global secret (env-poisons every agent on restart) |
-| `DELETE /settings/secrets/:key` / `DELETE /admin/secrets/:key` | Delete any global secret; same fan-out restart |
-| `GET /settings/secrets` / `GET /admin/secrets` | Read all global secret keys (values masked, but key enumeration enables targeted attacks) |
-| `GET /workspaces/:id/budget` + `PATCH /workspaces/:id/budget` | Read or clear any workspace's token budget |
-| `GET /events` / `GET /events/:workspaceId` | Read the full structural event log across all workspaces |
-| `POST /bundles/import` | Import an arbitrary workspace bundle — creates workspaces, injects secrets, overwrites configs |
-| `GET /bundles/export/:id` | Exfiltrate full workspace bundle including config, secrets references, and files |
-| `POST /org/import` | Instantiate an entire org template — creates multiple workspaces with arbitrary roles and secrets |
-| `GET /org/templates` | Enumerate all org template names and their configured roles/system prompts |
-| `POST /templates/import` | Write arbitrary files into `configsDir` (workspace template injection) |
-| `GET /templates` | Enumerate all template names and metadata |
-| `GET /admin/liveness` | Read platform subsystem health (ops intel) |
-| `GET /admin/schedules/health` | Read cron scheduler health across all workspaces |
-
-## Risk statement
-
-**A single compromised workspace agent can achieve full platform takeover via
-admin endpoints.**
-
-Attack chain example:
-1. Agent A's token is exfiltrated (e.g. via a prompt-injection in a delegated task).
-2. Attacker calls `PUT /settings/secrets` to overwrite `CLAUDE_API_KEY` with a
-   controlled value.
-3. Every non-paused workspace restarts and loads the poisoned key.
-4. Attacker now controls the LLM backend for the entire platform.
-
-Alternatively: call `POST /bundles/import` with a crafted bundle to inject a
-malicious workspace with a pre-configured `initial_prompt` and elevated secrets.
-
-## Current mitigations
-
- **Workspace isolation** — `CanCommunicate()` in the A2A proxy limits which
-  workspaces can send tasks to which, reducing the blast radius of a single
-  compromised agent during normal operation.
- **Audit logging** — PR #651 writes all admin-route calls to `structure_events`.
-  Forensic recovery is possible after the fact.
- **`ValidateAnyToken` removed-workspace JOIN** — tokens belonging to deleted
-  workspaces are filtered at the DB layer (PR #682 defense-in-depth) so
-  post-deletion token replay is blocked.
- **`MOLECULE_ENV=production` gate** — hides the `/admin/workspaces/:id/test-token`
-  endpoint in production deployments unless `MOLECULE_ENABLE_TEST_TOKENS=1`.
-
-## Phase-H remediation plan
-
-Tracked in GitHub issue **#710**.
-
-### Schema change
-
-Add a `token_type` column to `workspace_auth_tokens`:
-
-```sql
-ALTER TABLE workspace_auth_tokens
-  ADD COLUMN IF NOT EXISTS token_type TEXT NOT NULL DEFAULT 'workspace'
-  CHECK (token_type IN ('workspace', 'admin'));
-```
-
-Admin tokens are minted only via a dedicated privileged endpoint that itself
-requires an existing admin token or a one-time bootstrap secret.
-
-### Middleware update
-
- `WorkspaceAuth` — continue accepting `token_type = 'workspace'` only.
- `AdminAuth` — require `token_type = 'admin'`. Workspace tokens rejected.
-
-### Bootstrap flow
-
-On first boot (no tokens exist), a single-use bootstrap secret is printed to
-the server log. The operator uses it to mint the first admin token. Subsequent
-admin tokens are minted by existing admin token holders. The fail-open path in
-`HasAnyLiveTokenGlobal` is retired once Phase-H ships.
-
-### Migration path
-
-Phase-H is a breaking change for any automation that currently uses workspace
-tokens against admin endpoints. A migration guide and a `MOLECULE_PHASE_H=1`
-feature flag will be provided so operators can opt in before the strict
-enforcement date.
@@ -1,125 +0,0 @@
---
-title: API Reference
-description: Full REST API reference for the Molecule AI workspace server — workspace management, A2A communication, file operations, secrets, tokens, and more.
---
-
-# API Reference
-
-This document describes the REST API exposed by the Molecule AI workspace server (Go/Gin, default port `:8080`). Clients include the Canvas frontend, workspace agents communicating over A2A, and external tooling such as the MCP server and CLI.
-
-**Base URL:** `http://localhost:8080` (development default)
-**Rate limit:** 600 req/min (configurable via `RATE_LIMIT`)
-**CORS origins:** `http://localhost:3000,http://localhost:3001` by default (configurable via `CORS_ORIGINS`)
-
---
-
-## Authentication
-
-Three middleware classes gate server-side routes:
-
- **`AdminAuth`** — strict bearer-only. Required for any route that can leak prompts/memory, create/mutate workspaces, or expose ops intel. Lazy-bootstrap fail-open when no live tokens exist globally.
- **`WorkspaceAuth`** — binds a bearer token to a specific workspace `:id`. A token for workspace A cannot be used against workspace B's sub-routes.
- **`CanvasOrBearer`** — accepts a bearer token OR a request Origin matching `CORS_ORIGINS`. Used only for cosmetic routes with zero data/security impact (currently `PUT /canvas/viewport` only). Do not extend to routes that leak data or create resources.
-
-Full contract: `docs/runbooks/admin-auth.md`.
-
---
-
-## Routes
-
-| Method | Path | Handler |
-|--------|------|---------|
-| GET | /health | inline |
-| GET | /metrics | metrics.Handler() — Prometheus text format; no auth, scrape-safe |
-| POST/GET/PATCH/DELETE | /workspaces[/:id] | workspace.go — `GET /workspaces`, `POST /workspaces`, and `DELETE /workspaces/:id` require `AdminAuth`. `PATCH /workspaces/:id` enforces field-level authz: cosmetic fields (name, role, x, y, canvas) pass through; sensitive fields (tier, parent_id, runtime, workspace_dir) require a valid bearer token when any live token exists. |
-| GET/PATCH | /workspaces/:id/config | workspace.go |
-| GET/POST | /workspaces/:id/memory | workspace.go |
-| DELETE | /workspaces/:id/memory/:key | workspace.go |
-| POST/PATCH/DELETE | /workspaces/:id/agent | agent.go |
-| POST | /workspaces/:id/agent/move | agent.go |
-| GET/POST/PUT | /workspaces/:id/secrets | secrets.go (POST/PUT auto-restarts workspace) |
-| DELETE | /workspaces/:id/secrets/:key | secrets.go (DELETE auto-restarts workspace) |
-| GET | /workspaces/:id/model | secrets.go |
-| GET | /settings/secrets | secrets.go — list global secrets (keys only, values masked) |
-| PUT/POST | /settings/secrets | secrets.go — set a global secret `{key, value}`; auto-restarts every non-paused/non-removed/non-external workspace that does not shadow the key with a workspace-level override |
-| DELETE | /settings/secrets/:key | secrets.go — delete a global secret; same auto-restart fan-out as PUT/POST |
-| GET | /admin/workspaces/:id/test-token | admin_test_token.go — mint a fresh bearer token for E2E scripts; returns 404 unless `MOLECULE_ENV != production` or `MOLECULE_ENABLE_TEST_TOKENS=1` |
-| GET/POST/DELETE | /admin/secrets[/:key] | secrets.go — legacy aliases for /settings/secrets |
-| WS | /workspaces/:id/terminal | terminal.go |
-| POST | /workspaces/:id/expand | team.go |
-| POST | /workspaces/:id/collapse | team.go |
-| POST/GET | /workspaces/:id/approvals | approvals.go |
-| POST | /workspaces/:id/approvals/:id/decide | approvals.go |
-| GET | /approvals/pending | approvals.go |
-| POST/GET | /workspaces/:id/memories | memories.go |
-| DELETE | /workspaces/:id/memories/:id | memories.go |
-| GET | /workspaces/:id/traces | traces.go |
-| GET/POST | /workspaces/:id/activity | activity.go |
-| POST | /workspaces/:id/notify | activity.go (agent→user push message via WebSocket) |
-| POST | /workspaces/:id/restart | workspace.go |
-| POST | /workspaces/:id/pause | workspace.go (stops container, status→paused) |
-| POST | /workspaces/:id/resume | workspace.go (re-provisions paused workspace) |
-| POST | /workspaces/:id/a2a | workspace.go |
-| POST | /workspaces/:id/delegate | delegation.go (async fire-and-forget) |
-| GET | /workspaces/:id/delegations | delegation.go (list delegation status) |
-| GET/POST | /workspaces/:id/schedules | schedules.go (cron CRUD) |
-| PATCH/DELETE | /workspaces/:id/schedules/:scheduleId | schedules.go |
-| POST | /workspaces/:id/schedules/:scheduleId/run | schedules.go (manual trigger) |
-| GET | /workspaces/:id/schedules/:scheduleId/history | schedules.go (past runs) |
-| GET/POST | /workspaces/:id/channels | channels.go (social channel CRUD) |
-| PATCH/DELETE | /workspaces/:id/channels/:channelId | channels.go |
-| POST | /workspaces/:id/channels/:channelId/send | channels.go (outbound message) |
-| POST | /workspaces/:id/channels/:channelId/test | channels.go (test connection) |
-| GET | /channels/adapters | channels.go (list available platforms) |
-| POST | /channels/discover | channels.go (auto-detect chats for a bot token) |
-| POST | /webhooks/:type | channels.go (incoming social webhook) |
-| GET | /workspaces/:id/shared-context | templates.go |
-| GET/PUT/DELETE | /workspaces/:id/files[/*path] | templates.go |
-| GET | /canvas/viewport | viewport.go — open, no auth required (cosmetic, bootstrap-friendly) |
-| PUT | /canvas/viewport | viewport.go — `CanvasOrBearer` middleware; accepts bearer OR Origin matching `CORS_ORIGINS`. Cosmetic-only route — worst case viewport corruption, recovered by page refresh. |
-| GET | /templates | templates.go |
-| POST | /templates/import | templates.go — `AdminAuth` required |
-| POST | /registry/register | registry.go |
-| POST | /registry/heartbeat | registry.go — requires `Authorization: Bearer <token>` once a workspace has any live token on file (legacy workspaces grandfathered) |
-| POST | /registry/update-card | registry.go — requires `Authorization: Bearer <token>` once a workspace has any live token on file |
-| GET | /registry/discover/:id | discovery.go — requires `X-Workspace-ID` + bearer token on the caller side |
-| GET | /registry/:id/peers | discovery.go — requires `X-Workspace-ID` + bearer token on the caller side |
-| POST | /registry/check-access | discovery.go |
-| GET | /plugins | plugins.go (list registry; supports `?runtime=` filter) |
-| GET | /plugins/sources | plugins.go (list registered install-source schemes) |
-| GET/POST/DELETE | /workspaces/:id/plugins[/:name] | plugins.go — list, install (`{"source":"scheme://spec"}`), uninstall per-workspace |
-| GET | /workspaces/:id/plugins/available | plugins.go (filtered by workspace runtime) |
-| GET | /workspaces/:id/plugins/compatibility?runtime=X | plugins.go (preflight runtime-change check) |
-| GET/POST | /workspaces/:id/tokens | tokens.go — list active tokens (prefix + metadata), create new token (plaintext returned once). Max 50 per workspace. |
-| DELETE | /workspaces/:id/tokens/:tokenId | tokens.go — revoke specific token by ID |
-| GET | /bundles/export/:id | bundle.go — `AdminAuth` required |
-| POST | /bundles/import | bundle.go — `AdminAuth` required |
-| GET | /org/templates | org.go (list available org templates) |
-| POST | /org/import | org.go — `AdminAuth` required; applies `resolveInsideRoot` path sanitiser on template paths |
-| GET | /events | events.go — `AdminAuth` required |
-| GET | /events/:workspaceId | events.go — `AdminAuth` required |
-| GET | /admin/liveness | inline — `AdminAuth` required. Returns per-subsystem `supervised.Snapshot()` ages; use to check health of scheduler/heartbeat goroutines |
-| GET | /ws | socket.go |
-
---
-
-## Database
-
-Migration files live in `workspace-server/migrations/` (latest: `022_workspace_schedules_source`). Each migration ships as a `.up.sql`/`.down.sql` pair. The migration runner globs `*.sql`, filters out `.down.sql` files, sorts alphabetically, and executes each file on boot. All `.up.sql` files must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`) because the runner re-applies every migration on every boot.
-
-### Key Tables
-
-| Table | Description |
-|-------|-------------|
-| `workspaces` | Core entity — status, runtime, `agent_card` JSONB, heartbeat columns, `current_task`, `awareness_namespace`, `workspace_dir` |
-| `canvas_layouts` | Per-workspace x/y canvas position |
-| `structure_events` | Append-only event log (workspace lifecycle, agent, approval events) |
-| `activity_logs` | A2A communications, task updates, agent logs, errors. `error_detail` is populated by the scheduler so cron run history can surface failure reasons. |
-| `workspace_schedules` | Cron tasks — expression, timezone, prompt, run history, `source` (`'template'` for org/import-seeded, `'runtime'` for Canvas/API-created), `last_status` (includes `'skipped'` when the scheduler concurrency-skips a busy workspace) |
-| `workspace_channels` | Social channel integrations (Telegram, Slack, etc.) with JSONB config and allowlist |
-| `agents` | Agent records |
-| `workspace_secrets` | Per-workspace encrypted secrets |
-| `global_secrets` | Platform-wide encrypted secrets |
-| `workspace_auth_tokens` | Bearer tokens; auto-revoked on workspace delete |
-| `agent_memories` | HMA scoped memory (LOCAL / TEAM / GLOBAL) |
-| `approvals` | Human-in-the-loop approval requests |
@@ -1,83 +0,0 @@
---
-title: "Canary release pipeline"
-description: "The canary release pipeline that ships workspace-server changes to the prod tenant fleet, and how to halt it."
---
-# Canary release pipeline
-
-How a workspace-server code change reaches the prod tenant fleet — and how to stop it if something's wrong.
-
-## The loop
-
-```
-PR merged to staging → main
-      │
-      ▼
-publish-workspace-server-image.yml   ← pushes :staging-<sha> ONLY
-      │                                (NOT :latest — prod is untouched)
-      ▼
-Canary tenants auto-update to :staging-<sha>
-      │   (5-min auto-updater cycle on each canary EC2)
-      ▼
-canary-verify.yml waits 6 min, runs scripts/canary-smoke.sh
-      │
-      ├─► GREEN → crane tag :staging-<sha> → :latest
-      │                                       │
-      │                                       ▼
-      │                           Prod tenants auto-update within 5 min
-      │
-      └─► RED   → :latest stays on prior good digest
-                  GitHub Step Summary flags the rejected sha
-                  Ops fixes forward OR rolls back manually
-```
-
-## Canary fleet
-
-Lives in a separate AWS account (`molecule-canary`, `004947743811`) via an assumed role (`MoleculeStagingProvisioner`). The CP's `is_canary` org flag routes provisioning there; every other org goes to the default staging account. See `docs/architecture/saas-prod-migration-2026-04-19.md` for the account bootstrap.
-
-Canary tenants are configured to pull `:staging-<sha>` (not `:latest`) via `TENANT_IMAGE` on their provisioner, so they ingest each new build before prod does.
-
-## Smoke suite
-
-`scripts/canary-smoke.sh` hits each canary tenant (URL + ADMIN_TOKEN pair) and asserts:
-
- `/admin/liveness` returns a subsystems map (tenant booted, AdminAuth reachable)
- `/workspaces` returns a JSON array (wsAuth + DB healthy)
- `/memories/commit` + `/memories/search` round-trip (encryption + scrubber)
- `/events` admin read (C4 fail-closed proof)
- `/admin/liveness` without bearer → 401 (C4 regression gate)
-
-Expand by editing the script — each `check "name" "expected" "$response"` call is one line.
-
-## Adding a canary tenant
-
-1. `POST /cp/orgs` — create the org normally (is_canary defaults to false)
-2. `POST /cp/admin/orgs/<slug>/canary` with `{"is_canary": true}` — admin only, refuses to flip if already provisioned
-3. Re-trigger provision (or delete + recreate if the org was already provisioned into staging) — the fresh EC2 lands in account `004947743811`
-
-Then set repo secrets:
- `CANARY_TENANT_URLS` — append the new tenant's URL
- `CANARY_ADMIN_TOKENS` — append its ADMIN_TOKEN in the same position
-
-## Rolling back `:latest`
-
-When canary was green but something surfaces post-promotion, retag `:latest` to a prior digest:
-
-```bash
-export GITHUB_TOKEN=ghp_...    # write:packages
-scripts/rollback-latest.sh 4c1d56e  # retags both platform + tenant images
-```
-
-`scripts/rollback-latest.sh` pre-checks that `:staging-<sha>` exists before moving `:latest`, and verifies the digest after the move. Prod tenants pick up the rolled-back image on their next 5-min auto-update.
-
-A post-mortem should always include:
- the commit sha that broke
- why canary didn't catch it (new code path the smoke suite doesn't exercise?)
- whether the smoke suite should grow a new check to prevent the same class of bug
-
-## What this gate doesn't catch
-
- Bugs that only surface under prod-only data (customer workloads with scale or shape canary doesn't produce). Canary uses real traffic shapes but can't simulate weeks of accumulated state.
- Config drift between canary and prod (different env-var values, different feature flags). Keep canary's config deltas minimal and documented.
- Cross-tenant interactions — canary tenants run in their own AWS account, so a bug that only appears when two tenants compete for a shared resource won't reproduce here.
-
-When these miss, `rollback-latest.sh` is the escape hatch.
@@ -1,76 +0,0 @@
---
-title: "SaaS prod migration — 2026-04-19"
-description: "Prod cutover notes for the 2026-04-19 staging→main promotion of molecule-controlplane and molecule-core."
---
-# SaaS prod migration — 2026-04-19
-
-Promoted staging → main on both `Molecule-AI/molecule-controlplane` and `Molecule-AI/molecule-core`. This note captures the prod cutover deltas so ops can cross-check against the running system.
-
-## What changed
-
-Ten PRs landed, split across the two repos:
-
-**Control plane (`molecule-controlplane`)**
- PR #50 — C1/C2/C3: bearer auth on `/cp/workspaces/*`, shell-escape tenant user-data, per-tenant security group
- PR #51 — H1/H2: crash-safe `SECRETS_ENCRYPTION_KEY` log, dropped `admin_token` from `/instance` SELECT
- PR #52 — SSRF guard on `platform_url`
- PR #53 — CP injects `MOLECULE_CP_SHARED_SECRET` + `MOLECULE_CP_URL` into tenant env
- PR #54 — Stripe webhook body capped at 1 MiB
-
-**Core (`molecule-core` / this repo)**
- PR #978 — H3/H4: LimitReader on Discord webhook + workspace config PATCH
- PR #979 — C4: `AdminAuth` fail-closed on fresh install when `ADMIN_TOKEN` is set
- PR #980 — log-scrub: dropped token prefix logging, stopped logging raw upstream response bodies
- PR #981 — tenant `CPProvisioner` attaches the CP bearer on every outbound `/cp/workspaces/*` call
- PR #982 — Canvas API fetch timeout (15s)
- PR #984 — E2E smoke test sync for #966 (public GET no longer exposes `current_task`)
-
-## New prod env vars (Railway, project `molecule-platform`, env `production`)
-
-Set before the CP merge landed:
-
-| Variable | Value shape | Purpose |
-|---|---|---|
-| `PROVISION_SHARED_SECRET` | 32-byte hex | Gates `/cp/workspaces/*` on CP. Routes refuse to mount when unset — C1 fail-closed. |
-| `EC2_VPC_ID` | `vpc-…` | Enables per-tenant SG creation (C3). Shared-SG fallback emits a startup warning. |
-| `CP_BASE_URL` | `https://api.moleculesai.app` | Injected into newly-provisioned tenant containers as `MOLECULE_CP_URL`. |
-
-The live prod `PROVISION_SHARED_SECRET` value is held only in Railway; not committed anywhere. Rotate by `railway variables --set` + redeploy.
-
-## Existing-tenant migration (the sharp edge)
-
-Tenants provisioned **before** this cutover are still running the previous workspace-server image. When they pull the new image on their next boot or auto-update cycle, their `CPProvisioner` will start expecting `MOLECULE_CP_SHARED_SECRET` in the container env — but the existing tenant EC2s don't have that variable in their user-data (the CP only started injecting it from PR #53 onward).
-
-**Symptom**: a pre-cutover tenant can still serve its users' existing workspaces, but any attempt to **provision a new workspace** from inside the tenant UI will hit the CP's new bearer gate and get `401` or `404` back, surfacing as "workspace provision failed" with a generic error.
-
-**Fix per existing tenant (pick one)**:
-
-1. **SSH in + add the env var**
-   - Copy `PROVISION_SHARED_SECRET` from Railway prod env.
-   - `ssh ubuntu@<tenant-ip>` and append to the running container's env (`docker stop && docker run … -e MOLECULE_CP_SHARED_SECRET='…' -e MOLECULE_CP_URL=https://api.moleculesai.app …`). Rolling this into an auto-update hook is follow-up work.
-
-2. **Re-provision the tenant**
-   - `DELETE /cp/orgs/:slug` → re-create via normal signup flow. Tenant-level data survives only if the tenant's own Postgres volume is preserved; workspace_id values change. This is the heavy hammer — only for tenants where existing data can be recreated easily.
-
-3. **Wait for the auto-update + user-data refresh cycle**
-   - Tenant auto-updater (cron, 5-minute cadence) pulls the new container image but **does not refresh env vars** — those are frozen from the initial user-data. So option 3 alone doesn't fix this; it still needs option 1 or 2.
-
-Script at `scripts/migrate-tenant-cp-secret.sh` (follow-up) will automate option 1 across all running tenants in the prod AWS account.
-
-## Post-deploy verification checklist
-
- [ ] Railway prod deploy for `controlplane` lands on the new commit (check `https://railway.com/project/7ccc…/service/ae76…`)
- [ ] `curl https://api.moleculesai.app/health` → 200 `{service: molecule-cp, status: ok}`
- [ ] `curl -X POST https://api.moleculesai.app/cp/workspaces/provision` (no bearer) → 401 (**not** 404 — proves the env var is live and routes mounted)
- [ ] GHCR publishes new `workspace-server` image for the core main commit
- [ ] Vercel canvas prod deploy lands
-
-## Rollback
-
-If prod is on fire:
-
-1. `gh pr revert 46 -R Molecule-AI/molecule-controlplane` — reverts all 6 CP PRs together.
-2. `gh pr revert 983 -R Molecule-AI/molecule-core` — reverts the core bundle.
-3. Both reverts auto-deploy via Railway / GHCR / Vercel.
-
-Existing tenants aren't affected by a rollback — they're running whichever tenant image tag they booted with. Only newly-provisioned tenants pick up the reverted control plane code.
@@ -1,218 +0,0 @@
---
-title: "Staging Environment Design"
-description: "The staging environment design on Railway, mirroring prod for safe pre-release validation."
---
-# Staging Environment Design
-
-> **Status:** Planned — gates all future infra changes (Tunnel migration,
-> security fixes, etc.)
->
-> **Problem:** We merge directly to main and auto-deploy to production.
-> Today's session broke CI twice and caused hours of Cloudflare edge cache
-> issues because there was no staging to test infra changes first.
->
-> **Goal:** Full staging environment that mirrors production. Every change
-> ships to staging first, gets verified, then promotes to production.
-
---
-
-## Architecture
-
-```
-                    staging                         production
-                    ───────                         ──────────
-Git branch:         main (auto-deploy)              main (manual promote)
-                    or staging branch               
-
-CP (Railway):       staging service                 production service
-                    staging.api.moleculesai.app     api.moleculesai.app
-
-Tenant EC2s:        staging EC2 instances            production EC2 instances
-                    *.staging.moleculesai.app        *.moleculesai.app
-
-App (Vercel):       staging.app.moleculesai.app     app.moleculesai.app
-                    (Vercel preview)                 (Vercel production)
-
-DB (Neon):          staging branch                   main branch
-                    (or separate project)            
-
-Docker images:      platform-tenant:staging          platform-tenant:latest
-                    (GHCR)                           (GHCR)
-
-Cloudflare:         *.staging.moleculesai.app        *.moleculesai.app
-                    (separate tunnel/worker)         (tunnel per tenant)
-```
-
-## Deploy flow
-
-```
-Developer pushes to PR branch
-  → CI runs (tests, build, lint)
-  → PR merged to main
-  → Auto-deploy to STAGING
-  → Staging smoke tests (automated)
-  → Manual verification if needed
-  → Promote to PRODUCTION (manual trigger or approval)
-```
-
-## Components
-
-### 1. Railway: two environments
-
-Railway supports multiple environments per project. Create a `staging`
-environment alongside `production`:
-
-```bash
-railway environment create staging
-railway variables --environment staging --set "DATABASE_URL=<staging-neon>"
-railway variables --environment staging --set "MOLECULE_ENV=staging"
-# ... all other vars with staging-specific values
-```
-
-**Deploy trigger:**
- `staging`: auto-deploy on push to main
- `production`: manual promote via `railway up --environment production`
-  or GitHub Actions workflow_dispatch
-
-**Domains:**
- staging: `staging-api.moleculesai.app` (Railway custom domain)
- production: `api.moleculesai.app` (unchanged)
-
-### 2. Neon: branch per environment
-
-Neon supports database branches (like git branches):
-
-```bash
-# Create staging branch from main
-neon branch create --project-id <id> --name staging --parent main
-```
-
- Staging DB has same schema, separate data
- Can reset staging by re-branching from main
- Production data never touched by staging tests
-
-### 3. Vercel: preview deployments
-
-Vercel already supports this natively:
- Push to main → deploys to `app.moleculesai.app` (production)
- Push to `staging` branch → deploys to preview URL
-
-**Or** use Vercel environments:
- `staging.app.moleculesai.app` → staging deployment
- `app.moleculesai.app` → production deployment
-
-### 4. GHCR: tagged images
-
-```
-platform-tenant:staging    — built on every push to main
-platform-tenant:latest     — promoted from staging after verification
-platform-tenant:sha-xxxxx  — immutable, pinned to specific commit
-```
-
-**Publish workflow change:**
-```yaml
-# Current: pushes :latest on every main merge
-# New: pushes :staging on every main merge
-#       pushes :latest only on manual promote
-```
-
-### 5. Cloudflare: staging subdomain
-
-Option A (simple): `*.staging.moleculesai.app` with its own tunnel/worker
-Option B (full): separate Cloudflare zone for staging (overkill)
-
-Recommend Option A:
- Add `staging.moleculesai.app` DNS records
- Staging tenants get `slug.staging.moleculesai.app` subdomains
- Production tenants get `slug.moleculesai.app` (unchanged)
-
-### 6. EC2: staging tag
-
-Staging EC2 instances tagged with `Environment=staging`:
- Separate from production instances in AWS console
- Can use different AMI, instance type, security group
- Easy to identify and clean up
-
-## Environment variables
-
-| Variable | Staging | Production |
-|----------|---------|------------|
-| `MOLECULE_ENV` | `staging` | `production` |
-| `DATABASE_URL` | Neon staging branch | Neon main branch |
-| `TENANT_IMAGE` | `platform-tenant:staging` | `platform-tenant:latest` |
-| `APP_DOMAIN` | `staging.moleculesai.app` | `moleculesai.app` |
-| `CORS_ORIGINS` | `https://staging.app.moleculesai.app` | `https://app.moleculesai.app` |
-| `ADMIN_TOKEN` | per-tenant (same mechanism) | per-tenant |
-
-## Promotion workflow
-
-### Automated (CI/CD)
-
-```yaml
-# .github/workflows/promote-to-production.yml
-name: Promote to Production
-on:
-  workflow_dispatch:
-    inputs:
-      confirm:
-        description: 'Type "promote" to confirm'
-        required: true
-
-jobs:
-  promote:
-    if: github.event.inputs.confirm == 'promote'
-    steps:
-      # 1. Run staging smoke tests one more time
-      - run: bash tests/e2e/test_saas_tenant.sh
-        env:
-          TENANT_SLUG: smoke-test
-          BASE_URL: https://staging.api.moleculesai.app
-
-      # 2. Tag Docker image
-      - run: |
-          docker pull ghcr.io/molecule-ai/platform-tenant:staging
-          docker tag ghcr.io/molecule-ai/platform-tenant:staging \
-                     ghcr.io/molecule-ai/platform-tenant:latest
-          docker push ghcr.io/molecule-ai/platform-tenant:latest
-
-      # 3. Deploy CP to production
-      - run: railway up --environment production
-
-      # 4. Production tenants auto-update within 5 min (Option B cron)
-```
-
-### Manual (for now)
-
-Until the automated workflow is built:
-1. Verify on staging (`staging.api.moleculesai.app`)
-2. `docker tag platform-tenant:staging platform-tenant:latest && docker push`
-3. `railway up --environment production`
-4. Monitor production health
-
-## What this prevents
-
- CI breakage from untested path filters (today's dorny/paths-filter issue)
- Cloudflare edge cache poisoning (test DNS changes on staging subdomain)
- Workspace boot script regressions (test on staging EC2 first)
- DB migration failures (test on Neon staging branch)
- Auth/security regressions (staging has same auth stack)
-
-## Implementation order
-
-1. **Railway staging environment** — create + configure vars (~30 min)
-2. **Neon staging branch** — create from main (~5 min)
-3. **Staging DNS** — `staging.api.moleculesai.app` CNAME to Railway (~5 min)
-4. **Publish workflow** — push `:staging` tag instead of `:latest` (~15 min)
-5. **Promotion workflow** — manual trigger to promote staging → production (~30 min)
-6. **Vercel staging** — configure preview deployment URL (~15 min)
-7. **Staging smoke test** — automated test after staging deploy (~30 min)
-
-**Total:** ~2.5 hours for full staging pipeline.
-
-## Cost
-
- Railway staging: ~$5/mo (same as production, but can be smaller)
- Neon staging branch: free (included in plan)
- EC2 staging instances: only when testing (terminate after)
- Vercel: free (preview deployments included)
- Cloudflare: free (same zone, additional records)
@@ -1,154 +0,0 @@
---
-title: "Tenant Image Upgrade Strategies"
-description: "Strategies for rolling a new platform-tenant image out to existing EC2 tenants, with trade-offs."
---
-# Tenant Image Upgrade Strategies
-
-> **Status:** Option B (sidecar auto-updater) implemented. Options A and C
-> documented for future use.
-
-## Problem
-
-When we push a new `platform-tenant:latest` to GHCR, existing EC2 tenant
-instances keep running the old image. New orgs get the latest image at boot,
-but existing tenants fall behind — missing bug fixes, security patches, and
-new features.
-
-## Option A: Rolling restart on publish (coordinated)
-
-The publish workflow calls a CP admin endpoint after pushing the image.
-The CP iterates all running tenants and restarts them one by one.
-
-```
-publish-platform-image succeeds
-  → POST https://api.moleculesai.app/cp/admin/rolling-upgrade
-    → CP queries org_instances WHERE status = 'running'
-    → For each tenant (staggered, 30s apart):
-      1. AWS SSM Run Command: docker pull + docker restart
-      2. Wait for /health 200
-      3. Update org_instances.updated_at
-      4. If health fails after 60s, rollback (docker run old image)
-    → Return summary: {upgraded: N, failed: M, skipped: K}
-```
-
-### Pros
- Immediate, coordinated upgrades across all tenants
- CP has full visibility into upgrade status
- Can implement canary (upgrade 1 tenant first, verify, then rest)
- Rollback capability per tenant
-
-### Cons
- Requires AWS SSM agent on EC2 instances (not installed yet)
- Alternatively requires SSH access from Railway → EC2 (network/key management)
- Brief downtime per tenant during restart (~10-30s)
- Blast radius: a bad image can take down all tenants before canary catches it
-
-### Implementation effort
- Add SSM agent to EC2 user-data script
- Add `POST /cp/admin/rolling-upgrade` handler
- Add upgrade step to publish workflow
- Add rollback logic
- ~2-3 days
-
-### When to use
- Urgent security patches that can't wait 5 min
- Breaking changes that need coordinated rollout
- When you want canary/staged deployment
-
---
-
-## Option B: Sidecar auto-updater (implemented)
-
-A cron job on each EC2 checks GHCR for a new image digest every 5 minutes.
-If the digest changed, it pulls the new image and restarts the container.
-
-```bash
-# Runs every 5 min on each EC2 (added to user-data)
-*/5 * * * * /usr/local/bin/molecule-auto-update.sh
-```
-
-The update script:
-1. `docker pull platform-tenant:latest`
-2. Compare digest with running container's image digest
-3. If different: `docker stop molecule-tenant && docker rm molecule-tenant && docker run ...`
-4. Wait for `/health` 200
-5. Log result to `/var/log/molecule-auto-update.log`
-
-### Pros
- Zero CP involvement — fully autonomous per tenant
- Tenants upgrade within 5 min of any publish
- No SSH/SSM infrastructure needed
- Each tenant upgrades independently (natural canary)
- Simple to implement (2 lines in user-data + a small script)
-
-### Cons
- Up to 5 min delay between publish and tenant upgrade
- Brief downtime during restart (~10-30s)
- No centralized visibility into upgrade status
- Can't selectively hold back specific tenants
- All tenants track `latest` — no pinned versions
-
-### When to use
- Default for all tenants
- Works well for early-stage SaaS with frequent deploys
-
---
-
-## Option C: Blue-green via Worker (zero downtime)
-
-Each EC2 runs two container slots: `blue` (current) and `green` (new).
-The Cloudflare Worker routes traffic to whichever is healthy.
-
-```
-EC2 instance:
-  molecule-tenant-blue  → :8080 (current, serving traffic)
-  molecule-tenant-green → :8081 (new, starting up)
-
-Upgrade flow:
-  1. Pull new image
-  2. Start green on :8081
-  3. Health check green: GET :8081/health
-  4. If healthy: update Worker routing (KV: slug → port 8081)
-  5. Stop blue
-  6. Next upgrade: blue becomes the new slot
-
-Worker routing:
-  KV key: "example-org" → {"ip": "<EC2_IP>", "port": 8081}
-  (port defaults to 8080 when not in KV)
-```
-
-### Pros
- Zero downtime — traffic switches atomically after health check
- Instant rollback — just switch back to the old slot
- Worker already exists — just add port to the routing lookup
- Health-verified before any traffic switches
-
-### Cons
- Double memory usage during transition (~512MB extra per tenant)
- More complex user-data script (manage two containers)
- Worker needs port-aware routing (KV schema change)
- Need to track which slot is active per tenant
-
-### Implementation effort
- Update user-data to manage blue/green containers
- Update Worker to read port from KV
- Add blue/green state tracking to CP (org_instances.active_slot)
- Update auto-updater script for blue-green swap
- ~3-5 days
-
-### When to use
- When tenants have SLAs requiring zero downtime
- Production deployments with paying customers
- After Option B proves the auto-update pattern works
-
---
-
-## Migration path
-
-```
-Now:     Option B (auto-updater, 5 min delay, brief downtime)
-         ↓
-Growth:  Option A (add SSM for urgent patches, keep B as default)
-         ↓
-Scale:   Option C (zero-downtime for premium/enterprise tenants)
-```
@@ -1,592 +0,0 @@
---
-title: "Incident Log — molecule-core"
-description: "Chronological incident log for molecule-core — summaries, resolutions, and references."
---
-# Incident Log — molecule-core
-
-> This file documents security incidents, outages, and degraded states.
-> Active incidents are listed first. Resolved incidents remain for historical record.
-
---
-
-*Last updated: 2026-04-21T07:45Z by Core Platform Lead — Incident log rebuilt after linter reset*
-
---
-
-## Security Audit Cycle 6 — ALL CLEAR (2026-04-21 ~07:15Z)
-
-**SHA range:** e69cb26 → 674384b on main (~5 commits + ~10 merged PRs)
-**Verdict:** ✅ No critical/high findings
-
-### Commits Reviewed — All CLEAN
-
-| Commit | Description |
-|--------|-------------|
-| `dc9c64e` / PR #1258 | F1097 org_id context — eliminates redundant 2nd SELECT in AdminAuth |
-| `33f1d1a` | Canvas cascade-delete UX — `pendingDelete.hasChildren`, warning dialog |
-| `0790d57` | Canvas metrics guard — null coalescing |
-| `781c217` | CI YAML fix |
-| `169120d` / PR #1310 | CWE-78/CWE-22 — exec form + path traversal guards |
-| `e431fc4` / PR #1302 | CWE-918 SSRF — `isSafeURL` in `a2a_proxy.go` |
-| `a66f889` / PR #1261 | CWE path-injection — `resolveInsideRoot` for template paths |
-
-Full audit saved to TEAM memory id `abc58b47`.
-
---
-
-## F1100 — workspace_restart.go Path Traversal (RESOLVED)
-
-**Severity:** Medium | **Finding ID:** F1100
-**Status:** Resolved — fix applied via `a66f889` (PR #1261) on both main and staging
-
-### Summary
-
-`workspace_restart.go:127-133` accepted `body.Template` (attacker-controlled) via raw `filepath.Join(h.configsDir, template)`, allowing path traversal (e.g. `../../../etc`) to escape `configsDir`. **Issue #1043 triage missed this — legitimate gap, not false positive.**
-
-Authenticated callers could pass a crafted `body.Template` value to escape the configs directory.
-
-### Fix Applied
-
-PR #1260 (intended) closed without merge. Fix landed via **PR #1261 (`a66f889`)** on both main and staging:
-
-```go
-// Fixed (a66f889):
-candidatePath, resolveErr := resolveInsideRoot(h.configsDir, template)
-if resolveErr != nil {
-    template = ""  // fallback fires safely
-}
-```
-
-### References
-
- PR #1260: closed without merge — superseded by PR #1261
- PR #1261 (`a66f889`): merged ✅
- Closes: #1043
-
---
-
-## F1088 Credential Exposure — CLOSED
-
-**All prior F1088 entries below remain valid. Summary of current state:**
-
- Credentials: MiniMax revoked (⚠️), GitHub PAT revoked (✅), Admin token — treat as potentially exposed
- BFG git-history scrub: NOT REQUIRED — incident management closure, 0 public forks confirmed
- Git history still contains values — admin token rotation recommended as precaution
- PR #1179 (`b89f3fd`) merged — active code is clean
- Branch `origin/fix/credential-history-cleanup-f1088` exists but is 38 commits behind main — superseded by incident management closure
-
-**Required remaining action:** Rotate `ADMIN_TOKEN` (`HlgeMb8...ShARE=`) as precaution. All other actions complete.
-
---
-
-### Summary
-
-Commit `d513a0ced549ef2be8903a7b4794256110ba1805` on staging (merged to main via PR #1098) contains three production credentials as hardcoded default values in `scripts/post-rebuild-setup.sh`. The credentials appeared in the git diff and were permanently visible in the public commit history.
-
-### Credentials Status
-
-| # | Credential | Value | Status |
-|---|------------|-------|--------|
-| 1 | ANTHROPIC_AUTH_TOKEN | `sk-cp-lHt-QFSyZwZxeo...KVw` | ⚠️ Revoked or inactive (404 on API call) |
-| 2 | GITHUB_TOKEN | `github_pat_11BPRRWQI0m...hsIJLIL` | ✅ Revoked (confirmed 401) |
-| 3 | ADMIN_TOKEN | `HlgeMb8...ShARE=` | Needs confirmation — treated as active until proven otherwise |
-
-### Resolution
-
-PR #1179 (`b89f3fd`: "ci: retry — trigger fresh runner allocation") closed this finding. The incident was closed at the finding-management level. Git history scrub via BFG was discussed but deemed not required by security team (no active public forks confirmed, credentials were already revoked/inactive).
-
-Active code is clean (`d513a0c` replaced hardcoded defaults with env-var reads).
-
-### Summary
-
-Commit `d513a0ced549ef2be8903a7b4794256110ba1805` on staging (merged to main via PR #1098) contains two production credentials as hardcoded default values in `scripts/post-rebuild-setup.sh`. The credentials appear in the git diff and are permanently visible in the public commit history.
-
-The commit itself fixed the problem by replacing hardcoded defaults with env-var reads (MINIMAX_API_KEY, GITHUB_PAT). However, git history still shows the original values.
-
-### Credentials Exposed
-
-> **Token values redacted from this table 2026-04-26** to reduce public-search surface (the docs repo is publicly indexed). Short-suffix references match the convention in the Blast Radius table below (lines 134-137). Full values remain in `molecule-core` git history per the F1088 closure decision (no BFG scrub).
-
-| # | Credential | Value (short suffix) | Service |
-|---|------------|----------------------|---------|
-| 1 | ANTHROPIC_AUTH_TOKEN | `sk-cp-...KVw` | MiniMax API (api.minimax.io/anthropic) |
-| 2 | GITHUB_TOKEN | `github_pat_...hsIJLIL` | GitHub (fine-grained PAT, scope unknown) |
-| 3 | ADMIN_TOKEN | `HlgeMb8...ShARE=` | Platform admin authentication |
-
-### Affected Files
-
- `scripts/post-rebuild-setup.sh` (commit d513a0c, PR #1098 → merged to staging → merged to main)
-
-### Timeline
-
- **~2026-04-20T13:02Z**: Commit `d513a0c` pushed by `rabbitblood`. GitGuardian flagged credentials in the diff. Fix committed in same commit.
- **~2026-04-20T**: Credentials removed from active code, but git history still contains them.
- **2026-04-20T22:32Z**: Incident discovered and escalated.
-
-### Actions Taken
-
-1. Dev Lead notified (delegation failed — Dev Lead unreachable)
-2. All child workspaces notified (delegation failed — all unreachable)
-3. Incident documented in this file
-4. Branch `origin/fix/credential-history-cleanup-f1088` exists but is 38 commits behind `origin/main`
-5. **Incident CLOSED** — PR #1179 merged, finding management closure, BFG scrub deemed not required (no active public forks confirmed)
-
-### Blast Radius (Confirmed by Core-Security)
-
-| Credential | Test Result | Status |
-|------------|-------------|--------|
-| MiniMax API key (`sk-cp-...KVw`) | `404 Not Found` on real API call | ⚠️ **REVOKED** (or endpoint inactive) |
-| GitHub PAT (`github_pat_...hsIJLIL`) | `401 Bad credentials` | ✅ **REVOKED** |
-| Admin token (`HlgeMb8...ShARE=`) | Base64 — cannot test directly | ⚠️ **Treated as active** — recommend rotation as precaution |
-
-**Public forks:** 0 confirmed (GH API `/forks` returns none) — low fork blast radius.
-
-**Git history scope:** Credentials exist in both `main` and `staging` in commits `f787873`..`d513a0c`. They were introduced in `f787873` ("feat: nuke-and-rebuild.sh") and removed from active code in `d513a0c`. Both branches require BFG cleanup.
-
-### Required Actions (RESOLVED)
-
- [x] Credentials revoked (MiniMax ⚠️, GitHub PAT ✅)
- [x] BFG git history cleanup **NOT REQUIRED** — incident management closure, no active public forks, credentials confirmed revoked/inactive
- [x] Team notification — documented in this log
- [ ] **Admin token rotation** — recommended as precaution (value still in git history, treat as potentially exposed)
-
-### BFG Repo-Cleaner Procedure
-
-**NOT REQUIRED** — F1088 closed without BFG scrub per security team decision. Retained for reference only.
-
-**Step 1 — Create credentials manifest (`creds.txt`) [NOT NEEDED]:**
-```
-<ADMIN_TOKEN value>
-<MiniMax sk-cp-... value>
-<GitHub fine-grained PAT value>
-```
-Full token values redacted from this doc 2026-04-26 (see note in the
-Credentials Exposed table above). Pull from the Core-Security incident
-ticket if a future revival of this BFG procedure is needed.
-
-**Step 2 — Clean origin/main:**
-```bash
-git clone --mirror https://git.moleculesai.app/molecule-ai/molecule-core /tmp/molecule-main-mirror
-java -jar bfgr.jar --replace-text creds.txt --rewrite-not-committed-by-oss --no-blob-protection /tmp/molecule-main-mirror
-cd /tmp/molecule-main-mirror && git push --mirror
-```
-
-**Step 3 — Clean origin/staging:**
-```bash
-git clone --mirror https://git.moleculesai.app/molecule-ai/molecule-core /tmp/molecule-staging-mirror
-java -jar bfgr.jar --replace-text creds.txt --rewrite-not-committed-by-oss --no-blob-protection /tmp/molecule-staging-mirror
-cd /tmp/molecule-staging-mirror && git push --mirror
-```
-
-**Step 4 — Notify team to re-clone both branches if cloned before ~13:02 UTC 2026-04-20.**
-
-### References
-
- Commit: `d513a0ced549ef2be8903a7b4794256110ba1805`
- PR: #1098 (staging → main merge)
- Cleanup branch: `origin/fix/credential-history-cleanup-f1088` (behind main by 38 commits)
- Scanners triggered: GitGuardian
- Security investigation: Core-Security (confirmed credentials revoked via API tests)
- GitHub issue: #1282 (filed by Core-OffSec)
- **Closed by:** PR #1179 (`b89f3fd`) — incident management closure, BFG scrub deemed not required
-
-### Known Issue — PR #1230 Incomplete (QA Round 16, 2026-04-21)
-
-PR #1230 / commit `524e3c6` ("fix(security): replace err.Error() leaks") failed to carry mcp.go fixes into main's tree. All 3 MCP error leaks remain on main:
- `mcp.go:259`: "parse error: " + err.Error()
- `mcp.go:347`: "invalid params: " + err.Error()
- `mcp.go:352`: err.Error()
- `org_plugin_allowlist.go:260`: "detail": err.Error()
-
-Fix is covered by PR #1226 (rebased, MERGEABLE). Gap should close after #1226 merges.
-
---
-
-## CWE-918 SSRF — Backport to Main (RESOLVED)
-
-**Severity:** High
-**Status:** Resolved — PR #1302 merged to main
-
-### Summary
-
-SSRF defence (`isSafeURL` in `a2a_proxy.go`) was backported to main to address CWE-918 (Server-Side Request Forgery). The fix prevents the A2A proxy from forwarding requests to internal network addresses (localhost, private ranges, etc.).
-
-### References
-
- Commit: `e431fc4` (fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302))
-
---
-
-## CWE-22 + CWE-78 Security Fixes — Merged (RESOLVED)
-
-**Severity:** Critical
-**Status:** Resolved — proper fixes merged to staging and main
-
-### Summary
-
-The `fix/cwe78-delete-via-ephemeral-shell-injection` branch was the right diagnosis but wrong implementation (removed `safeName` from `copyFilesToContainer`). The correct fixes were merged separately:
-
-| Location | Commit | Fix |
-|----------|--------|-----|
-| staging | `ce2491e` | CWE-22: `copyFilesToContainer` safeName + `deleteViaEphemeral` validateRelPath + exec form |
-| main | `169120d` | CWE-78/CWE-22: block shell injection in `deleteViaEphemeral` |
-
-Both CWEs are fully resolved on both branches. The regression branch is superseded and must not be merged as-is.
-
-### Verification (staging `ce2491e`)
-
-`copyFilesToContainer` (container_files.go:73-99):
-```go
-clean := filepath.Clean(name)
-if filepath.IsAbs(clean) || strings.Contains(clean, "..") {
-    return fmt.Errorf("path traversal blocked: %s", name)
-}
-safeName := filepath.Join(destPath, clean)
-header := &tar.Header{Name: safeName, ...}  ✅
-```
-
-`deleteViaEphemeral` (container_files.go:152-168):
-```go
-validateRelPath(filePath)  ✅
-Cmd: []string{"rm", "-rf", "/configs", filePath}  ✅ exec form, no shell interpolation
-```
-
---
-
-
-
-**Severity:** High
-**Period:** ~2026-04-20T22:00Z – 2026-04-21T03:30Z
-**Finding IDs:** N/A (infra incident)
-**Status:** Resolved
-
-### Summary
-
-All self-hosted macOS arm64 runners saturated. 27 runs queued, 0 in-progress, 0 completed. Only cancellations processing. PRs #1053 and #1036 had zero CI runs.
-
-### Root Causes (multiple)
-
-1. `changes` job ran on `[self-hosted, macos, arm64]` despite having zero macOS dependencies (plain `git diff`) — wasted runner slots
-2. YAML corruption in `ci.yml` (JSON-escaped `\n` sequences from commits `12c52d4`/`5831b4e`) caused "workflow file issue" failures before any job could start
-3. `cancel-in-progress: false` at workflow level caused stale runs to queue instead of being cancelled
-4. Workflow-level concurrency not set — multiple in-flight runs queued on same ref
-
---
-
-## CI Stall — molecule-core/staging (RESOLVED 2026-04-21 ~07:05Z)
-
-**Severity:** High
-**Period:** ~2026-04-21T02:47Z – ~2026-04-21T07:00Z
-**Status:** Resolved — CI progressing normally, no config problems remain
-
-### Resolution
-
-All prior runner-saturation and YAML-corruption fixes were correct. The stall resolved naturally once stale queued runs drained. Current CI state (2026-04-21 ~07:07Z):
-
- Staging run #24708961892: **success** (SHA `5d32373`)
- Staging run #24708976467: **success** (changes job, SHA `72d825f`)
- Main run #24708984339: queued (normal — healthy queue, not stalled)
- Runner agent healthy — no dead slots
-
-### Root Causes (all resolved)
-
-1. `changes` job on `[self-hosted, macos, arm64]` — fixed by moving to `ubuntu-latest` (`9601545`)
-2. YAML corruption in `ci.yml` — fixed by PR #1264 / `b61692c` ✅
-3. `cancel-in-progress: false` at workflow level — reverted to `true` on staging ✅
-4. `cancel-in-progress: false` on main — correct for single-runner env, aligned via PR #1248 ✅
-
-### Staging CI Config (confirmed healthy)
-
- `ci.yml`: `cancel-in-progress: true`, `changes` job on `ubuntu-latest` ✅
- `codeql.yml`: `cancel-in-progress: false` ✅
- `e2e-api.yml`: `cancel-in-progress: false` ✅
-
-### Infra Recommendations (for long-term stability)
-
-1. Provision org-wide GitHub App installation token for CI automation (PATs rotate too frequently)
-2. Update remote URLs on controlplane and tenant-proxy repos
-3. Monitor runner agent health on mac mini — restart agent if future stalls recur
-
---
-
-## PR #1242 YAML Corruption — RESOLVED (PR never merged)
-
-**Severity:** Critical
-**Status:** Resolved — PR #1242 closed without merge, staging unaffected
-
-### Summary
-
-PR #1242 (`fix/ci-runner-queue-contention`) branch contained a YAML corruption in `ci.yml` — the `concurrency` block was replaced with a commit-SHA string literal:
-
-```yaml
-e4a62e1 (ci: add workflow-level concurrency to ci.yml and codeql.yml)
-```
-
-However, PR #1242 was **closed without merging**. Staging received `cancel-in-progress: true` via PR #1264 (commit `b61692c`) instead, which is the correct clean version.
-
-### Current State (updated 2026-04-21 ~04:30Z)
-
- **main:** `cancel-in-progress: false` ✅ (from PR #1248 / `2ffd11c` or similar clean commit)
- **staging:** `cancel-in-progress: true` (via `0b30465` tick restore after corruption)
- **PR #1248** (`2ffd11c`): open, sets staging `cancel-in-progress: false` — aligns staging with main ✅
- **Main has moved to `false`** — staging should follow to stay consistent
-
-### PR #1248 — URGENT MERGE
-
-PR #1248 (`fix/ci: restore corrupted ci.yml concurrency block`) by Dev Lead:
- Fixes the corruption pattern (same as prior incident)
- Sets `cancel-in-progress: false` — correct for single-runner environment
- Aligns staging CI config with main (which already has `false`)
- Must merge before any further CI runs on staging
-
-### References
-
- PR: #1242 (`fix/ci-runner-queue-contention`) — closed, not merged
- Staging corruption restored via: PR #1264 / `b61692c`
- PR #1248 (`2ffd11c`): open, Dev Lead fix, `cancel-in-progress: false`
- Main: `cancel-in-progress: false` ✅
-
---
-
-## PR #1036 QA Audit (STALE)
-
-**Severity:** Low
-**Date:** 2026-04-20 (QA audit performed)
-**Status:** Stale — CI infrastructure has been fixed since audit
-
-### Summary
-
-QA audit (2026-04-20) flagged CI as failing on PR #1036. However, CI was failing due to infrastructure issues (runner saturation, YAML corruption) that have since been resolved. The audit should be re-run now that staging CI is healthy.
-
---
-
-## PR #1246 / #1247 — Sed Regression Fix — RESOLVED (PR #1247 merged)
-
-**Severity:** Critical
-**Status:** Resolved — PR #1247 merged to main (2026-04-21 ~03:18Z)
-
-### Summary
-
-PR #1246 (`364712d`) was closed without merging. However, **PR #1247** (`04be218`) achieved the same fix cleanly and merged to main:
-
-```
-fix(go): replace $1 literal with resp.Body.Close() in 7 files (#1247)
-```
-
-Commit `04be218` (merged by molecule-ai[bot]) applied:
-```
-sed -i 's/defer func() { _ = \$1 }()/defer func() { _ = resp.Body.Close() }()/g'
-```
-
-### Affected Files (all fixed on main)
-
- `workspace-server/cmd/server/cp_config.go`
- `workspace-server/internal/handlers/a2a_proxy.go`
- `workspace-server/internal/handlers/github_token.go`
- `workspace-server/internal/handlers/traces.go`
- `workspace-server/internal/handlers/transcript.go`
- `workspace-server/internal/middleware/session_auth.go`
- `workspace-server/internal/provisioner/cp_provisioner.go` (3 occurrences)
-
-**Staging:** Fix present via prior commits. `cp_config.go` on staging has SHA `d1021c2` (correct form).
-
-**PR #1246:** Closed without merging — superseded by PR #1247. No further action needed.
-
---
-
-## CWE-78/CWE-22 Branch — RESOLVED (proper fixes merged separately)
-
-**Severity:** Critical
-**Status:** Resolved — proper fixes merged via `ce2491e` (staging) and `169120d` (main)
-
-### Summary
-
-The `fix/cwe78-delete-via-ephemeral-shell-injection` branch (commit `17419dd`) was **correct** for CWE-78 (`deleteViaEphemeral` exec form + `validateRelPath`) but **regressed** `copyFilesToContainer` by removing the `safeName` path-traversal guard.
-
-**Resolution — both branches merged to main and staging:**
-
-| Branch | Commit | Status |
-|--------|--------|--------|
-| staging | `ce2491e` — fix(security): CWE-22 in copyFilesToContainer and deleteViaEphemeral | ✅ merged |
-| main | `169120d` — fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral | ✅ merged |
-
-### What was fixed (staging `ce2491e`)
-
- `copyFilesToContainer`: `filepath.Clean` + `IsAbs` + `strings.Contains("..")` validation, `safeName` in tar header ✅
- `deleteViaEphemeral`: `validateRelPath(filePath)` check before rm command ✅
- Both CWE-22 and CWE-78 addressed correctly
-
-### `fix/cwe78-delete-via-ephemeral-shell-injection` branch status
-
-**Do NOT merge** — it's now superseded by `ce2491e`/`169120d`. The regression it introduced (removing `safeName` from `copyFilesToContainer`) was never the right approach. If this branch is revived, it must be rebased on top of `ce2491e` to preserve existing CWE-22 protections while adding the CWE-78 exec-form fix.
-
---
-
-## F1085 Regression Branch (`fix/f1085-regression-1283`) — IS a Regression
-
-**Severity:** High
-**Status:** Active — branch removes the confirmed-good F1085 fix (confirmed 2026-04-21 ~07:10Z)
-
-### Summary
-
-Branch `origin/fix/f1085-regression-1283` (commit `3b244e6`) removes `redactSecrets(workspaceID, content)` from `seedInitialMemories` in `workspace_provision.go:249`:
-
-```diff
-`, workspaceID, redactSecrets(workspaceID, content), scope, awarenessNamespace); err != nil {
-+`, workspaceID, content, scope, awarenessNamespace); err != nil {
-```
-
-**Staging still has the correct fix** (`workspace_provision.go:253` on origin/staging confirms `redactSecrets` is present). This branch is behind staging and would regress it if merged.
-
-### Required Fix
-
-Close or revert this branch. `redactSecrets` must remain in `seedInitialMemories`. If there is a legitimate reason to change this (e.g., a different redaction strategy), document it clearly in the PR before merging.
-
---
-
-## F1097 — org_id Context Fix — RESOLVED
-
-**Severity:** Medium
-**Status:** Resolved — PR #1258 merged to main (`dc9c64e`)
-
-### Summary
-
-`orgToken.Validate` refactored to return `org_id` directly, eliminating the redundant 2nd SELECT in `AdminAuth`. All SQL parameterized correctly.
-
-### References
-
- PR #1258 (`dc9c64e`): fix(F1097): set org_id in Gin context for org-token callers
-
---
-
-## PR #1226 — err.Error() Leaks (STALE — closed without merge)
-
-**Severity:** Medium
-**Status:** Open — PR closed without merging, leaks still present on main
-
-### Summary
-
-PR #1226 (`fix(security): sanitize remaining err.Error() leaks + errcheck artifacts/client.go`) was **closed without merging**. The following leaks remain on main:
-
-| File | Line | Code | Fix |
-|------|------|------|-----|
-| `mcp.go` | 259 | `"parse error: " + err.Error()` | → `"parse error: invalid JSON request body"` |
-| `mcp.go` | 347 | `"invalid params: " + err.Error()` | → `"invalid params: malformed JSON"` |
-| `mcp.go` | 352 | `err.Error()` | → `"dispatch error"` |
-| `org_plugin_allowlist.go` | 260 | `"detail": err.Error()` | → `"detail": "plugin name validation failed"` |
-| `admin_memories.go` | 99 | `"invalid JSON: " + err.Error()` | → `"invalid JSON request body"` |
-
-**Already fixed:** `artifacts/client.go:175` — `defer func() { _ = resp.Body.Close() }()` confirmed correct (via PR #1247).
-
-### Action Required
-
-Reopen PR #1226 and fast-track merge. Alternatively, cherry-pick the 4 commits from that PR onto a fresh branch.
-
---
-
-## QA Round 18 — orgs-page Test Regression (FIXED on main, pending staging port)
-
-**Severity:** Medium
-**SHA tested:** `ce33da5` (PR #1257 branch merge with staging)
-**Status:** Regression identified in PR #1255, fixed on main, not yet on staging
-
-### Findings
-
-| Finding | Status |
-|---------|--------|
-| Canvas tests: 53 passed, **1 FAILED** | orgs-page.test.tsx line 133 — `vi.useRealTimers()` + raw `setTimeout(50)` without `act()` |
-| PR #1257 conflict | MERGEABLE, approved — closed without merge; fix is on main/staging via `a66f889` |
-| PR #1255 regression | Introduced orgs-page test flakiness — +18/-2 in orgs-page.test.tsx |
-
-### orgs-page Test Regression — Root Cause
-
-PR #1255 (`e885fa1`) regressed the timer fix from PR #1235. It replaced `waitFor()` with `vi.useRealTimers()` + raw `setTimeout(50)` without `act()` — causing microtask flush issues.
-
-### Resolution
-
-**Main:** Fixed in `674384b` (PR #1313) — wraps all 10 affected `vi.advanceTimersByTimeAsync(50)` calls in `act(async () => { ... })`. All 813 canvas tests pass on main.
-**Staging:** Regression NOT yet fixed — `origin/staging` is 13 commits behind main.
-
-### Action needed
-
-Cherry-pick or port the orgs-page test fix from `674384b` to staging.
-
---
-
-## Issue #1124 — Orchestrator GET /workspaces 404: Env Var Misconfiguration (OPEN)
-
-**Severity:** Medium
-**Status:** Active — root cause confirmed, fix pending, delegated to Core-BE
-
-### Summary
-
-Orchestrator (workspace agent, `workspace/` directory) GET /workspaces/{WORKSPACE_ID} returns 404 due to missing or empty `WORKSPACE_ID` env var. Confirmed via code review (2026-04-21 ~07:10Z).
-
-### Root Causes
-
-**Platform-side (provisioner.go:375-377) is CORRECT:**
-```go
-env := []string{
-    fmt.Sprintf("WORKSPACE_ID=%s", cfg.WorkspaceID),  // ✅ correctly injected
-    "WORKSPACE_CONFIG_PATH=/configs",
-    fmt.Sprintf("PLATFORM_URL=%s", cfg.PlatformURL),
-}
-```
-The platform injects `WORKSPACE_ID` at container provision time. **The bug is in the Python orchestrator modules** that default to empty string instead of validating the injected value.
-
-**Buggy Python module-level defaults (empty string → broken API calls):**
-| File | Line | Code |
-|------|------|------|
-| `workspace/a2a_cli.py` | 24 | `WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")` |
-| `workspace/a2a_client.py` | 17 | `WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")` |
-| `workspace/coordinator.py` | 26 | `WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")` |
-| `workspace/consolidation.py` | 22 | `WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")` |
-| `workspace/molecule_ai_status.py` | 25 | `WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")` |
-
-When `WORKSPACE_ID` is empty, API calls produce URLs like `/workspaces//heartbeat` or `/registry/discover/` — platform returns 404 or wrong routing.
-
-**Note — main.py is already correct:**
-```python
-workspace_id = os.environ.get("WORKSPACE_ID", "workspace-default")  # main.py:55 ✅
-```
-However, `main.py` uses a local variable — it doesn't export `WORKSPACE_ID` as a module constant. The other modules that import `WORKSPACE_ID` from `a2a_client` etc. still get the empty-string default.
-
-### Fix Required (Quick Win for Core-BE)
-
-**Option A — Fail fast at module import (recommended):**
-```python
-WORKSPACE_ID = os.environ.get("WORKSPACE_ID")
-if not WORKSPACE_ID:
-    raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
-```
-Apply to all 5 affected modules. This surfaces the misconfiguration immediately instead of producing silent 404s downstream.
-
-**Option B — Align with main.py's approach (safer):**
-```python
-WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "workspace-default")
-```
-But this masks real misconfigurations. Option A is better.
-
-### Modules Requiring Fix
-
- `workspace/a2a_cli.py` — line 24
- `workspace/a2a_client.py` — line 17
- `workspace/coordinator.py` — line 26
- `workspace/consolidation.py` — line 22
- `workspace/molecule_ai_status.py` — line 25
-
-### PLATFORM_URL Note
-
-All modules default to `http://platform:8080` (container mesh hostname). This is correct for in-container use but fails outside Docker. No action needed for in-container orchestrators — the platform injects `PLATFORM_URL` at provision time which overrides this default.
-
-### Owner
-
-Core-BE — delegated to Dev Lead (A2A failed). Core-BE sub-team: please pick up.
-
-### Fix PR
-
-[PR #1336](https://git.moleculesai.app/molecule-ai/molecule-core/pull/1336) filed — `fix(orchestrator): fail-fast if WORKSPACE_ID env var is unset/empty`. Targets staging. Labels: bug, needs-work, area:backend-engineer, area:dev-lead.
-
---
-
-*Last updated: 2026-04-21T07:10Z by Core Platform Lead (post-restart session — all findings re-verified)*
@@ -1,214 +0,0 @@
---
-title: "a2a-sdk v0 → v1 migration"
-description: "Cheat sheet for migrating workspace runtime code (and forks) from a2a-sdk 0.3.x to 1.x — renamed/removed symbols, common error shapes, before/after diffs."
---
-
-import { Callout } from 'fumadocs-ui/components/callout';
-
-The `a2a-sdk` Python package released v1.0 in late April 2026. The
-Molecule workspace runtime migrated under tracking ID **KI-009** and
-shipped in `molecule-ai-workspace-runtime` **v0.1.11** (commit
-`d5cf872`, PR #39). The platform now runs exclusively on v1.
-
-If you're consuming the platform's published wheel, bumping
-`molecule-ai-workspace-runtime>=0.1.11` handles the migration for
-you. If you maintain a fork of the runtime, an external agent talking
-A2A directly, or your own adapter that imports from `a2a.*`, this page
-is your checklist.
-
-## Why migrate
-
- **Upstream**: `a2a-sdk` 1.0 reorganised the import surface, flattened
-  `Part`, removed deprecated capability flags, and replaced the
-  `A2AStarletteApplication` wrapper with explicit Starlette route
-  factories.
- **Platform**: as of 2026-04-24 the platform sends/receives via v1
-  shapes natively. The SDK ships a v0_3 compat layer (enabled in the
-  runtime via `enable_v0_3_compat=True` on `create_jsonrpc_routes`) so
-  in-flight 0.x callers don't break, but new code should target v1.
- **Forks/external runtimes**: v0 code throws on `import a2a.utils`
-  and `from a2a.server.apps import A2AStarletteApplication` once you
-  install v1, so the migration is a hard cutover at install time, not
-  a soft deprecation.
-
-## Cheat sheet — renamed and removed symbols
-
-The four breaking changes that hit the Molecule runtime during KI-009.
-All four are confirmed against
-`molecule-core/workspace/` source.
-
-### 1. `new_agent_text_message` renamed to `new_text_message`
-
- **v0 location**: `a2a.utils.new_agent_text_message`
- **v1 location**: `a2a.helpers.new_text_message`
-
-Both the module path and the symbol name changed.
-
-### 2. `Part` API flattened — `TextPart` removed
-
- **v0**: `Part(root=TextPart(text="..."))` — `Part` wrapped a `root`
-  union of `TextPart` / `FilePart` / `DataPart`.
- **v1**: `Part(text="...")` — `Part` accepts the text payload
-  directly. `TextPart` no longer exists as a public symbol.
-
-`FilePart` / `DataPart` are similarly flattened (`Part(file=...)`,
-`Part(data=...)`); the Molecule runtime only emits text parts so the
-file/data shapes weren't exercised in KI-009 and aren't covered by
-this guide.
-
-### 3. `A2AStarletteApplication` removed — use route factories
-
- **v0**: `from a2a.server.apps import A2AStarletteApplication` then
-  `A2AStarletteApplication(agent_card, request_handler).build()`.
- **v1**: `from a2a.server.routes import create_agent_card_routes,
-  create_jsonrpc_routes` then build a Starlette app from the returned
-  route lists.
-
-The factories also let you mount the JSON-RPC endpoint at any path
-(the runtime mounts at `/` because the platform POSTs to root, see
-`workspace/main.py:279`).
-
-### 4. `state_transition_history` capability flag removed
-
- **v0**: `AgentCapabilities(streaming=..., push_notifications=...,
-  state_transition_history=True)` was a per-agent opt-in.
- **v1**: the field is gone from `AgentCapabilities`. Per the SDK's own
-  `a2a/compat/v0_3/conversions.py`: *"No longer supported in v1.0"*.
-  The capability is now universal — `Task.history` is always available
-  and `tasks/get` accepts `historyLength` via `apply_history_length()`.
-
-If you pass `state_transition_history=...` as a kwarg to
-`AgentCapabilities` under v1, Pydantic will reject it. Drop the kwarg.
-See [`workspace/main.py`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/workspace/main.py)
-for the explanatory comment that prevents future accidental re-adds.
-
-## Common error shapes
-
-When v0 code runs against the v1 SDK, the failure modes look like this:
-
-| Error | Cause |
-|---|---|
-| `ModuleNotFoundError: No module named 'a2a.utils'` | v0 import path; module renamed to `a2a.helpers`. |
-| `ImportError: cannot import name 'A2AStarletteApplication' from 'a2a.server.apps'` | The whole `a2a.server.apps` module is gone in v1. Switch to `a2a.server.routes` factories. |
-| `ImportError: cannot import name 'TextPart' from 'a2a.types'` | Flattened `Part` API; use `Part(text=...)`. |
-| `ValueError: Protocol message AgentCapabilities has no "state_transition_history" field` | Removed capability flag passed as kwarg; drop it. |
-| `ValueError: Protocol message Part has no "root" field` | v0 `Part(root=TextPart(...))` shape against v1 schema; flatten to `Part(text=...)`. |
-
-The protobuf-style `ValueError` messages always follow the pattern
-`Protocol message <Type> has no "<field>" field` — that's the
-fingerprint of "v0 shape against v1 schema." Treat it as a v0→v1 hint
-even if the field name isn't on the cheat sheet above.
-
-## Migration checklist
-
-1. **Bump the dep** — `a2a-sdk[http-server]>=0.3.25` is the floor; remove
-   any `<1.0` upper bound. The Molecule wheel uses
-   `a2a-sdk[http-server]>=0.3.25` with no upper bound (see
-   [`molecule-ai-workspace-runtime/pyproject.toml`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/src/branch/main/pyproject.toml)).
-2. **Fix imports** — sweep the four renamed/removed symbols above. A
-   safe grep is `grep -rn "from a2a\\|import a2a"` across your tree.
-3. **Fix removed-field reads/writes** — search for
-   `state_transition_history` usage and delete the kwarg/field access.
-4. **Flatten `Part` constructors** — search for `Part(root=` and
-   convert to `Part(text=...)` / `Part(file=...)` / `Part(data=...)`.
-5. **Replace the app factory** — search for `A2AStarletteApplication`
-   and rewrite the bootstrap using `create_agent_card_routes` +
-   `create_jsonrpc_routes`. Pass `enable_v0_3_compat=True` to
-   `create_jsonrpc_routes` if your peers may still be on v0.
-6. **Re-run tests** — fixture-level mocks of `a2a.helpers` /
-   `a2a.utils` need to mock both names so tests still pass during the
-   rename rollout (see
-   [`workspace/tests/conftest.py`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/workspace/tests/conftest.py)
-   for the dual-name pattern).
-
-## Before / after diffs
-
-### `new_agent_text_message` → `new_text_message`
-
-```diff
-from a2a.utils import new_agent_text_message
-+from a2a.helpers import new_text_message
-
- async def execute(self, context, event_queue):
-    await event_queue.enqueue_event(new_agent_text_message("hello"))
-+    await event_queue.enqueue_event(new_text_message("hello"))
-```
-
-### Flat `Part` API
-
-```diff
-from a2a.types import Part, TextPart
-+from a2a.types import Part
-
-msg_parts = [Part(root=TextPart(text=final_text))]
-+msg_parts = [Part(text=final_text)]
-```
-
-### `AgentCapabilities` — drop `state_transition_history`
-
-```diff
- capabilities=AgentCapabilities(
-     streaming=config.a2a.streaming,
-     push_notifications=config.a2a.push_notifications,
-    state_transition_history=True,
- ),
-```
-
-### `A2AStarletteApplication` → route factories
-
-```diff
-from a2a.server.apps import A2AStarletteApplication
-+from a2a.server.routes import create_agent_card_routes, create_jsonrpc_routes
-
-app = A2AStarletteApplication(
-    agent_card=agent_card,
-    http_handler=request_handler,
-).build()
-+routes = []
-+routes.extend(create_agent_card_routes(agent_card))
-+routes.extend(create_jsonrpc_routes(
-+    request_handler=request_handler,
-+    rpc_url="/",
-+    enable_v0_3_compat=True,
-+))
-+app = Starlette(routes=routes)
-```
-
-The `enable_v0_3_compat=True` flag on `create_jsonrpc_routes` is what
-keeps in-flight v0 callers (peers that haven't migrated yet) from
-breaking — it accepts the old method names and translates them. The
-Molecule runtime ships with this flag on (see
-[`workspace/main.py`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/workspace/main.py));
-strip it once your entire fleet is on v1.
-
-## For downstream consumers
-
- **Using the published wheel** (`pip install
-  molecule-ai-workspace-runtime>=0.1.11`): the migration is in the
-  wheel — no code changes needed in your adapter or workspace template
-  beyond bumping the pin.
- **Running a fork of the runtime**: cherry-pick or rebase against
-  commit `d5cf872` ("feat: migrate a2a-sdk 1.x (KI-009) (#39)") in
-  `molecule-ai-workspace-runtime`. The diff is the canonical reference
-  for what KI-009 actually changed.
- **Standalone external agent** (talking A2A without the wheel): apply
-  the [Migration checklist](#migration-checklist) directly to your
-  source. The four cheat-sheet items are the entire surface that
-  changed for the typical agent role; only `Part` flattening and the
-  `state_transition_history` removal affect on-the-wire shapes — the
-  other two are import-only.
-
-<Callout type="info">
-The wheel keeps `enable_v0_3_compat=True` on `create_jsonrpc_routes`,
-so a v0 peer can still hit a v1 wheel and vice versa during the
-migration window. You don't need to coordinate a fleet-wide cutover —
-migrate at your own pace.
-</Callout>
-
-## See also
-
- [`molecule-ai-workspace-runtime` v0.1.11 release](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/releases/tag/v0.1.11) — first wheel containing KI-009
- PR #39 (feat: migrate a2a-sdk 1.x / KI-009) — closed without merge; PR content is historical
- PR #48 (feat(a2a): dual-compat for a2a-sdk 0.3.x and 1.x) — closed without merge; PR content is historical
- [Bring Your Own Runtime (MCP)](/docs/runtime-mcp) — universal wheel install path
- [External Agents](/docs/external-agents) — manual A2A path for non-MCP runtimes
@@ -1,69 +0,0 @@
---
-title: "Cognee Architecture Deep-Dive — Workspace Isolation"
-description: "Deep-dive into Cognee's isolation primitives versus Molecule AI's per-workspace memory requirements."
---
-# Cognee Architecture Deep-Dive — Workspace Isolation
-
-**Date:** 2026-04-20
-**Issue:** Molecule-AI/molecule-core#1146
-**Research by:** Research Lead
-**Status:** Complete
-
---
-
-## Executive Summary
-
-Cognee has **dataset-level isolation primitives** but **no storage-layer enforcement** and **no native `workspace_id` support** in its MCP tool interface. Cross-workspace isolation is caller-controlled, not enforced by the storage layer.
-
---
-
-## Isolation Layer Analysis
-
-| Layer | Mechanism | Enforced? | Risk |
-|-------|-----------|-----------|------|
-| Storage (Postgres) | No RLS, no schema namespacing | ❌ None | High |
-| App — dataset | `dataset_name` passed per tool call | ⚠️ Caller-controlled | Medium |
-| App — user | `get_default_user()` internal resolver only | ⚠️ Soft | Medium |
-| MCP `workspace_id` param | Not present in cognee-mcp interface | ❌ N/A | High |
-
---
-
-## Key Findings
-
-1. **Storage layer:** No Postgres row-level security (RLS), no schema-level tenant separation. Any admin with DB access can read any tenant's data.
-
-2. **Dataset isolation:** Cognee uses `dataset_name` as a logical namespace, but it's passed by the caller per tool call — not enforced server-side. A misconfigured or malicious caller could read/write across datasets.
-
-3. **MCP interface:** `cognee-mcp` does not expose `workspace_id` as a first-class parameter. Workspaces would need to be mapped to dataset names externally.
-
-4. **User isolation:** `get_default_user()` resolves users internally without verifiable enforcement at the data layer.
-
---
-
-## Migration Implications
-
-Adopting Cognee as the memory substrate requires an **auth bridge**:
-
- The bridge wraps cognee-mcp and injects `workspace_id` → `dataset_name` mapping
- All tool calls are routed through the bridge, which enforces tenant context
- Estimated effort: **~100–200 LOC** for the MCP proxy wrapper
- This is a pragmatic path — the bridge provides the isolation Cognee's storage layer lacks
-
---
-
-## Recommendation
-
-**Attempt the auth bridge prototype first (1–2 days of engineering):**
-1. Build MCP proxy that maps workspace_id to dataset_name on each call
-2. Validate that cross-workspace calls are correctly rejected
-3. If clean → adopt Cognee for Phase 9
-4. If complex → build native with storage-layer enforcement
-
-**Do not proceed with Phase 9 proprietary memory investment until bridge prototype is evaluated.**
-
---
-
-## Sources
-
- Cognee GitHub: https://github.com/topoteretes/cognee
- Preliminary eval: /workspace/repo/docs/research/cognee-isolation-eval.md
@@ -1,41 +0,0 @@
---
-title: "Cognee Workspace Isolation Evaluation"
-description: "Evaluating Cognee, an open-source AI memory engine, against Molecule AI's hierarchical memory isolation needs."
---
-# Cognee Workspace Isolation Evaluation
-
-**Date:** 2026-04-20
-**Issue:** Molecule-AI/molecule-core#1146
-**Status:** Preliminary — needs deeper architecture review
-
-## Summary
-
-Cognee (Apache-2.0, by Topoteretes UG) is an open-source AI memory engine with a shipped MCP component. It has direct overlap with Molecule AI's Phase 9 hierarchical memory architecture.
-
-## Workspace Isolation Assessment
-
-**Signal: Partial/Positive**
-
-Cognee's GitHub README explicitly lists "agentic user/tenant isolation, traceability, OTEL collector, audit traits" as a core architectural feature.
-
-This is a positive signal. However:
- The README mention does not specify the technical mechanism (namespace-level separation? separate vector DB instances per tenant? row-level security in a shared DB?)
- The cognee-mcp MCP component's handling of multi-workspace contexts is not documented in the surface-level readme
-
-**Verdict:** Cognee claims tenant isolation. Further due diligence required before treating this as confirmed.
-
-## Next Steps
-
-1. **Deep-dive into cognee architecture docs** — check if isolation is enforced at the storage layer (separate DB/collection per workspace), application layer (row-level), or both
-2. **Test cognee-mcp with a multi-workspace scenario** — the MCP tool interface should reveal whether workspace_id is a first-class parameter
-3. **Check cognee's GitHub issues/discussions** — any community reports of cross-tenant data leakage?
-4. **Evaluate migration path** — if Cognee is adopted, what's involved in migrating existing Phase 9 work?
-
-## Recommendation
-
-Proceed with Phase 9 build-vs-buy review. Cognee is a credible candidate — isolation is claimed but mechanism needs verification. The Phase 9 halt stands until this is resolved.
-
-## Sources
-
- https://github.com/topoteretes/cognee (README, 2026-04-20)
- /workspace/repo/research/cognee-memo.md
@@ -239,7 +239,7 @@ This terminates all EC2 instances, drops the Neon branch, and removes the org re
 - **Scoped roles**: give different team members read-only vs admin access within a tenant org (roadmap: Phase 34)
 - **Usage-based billing**: Meter workspace runtime and forward events to Stripe for custom billing tiers

-For runbook-level details on the provisioning flow, see the architecture docs at [`docs/architecture/saas-prod-migration-2026-04-19`](/docs/architecture/saas-prod-migration-2026-04-19).
+For the provisioning flow internals, see the [Provisioner](/docs/architecture/provisioner) and [Workspace Tiers](/docs/architecture/workspace-tiers) reference.

 For the API reference, see [`docs/api-reference`](/docs/api-reference) — the `/cp/orgs/*` endpoints are documented there.