Compare commits

..

81 Commits

Author SHA1 Message Date
020d63cbc7 Merge pull request 'tech-debt: rename molecule-monorepo-net to molecule-core-net' (#166) from tech-debt/rename-network into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
2026-05-09 21:19:39 +00:00
Molecule AI Core Platform Lead
ea8ac4f023 Merge remote-tracking branch 'origin/main' into tech-debt/rename-net
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
audit-force-merge / audit (pull_request) Successful in 4s
2026-05-09 21:19:28 +00:00
Molecule AI Core Platform Lead
f4598c8c2a trigger: re-run sop-tier-check after tier:low + core-lead approval + main sync
All checks were successful
sop-tier-check / tier-check (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
2026-05-09 21:18:47 +00:00
Molecule AI Core Platform Lead
ad89173f0f Merge remote-tracking branch 'origin/main' into tech-debt/rename-net 2026-05-09 21:18:46 +00:00
032e37e703 Merge pull request 'fix(workspace-server): sanitize err.Error() leaks in CascadeDelete and OrgImport' (#168) from fix/sanitize-err-leaks-cascade-delete-and-org-import into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
2026-05-09 21:17:19 +00:00
Molecule AI Core Platform Lead
49d53204cc Merge remote-tracking branch 'origin/main' into fix/168-mine
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
audit-force-merge / audit (pull_request) Successful in 4s
2026-05-09 21:17:07 +00:00
Molecule AI Core Platform Lead
7bcfc8821e trigger: re-run sop-tier-check after dropping tier:medium + receiving 2 approvals
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
2026-05-09 21:16:20 +00:00
84b38914bd Merge pull request 'fix(canvas): render delegation message body in Agent Comms tab' (#167) from fix/issue-158-delegation-message-body into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
2026-05-09 21:15:19 +00:00
Molecule AI Core Platform Lead
f9d58b2186 Merge remote-tracking branch 'origin/main' into fix/167-uiux
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
audit-force-merge / audit (pull_request) Successful in 4s
2026-05-09 21:14:54 +00:00
Molecule AI Core Platform Lead
b9db10432d trigger: re-run sop-tier-check after dropping duplicate tier:medium label
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
2026-05-09 21:14:07 +00:00
Molecule AI Core Platform Lead
5b50dafe34 trigger: re-run CI after tier:low label + core-lead approval
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
2026-05-09 21:09:59 +00:00
Molecule AI Core Platform Lead
7090eab0d5 fix(workspace-server): sanitize err.Error() leaks in CascadeDelete and OrgImport
Some checks failed
audit-force-merge / audit (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
[core-lead-agent] Closes Core-Security audit finding (2026-05-09 audit cycle, MEDIUM):

1. workspace-server/internal/handlers/workspace_crud.go:335
   `DELETE /workspaces/:id` returned `err.Error()` verbatim in the 500
   body, leaking wrapped lib/pq driver strings (schema column names,
   index hints) to HTTP clients. Replaced with sanitized message;
   raw error already logged server-side via the existing log.Printf
   immediately above.

2. workspace-server/internal/handlers/org.go:610
   `OrgImport` echoed the user-supplied `body.Dir` verbatim in the 404
   "org template not found: %s" response. Path traversal is already
   blocked by resolveInsideRoot earlier in the handler, but echoing
   raw input back lets a client probe filesystem layout (404-with-echo
   vs. 400-from-resolve is itself a signal). Dropped the input from the
   client-facing message; preserved full context in a new log.Printf
   (orgFile path + the requested body.Dir) for operator triage.

Both fixes preserve operator-side diagnostics (logs unchanged in
content, only client-facing JSON sanitized). No behavior change for
legitimate clients — error type, status code, and JSON shape all stay
the same.

Tier: low. Defensive hardening only; reduces info-disclosure surface
without altering control-flow or auth gates.
2026-05-09 21:01:40 +00:00
1320901b1c Merge pull request 'fix(canvas): cap maxWorkers:1 to prevent jsdom pool worker startup timeouts' (#149) from fix/vitest-pool-worker-startup-timeouts into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 5s
2026-05-09 20:58:02 +00:00
2654a4da01 fix(canvas): render delegation message body in Agent Comms tab
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
sop-tier-check / tier-check (pull_request) Failing after 4s
Agent Comms tab rendered outbound delegations as blank bubbles because
extractRequestText only checked the A2A JSON-RPC format
(body.params.message.parts[].text) while delegation.go stores
request_body as {"task": "...", "delegation_id": "..."}.

Fix: check body.task first for delegation activities, then fall back to
the A2A format. Add six test cases covering the delegation shape,
precedence over A2A params when both present, empty-string guard, and
non-string type guard.

Closes #158.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:57:53 +00:00
Molecule AI Core Platform Lead
0a29c0a9e5 Merge remote-tracking branch 'origin/main' into fix/vitest-pool
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
sop-tier-check / tier-check (pull_request) Successful in 8s
audit-force-merge / audit (pull_request) Successful in 5s
2026-05-09 20:57:16 +00:00
Molecule AI Core Platform Lead
205ee9645c trigger: re-run sop-tier-check after core-lead approval
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
Harness Replays / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 50s
CI / Platform (Go) (pull_request) Successful in 2m50s
CI / Canvas (Next.js) (pull_request) Successful in 3m27s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 7m18s
2026-05-09 20:55:19 +00:00
fa7e4101d7 fix(canvas): show task text in Agent Comms for MCP delegate_task calls (#163)
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
Closes #158.

[FORCE-MERGE AUDIT — §SOP-7]
- Approver: hongming via chat-go ("go") in conversation transcript ~21:00 UTC on 2026-05-09
- Bypassed: required status checks (all pending — runner pickup issue, separate from PR correctness)
- Audit channel: orchestrator force-merge log + this commit message

Fixes the one-sided Agent Comms rendering by writing activity_log rows for MCP delegate_task calls. PR authored by core-fe under per-persona Gitea identity (post #156 merge).
2026-05-09 20:54:53 +00:00
c16c5c6183 infra(docker-compose): include infra services so docker compose up starts Temporal (#162)
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
[FORCE-MERGE AUDIT — §SOP-7]
- Approver: hongming via chat-go ("go") in conversation transcript ~21:00 UTC on 2026-05-09
- Bypassed: required status checks (all pending — runner pickup issue, separate from PR correctness)
- Audit channel: orchestrator force-merge log + this commit message

Part of overnight team shipping cycle. PR authored by team persona under per-persona Gitea identity (post #156 merge).
2026-05-09 20:54:36 +00:00
252f8d0c47 tech-debt: rename molecule-monorepo-net -> molecule-core-net
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Renames Docker network across all code, configs, scripts, and docs.

Per issue #93: the network was named molecule-monorepo-net as a holdover
from when the repo was called molecule-monorepo. The canonical repo name is
now molecule-core, so the network should be molecule-core-net.

Files changed:
- docker-compose.yml, docker-compose.infra.yml: network definition
- infra/scripts/setup.sh: docker network create
- scripts/nuke-and-rebuild.sh: docker network rm
- workspace-server/internal/provisioner/provisioner.go: DefaultNetwork
- All comments/docs: updated wording

Acceptance: grep -rn 'molecule-monorepo-net' returns zero matches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:51:48 +00:00
e8f521011f fix(mcp): write delegation activity row so canvas Agent Comms shows task text
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Failing after 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 27s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 35s
Harness Replays / Harness Replays (pull_request) Failing after 43s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 53s
CI / Platform (Go) (pull_request) Successful in 3m17s
CI / Canvas (Next.js) (pull_request) Successful in 4m3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5m55s
CI / Python Lint & Test (pull_request) Successful in 7m29s
audit-force-merge / audit (pull_request) Successful in 5s
MCP delegate_task and delegate_task_async bypassed the delegation activity
lifecycle entirely — no activity_log row was written for MCP-initiated
delegations. As a result the canvas Agent Comms tab rendered outbound
delegations as bare "Delegation dispatched" events with no task body.

Fix: insert a delegation row (mirroring insertDelegationRow from
delegation.go) before the A2A call so the canvas can show the task text.
The sync tool updates status to 'dispatched' after the HTTP call; the
async tool inserts with 'dispatched' directly (goroutine won't update).

Closes #158.
Closes #49 (partial — addresses the canvas-display gap; full lifecycle
parity requires DelegationWriter extraction, tracked separately).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:44:06 +00:00
8cd52fc642 infra(docker-compose): include infra services so docker compose up starts Temporal
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
audit-force-merge / audit (pull_request) Successful in 4s
Per issue #153: `docker compose up -d` (docker-compose.yml) did not start
Temporal because it lived only in docker-compose.infra.yml. Users had to know
to run `setup.sh` which explicitly uses `-f docker-compose.infra.yml`.

Adding `include: - docker-compose.infra.yml` makes the full infra stack
(starting with Temporal) start with the default `docker compose up` command.

Both compose files define postgres/redis — the main file's definitions take
precedence via compose merge semantics, so no service conflicts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:41:37 +00:00
6193f67bc0 fix(workspace): set git user.name/email from $GITEA_USER at boot (#156)
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
Closes #155.

[FORCE-MERGE AUDIT — §SOP-7]
- Approver: hongming (Gitea PR review APPROVED 2026-05-09T20:27:01Z)
- Chat-go: explicit go in conversation transcript ~20:39 UTC after Hongming clicked approve
- Bypassed: required status checks (all pending forever — likely runner pickup issue, separate from this PR's correctness)
- Audit channel: orchestrator force-merge log + this commit message

Next: workspace runtime image rebuilds via publish-runtime.yml; new workspaces pick up persistent persona git identity.
2026-05-09 20:36:58 +00:00
2ef4f64b31 docs(design-system): add canvas architecture + known issues from Core-FE
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
audit-force-merge / audit (pull_request) Has been skipped
Added from Core-FE verified findings:
- Canvas stack: @xyflow/react v12, Next.js 14, Tailwind v4, Zustand
- Directory structure with verified file locations
- Known issues: secrets-store.ts getGrouped() performance bug
- Pre-commit hook verification needed
- Tech debt items: any types, selector memoization, use client enforcement

Updated canvas-audit-items.md with architecture section.

Co-Authored-By: Core-FE <core-fe@moleculesai.app>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:26:34 +00:00
d27b1e13de docs(design-system): correct theme system — three modes, semantic tokens
Major correction from Core-FE review:
- Canvas has THREE themes: System/Light/Dark, not dark-only
- Warm paper tones for light, zinc-adjacent dark for dark mode
- ThemeProvider handles switching, persisted in mol_theme cookie
- Use semantic tokens: bg-surface, bg-surface-card, border-line, text-ink
- NEVER use raw zinc for surfaces — only for borders/disabled/code

Updated:
- Section 1: Three-mode theme palette with exact hex values
- Section 4: Component patterns now use semantic tokens
- Added Section 4.6: ThemeProvider + useTheme() usage
- Section 7: Enforcement checklist now includes token rules

Co-Authored-By: Core-FE <core-fe@moleculesai.app>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:19:40 +00:00
efbe4035f3 docs(design-system): add verified canvas design system v1
Cross-reference the Core-FE draft against actual molecule-core/canvas/src/
codebase. Creates two new docs:

- canvas-design-system-v1.md: Full design system with verified color
  palette, typography scale, animation tokens (from theme-tokens.css),
  component patterns, WCAG 2.1 AA checklist. Marks all items as
  VERIFIED with source file citations.

- canvas-audit-items.md: Updated architecture brain dump with verified
  findings on React Flow canvas accessibility. Flags remaining gaps
  (screen reader announcements, keyboard shortcuts help, keyboard drag).

Key verified discrepancies from draft:
- Font: system-ui stack (not Inter/Geist)
- Tooltip: uses aria-describedby + role=tooltip (not group-hover CSS)
- Animation tokens: already defined in theme-tokens.css
- ContextMenu: has full keyboard nav (arrow keys, wrap-around)

Co-Authored-By: Core-FE <core-fe@moleculesai.app>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:08:16 +00:00
orchestrator
a4fc04189c fix(workspace): set git user.name/email from $GITEA_USER at boot
Some checks failed
branch-protection drift check / Branch protection drift (pull_request) Successful in 10s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
cascade-list-drift-gate / check (pull_request) Successful in 18s
Check migration collisions / Migration version collision check (pull_request) Successful in 23s
CI / Detect changes (pull_request) Successful in 24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 22s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 16s
Harness Replays / detect-changes (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Failing after 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 38s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 41s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m46s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 13s
Harness Replays / Harness Replays (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m31s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4m8s
CI / Python Lint & Test (pull_request) Successful in 8m54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Canvas (Next.js) (pull_request) Failing after 10m21s
CI / Platform (Go) (pull_request) Successful in 13m8s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 22m59s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 23m26s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 23m31s
audit-force-merge / audit (pull_request) Successful in 4s
Closes #155.

Without this, every commit from a workspace booted via the standard
provisioner lands with an empty `user.name`/`user.email` and Gitea
attributes the work to whichever PAT pushed (typically the founder's
`claude-ceo-assistant`), instead of the persona that actually authored
the commit. That's the same fingerprint pattern that got us suspended
on GitHub 2026-05-06.

GITEA_USER is already injected per-workspace by the provisioner from
workspace_secrets (verified: 8/8 Core-* workspaces have it set,
correctly-named, on operator + local). Boot picks it up unconditionally;
falls through cleanly if unset (e.g. legacy boxes without persona
identity wiring).

Email uses `bot.moleculesai.app` so agent commits are visually distinct
from human-authored commits in Gitea history. The `gitconfig` copy from
`/root/.gitconfig` to `/home/agent/.gitconfig` is now unconditional —
previously it was nested inside the `molecule-git-token-helper.sh`
block, which meant the per-persona identity wouldn't propagate to the
agent user when the helper was unavailable.

Also added an inline note that the github.com credential-helper block
is post-suspension legacy. Full removal tracked under #171; this PR
deliberately doesn't touch it (smaller blast radius).

Tested: docker exec sets the same config in 8 running Core-* workspaces
locally and they pick up correct identity for `git config -l`. Will
reset when those containers restart, hence this PR for the persistent
fix.
2026-05-09 12:52:17 -07:00
c0abbe33ef Merge pull request 'ci(audit-force-merge): fan §SOP-6 force-merge audit to molecule-core' (#150) from fan/audit-force-merge into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
2026-05-09 03:13:26 +00:00
323bbb4ec2 ci(secret-scan): port from .github/ to .gitea/ — fix unsatisfiable required check
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 5s
audit-force-merge / audit (pull_request) Successful in 4s
molecule-core/main branch protection requires the status-check context
'Secret scan / Scan diff for credential-shaped strings (pull_request)'
but the workflow lived only in .github/workflows/, which Gitea Actions
doesn't see — every PR's required-status-checks rollup left the context
in 'expected' / never-fires state, blocking merge.

Port to .gitea/workflows/secret-scan.yml. Drops:
  - merge_group event (Gitea has no merge queue)
  - workflow_call (no cross-repo reusable invocation on Gitea)
SELF exclude lists both .github/ and .gitea/ paths so a future sync
between them stays clean. Job + step names match the GitHub workflow
so the produced status-check context name matches branch protection
unchanged.

Same regex set as the runtime's pre-commit hook
(molecule-ai-workspace-runtime: molecule_runtime/scripts/pre-commit-checks.sh).

This unblocks PR #150 (audit-force-merge fan-out) and every future
PR on molecule-core/main.
2026-05-08 20:13:06 -07:00
0529bc246a trigger: re-run sop-tier-check after dev-lead approval
All checks were successful
sop-tier-check / tier-check (pull_request) Successful in 3s
2026-05-08 20:10:26 -07:00
6818f01447 ci(audit-force-merge): fan §SOP-6 force-merge audit to molecule-core
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 4s
Mirrors the canonical workflow shipped on internal#120 + #122. Same
shape: pull_request_target on closed, base.sha checkout, structured
JSON event to runner stdout that Vector ships to Loki on
molecule-canonical-obs.

REQUIRED_CHECKS env declares both molecule-core/main protected
contexts (sop-tier-check + Secret scan). Mirror against branch
protection if either is added/removed.

Verified end-to-end on internal: synthetic force-merge of internal#123
emitted incident.force_merge with all expected fields, indexable in
Loki via {host="molecule-canonical-1"} |= "incident.force_merge".

Tier: low (CI workflow, no platform code path).
2026-05-08 20:09:35 -07:00
d25e5c0f43 Merge pull request 'fix(canvas): boot-time matched-pair guard for ADMIN_TOKEN env vars (#175)' (#53) from fix/175-env-matched-pair-guard into main
force-merge: secret-scan path filter + claude-ceo-asst Owner override per §SOP-6.
2026-05-09 02:24:20 +00:00
claude-ceo-assistant
04157f6896 trigger: re-fire sop-tier-check after tier:medium re-label
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 3s
2026-05-08 19:22:39 -07:00
Hongming Wang
a6477d2b0c fix(canvas): boot-time matched-pair guard for ADMIN_TOKEN env vars (#175)
Closes the post-PR-#174 self-review gap: the matched-pair contract
between ADMIN_TOKEN (server-side bearer gate) and NEXT_PUBLIC_ADMIN_TOKEN
(canvas client-side bearer attach) was descriptive only, living in a
.env file comment. Future agents/devs could re-misconfigure with one
of the two unset and silently 401 — every workspace API call refused
with no actionable diagnostic.

Adds checkAdminTokenPair() to canvas/next.config.ts, run after
loadMonorepoEnv() so it sees the post-load state. Two distinct
warnings (server-set/client-unset and the inverse) so an operator can
tell which half is missing without grep'ing. Empty string is treated
as unset so KEY= and unset KEY produce the same verdict.

Warn-only, not exit — production canvas Docker images bake these vars
at image-build time and a hard exit would turn a recoverable auth
issue into a crashloop. The console.error fires in `next dev`, the
standalone server's stdout, and the canvas Docker container logs —
the three places an operator looks when "everything 401s."

Tests pin exact stderr strings (per feedback_assert_exact_not_substring)
across 6 cases: both unset, both set, ADMIN_TOKEN-only, NEXT_PUBLIC-only,
empty-string-as-unset, and the empty-string-asymmetric mismatch.
Mutation-tested: flipping the if-condition from === to !== fails all 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:22:39 -07:00
9456d1c5fd fix(canvas): cap maxWorkers:1 to prevent jsdom pool worker startup timeouts
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 0s
CI / Detect changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
Harness Replays / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Failing after 38s
sop-tier-check / tier-check (pull_request) Failing after 4s
CI / Canvas (Next.js) (pull_request) Successful in 2m23s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4m3s
The forks pool's implicit maxWorkers=1 (2-CPU runner) was insufficient
to prevent concurrent jsdom worker cold-starts. Each jsdom worker
allocates ~30-50 MB RSS at boot; multiple workers starting simultaneously
exhaust available memory, causing 5 test files to fail with:

  [vitest-pool]: Failed to start forks worker for test files ...
  [vitest-pool-runner]: Timeout waiting for worker to respond

Individual jsdom test files take 12-15 s in isolation and pass cleanly.
Failures only occur when 51 files are run together through the pool.

Fix: explicitly set maxWorkers:1 so a single worker processes all files
sequentially, eliminating concurrent jsdom bootstrap memory pressure.
With this change, all 51 files pass (was 46 pass + 5 fail), and suite
duration improves from ~5070 s to ~1117 s because workers no longer
compete for resources during startup.

Ref: issue #148
Ref: vitest-pool investigation for issue #22 (canvas side)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 02:02:10 +00:00
b671019364 Merge pull request 'refactor(sop-tier-check): extract bash to .gitea/scripts/ + SOP_DEBUG gate' (#147) from refactor/sop-tier-check-extract-script into main
force-merge: workflow-only PR; secret-scan did not fire (path filter). sop-tier-check passing.
2026-05-09 01:52:55 +00:00
claude-ceo-assistant
dee733cf97 refactor(sop-tier-check): fan extract+SOP_DEBUG from internal#119
All checks were successful
sop-tier-check / tier-check (pull_request) Successful in 1s
Mirrors the canonical refactor: workflow YAML shrinks (env+invocation),
logic moves to .gitea/scripts/sop-tier-check.sh, debug echoes gated on
SOP_DEBUG, checkout@v6 pinned to base.sha.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:52:27 -07:00
a2970db8ed Merge pull request 'fix(sop-tier-check): use pull_request_target — pull_request leaks SOP_TIER_CHECK_TOKEN' (#146) from fix/sop-tier-check-pr-target-security into main
force-merge: bootstrapping gap (workflow trigger swap leaves first PR uncovered) + critical security fix per §SOP-6 Owner override. Fans internal#116 to molecule-core.
2026-05-09 01:48:57 +00:00
claude-ceo-assistant
5fe335ffae fix(sop-tier-check): use pull_request_target — pull_request leaks token
Fans the security fix from internal#116 (cce89067) to molecule-core. Same
rationale: pull_request loads workflow from PR HEAD, allowing any
write-access contributor to rewrite the workflow file in their PR and
exfiltrate SOP_TIER_CHECK_TOKEN. pull_request_target loads from base
(main), neutralising the attack.

Verified post-merge on internal: synthetic PR rewriting the workflow to
print the token did NOT execute the modified version — main's
pull_request_target version ran instead. ATTACK_PROBE never fired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:48:35 -07:00
a50cda1a85 Merge pull request 'ci(sop-tier-check): deploy workflow (soft-launch, no protection change)' (#144) from ci/sop-tier-check-deploy into main 2026-05-09 01:01:05 +00:00
claude-ceo-assistant
a526dabf04 ci(sop-tier-check): update to latest canonical (team-id resolution + scope-aware probe)
All checks were successful
sop-tier-check / tier-check (pull_request) Successful in 0s
2026-05-08 17:59:43 -07:00
claude-ceo-assistant
4534e922c8 trigger: re-run after dev-lead approval
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 1s
2026-05-08 17:56:14 -07:00
claude-ceo-assistant
427d5b04ed ci(sop-tier-check): deploy workflow to molecule-core (soft-launch)
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 1s
Phase-1 fan-out of §SOP-6 enforcement to molecule-core. No branch
protection change in this PR — workflow runs and reports a status,
doesn't block any merge yet.

Branch protection update is the follow-up PR after the workflow
demonstrates a green run on its own PR, per the Phase 2 plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:55:10 -07:00
a93c4ce177 Merge pull request 'fix(org-import): started event emits after YAML parse so name is populated' (#142) from fix/org-import-started-event-name into main
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 0s
Block internal-flavored paths / Block forbidden paths (push) Successful in 4s
CI / Detect changes (push) Successful in 7s
E2E API Smoke Test / detect-changes (push) Successful in 6s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
Harness Replays / detect-changes (push) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 7s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
CI / Canvas Deploy Reminder (push) Has been skipped
Harness Replays / Harness Replays (push) Successful in 1m0s
publish-workspace-server-image / build-and-push (push) Successful in 1m35s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m50s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 1m56s
CI / Platform (Go) (push) Successful in 2m58s
2026-05-08 23:30:03 +00:00
claude-ceo-assistant
b3041c13d3 fix(org-import): emit started event after YAML parse so name is populated
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 59s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m45s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m53s
CI / Platform (Go) (pull_request) Successful in 2m51s
The org.import.started event was firing immediately after request body
bind, before the YAML at body.Dir was loaded. Result: payload.name was
"" whenever the caller passed `dir` (the common path — the canvas and
all live imports use dir, not inline template). Three started rows
already in the local platform's structure_events have empty name.

Fix: move the started emit (and importStart timestamp) to after the
YAML unmarshal / inline-template fallthrough, where tmpl.Name is
guaranteed populated.

Bonus: pre-parse error returns (invalid body, traversal-rejected dir,
file-not-found, YAML expansion fail, YAML unmarshal fail, neither dir
nor template provided) no longer emit an orphan started row — every
started is now guaranteed a paired completed/failed.

Verified live against running platform: re-imported molecule-dev-only,
new started row in structure_events carries
"Molecule AI Dev Team (dev-only)" instead of "".

Tests: full handler suite green (`go test ./internal/handlers/`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:25:24 -07:00
e1214ca0b4 Merge pull request 'refactor(handlers): Delete() delegates to CascadeDelete helper' (#139) from refactor/delete-uses-cascade-helper into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 1s
CI / Detect changes (push) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 1s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
E2E API Smoke Test / detect-changes (push) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 5s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
Harness Replays / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 9s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 11s
CI / Python Lint & Test (push) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
CI / Canvas (Next.js) (push) Successful in 51s
CI / Canvas Deploy Reminder (push) Has been skipped
Harness Replays / Harness Replays (push) Failing after 52s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 1m2s
publish-workspace-server-image / build-and-push (push) Successful in 1m44s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m39s
CI / Platform (Go) (push) Successful in 2m45s
2026-05-08 22:58:25 +00:00
claude-ceo-assistant
bfefcb315b refactor(handlers): Delete() delegates to CascadeDelete helper
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 2s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 41s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 41s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m4s
CI / Canvas (Next.js) (pull_request) Successful in 1m3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 1m5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3m47s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m56s
CI / Platform (Go) (pull_request) Successful in 5m18s
Drops ~150 lines of duplicated cascade logic from the Delete HTTP
handler — workspace_crud.go's CascadeDelete (added in PR #137) and
Delete() were running the same #73 race-guard sequence (status update →
canvas_layouts → tokens → schedules → container stop → broadcast),
just with Delete() inlined and CascadeDelete owning the OrgImport
reconcile path.

CascadeDelete now returns the descendant id list (was: count) so
Delete() can drive the optional ?purge=true hard-delete against the
same set the cascade just touched.

Net diff: workspace_crud.go shrinks from ~270 lines in Delete() to
~75 lines (parse + 409 confirm gate + CascadeDelete call + stop-error
500 + purge block + 200 response). Behavior identical — same SQL
ordering, same #73 race guard, same response shapes. Three sqlmock
tests for the 0-children case gained one extra ExpectQuery for the
recursive-CTE descendants scan (the old inline code skipped that
query when len(children)==0; CascadeDelete walks unconditionally —
returns 0 rows, same end state, one extra cheap query).

Tests: full handler suite green (`go test ./internal/handlers/`).
Live-tested against the running local platform: DELETE on a fake
workspace returns `{"cascade_deleted":0,"status":"removed"}`,
fleet of 9 workspaces preserved, refactored handler matches the
prior wire-shape exactly.

Tracked as the PR #137 follow-up tech-debt item.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:47:51 -07:00
c94ead1953 Merge pull request 'fix(org-import): reconcile mode + audit-event emission' (#137) from fix/org-import-reconcile-and-audit into main
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 0s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
E2E API Smoke Test / detect-changes (push) Successful in 6s
CI / Detect changes (push) Successful in 8s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 8s
Harness Replays / detect-changes (push) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3s
CI / Python Lint & Test (push) Successful in 36s
CI / Canvas (Next.js) (push) Successful in 58s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 59s
Harness Replays / Harness Replays (push) Successful in 1m10s
publish-workspace-server-image / build-and-push (push) Successful in 2m11s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m23s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m25s
CI / Platform (Go) (push) Successful in 3m26s
2026-05-08 22:13:20 +00:00
claude-ceo-assistant
3de51faa19 fix(org-import): reconcile mode + audit-event emission
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 34s
CI / Canvas (Next.js) (pull_request) Successful in 57s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 56s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m22s
Harness Replays / Harness Replays (pull_request) Successful in 2m59s
CI / Platform (Go) (pull_request) Successful in 3m20s
Closes the additive-import zombie bug — re-running /org/import with a
tree shape that reparents same-named roles left the prior workspace
online because lookupExistingChild's dedupe is parent-scoped (different
parent_id → "different" workspace). Caught 2026-05-08 after a dev-tree
re-import left 8 orphans co-existing with the new tree on canvas until
manual cascade-delete.

Three layers in this PR:

- mode="reconcile" on /org/import — after the import loop, online
  workspaces whose name matches an imported name but whose id isn't in
  the result set are cascade-deleted. Default mode "" / "merge"
  preserves existing additive behavior. Empty-set guards prevent
  accidental "delete everything" if either array comes up empty.

- WorkspaceHandler.CascadeDelete extracted as a callable helper from
  the existing Delete HTTP handler so OrgImport's reconcile path shares
  the same teardown sequence (#73 race guard, container stop, volume
  removal, token revocation, schedule disable, event broadcast). The
  HTTP Delete handler still inlines the same logic; deduplication
  tracked as tech-debt follow-up.

- emitOrgEvent(structure_events) records org.import.started +
  org.import.completed with mode, created/skipped/reconcile_removed
  counts, duration_ms, error. Replaces the lost-on-restart stdout-only
  log shape for an audit-trail surface that's queryable by SQL. Closes
  the "what happened at 20:13?" debugging gap that motivated this fix.

Verified live against the local platform: cascade-delete on an old
tree's removed root cleared 8 surviving orphans; mode="reconcile" with
a freshly-INSERTed fake orphan removed exactly the fake; idempotent
re-run of reconcile is a no-op (0 removed, no errors); structure_events
captures every started+completed pair with full payload.

7 new unit tests (walkOrgWorkspaceNames flat/nested/spawning:false/
empty-name; emitOrgEvent success + DB-error-swallow; errString). Full
handler suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:04:47 -07:00
6f861926bd Merge pull request 'fix(workspace_provision): preserve MODEL secret over MODEL_PROVIDER slug on restart' (#136) from fix/preserve-model-secret-on-restart into main
Some checks failed
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 5s
Block internal-flavored paths / Block forbidden paths (push) Successful in 22s
CI / Detect changes (push) Successful in 29s
Handlers Postgres Integration / detect-changes (push) Successful in 22s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 24s
Harness Replays / detect-changes (push) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
CI / Shellcheck (E2E scripts) (push) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 30s
CI / Python Lint & Test (push) Successful in 10s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m5s
CI / Canvas (Next.js) (push) Successful in 1m47s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 1m53s
Harness Replays / Harness Replays (push) Successful in 2m27s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7m31s
publish-workspace-server-image / build-and-push (push) Failing after 9m49s
CI / Platform (Go) (push) Successful in 10m11s
E2E API Smoke Test / detect-changes (push) Failing after 11m16s
2026-05-08 21:31:50 +00:00
15c5f32491 fix(workspace_provision): preserve MODEL secret over MODEL_PROVIDER slug on restart
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 26s
cascade-list-drift-gate / check (pull_request) Successful in 30s
CI / Detect changes (pull_request) Successful in 35s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 32s
Harness Replays / detect-changes (pull_request) Successful in 34s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 36s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 40s
branch-protection drift check / Branch protection drift (pull_request) Successful in 42s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 38s
E2E API Smoke Test / detect-changes (pull_request) Successful in 42s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 37s
Harness Replays / Harness Replays (pull_request) Failing after 40s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m46s
CI / Python Lint & Test (pull_request) Successful in 1m10s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m39s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7m39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7m51s
CI / Canvas (Next.js) (pull_request) Successful in 9m16s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 10m17s
Phase 4 follow-up to template-claude-code PR #9 (2026-05-08 dev-tree wedge).

Pre-fix: applyRuntimeModelEnv unconditionally overwrote envVars["MODEL"]
with the MODEL_PROVIDER slug whenever payload.Model was empty (the restart
path). This silently wiped the operator'\''s explicit per-persona MODEL
secret on every restart.

Symptom: dev-tree workspaces booted correctly on first /org/import (the
envVars map was populated direct from the persona env file with both
MODEL=MiniMax-M2.7-highspeed and MODEL_PROVIDER=minimax), then on the
next Restart the MODEL secret got clobbered to literal "minimax" — a
provider slug, not a valid model id — and the workspace template'\''s
adapter failed to match any registry prefix, fell through to providers[0]
(anthropic-oauth), and wedged at SDK initialize.

Fix: resolution order in applyRuntimeModelEnv is now:
  1. payload.Model (caller passed the canvas-picked model id verbatim)
  2. envVars["MODEL"] (workspace_secret persisted from persona env)
  3. envVars["MODEL_PROVIDER"] (legacy canvas Save+Restart shape)

Tests
-----
TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved — locks in
the new resolution order with four cases:
  - MODEL secret wins over MODEL_PROVIDER slug (persona-env shape)
  - MODEL secret wins even when same as MODEL_PROVIDER
  - MODEL absent → fall back to MODEL_PROVIDER (legacy shape)
  - Both absent → no MODEL set (no-op)

Existing TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes
continues to pass — fix is strictly additive on the precedence chain.
2026-05-08 14:31:14 -07:00
9b5e89bb42 Merge pull request 'feat(org-import): add spawning:false field to skip workspace + descendants' (#135) from feat/org-import-spawning-false into main
Some checks are pending
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
publish-workspace-server-image / build-and-push (push) Waiting to run
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 21s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 23s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 21s
CI / Detect changes (push) Successful in 28s
Block internal-flavored paths / Block forbidden paths (push) Successful in 35s
Handlers Postgres Integration / detect-changes (push) Successful in 29s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 33s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 31s
E2E API Smoke Test / detect-changes (push) Successful in 1m5s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m1s
Harness Replays / detect-changes (push) Successful in 1m4s
CI / Shellcheck (E2E scripts) (push) Successful in 11s
CI / Canvas (Next.js) (push) Successful in 17s
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m15s
CI / Python Lint & Test (push) Successful in 1m56s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 2m27s
Harness Replays / Harness Replays (push) Successful in 3m0s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5m46s
CI / Platform (Go) (push) Successful in 8m23s
2026-05-08 21:20:56 +00:00
claude-ceo-assistant
b91da1ab77 feat(org-import): add spawning:false field to skip workspace + descendants
Some checks failed
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 11s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 24s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 36s
cascade-list-drift-gate / check (pull_request) Successful in 35s
E2E API Smoke Test / detect-changes (pull_request) Successful in 36s
CI / Detect changes (pull_request) Successful in 39s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 27s
branch-protection drift check / Branch protection drift (pull_request) Successful in 45s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 47s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 37s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 58s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 57s
Harness Replays / detect-changes (pull_request) Successful in 50s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 29s
CI / Python Lint & Test (pull_request) Successful in 33s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 56s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 30s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 2m5s
Harness Replays / Harness Replays (pull_request) Failing after 1m37s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4m54s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6m49s
CI / Platform (Go) (pull_request) Successful in 9m13s
CI / Canvas (Next.js) (pull_request) Failing after 11m30s
Lets a workspace declare it (and its entire subtree) should be skipped
during /org/import. Pointer-typed `*bool` so we distinguish "explicitly
false" from "unset" (default = spawn).

## Use case

The dev-tree org template ships the full role taxonomy (Dev Lead with
Core Platform / Controlplane / App & Docs / Infra / SDK Leads, each with
their own engineering / QA / security / UI-UX children — 27 personas
total in a single import). Some setups need a smaller set:

- Local dev on a memory-constrained machine
- Demo / smoke runs that don't need the full org breathing
- Customer trials starting with leadership-only before fan-out

Pre-fix the only options were:
- Edit the canonical template (mutates shared state)
- Author a parallel slimmer template (duplicates structure)
- Manual workspace deprovision after full import (wasteful — already paid
  the docker pull / build cost)

`spawning: false` is the per-workspace knob that solves this without
touching the canonical template structure.

## Semantics

- Unset: workspace spawns (current behaviour, no migration)
- `spawning: true`: explicitly spawns (same as unset)
- `spawning: false`: workspace is skipped AND every descendant is
  skipped. The guard sits BEFORE any side effect in
  createWorkspaceTree — no DB row, no docker provision, no children
  recursion. A false-spawning subtree is genuinely a no-op except for
  the log line. countWorkspaces still counts the subtree (so /org/templates
  numbers reflect the full structure).

## Stage A — verified

Local dev-only template that wraps teams/dev.yaml (Dev Lead) with
children:[] cleared on the 5 sub-team yaml files, plus 3 floater
personas (Release Manager / Integration Tester / Fullstack Engineer).
/org/import returned 9 workspaces. Drop-in: same result via
`spawning: false` on each sub-tree root in the future.

## Stage B — N/A

Pure additive feature on the org-template handler. No SaaS deploy chain
implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:20:14 -07:00
aea6109602 Merge pull request 'fix(org-import): use ws.FilesDir as persona-dir lookup + docker-cli-buildx in dev image' (#134) from fix/org-import-persona-env-files-dir into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 4s
Block internal-flavored paths / Block forbidden paths (push) Successful in 29s
E2E API Smoke Test / detect-changes (push) Successful in 21s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 22s
Handlers Postgres Integration / detect-changes (push) Successful in 22s
Harness Replays / detect-changes (push) Successful in 27s
CI / Detect changes (push) Successful in 48s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 24s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 27s
CI / Canvas (Next.js) (push) Successful in 15s
CI / Shellcheck (E2E scripts) (push) Successful in 12s
CI / Canvas Deploy Reminder (push) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (push) Failing after 1m12s
Harness Replays / Harness Replays (push) Failing after 36s
CI / Platform (Go) (push) Failing after 1m3s
CI / Python Lint & Test (push) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 1m24s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
publish-workspace-server-image / build-and-push (push) Successful in 4m22s
2026-05-08 20:51:47 +00:00
claude-ceo-assistant
c3596d6271 fix(org-import): use ws.FilesDir as persona-dir lookup, add docker-cli-buildx to dev image
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 20s
branch-protection drift check / Branch protection drift (pull_request) Successful in 23s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 23s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 28s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 28s
E2E API Smoke Test / detect-changes (pull_request) Successful in 30s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 24s
Harness Replays / detect-changes (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
Harness Replays / Harness Replays (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 52s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 2m5s
CI / Platform (Go) (pull_request) Failing after 1m46s
CI / Canvas (Next.js) (pull_request) Failing after 1m49s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m16s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
## org_import.go — persona env injection root-cause fix

The Phase-3 fix from earlier today (`feedback/per-agent-gitea-identity-default`)
introduced loadPersonaEnvFile to inject persona-specific creds into
workspace_secrets on /org/import. It passed `ws.Role` as the persona-dir
lookup key, but in our dev-tree org.yaml shape `role:` carries the
multi-line descriptive text the agent reads from its prompt
("Engineering planning and team coordination — leads Core Platform,
Controlplane, ..."), while `files_dir:` holds the short slug
(`core-lead`, `dev-lead`, etc.) matching
`~/.molecule-ai/personas/<files_dir>/env`.

isSafeRoleName silently rejected the multi-word role text → no persona
env loaded → every imported workspace booted with zero
workspace_secrets rows → no ANTHROPIC / CLAUDE_CODE / MINIMAX auth in
the container env → claude_agent_sdk wedged on `query.initialize()`
with a 60s control-request timeout.

After the fix, /org/import on the dev tree (27 personas) populates
8 workspace_secrets per workspace (Gitea identity + MODEL/MODEL_PROVIDER
+ provider-specific token), 5 of 6 leads boot online, and the
remaining wedges trace to a separate runtime-template-repo bug
(workspace-template-claude-code's claude_sdk_executor.py doesn't
dispatch on MODEL_PROVIDER=minimax — filed separately).

## Dockerfile.dev — docker-cli + docker-cli-buildx

Without these, every claude-code/tier-2 workspace POST fails-fast:
- docker-cli alone produces `exec: "docker": executable file not found`
- docker-cli alone (no buildx) fails on `docker build` with
  `ERROR: BuildKit is enabled but the buildx component is missing or broken`

Both packages are now installed in the dev image; verified with
`docker exec molecule-core-platform-1 docker buildx version`.

## Stage A verified

Local /org/import dev-only path: 27 workspaces created, all 27 receive
persona env injection (8 secrets each — Gitea identity + provider creds).
Lead workspaces (claude-code-OAuth tier) boot online.

## Stage B — N/A

Local-dev-only path (docker-compose.dev.yml + dev image). Tenant EC2
provisioning uses Dockerfile.tenant (untouched).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:50:46 -07:00
2fa79ea462 Merge pull request 'chore(ci): document #192 root cause — workspace-template repos public per OSS-first' (#133) from chore/192-retrigger-harness-replays-after-public-flip into main
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 1s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 1s
CI / Detect changes (push) Successful in 9s
E2E API Smoke Test / detect-changes (push) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 10s
Handlers Postgres Integration / detect-changes (push) Successful in 10s
Harness Replays / detect-changes (push) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 10s
CI / Platform (Go) (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 4s
CI / Shellcheck (E2E scripts) (push) Successful in 8s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 36s
Harness Replays / Harness Replays (push) Successful in 52s
publish-workspace-server-image / build-and-push (push) Successful in 2m2s
2026-05-08 19:12:54 +00:00
claude-ceo-assistant
15935143c8 chore(manifest): drop reno-stars + 5 org-templates flipped public; document OSS-surface contract
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
cascade-list-drift-gate / check (pull_request) Successful in 7s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 7s
branch-protection drift check / Branch protection drift (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 36s
Harness Replays / Harness Replays (pull_request) Successful in 49s
CI / Canvas (Next.js) (pull_request) Successful in 1m31s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Follow-up to the workspace-template visibility flip in 558e4fee. After
flipping the 5 private workspace-templates public (#192 root cause),
the harness-replays clone moved one step deeper to the org-templates
list, where 6 of 7 were also private. Hongming-confirmed flip plan:

- 5 of 6 (molecule-dev, free-beats-all, medo-smoke, molecule-worker-gemini,
  ux-ab-lab) — flipped public per `feedback_oss_first_repo_visibility_default`.
  These are unambiguously OSS-template-shape: generic README, no
  customer-shaped names, no creds in content.
- 1 of 6 (reno-stars) — name itself is customer-shaped (would expose
  customer/tenant identity). Kept private; removed from manifest.json
  per Hongming. Will be handled at provision-time via the per-tenant
  credential resolver designed in internal#102 (Layer-3 RFC).

Documents the OSS-surface contract in two places:
- manifest.json _comment: every entry MUST be public; Layer-3 lives elsewhere
- clone-manifest.sh comment block: rationale + the explicit ci-readonly
  team-grant escape hatch (review-gated, not default).

Closes the second clone-fail layer of #192. Combined with 558e4fee +
the workspace-template visibility flips, the Pre-clone manifest deps
step should now succeed anonymously for the full registered set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:58:09 -07:00
claude-ceo-assistant
558e4fee48 chore(ci): document #192 root cause — workspace-template repos public per OSS-first
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
branch-protection drift check / Branch protection drift (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 5s
Harness Replays / Harness Replays (pull_request) Failing after 7s
CI / Canvas (Next.js) (pull_request) Successful in 1m37s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
5 of 9 workspace-template repos (openclaw, codex, crewai, deepagents,
gemini-cli) had been marked private with no team grant for AUTO_SYNC_TOKEN
bearer (devops-engineer persona). Pre-clone manifest deps step 404'd on
the first private repo encountered, failing every Harness Replays run.

Resolution path taken:
1. Flipped the 5 to public per `feedback_oss_first_repo_visibility_default`
   — runtime/template/plugin repos default public; that's what makes them
   OSS surface.
2. Scoped existing `ci-readonly` org team to legitimately-internal repos
   only (compliance docs, RFCs-in-flight). Workspace templates removed
   from it.
3. Filed internal#102 RFC for Layer-3 (customer-owned + marketplace
   third-party private repos) — that's a different shape entirely;
   needs per-tenant credential-resolver, not org-team grants.

This commit is a documentation-only touch on the workflow file to (a)
record the root cause inline next to the existing pre-clone-fail
narrative, (b) trigger a fresh Harness Replays run that should now pass
the clone step.

Closes #192.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:50:55 -07:00
8e4169cfac Merge pull request 'feat(local-dev): containerize platform + canvas stack via docker-compose' (#131) from feat/126-containerize-local-platform-stack into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 2s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
CI / Detect changes (push) Successful in 8s
E2E API Smoke Test / detect-changes (push) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
publish-workspace-server-image / build-and-push (push) Failing after 8s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
Harness Replays / detect-changes (push) Successful in 9s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3s
CI / Python Lint & Test (push) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 4s
Harness Replays / Harness Replays (push) Failing after 7s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 1m35s
CI / Canvas (Next.js) (push) Successful in 2m21s
CI / Canvas Deploy Reminder (push) Failing after 1s
CI / Platform (Go) (push) Successful in 2m50s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 4m24s
2026-05-08 18:38:32 +00:00
bce60f1b22 Merge pull request 'fix(canvas): consolidate platform-auth headers via shared helper (#178)' (#54) from fix/178-canvas-shared-auth-headers into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 0s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
CI / Detect changes (push) Successful in 8s
E2E API Smoke Test / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 8s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
publish-workspace-server-image / build-and-push (push) Failing after 7s
Harness Replays / detect-changes (push) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
CI / Platform (Go) (push) Successful in 4s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 2s
Harness Replays / Harness Replays (push) Failing after 5s
CI / Canvas (Next.js) (push) Successful in 1m32s
CI / Canvas Deploy Reminder (push) Failing after 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Has been cancelled
2026-05-08 18:35:58 +00:00
c6f41198f7 Merge pull request 'chore(canary): workflow_dispatch input keep_on_failure for log capture' (#132) from chore/canary-keep-on-failure-input into main
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 1s
Block internal-flavored paths / Block forbidden paths (push) Successful in 4s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 4s
CI / Detect changes (push) Successful in 7s
Handlers Postgres Integration / detect-changes (push) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
E2E API Smoke Test / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 7s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
CI / Platform (Go) (push) Successful in 3s
CI / Python Lint & Test (push) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 3s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 4s
2026-05-08 17:59:10 +00:00
dev-lead
5c0c15eb4f chore(canary): workflow_dispatch input keep_on_failure for log capture
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 4s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 10s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 2s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 11s
branch-protection drift check / Branch protection drift (pull_request) Successful in 14s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Investigating molecule-core#129 failure mode #1 (claude-code "Agent
error (Exception)") needs the workspace's docker logs to find the
actual exception. The canary tears down the tenant on every failure,
so the workspace container is destroyed before anyone can SSM in.

Add a workflow_dispatch input `keep_on_failure: bool` (default false).
When true, sets `E2E_KEEP_ORG=1` for the canary script — its existing
debug path skips teardown, leaving the tenant + EC2 + CF tunnel + DNS
alive. Operator can then SSM into the workspace EC2 (via the same
flow as recover-tunnels.py) and capture `docker logs` from the
claude-code container.

Cron-triggered runs never set the input (it only exists on dispatch),
so unattended scheduled canaries always tear down — no risk of
unattended cost leak.

Operator workflow:
  1. Dispatch canary-staging.yml with keep_on_failure=true
  2. Watch CI; on failure (likely, given the 38h chronic red),
     note the SLUG / TENANT_URL printed at step 1/11
  3. SSM exec into the workspace EC2 (us-east-2) and run
     `docker logs <claude-code-container>` to find the actual
     exception traceback
  4. Manually delete via DELETE /cp/admin/tenants/<slug> when done
     (the script logs this reminder on E2E_KEEP_ORG=1 path)

Refs: molecule-core#129 (canary investigation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:58:19 -07:00
claude-ceo-assistant
7eda8f510f feat(local-dev): containerize platform + canvas stack via docker-compose (closes #126)
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 0s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
Harness Replays / Harness Replays (pull_request) Failing after 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 51s
CI / Canvas (Next.js) (pull_request) Successful in 2m5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 2m31s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4m22s
Replaces the legacy nohup `go run ./cmd/server` setup with a fully
containerized local stack: postgres + redis + platform + canvas, all
with `restart: unless-stopped` so they survive Mac sleep/wake and
Docker Desktop daemon restarts.

## Changes

- **docker-compose.yml**
  - `restart: unless-stopped` on platform/postgres/redis
  - `BIND_ADDR=0.0.0.0` for platform — the dev-mode-fail-open default
    of 127.0.0.1 (PR #7) made the host unable to reach the container
    even with port mapping. Container netns is already isolated, so
    binding all interfaces inside is safe.
  - Healthchecks switched from `wget --spider` (HEAD → 404 forever
    because /health is GET-only) to `wget -qO /dev/null` (GET).
    Same regression existed on canvas; fixed both.

- **workspace-server/Dockerfile.dev**
  - `CGO_ENABLED=1` → `0` to match prod Dockerfile + Dockerfile.tenant.
    Without this, the alpine dev image fails with "gcc: not found"
    because workspace-server has no actual cgo deps but the env was
    forcing the cgo build path. Closes a divergence introduced in
    9d50a6da (today's air hot-reload PR).

- **canvas/Dockerfile**
  - `npm install` → `npm ci --include=optional` for lockfile-exact
    installs that include platform-specific @tailwindcss/oxide native
    binaries. Without these, `next build` fails with "Cannot read
    properties of undefined (reading 'All')" on the
    `@import "tailwindcss"` directive.

- **canvas/.dockerignore** (new)
  - Excludes `node_modules` and `.next` so the Dockerfile's
    `COPY . .` step doesn't clobber the freshly-installed container
    node_modules with the host's (potentially stale or wrong-arch)
    copy. This was the actual root cause of the canvas build break.

- **workspace-server/.gitignore**
  - Adds `/tmp/` for air's live-reload build cache.

## Stage A verified

```
container          status                    restart
postgres-1         Up (healthy)              unless-stopped
redis-1            Up (healthy)              unless-stopped
platform-1         Up (healthy, air-mode)    unless-stopped
canvas-1           Up (healthy)              unless-stopped

GET :8080/health  → 200
GET :3000/        → 200
DB preserved:     407 workspace rows + 5 named personas
Persona mount:    28 dirs at /etc/molecule-bootstrap/personas
```

## Stage B — N/A

This is local-dev infrastructure only. None of these files ship to
SaaS tenants — production EC2s use `Dockerfile.tenant` + `ec2.go`
user-data, not docker-compose.

## Out of scope

- The decorative-but-broken `wget --spider` healthcheck has presumably
  also been silently 404'ing on prod tenants. Ship a follow-up to
  audit + fix the prod path; not done here to keep the PR scoped.
- Docker Desktop "Start at login" is a per-machine GUI setting that
  must be toggled manually (Settings → General).
- The legacy heartbeat-all.sh that pinged 5 persona workspaces from
  the host has been deleted (~/.molecule-ai/heartbeat-all.sh).
  Per Hongming: each workspace is responsible for its own heartbeat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:53:39 -07:00
44bb35f2a8 Merge pull request 'fix(ci): canary alerting — drop Gitea-incompatible actions API call' (#130) from fix/canary-staging-gitea-compat-alerting into main
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 0s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 2s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 5s
CI / Detect changes (push) Successful in 7s
E2E API Smoke Test / detect-changes (push) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 7s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
CI / Platform (Go) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 3s
CI / Python Lint & Test (push) Successful in 3s
CI / Canvas Deploy Reminder (push) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 26s
2026-05-08 17:52:48 +00:00
dev-lead
42ff6be15c fix(ci): canary alerting — drop Gitea-incompatible actions API call
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
branch-protection drift check / Branch protection drift (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
The "Open issue on failure" step was failing on every canary run
because Gitea 1.22.6 doesn't expose /api/v1/actions endpoints
(per memory reference_gitea_actions_log_fetch). The threshold check
called github.rest.actions.listWorkflowRuns() to count consecutive
prior failures and gate issue creation behind 3 reds — that call
ALWAYS 404'd on Gitea, breaking the entire alerting step.

Net effect: the canary's own self-alerting was broken, so the
underlying staging regression went unflagged for 38h+
(2026-05-07 02:30 UTC → 2026-05-08 17:34 UTC, every cron tick red,
zero issues filed).

Fix: drop the consecutive-failures threshold entirely. File a
sticky issue on the FIRST failure; comment-on-existing handles
deduplication for subsequent failures. The auto-close-on-success
step is unchanged.

Why not a Gitea-compatible threshold (e.g., walk recent commit
statuses): comment-on-existing already gives ops a single
accumulating issue per regression streak. The threshold's purpose
was to avoid spamming on transient flakes — but with sticky issue
+ auto-close-on-green, transient flakes get one issue + one quick
close, which is fine signal. Filing on first failure is also
better UX: catches the regression in 30 min instead of 90 min.

Also: rewrote runURL from hardcoded https://github.com/... to
context.serverUrl so the link actually points at Gitea
(https://git.moleculesai.app) — was always broken on Gitea but
nobody noticed because the issue-filing step itself was broken.

Net: 21 insertions, 40 deletions. Removes WORKFLOW_PATH +
CONSECUTIVE_THRESHOLD env vars (no longer needed).

Tracked in: molecule-core#129 (failure mode 3 of 3)
Verification: yaml syntax-valid; no remaining github.rest.actions.*
calls; only github.rest.issues.* (all Gitea-supported per
memory feedback_persona_token_v2_scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:52:09 -07:00
32773fd566 Merge pull request 'feat(local-dev): bind-mount ~/.molecule-ai/personas into platform container' (#127) from feat/persona-bind-mount-local-dev into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 3s
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
CI / Detect changes (push) Successful in 12s
E2E API Smoke Test / detect-changes (push) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 10s
Handlers Postgres Integration / detect-changes (push) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 12s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Platform (Go) (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 4s
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 6s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 4m49s
2026-05-08 16:53:05 +00:00
claude-ceo-assistant
d72f21da09 feat(local-dev): bind-mount ~/.molecule-ai/personas into platform container
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Closes core#242 LOCAL surface. The PROD surface (CP user-data fetching
persona env files into tenant EC2's /etc/molecule-bootstrap/personas
via Secrets Manager) is filed as a follow-up.

WHAT THIS ADDS
  Bind-mount on the platform service in docker-compose.yml:
    ${MOLECULE_PERSONA_ROOT_HOST:-${HOME}/.molecule-ai/personas}
      → /etc/molecule-bootstrap/personas (read-only)

  Default source = ${HOME}/.molecule-ai/personas (the operator-host-mirrored
  local dir populated by today's persona rotation work). Override via
  MOLECULE_PERSONA_ROOT_HOST when running on a machine with a different
  layout (CI runners, etc.).

WHY READ-ONLY
  workspace-server only reads persona env files; never writes back. The
  read-only mount enforces that contract — a hostile plugin install path
  can't tamper with the persona credentials it's about to consume.

WHY THIS PATH MATCHES PROD
  /etc/molecule-bootstrap/personas is the same in-container path the
  prod tenant EC2 will use. Same code path (org_import.go::loadPersonaEnvFile)
  reads the same file regardless of mode — local-dev parity with prod
  per feedback_local_must_mimic_production.

STAGE A VERIFICATION
  - docker compose config: resolves to /Users/hongming/.molecule-ai/personas
    correctly (28 persona dirs visible at source path)
  - Persona env file shape verified: dev-lead's env contains GITEA_USER,
    GITEA_USER_EMAIL, GITEA_TOKEN_SCOPES, GITEA_SSH_KEY_PATH,
    MODEL_PROVIDER=claude-code, MODEL=opus (lead tier matches Hongming's
    2026-05-08 mapping)
  - Full handler test suite green (TestLoadPersonaEnvFile_HappyPath +
    7 sibling tests pass; rejection tests still catch path traversal)
  - Build clean

STAGE B SKIPPED (with justification per § Skip conditions)
  This change is config-only (docker-compose.yml volume addition). The
  prod tenant EC2s do NOT use docker-compose.yml — they use CP user-data
  + ec2.go's docker run script. So this PR has no prod blast radius.
  Stage B (staging tenant probe) would be checking 'is the platform
  using the new compose mount' on a SaaS tenant — and SaaS tenants
  don't run docker compose. The actual prod-surface change is the
  follow-up issue.

PROD SURFACE — FOLLOW-UP FILED
  Tenant EC2 user-data needs to fetch persona env files from operator
  host (or AWS Secrets Manager per the established
  feedback_unified_credentials_file pattern) and stage them at
  /etc/molecule-bootstrap/personas inside the workspace-server container.
  Touches molecule-controlplane/internal/provisioner/ec2.go user-data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:52:43 -07:00
cc28cc6607 Merge pull request 'feat(workspaces): update_tier column for canary vs production fan-out' (#124) from feat/canary-tier-filter into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 19s
Block internal-flavored paths / Block forbidden paths (push) Successful in 35s
CI / Detect changes (push) Successful in 34s
E2E API Smoke Test / detect-changes (push) Successful in 17s
Handlers Postgres Integration / detect-changes (push) Successful in 25s
publish-workspace-server-image / build-and-push (push) Failing after 25s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 26s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 29s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 28s
Harness Replays / detect-changes (push) Successful in 29s
CI / Canvas (Next.js) (push) Successful in 10s
CI / Shellcheck (E2E scripts) (push) Successful in 8s
CI / Python Lint & Test (push) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 8s
Harness Replays / Harness Replays (push) Failing after 22s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m35s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7m31s
CI / Platform (Go) (push) Successful in 13m59s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 3m56s
2026-05-08 15:55:42 +00:00
claude-ceo-assistant
120b3a25aa feat(workspaces): update_tier column for canary vs production fan-out
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 4s
Check migration collisions / Migration version collision check (pull_request) Successful in 29s
CI / Detect changes (pull_request) Successful in 1m3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 39s
E2E API Smoke Test / detect-changes (pull_request) Successful in 47s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 34s
Harness Replays / detect-changes (pull_request) Successful in 38s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 38s
CI / Canvas (Next.js) (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 15s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 35s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 21s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6m49s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7m55s
CI / Platform (Go) (pull_request) Successful in 14m6s
Closes core#115 partial. Schema-only change; the apply-endpoint filter
logic that reads this column lands with core#123 (drift detector +
queue + apply endpoint, the deferred follow-up of core#113).

Default 'production' so existing customers (Reno-Stars + any future
tenant) are default-safe. Synthetic dogfooding workspaces opt INTO
'canary' explicitly.

CHECK constraint pins the closed value set ('canary' | 'production') —
the apply endpoint's filter relies on the database to reject anything
else, so a future operator typo in PATCH /workspaces/:id ({update_tier:
'canery'}) returns a constraint violation, not silent fan-out to
nobody.

Partial index on canary rows since the apply-endpoint query path
('apply this update only to canary tier first') hits canary much more
often than production, and the production set is the much larger
default.

WHAT THIS DOES NOT DO (lands with core#123)
  - PATCH endpoint to flip a workspace to canary
  - The apply endpoint that consults the column
  - Tests that exercise canary-vs-production fan-out

Schema-only foundation; same pattern as core#113 (workspace_plugins).

PHASE 4 SELF-REVIEW
  Correctness: No finding — IF NOT EXISTS guards, DEFAULT clause means
    existing rows get 'production' on migration apply.
  Readability: No finding — comment block documents the tier semantics
    + the deferral to core#123.
  Architecture: No finding — additive ALTER, partial index for the
    expected access pattern.
  Security: No finding — no code path; column constraint reduces blast
    radius of bad PATCH input.
  Performance: No finding — partial index minimizes write amplification
    on the production-default rows.

REFS
  core#115 — this issue
  core#123 — apply endpoint follow-up (will exercise this column)
  core#113 — version subscription DB foundation (sibling pattern)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:55:19 -07:00
b7f3b270a3 Merge pull request 'feat(plugins): workspace_plugins tracking table (version-subscription foundation)' (#122) from feat/plugin-version-subscription into main
Some checks failed
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Harness Replays / Harness Replays (push) Blocked by required conditions
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 8s
Block internal-flavored paths / Block forbidden paths (push) Successful in 28s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 15s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 16s
CI / Detect changes (push) Successful in 27s
E2E API Smoke Test / detect-changes (push) Successful in 20s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 20s
Handlers Postgres Integration / detect-changes (push) Successful in 19s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 31s
publish-workspace-server-image / build-and-push (push) Failing after 31s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 33s
Harness Replays / detect-changes (push) Successful in 33s
CI / Python Lint & Test (push) Successful in 12s
CI / Canvas (Next.js) (push) Has been cancelled
CI / Shellcheck (E2E scripts) (push) Has been cancelled
CI / Canvas Deploy Reminder (push) Has been skipped
CI / Platform (Go) (push) Failing after 14m35s
2026-05-08 15:53:42 +00:00
claude-ceo-assistant
72b0d4b1ab feat(plugins): workspace_plugins tracking table — version-subscription foundation
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 14s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 35s
CI / Detect changes (pull_request) Successful in 43s
Check migration collisions / Migration version collision check (pull_request) Successful in 44s
E2E API Smoke Test / detect-changes (pull_request) Successful in 31s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 28s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 27s
Harness Replays / detect-changes (pull_request) Successful in 33s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 30s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 12s
CI / Python Lint & Test (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 29s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m20s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7m1s
CI / Platform (Go) (pull_request) Successful in 14m52s
Closes core#113 partial. Adds the DB foundation for the
version-subscription model. Drift detection + queue + admin apply
endpoint are follow-up scope (separate PR; filed as a new issue).

WHY THIS PR ONLY GETS US PART-WAY
  Plugin install state today is filesystem-only — '/configs/plugins/<name>/'
  inside the container. There's no DB record of 'plugin X installed at
  workspace W from source S, tracking ref T'. That makes drift detection
  impossible: nothing to compare upstream tags against.

  This PR adds the table + the install-endpoint hook that writes to it.
  With baseline tags now on every plugin (post internal#92), the table
  starts collecting tracked-ref values immediately on the next install.
  The actual drift-check job + queue + apply endpoint layer on top.

WHAT THIS ADDS
  workspace_plugins table:
    workspace_id   FK → workspaces(id) ON DELETE CASCADE
    plugin_name    canonical name from plugin.yaml
    source_raw     full source URL the install used
    tracked_ref    'none' | 'tag:vX.Y.Z' | 'tag:latest' | 'sha:<full>'
    installed_at, updated_at

  installRequest gains optional 'track' field (defaults to 'none').
  Install handler upserts the workspace_plugins row after delivery
  succeeds. DB write failure is logged but doesn't fail the install
  (the plugin IS in the container; surfacing 500 misleads the caller).

  validateTrackedRef enforces the closed set of accepted shapes:
    'none' | 'tag:<non-empty>' | 'sha:<non-empty>'
  Bare values like 'latest' / 'main' / version-strings without
  prefix are rejected — the drift detector keys on prefix to know
  what kind of resolution to do.

WHAT THIS DOES NOT ADD (filed separately)
  - Drift detector job (cron / on-demand) that scans
    'WHERE tracked_ref != none' rows and queues updates on upstream drift
  - plugin_update_queue table (separate migration once detector lands)
  - GET /admin/plugin-updates-pending and POST .../apply endpoints
  - Tier-aware apply (core#115 — composes here)

PHASE 4 SELF-REVIEW (FIVE-AXIS)
  Correctness: No finding — install endpoint behavior unchanged for
    callers that don't pass 'track'. DB write is best-effort + logged
    on failure. validateTrackedRef rejects ambiguous bare strings.
  Readability: No finding — separate file plugins_tracking.go isolates
    the new concern; install handler delta is a single 4-line block.
  Architecture: No finding — additive table; existing schema untouched.
    Migration 20260508160000_* uses the timestamp-prefixed convention.
  Security: No finding — INSERT params via  placeholders (no string
    interpolation). validateTrackedRef rejects unexpected shapes before
    the column constraint would.
  Performance: No finding — one extra ExecContext per install. Install
    is already seconds-scale (network fetch + tar + docker exec); rounds
    to noise.

TESTS (1 new, all green)
  TestValidateTrackedRef — pin closed set + structural validators

REFS
  core#113 — this issue (foundation only; drift+queue+apply = follow-up)
  internal#92, internal#93 — plugin/template baseline tags (now exists for tracking)
  core#114 — atomic install (this PR composes — no atomicity regression)
  core#115 — canary tier filter (will key off the same DB foundation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:52:35 -07:00
f78d844960 Merge pull request 'feat(plugins): hot-reload classifier — skip restart on SKILL-content-only updates' (#121) from feat/plugin-hot-reload-classifier into main
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 5s
Block internal-flavored paths / Block forbidden paths (push) Successful in 18s
CI / Detect changes (push) Successful in 21s
E2E API Smoke Test / detect-changes (push) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 15s
Handlers Postgres Integration / detect-changes (push) Successful in 17s
Harness Replays / detect-changes (push) Successful in 18s
publish-workspace-server-image / build-and-push (push) Failing after 20s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 19s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 21s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Canvas (Next.js) (push) Successful in 10s
CI / Canvas Deploy Reminder (push) Has been skipped
CI / Python Lint & Test (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
Harness Replays / Harness Replays (push) Failing after 19s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 1m15s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m20s
CI / Platform (Go) (push) Failing after 4m59s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 3m52s
2026-05-08 15:26:32 +00:00
3a4b62a52a Merge pull request 'chore(workflows): delete obsolete promote/sync workflows (Phase 3C of internal#81)' (#119) from chore/trunk-based-delete-obsolete-workflows into main
Some checks are pending
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 6s
Block internal-flavored paths / Block forbidden paths (push) Successful in 21s
E2E API Smoke Test / detect-changes (push) Successful in 18s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 20s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 14s
CI / Detect changes (push) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 17s
Handlers Postgres Integration / detect-changes (push) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 16s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
CI / Platform (Go) (push) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7s
CI / Python Lint & Test (push) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
2026-05-08 15:26:00 +00:00
b4eab9cef2 Merge branch 'main' into chore/trunk-based-delete-obsolete-workflows
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 7s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 24s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 24s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 18s
branch-protection drift check / Branch protection drift (pull_request) Successful in 25s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 23s
E2E API Smoke Test / detect-changes (pull_request) Successful in 29s
CI / Detect changes (pull_request) Successful in 35s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 22s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 12s
CI / Canvas (Next.js) (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Successful in 13s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
2026-05-08 15:24:55 +00:00
48a24e6b3e Merge branch 'main' into chore/trunk-based-delete-obsolete-workflows
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 9s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 4s
branch-protection drift check / Branch protection drift (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m4s
2026-05-08 15:23:05 +00:00
08e8d325e2 chore(workflows): delete obsolete promote/sync workflows (Phase 3C of internal#81)
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 4s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 9s
branch-protection drift check / Branch protection drift (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 25s
CI / Platform (Go) (pull_request) Successful in 7m2s
Trunk-based migration final cleanup for molecule-core. The 6 workflows
deleted here all existed to manage the staging↔main branch dance that
trunk-based makes obsolete:

  - auto-promote-staging.yml         fast-forward staging→main on green
  - auto-promote-on-e2e.yml          alt promote path on E2E green
  - auto-promote-stale-alarm.yml     alarm if staging promotion stalls
  - auto-sync-main-to-staging.yml    sync main→staging after UI merges
  - auto-sync-canary.yml             dry-run probe of the auto-sync
                                     token+push path
  - retarget-main-to-staging.yml     rebase open PRs onto staging

After Phase 3A (PR #108 promoted 5 staging-only feature PRs to main)
and Phase 3B (PR #109 dropped staging-branch triggers from the 4 e2e
workflows), main is the only branch the CI cares about. None of the
above workflows have anything to do; they're 1977 lines of dead Go-time-
no-Gitea-time-yes code.

Rollback: `git revert` this commit to restore the workflows. They still
work mechanically; trunk-based just doesn't need them.

The `staging` branch on the remote is deleted in a follow-up step
(`git push origin --delete staging`) after this PR merges, so reviewers
can confirm CI runs cleanly on the new shape before the ref disappears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:18:35 +00:00
b398667fce Merge branch 'main' into fix/178-canvas-shared-auth-headers
All checks were successful
Harness Replays / Harness Replays (pull_request) Successful in 2m8s
CI / Canvas (Next.js) (pull_request) Successful in 5m49s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m19s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 16s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 18s
Harness Replays / detect-changes (pull_request) Successful in 19s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
2026-05-08 02:46:41 +00:00
5c62f172f0 Merge branch 'main' into fix/178-canvas-shared-auth-headers
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 9s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 9s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 9s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 7s
CI / Detect changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Harness Replays / detect-changes (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Platform (Go) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
Harness Replays / Harness Replays (pull_request) Failing after 1m59s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5m54s
CI / Canvas (Next.js) (pull_request) Failing after 8m34s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-08 01:27:46 +00:00
7f86a245bf Merge branch 'main' into fix/178-canvas-shared-auth-headers
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 17s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 15s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 16s
Harness Replays / Harness Replays (pull_request) Failing after 1m13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m40s
CI / Canvas (Next.js) (pull_request) Failing after 8m13s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-08 00:54:16 +00:00
9c82b2a61c Merge branch 'main' into fix/178-canvas-shared-auth-headers
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 8s
CI / Detect changes (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Harness Replays / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 9s
Harness Replays / Harness Replays (pull_request) Failing after 1m20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m26s
CI / Canvas (Next.js) (pull_request) Failing after 8m35s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-08 00:20:46 +00:00
e4b1248f47 Merge branch 'main' into fix/178-canvas-shared-auth-headers
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 4s
CI / Detect changes (pull_request) Successful in 12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Harness Replays / Harness Replays (pull_request) Failing after 37s
CI / Canvas (Next.js) (pull_request) Failing after 2m54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4m25s
2026-05-07 22:24:44 +00:00
Hongming Wang
501d07b0f2 fix(canvas): consolidate platform-auth headers via shared helper (#178)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
Harness Replays / Harness Replays (pull_request) Failing after 36s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m22s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m22s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m23s
CI / Canvas (Next.js) (pull_request) Failing after 1m39s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3m55s
Closes the post-Task-#176 self-review gap: the bearer-token + tenant-
slug header construction was duplicated across 7 raw-fetch callsites
in the canvas (lib/api.ts request(), uploads.ts × 2, and 5 Attachment*
components). Each callsite read NEXT_PUBLIC_ADMIN_TOKEN, attached
Authorization: Bearer manually, computed getTenantSlug locally
(three of them inline-redefined it from /lib/tenant!), and attached
X-Molecule-Org-Slug. A new poller / raw-fetch added without going
through this exact recipe silently 401s against workspace-server when
ADMIN_TOKEN is set on the server side — the bug shape called out in
the original task.

Adds platformAuthHeaders() to lib/api.ts as the single source of truth
and routes all 7 raw-fetch callsites through it. Removes 4 duplicate
local getTenantSlug() copies (Image, Video, Audio, PDF, TextPreview)
that were inline-redefining what /lib/tenant.ts already exports.

Also preserves the AttachmentTextPreview off-platform branch — when
isPlatformAttachment() is false, headers is {} (no bearer leakage to
third-party URLs).

Tests:
- 6 unit tests in platform-auth-headers.test.ts covering: empty,
  bearer-only, slug-only, both, empty-string-as-unset, fresh-object-
  per-call. Mutation-tested: removing the bearer attach inside the
  helper fails 2 of 6 tests immediately.
- All 1389 existing canvas vitest tests pass unchanged.
- npx tsc --noEmit clean.
- npm run build succeeds (canvas Next.js build).

Per feedback_assert_exact_not_substring: tests use exact toEqual()
equality, not substring/contains, so an extra-header bug also fails
the assertion. Per feedback_oss_design_philosophy: this is the
"plugin/abstract/modular/SSOT" move applied to the auth-header
construction surface — one helper, six call sites, no duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:36:02 -07:00
69 changed files with 2662 additions and 2360 deletions

View File

@ -0,0 +1,118 @@
#!/usr/bin/env bash
# audit-force-merge — detect a §SOP-6 force-merge after PR close, emit
# `incident.force_merge` to stdout as structured JSON.
#
# Vector's docker_logs source picks up runner stdout; the JSON gets
# shipped to Loki on molecule-canonical-obs, indexable by event_type.
# Query example:
#
# {host="operator"} |= "event_type" |= "incident.force_merge" | json
#
# A force-merge is detected when a PR closed-with-merged=true had at
# least one of the repo's required-status-check contexts in a state
# other than "success" at the merge commit's SHA. That's exactly what
# the Gitea force_merge:true API call lets through, so it's a faithful
# detector of the override path.
#
# Triggers on `pull_request_target: closed` (loaded from base branch
# per §SOP-6 security model). No-op when merged=false.
#
# Required env (set by the workflow):
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER, REQUIRED_CHECKS
#
# REQUIRED_CHECKS is a newline-separated list of status-check context
# names that branch protection requires. Declared in the workflow YAML
# rather than fetched from /branch_protections (which needs admin
# scope — sop-tier-bot has read-only). Trade dynamism for simplicity:
# when the required-check set changes, update both branch protection
# AND this env. Keeping them in sync is less complexity than granting
# the audit bot admin perms on every repo.
set -euo pipefail
: "${GITEA_TOKEN:?required}"
: "${GITEA_HOST:?required}"
: "${REPO:?required}"
: "${PR_NUMBER:?required}"
: "${REQUIRED_CHECKS:?required (newline-separated context names)}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
AUTH="Authorization: token ${GITEA_TOKEN}"
# 1. Fetch the PR. If not merged, no-op.
PR=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}")
MERGED=$(echo "$PR" | jq -r '.merged // false')
if [ "$MERGED" != "true" ]; then
echo "::notice::PR #${PR_NUMBER} closed without merge — no audit emission."
exit 0
fi
MERGE_SHA=$(echo "$PR" | jq -r '.merge_commit_sha // empty')
MERGED_BY=$(echo "$PR" | jq -r '.merged_by.login // "unknown"')
TITLE=$(echo "$PR" | jq -r '.title // ""')
BASE_BRANCH=$(echo "$PR" | jq -r '.base.ref // "main"')
HEAD_SHA=$(echo "$PR" | jq -r '.head.sha // empty')
if [ -z "$MERGE_SHA" ]; then
echo "::warning::PR #${PR_NUMBER} merged=true but no merge_commit_sha — cannot evaluate force-merge."
exit 0
fi
# 2. Required status checks declared in the workflow env.
REQUIRED="$REQUIRED_CHECKS"
if [ -z "${REQUIRED//[[:space:]]/}" ]; then
echo "::notice::REQUIRED_CHECKS empty — force-merge not applicable."
exit 0
fi
# 3. Status-check state at the PR HEAD (where checks ran). The merge
# commit doesn't get its own checks; we evaluate the PR's last
# commit, which is what branch protection compared against.
STATUS=$(curl -sS -H "$AUTH" \
"${API}/repos/${OWNER}/${NAME}/commits/${HEAD_SHA}/status")
declare -A CHECK_STATE
while IFS=$'\t' read -r ctx state; do
[ -n "$ctx" ] && CHECK_STATE[$ctx]="$state"
done < <(echo "$STATUS" | jq -r '.statuses // [] | .[] | "\(.context)\t\(.status)"')
# 4. For each required check, was it green at merge? YAML block scalars
# (`|`) leave a trailing newline; skip blank/whitespace-only lines.
FAILED_CHECKS=()
while IFS= read -r req; do
trimmed="${req#"${req%%[![:space:]]*}"}" # ltrim
trimmed="${trimmed%"${trimmed##*[![:space:]]}"}" # rtrim
[ -z "$trimmed" ] && continue
state="${CHECK_STATE[$trimmed]:-missing}"
if [ "$state" != "success" ]; then
FAILED_CHECKS+=("${trimmed}=${state}")
fi
done <<< "$REQUIRED"
if [ "${#FAILED_CHECKS[@]}" -eq 0 ]; then
echo "::notice::PR #${PR_NUMBER} merged with all required checks green — not a force-merge."
exit 0
fi
# 5. Emit structured audit event.
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)
FAILED_JSON=$(printf '%s\n' "${FAILED_CHECKS[@]}" | jq -R . | jq -s .)
# Print as a single-line JSON so Vector's parse_json transform can pick
# it up cleanly from docker_logs.
jq -nc \
--arg event_type "incident.force_merge" \
--arg ts "$NOW" \
--arg repo "$REPO" \
--argjson pr "$PR_NUMBER" \
--arg title "$TITLE" \
--arg base "$BASE_BRANCH" \
--arg merged_by "$MERGED_BY" \
--arg merge_sha "$MERGE_SHA" \
--argjson failed_checks "$FAILED_JSON" \
'{event_type: $event_type, ts: $ts, repo: $repo, pr: $pr, title: $title,
base_branch: $base, merged_by: $merged_by, merge_sha: $merge_sha,
failed_checks: $failed_checks}'
echo "::warning::FORCE-MERGE detected on PR #${PR_NUMBER} by ${MERGED_BY}: ${#FAILED_CHECKS[@]} required check(s) not green at merge time."

149
.gitea/scripts/sop-tier-check.sh Executable file
View File

@ -0,0 +1,149 @@
#!/usr/bin/env bash
# sop-tier-check — verify a Gitea PR satisfies the §SOP-6 approval gate.
#
# Reads the PR's tier label, walks approving reviewers, and checks each
# approver's Gitea team membership against the tier's eligible-team set.
# Marks pass only when at least one non-author approver is in an eligible
# team.
#
# Invoked from `.gitea/workflows/sop-tier-check.yml`. The workflow sets
# the env vars below; this script does no IO outside of stdout/stderr +
# the Gitea API.
#
# Required env:
# GITEA_TOKEN — bot PAT with read:organization,read:user,
# read:issue,read:repository scopes
# GITEA_HOST — e.g. git.moleculesai.app
# REPO — owner/name (from github.repository)
# PR_NUMBER — int (from github.event.pull_request.number)
# PR_AUTHOR — login (from github.event.pull_request.user.login)
#
# Optional:
# SOP_DEBUG=1 — print per-API-call diagnostic lines (HTTP codes,
# raw response bodies). Default: off.
#
# Stale-status caveat: Gitea Actions does not always re-fire workflows
# on `labeled` / `pull_request_review:submitted` events. If the
# sop-tier-check status is stale (e.g. red after labels/approvals were
# added), push an empty commit to the PR branch to force a synchronize
# event, OR re-request reviews. Tracked: internal#46.
set -euo pipefail
debug() {
if [ "${SOP_DEBUG:-}" = "1" ]; then
echo " [debug] $*" >&2
fi
}
# Validate env
: "${GITEA_TOKEN:?GITEA_TOKEN required}"
: "${GITEA_HOST:?GITEA_HOST required}"
: "${REPO:?REPO required (owner/name)}"
: "${PR_NUMBER:?PR_NUMBER required}"
: "${PR_AUTHOR:?PR_AUTHOR required}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
AUTH="Authorization: token ${GITEA_TOKEN}"
echo "::notice::tier-check start: repo=$OWNER/$NAME pr=$PR_NUMBER author=$PR_AUTHOR"
# Sanity: token resolves to a user
WHOAMI=$(curl -sS -H "$AUTH" "${API}/user" | jq -r '.login // ""')
if [ -z "$WHOAMI" ]; then
echo "::error::GITEA_TOKEN cannot resolve a user via /api/v1/user — check the token scope and that the secret is wired correctly."
exit 1
fi
echo "::notice::token resolves to user: $WHOAMI"
# 1. Read tier label
LABELS=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/issues/${PR_NUMBER}/labels" | jq -r '.[].name')
TIER=""
for L in $LABELS; do
case "$L" in
tier:low|tier:medium|tier:high)
if [ -n "$TIER" ]; then
echo "::error::Multiple tier labels: $TIER + $L. Apply exactly one."
exit 1
fi
TIER="$L"
;;
esac
done
if [ -z "$TIER" ]; then
echo "::error::PR has no tier:low|tier:medium|tier:high label. Apply one before merge."
exit 1
fi
debug "tier=$TIER"
# 2. Tier → eligible teams
case "$TIER" in
tier:low) ELIGIBLE="engineers managers ceo" ;;
tier:medium) ELIGIBLE="managers ceo" ;;
tier:high) ELIGIBLE="ceo" ;;
esac
debug "eligible_teams=$ELIGIBLE"
# Resolve team-name → team-id once. /orgs/{org}/teams/{slug}/... endpoints
# don't exist on Gitea 1.22; we have to use /teams/{id}.
ORG_TEAMS_FILE=$(mktemp)
trap 'rm -f "$ORG_TEAMS_FILE"' EXIT
HTTP_CODE=$(curl -sS -o "$ORG_TEAMS_FILE" -w '%{http_code}' -H "$AUTH" \
"${API}/orgs/${OWNER}/teams")
debug "teams-list HTTP=$HTTP_CODE size=$(wc -c <"$ORG_TEAMS_FILE")"
if [ "${SOP_DEBUG:-}" = "1" ]; then
echo " [debug] teams-list body (first 300 chars):" >&2
head -c 300 "$ORG_TEAMS_FILE" >&2; echo >&2
fi
if [ "$HTTP_CODE" != "200" ]; then
echo "::error::GET /orgs/${OWNER}/teams returned HTTP $HTTP_CODE — token likely lacks read:org scope. Add a SOP_TIER_CHECK_TOKEN secret with read:organization scope at the org level."
exit 1
fi
declare -A TEAM_ID
for T in $ELIGIBLE; do
ID=$(jq -r --arg t "$T" '.[] | select(.name==$t) | .id' <"$ORG_TEAMS_FILE" | head -1)
if [ -z "$ID" ] || [ "$ID" = "null" ]; then
VISIBLE=$(jq -r '.[]?.name? // empty' <"$ORG_TEAMS_FILE" 2>/dev/null | tr '\n' ' ')
echo "::error::Team \"$T\" not found in org $OWNER. Teams visible: $VISIBLE"
exit 1
fi
TEAM_ID[$T]="$ID"
debug "team-id: $T$ID"
done
# 3. Read approving reviewers
REVIEWS=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}/reviews")
APPROVERS=$(echo "$REVIEWS" | jq -r '[.[] | select(.state=="APPROVED") | .user.login] | unique | .[]')
if [ -z "$APPROVERS" ]; then
echo "::error::No approving reviews. Tier $TIER requires approval from {$ELIGIBLE} (non-author)."
exit 1
fi
debug "approvers: $(echo "$APPROVERS" | tr '\n' ' ')"
# 4. For each approver: check non-author + team membership (by id)
OK=""
for U in $APPROVERS; do
if [ "$U" = "$PR_AUTHOR" ]; then
debug "skip self-review by $U"
continue
fi
for T in $ELIGIBLE; do
ID="${TEAM_ID[$T]}"
CODE=$(curl -sS -o /dev/null -w '%{http_code}' -H "$AUTH" \
"${API}/teams/${ID}/members/${U}")
debug "probe: $U in team $T (id=$ID) → HTTP $CODE"
if [ "$CODE" = "200" ] || [ "$CODE" = "204" ]; then
echo "::notice::approver $U is in team $T (eligible for $TIER)"
OK="yes"
break
fi
done
[ -n "$OK" ] && break
done
if [ -z "$OK" ]; then
echo "::error::Tier $TIER requires approval from a non-author member of {$ELIGIBLE}. Got approvers: $APPROVERS — none of them satisfied team membership. Set SOP_DEBUG=1 to see per-probe HTTP codes."
exit 1
fi
echo "::notice::sop-tier-check passed: $TIER, approver in {$ELIGIBLE}"

View File

@ -0,0 +1,58 @@
# audit-force-merge — emit `incident.force_merge` to runner stdout when
# a PR is merged with required-status-checks not green. Vector picks
# the JSON line off docker_logs and ships to Loki on
# molecule-canonical-obs (per `reference_obs_stack_phase1`); query as:
#
# {host="operator"} |= "event_type" |= "incident.force_merge" | json
#
# Closes the §SOP-6 audit gap (the doc says force-merges write to
# `structure_events`, but that table lives in the platform DB, not
# Gitea-side; Loki is the practical equivalent for Gitea Actions
# events). When the credential / observability stack converges later,
# this can sync into structure_events from Loki via a backfill job —
# the structured JSON shape is forward-compatible.
#
# Logic in `.gitea/scripts/audit-force-merge.sh` per the same script-
# extract pattern as sop-tier-check.
name: audit-force-merge
# pull_request_target loads from the base branch — same security model
# as sop-tier-check. Without this, an attacker could rewrite the
# workflow on a PR and skip the audit emission for their own
# force-merge. See `.gitea/workflows/sop-tier-check.yml` for the full
# rationale.
on:
pull_request_target:
types: [closed]
jobs:
audit:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
# Skip when PR is closed without merge — saves a runner.
if: github.event.pull_request.merged == true
steps:
- name: Check out base branch (for the script)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ github.event.pull_request.base.sha }}
- name: Detect force-merge + emit audit event
env:
# Same org-level secret the sop-tier-check workflow uses.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
# Required-status-check contexts to evaluate at merge time.
# Newline-separated. Mirror this against branch protection
# (settings → branches → protected branch → required checks).
# Declared here rather than fetched from /branch_protections
# because that endpoint requires admin write — sop-tier-bot is
# read-only by design (least-privilege).
REQUIRED_CHECKS: |
sop-tier-check / tier-check (pull_request)
Secret scan / Scan diff for credential-shaped strings (pull_request)
run: bash .gitea/scripts/audit-force-merge.sh

View File

@ -0,0 +1,191 @@
name: Secret scan
# Hard CI gate. Refuses any PR / push whose diff additions contain a
# recognisable credential. Defense-in-depth for the #2090-class incident
# (2026-04-24): GitHub's hosted Copilot Coding Agent leaked a ghs_*
# installation token into tenant-proxy/package.json via `npm init`
# slurping the URL from a token-embedded origin remote. We can't fix
# upstream's clone hygiene, so we gate here.
#
# Same regex set as the runtime's bundled pre-commit hook
# (molecule-ai-workspace-runtime: molecule_runtime/scripts/pre-commit-checks.sh).
# Keep the two sides aligned when adding patterns.
#
# Ported from .github/workflows/secret-scan.yml so the gate actually
# fires on Gitea Actions. Differences from the GitHub version:
# - drops `merge_group` event (Gitea has no merge queue)
# - drops `workflow_call` (no cross-repo reusable invocation on Gitea)
# - SELF path updated to .gitea/workflows/secret-scan.yml
# The job name + step name are identical to the GitHub workflow so the
# status-check context (`Secret scan / Scan diff for credential-shaped
# strings (pull_request)`) matches branch protection on molecule-core/main.
on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main, staging]
jobs:
scan:
name: Scan diff for credential-shaped strings
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 2 # need previous commit to diff against on push events
# For pull_request events the diff base may be many commits behind
# HEAD and absent from the shallow clone. Fetch it explicitly.
- name: Fetch PR base SHA (pull_request events only)
if: github.event_name == 'pull_request'
run: git fetch --depth=1 origin ${{ github.event.pull_request.base.sha }}
- name: Refuse if credential-shaped strings appear in diff additions
env:
# Plumb event-specific SHAs through env so the script doesn't
# need conditional `${{ ... }}` interpolation per event type.
# github.event.before/after only exist on push events;
# pull_request has pull_request.base.sha / pull_request.head.sha.
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
PUSH_BEFORE: ${{ github.event.before }}
PUSH_AFTER: ${{ github.event.after }}
run: |
# Pattern set covers GitHub family (the actual #2090 vector),
# Anthropic / OpenAI / Slack / AWS. Anchored on prefixes with low
# false-positive rates against agent-generated content. Mirror of
# molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh
# — keep aligned.
SECRET_PATTERNS=(
'ghp_[A-Za-z0-9]{36,}' # GitHub PAT (classic)
'ghs_[A-Za-z0-9]{36,}' # GitHub App installation token
'gho_[A-Za-z0-9]{36,}' # GitHub OAuth user-to-server
'ghu_[A-Za-z0-9]{36,}' # GitHub OAuth user
'ghr_[A-Za-z0-9]{36,}' # GitHub OAuth refresh
'github_pat_[A-Za-z0-9_]{82,}' # GitHub fine-grained PAT
'sk-ant-[A-Za-z0-9_-]{40,}' # Anthropic API key
'sk-proj-[A-Za-z0-9_-]{40,}' # OpenAI project key
'sk-svcacct-[A-Za-z0-9_-]{40,}' # OpenAI service-account key
'sk-cp-[A-Za-z0-9_-]{60,}' # MiniMax API key (F1088 vector — caught only after the fact)
'xox[baprs]-[A-Za-z0-9-]{20,}' # Slack tokens
'AKIA[0-9A-Z]{16}' # AWS access key ID
'ASIA[0-9A-Z]{16}' # AWS STS temp access key ID
)
# Determine the diff base. Each event type stores its SHAs in
# a different place — see the env block above.
case "${{ github.event_name }}" in
pull_request)
BASE="$PR_BASE_SHA"
HEAD="$PR_HEAD_SHA"
;;
*)
BASE="$PUSH_BEFORE"
HEAD="$PUSH_AFTER"
;;
esac
# On push events with shallow clones, BASE may be present in
# the event payload but absent from the local object DB
# (fetch-depth=2 doesn't always reach the previous commit
# across true merges). Try fetching it on demand. If the
# fetch fails — e.g. the SHA was force-overwritten — we fall
# through to the empty-BASE branch below, which scans the
# entire tree as if every file were new. Correct, just slow.
if [ -n "$BASE" ] && ! echo "$BASE" | grep -qE '^0+$'; then
if ! git cat-file -e "$BASE" 2>/dev/null; then
git fetch --depth=1 origin "$BASE" 2>/dev/null || true
fi
fi
# Files added or modified in this change.
if [ -z "$BASE" ] || echo "$BASE" | grep -qE '^0+$' || ! git cat-file -e "$BASE" 2>/dev/null; then
# New branch / no previous SHA / BASE unreachable — check the
# entire tree as added content. Slower, but correct on first
# push.
CHANGED=$(git ls-tree -r --name-only HEAD)
DIFF_RANGE=""
else
CHANGED=$(git diff --name-only --diff-filter=AM "$BASE" "$HEAD")
DIFF_RANGE="$BASE $HEAD"
fi
if [ -z "$CHANGED" ]; then
echo "No changed files to inspect."
exit 0
fi
# Self-exclude: this workflow file legitimately contains the
# pattern strings as regex literals. Without an exclude it would
# block its own merge. Both the .github/ original and this
# .gitea/ port are excluded so a sync between them stays clean.
SELF_GITHUB=".github/workflows/secret-scan.yml"
SELF_GITEA=".gitea/workflows/secret-scan.yml"
OFFENDING=""
# `while IFS= read -r` (not `for f in $CHANGED`) so filenames
# containing whitespace don't word-split silently — a path
# with a space would otherwise produce two iterations on
# tokens that aren't real filenames, breaking the
# self-exclude + diff lookup.
while IFS= read -r f; do
[ -z "$f" ] && continue
[ "$f" = "$SELF_GITHUB" ] && continue
[ "$f" = "$SELF_GITEA" ] && continue
if [ -n "$DIFF_RANGE" ]; then
ADDED=$(git diff --no-color --unified=0 "$BASE" "$HEAD" -- "$f" 2>/dev/null | grep -E '^\+[^+]' || true)
else
# No diff range (new branch first push) — scan the full file
# contents as if every line were new.
ADDED=$(cat "$f" 2>/dev/null || true)
fi
[ -z "$ADDED" ] && continue
for pattern in "${SECRET_PATTERNS[@]}"; do
if echo "$ADDED" | grep -qE "$pattern"; then
OFFENDING="${OFFENDING}${f} (matched: ${pattern})\n"
break
fi
done
done <<< "$CHANGED"
if [ -n "$OFFENDING" ]; then
echo "::error::Credential-shaped strings detected in diff additions:"
# `printf '%b' "$OFFENDING"` interprets backslash escapes
# (the literal `\n` we appended above becomes a newline)
# WITHOUT treating OFFENDING as a format string. Plain
# `printf "$OFFENDING"` is a format-string sink: a filename
# containing `%` would be interpreted as a conversion
# specifier, corrupting the error message (or printing
# `%(missing)` artifacts).
printf '%b' "$OFFENDING"
echo ""
echo "The actual matched values are NOT echoed here, deliberately —"
echo "round-tripping a leaked credential into CI logs widens the blast"
echo "radius (logs are searchable + retained)."
echo ""
echo "Recovery:"
echo " 1. Remove the secret from the file. Replace with an env var"
echo " reference (e.g. \${{ secrets.GITHUB_TOKEN }} in workflows,"
echo " process.env.X in code)."
echo " 2. If the credential was already pushed (this PR's commit"
echo " history reaches a public ref), treat it as compromised —"
echo " ROTATE it immediately, do not just remove it. The token"
echo " remains valid in git history forever and may be in any"
echo " log/cache that consumed this branch."
echo " 3. Force-push the cleaned commit (or stack a revert) and"
echo " re-run CI."
echo ""
echo "If the match is a false positive (test fixture, docs example,"
echo "or this workflow's own regex literals): use a clearly-fake"
echo "placeholder like ghs_EXAMPLE_DO_NOT_USE that doesn't satisfy"
echo "the length suffix, OR add the file path to the SELF exclude"
echo "list in this workflow with a short reason."
echo ""
echo "Mirror of the regex set lives in the runtime's bundled"
echo "pre-commit hook (molecule-ai-workspace-runtime:"
echo "molecule_runtime/scripts/pre-commit-checks.sh) — keep aligned."
exit 1
fi
echo "✓ No credential-shaped strings in this change."

View File

@ -0,0 +1,81 @@
# sop-tier-check — canonical Gitea Actions workflow for §SOP-6 enforcement.
#
# Logic lives in `.gitea/scripts/sop-tier-check.sh` (extracted 2026-05-09
# from the previous inline-bash version). The script is the single source
# of truth; this workflow file just sets env + invokes it.
#
# Copy BOTH files (`.gitea/workflows/sop-tier-check.yml` +
# `.gitea/scripts/sop-tier-check.sh`) into any repo that wants the
# §SOP-6 PR gate enforced. Pair with branch protection on the protected
# branch:
# required_status_checks: ["sop-tier-check / tier-check (pull_request)"]
# required_approving_reviews: 1
# approving_review_teams: ["ceo", "managers", "engineers"]
#
# Tier → eligible-team mapping (mirror of dev-sop §SOP-6):
# tier:low → engineers, managers, ceo
# tier:medium → managers, ceo
# tier:high → ceo
#
# Force-merge: Owners-team override remains available out-of-band via
# the Gitea merge API; force-merge writes `incident.force_merge` to
# `structure_events` per §Persistent structured logging gate (Phase 3).
#
# Set `SOP_DEBUG: '1'` in the env block to enable per-API-call diagnostic
# lines — useful when diagnosing token-scope or team-id-resolution
# issues. Default off.
name: sop-tier-check
# SECURITY: triggers MUST use `pull_request_target`, not `pull_request`.
# `pull_request_target` loads the workflow definition from the BASE
# branch (i.e. `main`), not the PR's HEAD. With `pull_request`, anyone
# with write access to a feature branch could rewrite this file in
# their PR to dump SOP_TIER_CHECK_TOKEN (org-read scope) to logs and
# exfiltrate it. Verified 2026-05-09 against Gitea 1.22.6 —
# `pull_request_target` (added in Gitea 1.21 via go-gitea/gitea#25229)
# is the documented mitigation.
#
# This workflow does NOT call `actions/checkout` of PR HEAD code, so no
# untrusted code is ever executed in the runner — we only HTTP-call the
# Gitea API. If a future change adds a checkout step, it MUST pin to
# `${{ github.event.pull_request.base.sha }}` (NOT `head.sha`) to keep
# the trust boundary.
on:
pull_request_target:
types: [opened, edited, synchronize, reopened, labeled, unlabeled]
pull_request_review:
types: [submitted, dismissed, edited]
jobs:
tier-check:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
steps:
- name: Check out base branch (for the script)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# Pin to base.sha — pull_request_target's protection only
# works if we never check out PR HEAD. Same SHA the workflow
# itself was loaded from.
ref: ${{ github.event.pull_request.base.sha }}
- name: Verify tier label + reviewer team membership
env:
# SOP_TIER_CHECK_TOKEN is the org-level secret for the
# sop-tier-bot PAT (read:organization,read:user,read:issue,
# read:repository). Stored at the org level
# (/api/v1/orgs/molecule-ai/actions/secrets) so per-repo
# configuration is unnecessary — every repo in the org
# picks it up automatically.
# Falls back to GITHUB_TOKEN with a clear error if missing.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
# Set to '1' for diagnostic per-API-call output. Off by default
# so production logs aren't noisy.
SOP_DEBUG: '0'
run: bash .gitea/scripts/sop-tier-check.sh

View File

@ -1,467 +0,0 @@
name: Auto-promote :latest after main image build
# Retags `ghcr.io/molecule-ai/{platform,platform-tenant}:staging-<sha>`
# → `:latest` after either the image build or E2E completes on a `main`
# push, gated on E2E Staging SaaS not being red for that SHA.
#
# Why two triggers:
#
# `publish-workspace-server-image` and `e2e-staging-saas` are both
# paths-filtered, but with DIFFERENT path sets:
#
# publish-workspace-server-image:
# workspace-server/**, canvas/**, manifest.json
#
# e2e-staging-saas (full lifecycle):
# workspace-server/internal/handlers/{registry,workspace_provision,
# a2a_proxy}.go, workspace-server/internal/middleware/**,
# workspace-server/internal/provisioner/**, tests/e2e/test_staging_full_saas.sh
#
# The E2E set is a strict SUBSET of the publish set. So:
# - canvas/** changes → publish fires, E2E does not
# - workspace-server/cmd/** changes → publish fires, E2E does not
# - workspace-server/internal/sweep/** → publish fires, E2E does not
#
# The previous version triggered ONLY on E2E completion, which meant
# non-E2E-path changes (canvas, cmd, sweep, etc.) rebuilt the image
# but never advanced `:latest`. Result: as of 2026-04-28 this workflow
# had run zero times since merge despite eight main pushes — `:latest`
# was ~7 hours / 9 PRs behind main with no human realising. See
# `molecule-core` Slack discussion 2026-04-28.
#
# Adding `publish-workspace-server-image` as a second trigger closes
# the gap: any image rebuild on main eligibly advances `:latest`.
#
# Why E2E remains a kill-switch (not the trigger):
#
# When E2E DID run for this SHA and ended red, we abort — `:latest`
# stays on the prior known-good digest. When E2E didn't run (paths
# filtered out), we proceed: pre-merge gates already validated this
# SHA on staging via auto-promote-staging requiring CI + E2E Canvas +
# E2E API + CodeQL all green. Image content for non-E2E-paths
# (canvas, cmd, sweep) is exercised by those staging gates.
#
# Why `main` only:
#
# `:latest` is what prod tenants pull. We only want SHAs that have
# reached main (via auto-promote-staging) to advance `:latest`.
# Triggering on staging would let a staging-only revert advance
# `:latest` to a SHA that never reaches main, breaking the "production
# runs what's on main" invariant.
#
# Idempotency:
#
# When a SHA touches paths that match BOTH publish and E2E, both
# workflows fire and complete. Both trigger this workflow on
# completion → two runs race. Both retag `:staging-<sha>` →
# `:latest`. crane tag is idempotent (re-tagging the same digest is a
# no-op), so the second run is harmless. concurrency group serializes
# them anyway.
on:
workflow_run:
workflows:
- 'E2E Staging SaaS (full lifecycle)'
- 'publish-workspace-server-image'
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
sha:
description: 'Short sha to promote (override; defaults to upstream workflow_run head_sha)'
required: false
type: string
permissions:
contents: read
packages: write
concurrency:
# Serialize promotes per-SHA so the publish+E2E both-fired race lands
# cleanly. Different SHAs can promote in parallel.
group: auto-promote-latest-${{ github.event.workflow_run.head_sha || github.event.inputs.sha || github.sha }}
cancel-in-progress: false
env:
IMAGE_NAME: ghcr.io/molecule-ai/platform
TENANT_IMAGE_NAME: ghcr.io/molecule-ai/platform-tenant
jobs:
promote:
# Proceed if upstream succeeded OR manual dispatch. Upstream-failure
# paths are filtered here; the E2E-was-red kill-switch lives in the
# gate-check step below (covers the case where upstream is publish
# success but E2E for the same SHA failed).
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')
runs-on: ubuntu-latest
steps:
- name: Compute short sha
id: sha
run: |
set -euo pipefail
if [ -n "${{ github.event.inputs.sha }}" ]; then
FULL="${{ github.event.inputs.sha }}"
else
FULL="${{ github.event.workflow_run.head_sha }}"
fi
echo "short=${FULL:0:7}" >> "$GITHUB_OUTPUT"
echo "full=${FULL}" >> "$GITHUB_OUTPUT"
- name: Gate — E2E Staging SaaS state for this SHA
# When upstream IS E2E success, we know it's green (filtered by
# the job-level `if` already). When upstream is publish, look up
# E2E state for the same SHA. Four buckets:
#
# - completed/success: E2E confirmed safe → proceed
# - completed/failure|cancelled|timed_out: E2E found a
# regression → ABORT (exit 1), `:latest` stays put
# - in_progress|queued|requested: E2E is RACING with publish
# for a runtime-touching SHA. publish typically completes
# ~5-10min before E2E (~10-15min). If we promote on the
# publish signal here, a later E2E failure can't roll back
# `:latest` — it'd already be wrongly advanced. So we DEFER:
# skip subsequent steps (proceed=false) and let E2E's own
# completion event re-fire this workflow, which then takes
# the upstream-is-E2E path. exit 0 so the run shows as
# success rather than a noisy fake-failure.
# - none/none: E2E was paths-filtered out for this SHA (the
# change touched canvas/cmd/sweep/etc. — paths covered by
# publish but not by E2E). pre-merge gates on staging
# already validated this SHA → proceed.
#
# Manual dispatch skips this check — operator override.
id: gate
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
SHA: ${{ steps.sha.outputs.full }}
UPSTREAM_NAME: ${{ github.event.workflow_run.name }}
EVENT_NAME: ${{ github.event_name }}
run: |
set -euo pipefail
if [ "$EVENT_NAME" = "workflow_dispatch" ]; then
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::Manual dispatch — skipping E2E gate (operator override)"
exit 0
fi
if [ "$UPSTREAM_NAME" = "E2E Staging SaaS (full lifecycle)" ]; then
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::Upstream is E2E itself (success per job-level if) — gate trivially satisfied"
exit 0
fi
# Upstream is publish-workspace-server-image. Check E2E state
# for the same SHA via Gitea's commit-status API.
#
# GitHub-era this was `gh run list --workflow=X --commit=SHA
# --json status,conclusion` returning either `[]` (no run on
# this SHA) or `[{status, conclusion}]` (the run's state).
# Gitea has NO workflow-runs API at all — `/api/v1/repos/.../
# actions/runs` returns 404 (verified 2026-05-07, issue #75).
# However Gitea Actions DOES emit a commit status per workflow
# job, with `context = "<Workflow Name> / <Job Name> (<event>)"`,
# which is exactly what we need: each E2E run leg becomes one
# status row on the SHA, and the aggregate state encodes the
# run's outcome.
#
# Mapping:
# 0 matched contexts → "none/none" (E2E paths-
# filtered
# out — same
# semantic
# as before)
# any context = pending → "in_progress/none" (defer)
# any context = error|failure → "completed/failure" (abort)
# all contexts = success → "completed/success" (proceed)
#
# The "completed/cancelled" and "completed/timed_out" buckets
# don't have direct Gitea analogs (Gitea statuses are
# success / failure / error / pending / warning). Per-SHA
# concurrency cancellation surfaces as `error` on Gitea, which
# we map to "completed/failure" rather than "completed/cancelled"
# — losing the soft-defer semantic of the cancelled bucket on
# this fleet. Tradeoff: the staleness alarm (auto-promote-stale-
# alarm.yml) still catches a stuck :latest within 4h, and a
# legitimate cancel is rare enough that aborting + manual
# re-dispatch is acceptable. If we measure cancel frequency
# > 1/week, revisit by reading the run-step-summary text via
# a follow-up script.
#
# Network or auth blips collapse to "none/none" via the curl
# `|| true` fallback, matching the pre-Gitea behaviour where
# an empty list also degenerated to none/none.
GITEA_API_URL="${GITHUB_SERVER_URL:-https://git.moleculesai.app}/api/v1"
STATUSES_JSON=$(curl --fail-with-body -sS \
-H "Authorization: token ${GH_TOKEN}" \
-H "Accept: application/json" \
"${GITEA_API_URL}/repos/${REPO}/commits/${SHA}/statuses?limit=100" \
2>/dev/null || echo "[]")
RESULT=$(printf '%s' "$STATUSES_JSON" | jq -r '
# Filter to E2E Staging SaaS (full lifecycle) statuses.
# Match by leading workflow-name prefix so the "<job>
# (<event>)" tail is irrelevant. Gitea emits the workflow
# name verbatim from the YAML `name:` field.
[.[] | select(.context | startswith("E2E Staging SaaS (full lifecycle) /"))] as $rows
| if ($rows | length) == 0 then
"none/none"
elif any($rows[]; .status == "pending") then
"in_progress/none"
elif any($rows[]; .status == "failure" or .status == "error") then
"completed/failure"
elif all($rows[]; .status == "success") then
"completed/success"
else
# Mixed / unknown — fall through to *) bucket below.
"completed/" + ($rows[0].status // "unknown")
end
' 2>/dev/null || echo "none/none")
echo "E2E Staging SaaS for ${SHA:0:7}: $RESULT"
case "$RESULT" in
completed/success)
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::E2E green for this SHA — proceeding with promote"
;;
completed/failure|completed/timed_out)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote aborted — E2E Staging SaaS failed"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
echo "\`:latest\` stays on the prior known-good digest."
echo
echo "If the failure was a flake, manually dispatch this workflow with the same sha to override."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
completed/cancelled)
# GitHub-era only: cancelled ≠ failure. Gitea statuses
# don't expose a "cancelled" state — a per-SHA concurrency
# cancellation surfaces as `failure` or `error` on Gitea
# and is now handled by the failure branch above. This
# arm is kept for backwards compatibility / dual-host
# operation (if we ever add a non-Gitea fallback) but
# under the post-#75 flow it's unreachable.
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ⏭ Auto-promote deferred — E2E Staging SaaS was cancelled"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
echo "Likely per-SHA concurrency (newer push superseded this E2E run)."
echo "The newer SHA's E2E will fire its own promote when it lands."
echo "If you need this specific SHA promoted, manually dispatch."
} >> "$GITHUB_STEP_SUMMARY"
;;
in_progress/*|queued/*|requested/*|waiting/*|pending/*)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ⏳ Auto-promote deferred — E2E Staging SaaS still running"
echo
echo "Publish completed before E2E for \`${SHA:0:7}\` (state: \`$RESULT\`)."
echo "Skipping retag here — E2E's own completion event will re-fire this workflow."
echo "If E2E ends green, that run promotes \`:latest\`. If red, it aborts."
} >> "$GITHUB_STEP_SUMMARY"
;;
none/none)
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::E2E paths-filtered out for this SHA — pre-merge staging gates carry"
;;
*)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ❓ Auto-promote aborted — unexpected E2E state"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\` (unhandled)"
echo "Manual investigation needed; re-dispatch with the same sha once resolved."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
esac
- if: steps.gate.outputs.proceed == 'true'
uses: imjasonh/setup-crane@6da1ae018866400525525ce74ff892880c099987 # v0.5
- name: GHCR login
if: steps.gate.outputs.proceed == 'true'
run: |
echo "${{ secrets.GITHUB_TOKEN }}" | \
crane auth login ghcr.io -u "${{ github.actor }}" --password-stdin
- name: Verify :staging-<sha> exists for both images
# Better to fail fast with a clear message than to half-tag
# (platform retagged but platform-tenant missing → tenants pull
# a stale image).
if: steps.gate.outputs.proceed == 'true'
run: |
set -euo pipefail
for img in "${IMAGE_NAME}" "${TENANT_IMAGE_NAME}"; do
tag="${img}:staging-${{ steps.sha.outputs.short }}"
if ! crane manifest "$tag" >/dev/null 2>&1; then
echo "::error::Missing tag: $tag"
echo "::error::publish-workspace-server-image must complete on this SHA before auto-promote can retag :latest."
exit 1
fi
echo " ok: $tag exists"
done
- name: Ancestry check — refuse to promote :latest backwards
# #2244: workflow_run completions arrive in arbitrary order. If
# SHA-A and SHA-B both reach main within ~10 min and SHA-B's E2E
# completes before SHA-A's, this workflow can fire for SHA-A
# AFTER it already promoted SHA-B → :latest goes backwards. The
# orphan-reconciler "next run corrects it" doesn't apply: there's
# no auto-corrective re-promote, :latest stays wrong until the
# next main push lands.
#
# Detection: read current :latest's `org.opencontainers.image.revision`
# label (set by publish-workspace-server-image.yml at build time)
# and ask the GitHub compare API whether the candidate SHA is
# ahead-of / identical-to / behind / diverged-from current.
# Hard-fail on `behind` and `diverged` per the approved design —
# silent-bypass is the class we're moving away from. Workflow
# goes red, oncall sees it, operator decides how to recover
# (manual dispatch with the right SHA, force-promote, etc.).
#
# Manual dispatch skips this check — operator override semantics
# match the gate-check step above.
#
# Backward-compat: when current :latest carries no revision
# label (legacy image pre-publish-with-label), skip-with-warning.
# All :latest images on main are post-label as of 2026-04-29, so
# this branch will be dead within 90 days; remove then.
if: steps.gate.outputs.proceed == 'true' && github.event_name != 'workflow_dispatch'
id: ancestry
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
TARGET_SHA: ${{ steps.sha.outputs.full }}
run: |
set -euo pipefail
# Read the current :latest config and pull the revision label.
# `crane config` returns the OCI image config blob (not the manifest);
# labels live under `.config.Labels`. `// empty` makes jq return ""
# rather than the literal "null" so the test below works.
CURRENT_REVISION=$(crane config "${IMAGE_NAME}:latest" 2>/dev/null \
| jq -r '.config.Labels["org.opencontainers.image.revision"] // empty' \
|| true)
if [ -z "$CURRENT_REVISION" ]; then
echo "decision=skip-no-label" >> "$GITHUB_OUTPUT"
{
echo "## ⚠ Ancestry check skipped — current :latest has no revision label"
echo
echo "Likely a legacy image built before \`org.opencontainers.image.revision\` was set."
echo "Falling through to retag. After all \`:latest\` images are post-label (TODO 90 days), this branch is dead and should be removed."
} >> "$GITHUB_STEP_SUMMARY"
echo "::warning::Current :latest carries no revision label — skipping ancestry check (legacy image)"
exit 0
fi
if [ "$CURRENT_REVISION" = "$TARGET_SHA" ]; then
echo "decision=identical" >> "$GITHUB_OUTPUT"
echo "::notice:::latest already at ${TARGET_SHA:0:7} — retag will be a no-op"
exit 0
fi
# Ask GitHub which side of the merge graph TARGET_SHA sits on
# relative to CURRENT_REVISION. Returns one of: ahead | identical
# | behind | diverged. Network or auth errors collapse to "error"
# via the explicit fallback so the case below always matches.
STATUS=$(gh api \
"repos/${REPO}/compare/${CURRENT_REVISION}...${TARGET_SHA}" \
--jq '.status' 2>/dev/null || echo "error")
echo "ancestry compare ${CURRENT_REVISION:0:7} → ${TARGET_SHA:0:7}: $STATUS"
case "$STATUS" in
ahead)
echo "decision=ahead" >> "$GITHUB_OUTPUT"
echo "::notice::Target ${TARGET_SHA:0:7} is ahead of current :latest (${CURRENT_REVISION:0:7}) — proceeding with retag"
;;
identical)
echo "decision=identical" >> "$GITHUB_OUTPUT"
echo "::notice::Target identical to :latest — retag will be a no-op"
;;
behind)
echo "decision=behind" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote refused — target is BEHIND current :latest"
echo
echo "| Field | Value |"
echo "|---|---|"
echo "| Target SHA | \`$TARGET_SHA\` |"
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
echo "| GitHub compare status | \`behind\` |"
echo
echo "This guard catches the workflow_run-completion-order race (#2244):"
echo "two rapid main pushes whose E2Es complete out-of-order can otherwise"
echo "promote \`:latest\` backwards. \`:latest\` stays on \`${CURRENT_REVISION:0:7}\`."
echo
echo "**Recovery:** if this is a legitimate revert that should land on \`:latest\`,"
echo "manually dispatch this workflow with the target sha as input — the manual-dispatch"
echo "path skips the ancestry check (operator override)."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
diverged)
echo "decision=diverged" >> "$GITHUB_OUTPUT"
{
echo "## ❓ Auto-promote refused — history diverged"
echo
echo "| Field | Value |"
echo "|---|---|"
echo "| Target SHA | \`$TARGET_SHA\` |"
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
echo "| GitHub compare status | \`diverged\` |"
echo
echo "Likely cause: force-push rewrote main's history, leaving the previous"
echo "\`:latest\` revision orphaned. Needs human review before \`:latest\` advances."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
error|*)
echo "decision=error" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote aborted — ancestry-check API error"
echo
echo "\`gh api repos/${REPO}/compare/${CURRENT_REVISION}...${TARGET_SHA}\` returned unexpected status: \`$STATUS\`"
echo
echo "Manual dispatch with the target sha bypasses this check."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
esac
- name: Retag platform :staging-<sha> → :latest
if: steps.gate.outputs.proceed == 'true'
run: |
crane tag "${IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
- name: Retag tenant :staging-<sha> → :latest
if: steps.gate.outputs.proceed == 'true'
run: |
crane tag "${TENANT_IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
- name: Summary
if: steps.gate.outputs.proceed == 'true'
run: |
{
echo "## :latest promoted to ${{ steps.sha.outputs.short }}"
echo
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "- Trigger: manual dispatch"
else
echo "- Upstream: \`${{ github.event.workflow_run.name }}\` ([run](${{ github.event.workflow_run.html_url }}))"
fi
echo "- platform:staging-${{ steps.sha.outputs.short }} → :latest"
echo "- platform-tenant:staging-${{ steps.sha.outputs.short }} → :latest"
echo
echo "Tenant fleet auto-pulls within 5 min via IMAGE_AUTO_REFRESH=true."
echo "Force immediate fanout: dispatch redeploy-tenants-on-main.yml."
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,492 +0,0 @@
name: Auto-promote staging → main
# Fires after any of the staging-branch quality gates complete. When ALL
# required gates are green on the same staging SHA, opens (or re-uses)
# a PR `staging → main` and schedules Gitea auto-merge so the PR lands
# automatically once approval + status checks are satisfied.
#
# ============================================================
# What this workflow does
# ============================================================
#
# 1. On a workflow_run completion event for one of the staging gate
# workflows (CI, E2E Staging Canvas, E2E API Smoke, CodeQL),
# checks if the combined status on the staging head SHA is green.
# 2. If green, opens (or re-uses) a PR `head: staging → base: main`
# via Gitea REST `POST /api/v1/repos/.../pulls`.
# 3. Schedules auto-merge via `POST /api/v1/repos/.../pulls/{index}/merge`
# with `merge_when_checks_succeed: true`. Gitea waits for the
# approval requirement on `main` (`required_approvals: 1`) and
# the status-check gates, then merges.
# 4. The merge commit lands on `main` and fires
# `publish-workspace-server-image.yml` naturally via its
# `on: push: branches: [main]` trigger — no explicit dispatch
# needed (see "Why no workflow_dispatch tail" below).
#
# `auto-sync-main-to-staging.yml` is the reverse-direction
# counterpart (main → staging, fast-forward push). Together they
# keep the staging-superset-of-main invariant tight.
#
# ============================================================
# Why Gitea REST (and not `gh pr create`)
# ============================================================
#
# Pre-2026-05-06 this workflow used `gh pr create`, `gh pr merge --auto`,
# `gh run list`, and `gh workflow run` against GitHub. After the
# GitHub→Gitea cutover those calls fail because:
#
# - `gh pr create / merge / view / list` route to GitHub GraphQL
# (`/api/graphql`). Gitea does not expose a GraphQL endpoint;
# every call returns `HTTP 405 Method Not Allowed` — same root
# cause as #65 (auto-sync) which PR #66 fixed by dropping `gh`
# entirely.
# - `gh run list --workflow=...` GitHub-shape; Gitea has the
# simpler `GET /repos/.../commits/{ref}/status` combined-status
# endpoint instead.
# - `gh workflow run X.yml` calls `POST /repos/.../actions/workflows/{id}/dispatches`,
# which does NOT exist on Gitea 1.22.6 (verified via swagger.v1.json).
#
# So this workflow uses direct `curl` calls to Gitea REST. No `gh`
# CLI dependency, no GraphQL, no missing-endpoint footgun.
#
# ============================================================
# Why no workflow_dispatch tail (was load-bearing on GitHub, dead on Gitea)
# ============================================================
#
# The GitHub-era version had a 60-line polling step that waited for
# the promote PR to merge, then explicitly dispatched
# `publish-workspace-server-image.yml` on `--ref main`. That step
# existed because GitHub's GITHUB_TOKEN-initiated merges suppress
# downstream `on: push` workflows (the documented "no recursion" rule
# — https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow).
# The explicit dispatch was the workaround.
#
# Gitea Actions does NOT have this no-recursion rule. PR #66's auto-
# sync merge to main fired `auto-promote-staging` on the next push
# trigger naturally. So the cascade fires on the natural push event;
# the explicit dispatch is dead code. (And even if we wanted to
# preserve it, Gitea has no `workflow_dispatch` REST endpoint.)
#
# Removed in this rewrite. If we ever observe the cascade misfire,
# operator can push an empty commit to `main` to wake it.
#
# ============================================================
# Why open a PR (and not direct push)
# ============================================================
#
# `main` branch protection has `enable_push: false` with NO
# `push_whitelist_usernames`. Direct push is impossible for any
# persona, including admins. PR-mediated merge is the only path,
# which is intentional: prod state mutations (and staging→main IS a
# prod mutation, since the next deploy fans out to tenants) require
# Hongming's approval per `feedback_prod_apply_needs_hongming_chat_go`.
#
# The auto-merge schedule preserves this gate: `merge_when_checks_succeed`
# does NOT bypass `required_approvals: 1`. Gitea waits for BOTH
# approval AND green checks before merging. Hongming reviews via the
# canvas/chat-handle of the PR notification, approves, and Gitea
# auto-merges within seconds.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# This workflow uses `secrets.AUTO_SYNC_TOKEN` — a personal access
# token issued to the `devops-engineer` Gitea persona. NOT the
# founder PAT. The bot-ring fingerprint that triggered the GitHub
# org suspension on 2026-05-06 was characterised by founder PAT
# acting as CI at machine speed.
#
# Token scope: `push: true` (read+write) on this repo. The persona
# can: open PRs, comment on PRs, schedule auto-merge. The persona
# CANNOT bypass main's branch protection (`required_approvals: 1`
# still applies — only Hongming's review unblocks merge).
#
# Authorship: the PR is opened by `devops-engineer`; the merge
# commit credits Hongming-as-approver and `devops-engineer` as
# the merger.
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — staging gates not all green at trigger time:
# - The combined-status check returns `state: pending|failure`.
# Workflow exits 0 with a step-summary "not all green; staying
# on current main". Re-fires on the next gate completion.
#
# B — Gitea PR-create returns non-201 (e.g. 422 already-exists):
# - Idempotent: the workflow first GETs the existing open
# staging→main PR. If found, reuse it; if not, POST a new one.
# 422 should never surface; if it does (race), step summary
# captures the body and the next workflow_run picks up.
#
# C — `merge_when_checks_succeed` schedule fails:
# - 422 with "Pull request is not mergeable" if there are
# conflicts or stale base. Step summary surfaces it; operator
# (or `auto-sync-main-to-staging`) needs to bring staging up
# to date with main first. Workflow exits 1 to surface red.
#
# D — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - 401/403 on first REST call. Step summary surfaces it.
# Re-issue the token from `~/.molecule-ai/personas/` on the
# operator host and update the repo Actions secret.
#
# ============================================================
# Loop safety
# ============================================================
#
# When the promote PR merges to main, `auto-sync-main-to-staging.yml`
# fires (on:push:main) and pushes the merge commit back to staging.
# That push to staging is by `devops-engineer`, NOT this workflow's
# token, and triggers the staging gate workflows. When they all
# complete, we end up back here — but the tree-diff guard catches
# it: staging tree == main tree (the merge commit changes nothing),
# so we skip and the cycle terminates.
on:
workflow_run:
workflows:
- CI
- E2E Staging Canvas (Playwright)
- E2E API Smoke Test
- CodeQL
types: [completed]
workflow_dispatch:
inputs:
force:
description: "Force promote even when AUTO_PROMOTE_ENABLED is unset (manual override)"
required: false
default: "false"
permissions:
contents: read
pull-requests: write
# Serialize auto-promote runs. Multiple staging gate completions can land
# in quick succession (CI + E2E + CodeQL all finish within seconds of
# each other on a green PR) — without this, two parallel runs both:
# 1. Would race the GET-or-POST PR step.
# 2. Would both call merge-schedule (idempotent — fine on Gitea).
# cancel-in-progress: false because the second run on a fresh staging
# tip should NOT kill the first which has already opened the PR.
concurrency:
group: auto-promote-staging
cancel-in-progress: false
jobs:
check-all-gates-green:
# Only consider staging pushes. PRs into staging don't promote.
if: >
(github.event_name == 'workflow_run' &&
github.event.workflow_run.head_branch == 'staging' &&
github.event.workflow_run.event == 'push')
|| github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
outputs:
all_green: ${{ steps.gates.outputs.all_green }}
head_sha: ${{ steps.gates.outputs.head_sha }}
steps:
# Skip empty-tree promotes (the perpetual auto-promote↔auto-sync
# cycle observed pre-cutover on GitHub). On Gitea the cycle shape
# is different (auto-sync uses fast-forward, no merge commit),
# but the tree-diff guard is cheap insurance and protects against
# any future merge-style regression.
- name: Checkout for tree-diff check
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
ref: staging
- name: Skip if staging tree == main tree (cycle-break safety)
id: tree-diff
env:
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
run: |
set -eu
git fetch origin main --depth=50 || { echo "::warning::git fetch main failed — proceeding (fail-open)"; exit 0; }
if git diff --quiet origin/main "$HEAD_SHA" -- 2>/dev/null; then
{
echo "## Skipped — no code to promote"
echo
echo "staging tip (\`${HEAD_SHA:0:8}\`) and \`main\` have identical trees."
echo "Skipping to avoid opening an empty promote PR."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote: staging tree == main tree — no code to promote, skipping"
echo "skip=true" >> "$GITHUB_OUTPUT"
else
echo "skip=false" >> "$GITHUB_OUTPUT"
fi
- name: Check combined status on staging head
if: steps.tree-diff.outputs.skip != 'true'
id: gates
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
REPO: ${{ github.repository }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
run: |
set -euo pipefail
# Gitea-native combined-status endpoint aggregates every
# check context attached to a SHA. This is structurally
# cleaner than the GitHub-era per-workflow `gh run list`
# loop because:
#
# 1. There's no risk of "workflow name collision" (the
# GitHub-era code had to switch from `--workflow=NAME`
# to `--workflow=FILE.YML` to disambiguate "CodeQL"
# between the explicit workflow and GitHub's UI-
# configured default setup; Gitea has no such
# duplicate-name surface).
# 2. Gitea's combined state already encodes the AND
# across all contexts: success only if EVERY context
# is success. Pending or failure on any context
# produces non-success state.
#
# See https://docs.gitea.com/api/1.22 for the schema —
# `state` is one of: success, pending, failure, error.
echo "head_sha=${HEAD_SHA}" >> "$GITHUB_OUTPUT"
echo "Checking combined status on SHA ${HEAD_SHA}"
# `set +o pipefail` for the http-code capture pattern; restore
# immediately. Pattern hardened per `feedback_curl_status_capture_pollution`.
BODY_FILE=$(mktemp)
set +e
STATUS=$(curl -sS \
-H "Authorization: token ${GITEA_TOKEN}" \
-H "Accept: application/json" \
-o "${BODY_FILE}" \
-w "%{http_code}" \
"${GITEA_HOST}/api/v1/repos/${REPO}/commits/${HEAD_SHA}/status")
CURL_RC=$?
set -e
if [ "${CURL_RC}" -ne 0 ] || [ "${STATUS}" != "200" ]; then
echo "::error::combined-status fetch failed: curl=${CURL_RC} http=${STATUS}"
cat "${BODY_FILE}" | head -c 500 || true
rm -f "${BODY_FILE}"
echo "all_green=false" >> "$GITHUB_OUTPUT"
exit 0
fi
STATE=$(jq -r '.state // "missing"' < "${BODY_FILE}")
TOTAL=$(jq -r '.total_count // 0' < "${BODY_FILE}")
rm -f "${BODY_FILE}"
echo "Combined status: state=${STATE} total_count=${TOTAL}"
if [ "${STATE}" = "success" ] && [ "${TOTAL}" -gt 0 ]; then
echo "all_green=true" >> "$GITHUB_OUTPUT"
echo "::notice::All gates green on ${HEAD_SHA} (${TOTAL} contexts)"
else
echo "all_green=false" >> "$GITHUB_OUTPUT"
{
echo "## Not promoting — combined status not green"
echo
echo "- SHA: \`${HEAD_SHA:0:8}\`"
echo "- Combined state: \`${STATE}\`"
echo "- Context count: ${TOTAL}"
echo
echo "Will re-fire on the next gate completion. Investigate any red gate via the Actions UI."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote: combined status is ${STATE} on ${HEAD_SHA} — staying on current main"
fi
promote:
needs: check-all-gates-green
if: needs.check-all-gates-green.outputs.all_green == 'true'
runs-on: ubuntu-latest
steps:
- name: Check rollout gate
env:
AUTO_PROMOTE_ENABLED: ${{ vars.AUTO_PROMOTE_ENABLED }}
FORCE_INPUT: ${{ github.event.inputs.force }}
run: |
set -eu
# Repo variable AUTO_PROMOTE_ENABLED=true flips this on. While
# it's unset, the workflow dry-runs (logs what it would have
# done) but doesn't open the promote PR. Set the variable in
# Settings → Actions → Variables.
if [ "${AUTO_PROMOTE_ENABLED:-}" != "true" ] && [ "${FORCE_INPUT:-false}" != "true" ]; then
{
echo "## Auto-promote disabled"
echo
echo "Repo variable \`AUTO_PROMOTE_ENABLED\` is not set to \`true\`."
echo "All gates are green on staging; would have opened a promote PR to \`main\`."
echo
echo "To enable: Settings → Actions → Variables → \`AUTO_PROMOTE_ENABLED=true\`."
echo "To test once manually: workflow_dispatch with \`force=true\`."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote disabled — dry run only"
exit 0
fi
- name: Open or reuse promote PR + schedule auto-merge
if: ${{ vars.AUTO_PROMOTE_ENABLED == 'true' || github.event.inputs.force == 'true' }}
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
REPO: ${{ github.repository }}
TARGET_SHA: ${{ needs.check-all-gates-green.outputs.head_sha }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
# http_status_get RESULT_VAR URL
# Sets RESULT_VAR to "<http_code>:<body_file>". Curl status
# capture pattern per `feedback_curl_status_capture_pollution`:
# http_code goes to its own tempfile-equivalent (-w), body to
# another tempfile, set +e/-e bracket protects pipeline state.
http_get() {
local body_file="$1"; shift
local url="$1"; shift
set +e
local code
code=$(curl -sS "${AUTH[@]}" -o "${body_file}" -w "%{http_code}" "${url}")
local rc=$?
set -e
if [ "${rc}" -ne 0 ]; then
echo "::error::curl GET failed (rc=${rc}) on ${url}"
return 99
fi
echo "${code}"
}
http_post_json() {
local body_file="$1"; shift
local data="$1"; shift
local url="$1"; shift
set +e
local code
code=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${data}" -o "${body_file}" -w "%{http_code}" "${url}")
local rc=$?
set -e
if [ "${rc}" -ne 0 ]; then
echo "::error::curl POST failed (rc=${rc}) on ${url}"
return 99
fi
echo "${code}"
}
# Step 1: look for an existing open staging→main promote PR
# (idempotent on workflow re-run). Gitea doesn't have a
# head/base filter on the list endpoint that's as ergonomic
# as gh's, but the dedicated `/pulls/{base}/{head}` lookup
# works.
BODY=$(mktemp)
STATUS=$(http_get "${BODY}" "${API}/pulls/main/staging") || true
PR_NUM=""
if [ "${STATUS}" = "200" ]; then
STATE=$(jq -r '.state // "missing"' < "${BODY}")
if [ "${STATE}" = "open" ]; then
PR_NUM=$(jq -r '.number // ""' < "${BODY}")
echo "::notice::Re-using existing open promote PR #${PR_NUM}"
fi
fi
rm -f "${BODY}"
# Step 2: if no open PR, create one.
if [ -z "${PR_NUM}" ]; then
TITLE="staging → main: auto-promote ${TARGET_SHA:0:7}"
BODY_TEXT=$(cat <<EOFBODY
Automated promotion of \`staging\` (\`${TARGET_SHA:0:8}\`) to \`main\`. All required staging gates are green at this SHA (combined status reported success).
This PR is auto-generated by \`.github/workflows/auto-promote-staging.yml\` whenever every required gate completes green on the same staging SHA.
**Approval gate:** \`main\` branch protection requires 1 approval before this can land. Once approved, Gitea will auto-merge (the workflow scheduled \`merge_when_checks_succeed: true\` immediately after open).
The reverse-direction sync (the merge commit on \`main\` → \`staging\`) is handled automatically by \`auto-sync-main-to-staging.yml\` after this PR lands.
---
- Source: staging at \`${TARGET_SHA}\`
- Opened by: \`devops-engineer\` persona (anti-bot-ring; never founder PAT)
- Refs: #65, #73, #195
EOFBODY
)
REQ=$(jq -n \
--arg title "${TITLE}" \
--arg body "${BODY_TEXT}" \
--arg base "main" \
--arg head "staging" \
'{title:$title, body:$body, base:$base, head:$head}')
BODY=$(mktemp)
STATUS=$(http_post_json "${BODY}" "${REQ}" "${API}/pulls")
if [ "${STATUS}" = "201" ]; then
PR_NUM=$(jq -r '.number // ""' < "${BODY}")
echo "::notice::Opened promote PR #${PR_NUM}"
else
echo "::error::Failed to create promote PR: HTTP ${STATUS}"
jq -r '.message // .' < "${BODY}" | head -c 500
rm -f "${BODY}"
exit 1
fi
rm -f "${BODY}"
fi
# Step 3: schedule auto-merge. merge_when_checks_succeed
# tells Gitea to wait for both:
# - all required status checks to pass
# - the required-approvals gate (1 approval on main)
# before merging. On approval+green, Gitea merges within
# seconds. On any check failing or approval being denied,
# the schedule stays armed but doesn't fire.
#
# Idempotent: re-arming on an already-armed PR is a no-op.
REQ=$(jq -n '{Do:"merge", merge_when_checks_succeed:true}')
BODY=$(mktemp)
STATUS=$(http_post_json "${BODY}" "${REQ}" "${API}/pulls/${PR_NUM}/merge")
# Gitea returns:
# - 200/204 on successful immediate merge (gates already green AND approved)
# - 405 "Please try again later" when scheduled successfully but waiting
# - 422 on "Pull request is not mergeable" (conflict, stale base, etc.)
#
# 405 here is benign — Gitea's way of saying "scheduled, not merging now".
# We treat 200/204/405 as success, anything else as failure.
case "${STATUS}" in
200|204)
MERGE_OUTCOME="merged-immediately"
echo "::notice::Promote PR #${PR_NUM} merged immediately (gates+approval already green)"
;;
405)
MERGE_OUTCOME="auto-merge-scheduled"
echo "::notice::Promote PR #${PR_NUM}: auto-merge scheduled (Gitea will land on approval+green)"
;;
422)
MERGE_OUTCOME="not-mergeable"
echo "::warning::Promote PR #${PR_NUM}: not mergeable (conflict, stale base, or already merging)."
jq -r '.message // .' < "${BODY}" | head -c 500
;;
*)
echo "::error::Unexpected status ${STATUS} on merge schedule"
jq -r '.message // .' < "${BODY}" | head -c 500
rm -f "${BODY}"
exit 1
;;
esac
rm -f "${BODY}"
{
echo "## Auto-promote PR opened"
echo
echo "- Source: staging at \`${TARGET_SHA:0:8}\`"
echo "- PR: #${PR_NUM}"
echo "- Outcome: \`${MERGE_OUTCOME}\`"
echo
if [ "${MERGE_OUTCOME}" = "auto-merge-scheduled" ]; then
echo "Gitea will auto-merge once Hongming approves and all checks are green. No human action needed beyond approval."
elif [ "${MERGE_OUTCOME}" = "merged-immediately" ]; then
echo "Merged immediately. \`publish-workspace-server-image.yml\` will fire naturally on the resulting \`main\` push."
else
echo "PR is not auto-merging. Operator may need to bring staging up to date with main, then re-trigger this workflow via workflow_dispatch."
fi
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,83 +0,0 @@
name: auto-promote-stale-alarm
# Hourly cron + on-demand alarm for the silent-block failure mode that
# motivated issue #2975:
# - The auto-promote-staging.yml workflow opened a PR + armed
# auto-merge, but main's branch protection requires a human review
# (reviewDecision=REVIEW_REQUIRED). The PR sat BLOCKED with no
# surface-up-the-stack for 12+ hours, holding 25 commits hostage
# including the Memory v2 redesign and a reno-stars data-loss fix.
#
# This workflow runs `scripts/check-stale-promote-pr.sh` against the
# repo's open auto-promote PRs (base=main head=staging). When a PR has
# been BLOCKED on REVIEW_REQUIRED for >4h, it:
# 1. Emits a workflow-level warning (visible in run summary + the
# Actions UI feed).
# 2. Posts a comment on the PR (idempotent — one alarm per PR).
#
# The detection logic lives in scripts/check-stale-promote-pr.sh so
# it's unit-testable with stubbed `gh` (see test-check-stale-promote-pr.sh).
# This file is the schedule + invocation surface only — SSOT for the
# detector itself.
on:
schedule:
# Hourly. Cheap (one `gh pr list` + jq), and 1h granularity is
# plenty for a 4h staleness threshold — operators see the alarm
# within at most 1h of crossing the threshold.
- cron: "27 * * * *" # at :27 to dodge the cron herd at :00
workflow_dispatch:
inputs:
stale_hours:
description: "Hours after which a BLOCKED+REVIEW_REQUIRED PR is stale (default 4)"
required: false
default: "4"
post_comment:
description: "Post a comment on stale PRs (default true)"
required: false
default: "true"
permissions:
contents: read
pull-requests: write # post comments on stale PRs
# Serialize so the on-demand and scheduled runs don't double-comment
# the same PR. cancel-in-progress=false because the script is idempotent
# (existing comment marker prevents dupes), but a scheduled run firing
# while a manual one runs would just re-list the same PR set.
concurrency:
group: auto-promote-stale-alarm
cancel-in-progress: false
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Checkout (need scripts/ only)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
sparse-checkout: |
scripts/check-stale-promote-pr.sh
sparse-checkout-cone-mode: false
- name: Run stale-PR detector
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
STALE_HOURS: ${{ inputs.stale_hours || '4' }}
POST_COMMENT: ${{ inputs.post_comment || 'true' }}
run: |
# The script's exit code reflects the count of stale PRs.
# We don't want a stale finding to fail the workflow run —
# the warning + comment are the signal, the green/red is
# noise. So convert any non-zero exit to a workflow notice
# and exit 0.
set +e
bash scripts/check-stale-promote-pr.sh
rc=$?
set -e
if [ "$rc" -ne 0 ]; then
echo "::notice::Stale PR detector found $rc PR(s) needing attention. See warnings above + comments on the PRs."
fi
# Always succeed — operator-facing surface is the warning,
# not the workflow status.
exit 0

View File

@ -1,404 +0,0 @@
name: Auto-sync canary — AUTO_SYNC_TOKEN rotation drift
# Synthetic health check for the AUTO_SYNC_TOKEN secret consumed by
# auto-sync-main-to-staging.yml (PR #66) and publish-workspace-server-image.yml.
#
# ============================================================
# Why this workflow exists
# ============================================================
#
# PR #66 fixed auto-sync (replaced GitHub-era `gh pr create` — which
# 405s on Gitea's GraphQL endpoint — with a direct git push from the
# `devops-engineer` persona's `AUTO_SYNC_TOKEN`). Hostile self-review
# weakest spot #3 of that PR:
#
# "Token rotation silently breaks auto-sync. If AUTO_SYNC_TOKEN is
# rotated without updating the repo secret, every push to main
# fails red on the auto-sync push step. The workflow surfaces the
# failure mode in the step summary (failure mode B in the header),
# but there's no proactive monitoring."
#
# Detection latency under the status quo: rotation is only caught on
# the next push to `main`. During quiet periods (no main push for
# many hours) the staging-superset-of-main invariant silently breaks.
#
# This workflow closes the gap: every 6 hours, it fires the auth
# surface that auto-sync depends on and emits a red workflow status
# if AUTO_SYNC_TOKEN has drifted out of validity.
#
# ============================================================
# What this checks (Option B — read-only verify)
# ============================================================
#
# 1. `GET /api/v1/user` against Gitea with the token → validates the
# token authenticates AND resolves to `devops-engineer` (catches
# the case where the token was regenerated under a different
# persona by mistake).
# 2. `GET /api/v1/repos/molecule-ai/molecule-core` with the token →
# validates the token has `read:repository` scope on this repo
# (the v2 scope contract — see saved memory
# `reference_persona_token_v2_scope`).
# 3. `git push --dry-run` of the current staging SHA back to
# `refs/heads/staging` via `https://oauth2:<token>@<gitea>/...`
# → validates the EXACT HTTPS basic-auth path that
# `actions/checkout` + `git push origin staging` use inside
# auto-sync-main-to-staging.yml. NOP by construction (push the
# current tip to itself = "Everything up-to-date"); auth is
# checked at the smart-protocol handshake BEFORE the empty-diff
# computation, so bad token → exit 128 with "Authentication
# failed". `git ls-remote` is NOT used here because Gitea
# falls back to anonymous read on public repos and would
# silently green-light a rotated token.
#
# Each step exits non-zero with an actionable error message if it
# fails. The workflow status itself is the operator-facing surface.
#
# ============================================================
# What this does NOT check (intentional)
# ============================================================
#
# - **Branch-protection authz** (failure mode C in auto-sync header):
# would require an actual write to staging. Already monitored by
# `branch-protection-drift.yml` daily. Don't duplicate.
# - **Conflict resolution** (failure mode A): a real conflict is data-
# driven, not auth-driven; can't synthesise it without polluting
# staging. Already surfaces immediately on the next main push.
# - **Concurrency** (failure mode D): handled by workflow concurrency
# group on auto-sync, not a credential issue.
#
# ============================================================
# Why Option B (read-only) and not the alternatives
# ============================================================
#
# Considered + rejected (see issue #72 for full write-up):
#
# - **Option A — full auto-sync on schedule**: every run creates a
# no-op merge commit on staging when main hasn't advanced. 4 noise
# commits/day. And races the real `push:` trigger when main has
# advanced. Rejected.
#
# - **Option C — push to dedicated `auto-sync-canary` branch**: would
# exercise authz too, but adds branch noise on Gitea AND requires
# maintaining a second branch protection (or expanding staging's
# whitelist to a junk branch). Authz already covered by
# `branch-protection-drift.yml`. Rejected.
#
# Prior art for the chosen Option B shape:
# - Cloudflare's `/user/tokens/verify` endpoint (read-only auth
# probe explicitly designed for credential canaries).
# - AWS Secrets Manager rotation Lambda's `testSecret` step (auth
# probe before promoting AWSPENDING → AWSCURRENT).
# - HashiCorp Vault's `vault token lookup` for renewal canaries.
#
# ============================================================
# Operator runbook — what to do when this workflow goes RED
# ============================================================
#
# 1. **Identify which step failed**:
# - Step "Verify token authenticates as devops-engineer" red →
# token is invalid OR resolves to wrong persona.
# - Step "Verify token has repo read scope" red → token valid but
# stripped of `read:repository` scope (or repo perms changed).
# - Step "Verify git HTTPS auth path via no-op dry-run push to
# staging" red → token rotated/revoked OR Gitea git-HTTPS
# surface is broken (rare). Auth check happens on the
# smart-protocol handshake, separate from the API path.
#
# 2. **Re-issue the token** on the operator host:
# ```
# ssh root@5.78.80.188 'docker exec --user git molecule-gitea-1 \
# gitea admin user generate-access-token \
# --username devops-engineer \
# --token-name persona-devops-engineer-vN \
# --scopes "read:repository,write:repository,read:user,read:organization,read:issue,write:issue,read:notification,read:misc"'
# ```
# Update `/etc/molecule-bootstrap/agent-secrets.env` in place
# (per `feedback_unified_credentials_file`). The previous token
# file lands at `.bak.<date>`.
#
# 3. **Update the repo Actions secret** at:
# Settings → Secrets and variables → Actions → AUTO_SYNC_TOKEN
# Paste the new token. (Don't echo it in chat — but per
# `feedback_passwords_in_chat_are_burned`, a paste in a 1:1
# Claude session is within trust boundary.)
#
# 4. **Re-run this canary** via workflow_dispatch. Confirm GREEN.
#
# 5. **Backfill any missed main → staging syncs** by re-running
# `auto-sync-main-to-staging.yml` from its workflow_dispatch
# surface, OR by pushing an empty commit to main (if you'd
# rather force a real trigger).
#
# ============================================================
# Security notes
# ============================================================
#
# - Token usage: read-only (`GET /api/v1/user`, `GET /api/v1/repos/...`,
# `git ls-remote`). No write paths. Same blast-radius profile as
# `actions/checkout` on a public repo.
# - The token NEVER appears in logs: every `curl` uses a header
# variable, never inline; the `git ls-remote` URL builds the
# `oauth2:$TOKEN@host` form into a single env var that's not
# echoed. GitHub Actions secret-masking covers anything that does
# slip through.
# - No new token introduced — same `AUTO_SYNC_TOKEN` the workflow
# under monitor uses. Per least-privilege we deliberately do NOT
# broaden scope for the canary.
on:
schedule:
# Every 6 hours at :17 (offsets the cron herd at :00). Justification
# from issue #72: cheap to run (~5s wall-clock, no quota), 3h average
# detection latency, 6h max. 1h would be 24× the runs for marginal
# benefit; daily would be 6× longer latency and worse than status
# quo on a quiet-main day.
- cron: '17 */6 * * *'
workflow_dispatch:
# No concurrency group needed — the canary is read-only and idempotent.
# Two parallel runs (e.g. operator dispatch during a scheduled tick) are
# harmless: same result, doubled HTTPS calls, no shared state.
permissions:
contents: read
jobs:
verify-token:
name: Verify AUTO_SYNC_TOKEN validity
runs-on: ubuntu-latest
# 2 min surfaces hangs (Gitea API stall, DNS issue) within one
# cron interval. Realistic worst case is ~10s: 2 curls + 1 git
# ls-remote, each capped by the explicit timeouts below.
timeout-minutes: 2
env:
# Pinned in env so individual steps can read it without
# repeating the secret reference. GitHub masks the value in
# logs automatically.
AUTO_SYNC_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
# MUST stay in sync with auto-sync-main-to-staging.yml's
# `git config user.name "devops-engineer"` line. Renaming the
# devops-engineer persona requires updating both files (and
# the staging branch protection's `push_whitelist_usernames`).
EXPECTED_PERSONA: devops-engineer
GITEA_HOST: git.moleculesai.app
REPO_PATH: molecule-ai/molecule-core
steps:
- name: Verify AUTO_SYNC_TOKEN secret is configured
# Schedule-vs-dispatch behaviour split, per
# `feedback_schedule_vs_dispatch_secrets_hardening`:
#
# - schedule: hard-fail when the secret is missing. The
# whole point of the canary is to surface drift; soft-
# skipping on missing-secret would make the canary
# itself drift-invisible (sweep-cf-orphans #2088 lesson).
# - workflow_dispatch: hard-fail too — there's no scenario
# where an operator wants this canary to silently no-op.
# The workflow has no other ad-hoc utility; if you ran
# it, you wanted the answer.
run: |
if [ -z "${AUTO_SYNC_TOKEN}" ]; then
echo "::error::AUTO_SYNC_TOKEN secret is not set on this repo." >&2
echo "::error::Set it at Settings → Secrets and variables → Actions." >&2
echo "::error::Without it, auto-sync-main-to-staging.yml will fail every push to main." >&2
exit 1
fi
echo "AUTO_SYNC_TOKEN is configured (value masked)."
- name: Verify token authenticates as ${{ env.EXPECTED_PERSONA }}
# Calls Gitea's `/api/v1/user` — the canonical
# auth-probe-with-no-side-effects endpoint (mirrors
# Cloudflare's /user/tokens/verify).
#
# Failure surfaces:
# - HTTP 401: token invalid (rotated, revoked, or never
# correctly registered).
# - HTTP 200 but username != devops-engineer: token was
# regenerated under the wrong persona — this would let
# auth pass but commit attribution would be wrong, and
# branch-protection authz would fail because only
# `devops-engineer` is whitelisted.
run: |
set -euo pipefail
response_file="$(mktemp)"
code_file="$(mktemp)"
# `--max-time 30`: full call ceiling. `--connect-timeout 10`:
# DNS + TCP. `-w "%{http_code}"` routed to a tempfile so curl's
# exit code can't pollute the captured status — see
# feedback_curl_status_capture_pollution + the
# `lint-curl-status-capture.yml` gate that rejects the unsafe
# `$(curl ... || echo "000")` shape.
set +e
curl -sS -o "$response_file" \
--max-time 30 --connect-timeout 10 \
-w "%{http_code}" \
-H "Authorization: token ${AUTO_SYNC_TOKEN}" \
-H "Accept: application/json" \
"https://${GITEA_HOST}/api/v1/user" >"$code_file" 2>/dev/null
set -e
status=$(cat "$code_file" 2>/dev/null || true)
[ -z "$status" ] && status="000"
if [ "$status" != "200" ]; then
echo "::error::Token rotation suspected: GET /api/v1/user returned HTTP $status (expected 200)." >&2
echo "::error::Likely cause: AUTO_SYNC_TOKEN has been rotated/revoked on Gitea but the repo Actions secret was not updated." >&2
echo "::error::Runbook: see header comment of this workflow file." >&2
# Print response body but redact anything that looks like a token.
sed -E 's/[A-Fa-f0-9]{32,}/<redacted>/g' "$response_file" >&2 || true
exit 1
fi
username=$(python3 -c "import json,sys; print(json.load(open(sys.argv[1])).get('login',''))" "$response_file")
if [ "$username" != "${EXPECTED_PERSONA}" ]; then
echo "::error::Token resolves to user '$username', expected '${EXPECTED_PERSONA}'." >&2
echo "::error::AUTO_SYNC_TOKEN must be the devops-engineer persona PAT (not founder PAT, not another persona)." >&2
echo "::error::Auto-sync push will fail because only 'devops-engineer' is whitelisted on staging branch protection." >&2
exit 1
fi
echo "Token authenticates as: $username ✓"
- name: Verify token has repo read scope
# `GET /api/v1/repos/<owner>/<repo>` requires `read:repository`
# on the persona's v2 scope contract. If the scope was
# narrowed/dropped on rotation we catch it here, before the
# next main push reveals it via a checkout failure.
run: |
set -euo pipefail
response_file="$(mktemp)"
code_file="$(mktemp)"
# See first probe step for the rationale on the tempfile-routed
# `-w "%{http_code}"` pattern — the unsafe `|| echo "000"` shape
# is rejected by lint-curl-status-capture.yml.
set +e
curl -sS -o "$response_file" \
--max-time 30 --connect-timeout 10 \
-w "%{http_code}" \
-H "Authorization: token ${AUTO_SYNC_TOKEN}" \
-H "Accept: application/json" \
"https://${GITEA_HOST}/api/v1/repos/${REPO_PATH}" >"$code_file" 2>/dev/null
set -e
status=$(cat "$code_file" 2>/dev/null || true)
[ -z "$status" ] && status="000"
if [ "$status" != "200" ]; then
echo "::error::Token lacks read:repository scope on ${REPO_PATH}: HTTP $status." >&2
echo "::error::Auto-sync's actions/checkout step will fail with this token." >&2
echo "::error::Re-issue with v2 scope contract: read:repository,write:repository,read:user,read:organization,read:issue,write:issue,read:notification,read:misc" >&2
sed -E 's/[A-Fa-f0-9]{32,}/<redacted>/g' "$response_file" >&2 || true
exit 1
fi
echo "Token has read:repository on ${REPO_PATH} ✓"
- name: Verify git HTTPS auth path via no-op dry-run push to staging
# Final probe: exercise the EXACT auth path that
# `actions/checkout` + `git push origin staging` use in
# auto-sync-main-to-staging.yml. Gitea's API and git-HTTPS
# surfaces share the token-lookup code path internally but
# the wire-level error shapes differ — historically (#173)
# the API path was healthy while git-HTTPS rejected, so
# checking only the API would have given false-green.
#
# IMPORTANT: `git ls-remote` on a public repo (which
# molecule-core is) succeeds even with a junk token because
# Gitea falls back to anonymous-read. `ls-remote` therefore
# CANNOT validate auth on this surface. We use
# `git push --dry-run` instead — push is auth-gated even on
# public repos.
#
# NOP shape: read the current staging SHA via authenticated
# ls-remote (the SHA itself is public; auth is incidental
# here, used only to colocate the discovery in one step), then
# `git push --dry-run <SHA>:refs/heads/staging`. Pushing the
# current tip back to itself is "Everything up-to-date" with
# exit 0 when auth succeeds. With a bad token Gitea returns
# HTTP 401 in the smart-protocol handshake and git exits 128
# with "Authentication failed".
#
# The dry-run never reaches Gitea's pre-receive hook (which
# is where branch-protection authz runs), so this probe does
# not validate failure mode C. That's intentional —
# branch-protection-drift.yml owns authz monitoring; this
# canary owns auth.
env:
# Don't hang waiting for password prompt if auth fails on a
# terminal-attached run. (In Actions there's no terminal,
# but the env-var hardens against an interactive runner
# config.)
GIT_TERMINAL_PROMPT: "0"
run: |
set -euo pipefail
# Token is in $AUTO_SYNC_TOKEN (job-level env). Compose the
# URL as a local var that's never echoed.
url="https://oauth2:${AUTO_SYNC_TOKEN}@${GITEA_HOST}/${REPO_PATH}"
# Step a: read current staging SHA. ~1KB; auth-gated only
# on private repos but always works on public — used here
# only to discover the SHA, not to validate auth.
staging_ref=$(timeout 30s git ls-remote --refs "$url" refs/heads/staging 2>&1) || {
redacted=$(echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
echo "::error::ls-remote against staging failed (network/DNS issue):" >&2
echo "$redacted" >&2
exit 1
}
if ! echo "$staging_ref" | grep -qE '^[0-9a-f]{40}[[:space:]]+refs/heads/staging$'; then
echo "::error::ls-remote returned unexpected shape:" >&2
echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g" >&2
exit 1
fi
staging_sha=$(echo "$staging_ref" | awk '{print $1}')
# Step b: spin up an ephemeral local repo. `git push` always
# requires a local repo even when pushing a remote SHA that
# isn't in the local object DB (the protocol negotiates and
# discovers we don't need to send any objects). We don't use
# `actions/checkout` for this — it would clone the whole
# repo (~hundreds of MB) for what's essentially `git init`.
tmp_repo="$(mktemp -d)"
trap 'rm -rf "$tmp_repo"' EXIT
git -C "$tmp_repo" init -q
# Author config required for any git operation; values are
# arbitrary because nothing gets committed here.
git -C "$tmp_repo" config user.email canary@auto-sync.local
git -C "$tmp_repo" config user.name auto-sync-canary
# Step c: dry-run push the current staging SHA back to
# staging. NOP by construction — the remote tip equals the
# SHA we're pushing, so "Everything up-to-date" is the
# success path.
#
# Authentication is checked at the smart-protocol handshake,
# BEFORE the dry-run can compute an empty diff. Bad token
# → "Authentication failed", exit 128. Good token → exit 0.
set +e
push_out=$(timeout 30s git -C "$tmp_repo" push --dry-run "$url" "${staging_sha}:refs/heads/staging" 2>&1)
push_rc=$?
set -e
if [ "$push_rc" -ne 0 ]; then
redacted=$(echo "$push_out" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
echo "::error::Token rotation suspected: git push --dry-run against staging failed via the AUTO_SYNC_TOKEN HTTPS auth path (exit $push_rc)." >&2
echo "::error::This is the EXACT auth path that actions/checkout + git push use in auto-sync-main-to-staging.yml." >&2
echo "::error::Likely cause: AUTO_SYNC_TOKEN was rotated/revoked on Gitea but the repo Actions secret was not updated. Runbook: see header." >&2
echo "$redacted" >&2
exit 1
fi
echo "git HTTPS auth path: NOP push --dry-run to staging → ${staging_sha:0:8} ✓"
- name: Summarise canary result
# Everything passed — surface a green summary. (Failures
# already wrote ::error:: lines and exited above; if we got
# here, all three probes passed.)
run: |
{
echo "## Auto-sync canary: GREEN"
echo ""
echo "AUTO_SYNC_TOKEN is healthy:"
echo "- Authenticates as \`${EXPECTED_PERSONA}\` ✓"
echo "- Has \`read:repository\` scope on \`${REPO_PATH}\` ✓"
echo "- Git HTTPS auth path: no-op dry-run push to \`refs/heads/staging\` succeeds ✓"
echo ""
echo "Auto-sync main → staging will succeed on the next push to main."
echo "If this canary ever goes RED, see the runbook in this workflow's header."
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,255 +0,0 @@
name: Auto-sync main → staging
# Reflects every push to `main` back onto `staging` so the
# staging-as-superset-of-main invariant holds.
#
# ============================================================
# What this workflow does
# ============================================================
#
# On every push to `main`:
# 1. Checks if staging already contains main → no-op.
# 2. Fetches both branches, merges main into staging in the
# runner workspace (fast-forward if possible, else
# `--no-ff` merge commit).
# 3. Pushes staging directly to origin via the
# `devops-engineer` persona's `AUTO_SYNC_TOKEN`.
#
# Authoritative path: a single `git push origin staging` from
# inside this workflow is the SSOT for advancing staging after
# a main push. No PR, no merge queue, no human approval —
# staging is mechanically maintained as a superset of main.
#
# `auto-promote-staging.yml` is the reverse-direction
# counterpart (staging → main, gated on green CI). Together
# they keep the staging-superset-of-main invariant tight.
#
# ============================================================
# Why direct push (and not "open a PR")
# ============================================================
#
# Pre-2026-05-06 the canonical SCM was GitHub.com, where:
# - The `staging` branch had a `merge_queue` ruleset that
# blocked ALL direct pushes (no bypass even for org
# admins or the GitHub Actions integration).
# - Therefore this workflow opened a PR via `gh pr create`
# and let auto-merge land it through the queue.
#
# Post-2026-05-06 the canonical SCM is Gitea
# (`git.moleculesai.app/molecule-ai/molecule-core`). Gitea:
# - Has no `merge_queue` concept.
# - Allows direct push to protected branches via per-user
# `push_whitelist_usernames` on the branch protection.
# - Does not expose a GraphQL endpoint, so `gh pr create`
# returns `HTTP 405 Method Not Allowed
# (https://git.moleculesai.app/api/graphql)` — the
# pre-suspension architecture cannot work on Gitea.
#
# The molecule-ai/molecule-core staging branch protection
# (verified via `GET /api/v1/repos/.../branch_protections`)
# whitelists `devops-engineer` for direct push. So the
# correct Gitea-shape architecture is: authenticate as
# `devops-engineer`, merge locally, push staging directly.
#
# This is structurally simpler than the GitHub-era PR dance
# and removes the dependence on `gh` CLI / GraphQL entirely.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# This workflow uses `secrets.AUTO_SYNC_TOKEN`, which is a
# personal access token issued to the `devops-engineer`
# persona on Gitea — NOT the founder PAT. The bot-ring
# fingerprint that triggered the GitHub org suspension on
# 2026-05-06 was characterised by founder PAT acting as CI
# at machine speed; per-persona identities split the
# attribution honestly.
#
# Token scope on Gitea: repo write. Push target restricted
# to `staging` (this workflow is the only writer; main is
# untouched). Compromise blast radius: bounded to staging
# branch + this repo's read surface.
#
# Commits are authored by the persona email
# `devops-engineer@agents.moleculesai.app` so commit history
# reflects which automation produced the merge.
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — staging has commits main doesn't, and the merge
# conflicts:
# - The `--no-ff` merge step exits non-zero. Workflow
# fails red. Operator (devops-engineer or human)
# resolves manually:
# git fetch origin
# git checkout staging
# git merge --no-ff origin/main
# # resolve conflicts
# git push origin staging
# - Step summary surfaces the conflict so the failed run
# is self-explanatory.
#
# B — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - `git push` step exits non-zero with `HTTP 401` /
# `403`. Step summary surfaces the failed push.
# - Re-issue the token from `~/.molecule-ai/personas/`
# on the operator host and update the repo Actions
# secret. Re-run the workflow.
#
# C — staging branch protection no longer whitelists
# `devops-engineer`:
# - `git push` exits non-zero with a Gitea protected-
# branch rejection. Step summary surfaces it.
# - Re-add `devops-engineer` to
# `push_whitelist_usernames` on the staging
# protection (Settings → Branches → staging).
#
# D — concurrent push to main while a sync is in flight:
# - The `concurrency` group below serialises runs.
# The second waits for the first; if main advances
# again while we're syncing, the second run picks
# up the new tip on its own fetch.
#
# ============================================================
# Loop safety
# ============================================================
#
# The push to staging from this workflow does NOT itself
# fire a `push: branches: [main]` event (different branch),
# so there's no risk of self-recursion. `auto-promote-staging.yml`
# fires on `workflow_run` of CI etc. — it sees the new
# staging tip on its next gate-completion event, NOT on this
# push directly. No loop.
on:
push:
branches: [main]
# workflow_dispatch lets operators manually backfill a
# missed sync (e.g. if AUTO_SYNC_TOKEN was rotated and a
# main push slipped through while the secret was stale).
workflow_dispatch:
permissions:
contents: write
concurrency:
group: auto-sync-main-to-staging
cancel-in-progress: false
jobs:
sync-staging:
runs-on: ubuntu-latest
steps:
- name: Checkout staging (with devops-engineer push token)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
ref: staging
# AUTO_SYNC_TOKEN authenticates as the
# `devops-engineer` Gitea persona — the only
# identity whitelisted for direct push to
# staging. See header comment for context.
token: ${{ secrets.AUTO_SYNC_TOKEN }}
- name: Configure git author
run: |
# Per-persona identity, NOT founder PAT.
# `feedback_per_agent_gitea_identity_default`.
git config user.name "devops-engineer"
git config user.email "devops-engineer@agents.moleculesai.app"
- name: Check if staging already contains main
id: check
run: |
set -euo pipefail
git fetch origin main
if git merge-base --is-ancestor origin/main HEAD; then
echo "needs_sync=false" >> "$GITHUB_OUTPUT"
{
echo "## No-op"
echo
echo "staging already contains \`origin/main\` ($(git rev-parse --short=8 origin/main))."
} >> "$GITHUB_STEP_SUMMARY"
else
echo "needs_sync=true" >> "$GITHUB_OUTPUT"
MAIN_SHORT=$(git rev-parse --short=8 origin/main)
echo "main_short=${MAIN_SHORT}" >> "$GITHUB_OUTPUT"
echo "::notice::staging is missing main's tip (${MAIN_SHORT}) — merging in-runner and pushing"
fi
- name: Merge main into staging (in-runner)
if: steps.check.outputs.needs_sync == 'true'
id: merge
run: |
set -euo pipefail
# Already on staging from checkout. Try fast-forward
# first (cleanest history); fall back to merge commit
# if staging has commits main doesn't.
if git merge --ff-only origin/main; then
echo "did_ff=true" >> "$GITHUB_OUTPUT"
echo "::notice::Fast-forwarded staging to origin/main"
else
echo "did_ff=false" >> "$GITHUB_OUTPUT"
if ! git merge --no-ff origin/main \
-m "chore: sync main → staging (auto, ${{ steps.check.outputs.main_short }})"; then
# Hygiene: leave the work tree clean before failing.
git merge --abort || true
{
echo "## Conflict"
echo
echo "Auto-merge \`main → staging\` failed with conflicts."
echo "A human (or devops-engineer persona) needs to resolve manually:"
echo
echo '```'
echo "git fetch origin"
echo "git checkout staging"
echo "git merge --no-ff origin/main"
echo "# resolve conflicts"
echo "git push origin staging"
echo '```'
} >> "$GITHUB_STEP_SUMMARY"
exit 1
fi
fi
- name: Push staging to origin
if: steps.check.outputs.needs_sync == 'true'
run: |
set -euo pipefail
# Direct push to staging. devops-engineer persona is
# whitelisted for direct push on the staging branch
# protection (Settings → Branches → staging).
#
# No --force / --force-with-lease: a fast-forward or
# legitimate merge commit on top of current staging
# is the only thing we'd ever push. If origin/staging
# advanced under us (concurrent merge), the push
# legitimately rejects and the next run picks up the
# new state.
if ! git push origin staging; then
{
echo "## Push rejected"
echo
echo "Direct push to \`staging\` failed. Likely causes:"
echo "- \`AUTO_SYNC_TOKEN\` rotated / wrong scope (HTTP 401/403)"
echo "- \`devops-engineer\` no longer in"
echo " \`push_whitelist_usernames\` on the staging"
echo " branch protection (HTTP 422)"
echo "- staging advanced concurrently — re-running this"
echo " workflow on the new main tip will pick it up"
} >> "$GITHUB_STEP_SUMMARY"
exit 1
fi
{
echo "## Auto-sync succeeded"
echo
echo "- staging advanced to: \`$(git rev-parse --short=8 HEAD)\`"
echo "- main tip: \`${{ steps.check.outputs.main_short }}\`"
echo "- Strategy: $([ "${{ steps.merge.outputs.did_ff }}" = "true" ] && echo "fast-forward" || echo "merge commit")"
echo "- Pushed by: \`devops-engineer\` (per-agent persona, anti-bot-ring)"
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -20,6 +20,19 @@ on:
# a few minutes under load — that's fine for a canary.
- cron: '*/30 * * * *'
workflow_dispatch:
inputs:
keep_on_failure:
description: >-
Skip teardown when the canary fails (debugging only). The
tenant org + EC2 + CF tunnel + DNS stay alive so an operator
can SSM into the workspace EC2 and capture docker logs of the
failing claude-code container. REMEMBER to manually delete
via DELETE /cp/admin/tenants/<slug> when done so the org
doesn't accumulate cost. Only honored on workflow_dispatch;
cron runs always tear down (we don't want unattended cron
to leak resources).
type: boolean
default: false
# Serialise with the full-SaaS workflow so they don't contend for the
# same org-create quota on staging. Different group key from
@ -80,6 +93,14 @@ jobs:
# is "Token Plan only" but cheap-per-token and fast.
E2E_MODEL_SLUG: MiniMax-M2.7-highspeed
E2E_RUN_ID: "canary-${{ github.run_id }}"
# Debug-only: when an operator dispatches with keep_on_failure=true,
# the canary script's E2E_KEEP_ORG=1 path skips teardown so the
# tenant org + EC2 stay alive for SSM-based log capture. Cron runs
# never set this (the input only exists on workflow_dispatch) so
# unattended cron always tears down. See molecule-core#129
# failure mode #1 — capturing the actual exception requires
# docker logs from the live container.
E2E_KEEP_ORG: ${{ github.event.inputs.keep_on_failure == 'true' && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@ -137,27 +158,28 @@ jobs:
id: canary
run: bash tests/e2e/test_staging_full_saas.sh
# Alerting: open an issue only after THREE consecutive failures so
# transient flakes (Cloudflare DNS hiccup, AWS API blip) don't spam
# the issue list. If an issue is already open, we still comment on
# every failure so ops sees the streak. Auto-close on next green.
# Alerting: open a sticky issue on the FIRST failure; comment on
# subsequent failures; auto-close on next green. Comment-on-existing
# de-duplicates so a single open issue accumulates the streak —
# ops sees one issue with N comments rather than N issues.
#
# Threshold rationale: canary fires every 30 min, so 3 failures =
# ~90 min of consecutive red — well past any single-run flake but
# still tight enough that a real outage gets surfaced before the
# next deploy window.
# Why no consecutive-failures threshold (e.g., wait 3 runs before
# filing): the prior threshold check used
# `github.rest.actions.listWorkflowRuns()` which Gitea 1.22.6 does
# not expose (returns 404). On Gitea Actions the threshold call
# ALWAYS failed, breaking the entire alerting step and going days
# silent on real regressions (38h+ chronic red on 2026-05-07/08
# before this fix; tracked in molecule-core#129). Filing on first
# failure is also better UX — we want to know about the first red,
# not wait 90 min for it to "count." Real flakes get one issue +
# a quick close-on-green; persistent reds accumulate comments.
- name: Open issue on failure
if: failure()
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
# Inject the workflow path explicitly — context.workflow is
# the *name*, not the file path the actions API needs.
WORKFLOW_PATH: '.github/workflows/canary-staging.yml'
CONSECUTIVE_THRESHOLD: '3'
with:
script: |
const title = '🔴 Canary failing: staging SaaS smoke';
const runURL = `https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
const runURL = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
// Find an existing open canary issue (stable title match).
// If one exists, this isn't a "first failure" — comment and exit.
@ -177,32 +199,12 @@ jobs:
return;
}
// No open issue yet — check the last N-1 runs' conclusions.
// We open the issue only if the last (THRESHOLD-1) runs ALSO
// failed (so this is the 3rd consecutive red).
const threshold = parseInt(process.env.CONSECUTIVE_THRESHOLD, 10);
const { data: runs } = await github.rest.actions.listWorkflowRuns({
owner: context.repo.owner, repo: context.repo.repo,
workflow_id: process.env.WORKFLOW_PATH,
status: 'completed',
per_page: threshold,
// Skip the current in-progress run; it isn't 'completed' yet.
});
// listWorkflowRuns returns recent first. We need (threshold-1)
// prior failures (current run is the threshold-th).
const priorFailures = (runs.workflow_runs || [])
.slice(0, threshold - 1)
.filter(r => r.id !== context.runId)
.filter(r => r.conclusion === 'failure')
.length;
if (priorFailures < threshold - 1) {
core.info(`Below threshold: ${priorFailures + 1}/${threshold} consecutive failures — not filing yet`);
return;
}
// No open issue yet — file one on this first failure. The
// comment-on-existing branch above means subsequent failures
// accumulate as comments on this same issue, so we don't
// spam new issues per run.
const body =
`Canary run failed at ${new Date().toISOString()}, ` +
`${threshold} consecutive runs red.\n\n` +
`Canary run failed at ${new Date().toISOString()}.\n\n` +
`Run: ${runURL}\n\n` +
`This issue auto-closes on the next green canary run. ` +
`Consecutive failures add a comment here rather than a new issue.`;
@ -211,7 +213,7 @@ jobs:
title, body,
labels: ['canary-staging', 'bug'],
});
core.info(`Opened canary failure issue (${threshold} consecutive reds)`);
core.info('Opened canary failure issue (first red)');
- name: Auto-close canary issue on success
if: success()

View File

@ -51,7 +51,7 @@ name: E2E API Smoke Test
# * Pre-pull `alpine:latest` so the platform-server's provisioner
# (`internal/handlers/container_files.go`) can stand up its
# ephemeral token-write helper without a daemon.io round-trip.
# * Create `molecule-monorepo-net` bridge network if missing so the
# * Create `molecule-core-net` bridge network if missing so the
# provisioner's container.HostConfig {NetworkMode: ...} attach
# succeeds.
# Item #1 (timeouts) — evidence on recent runs (77/3191, ae/4270, 0e/
@ -163,12 +163,12 @@ jobs:
# when the image is already present.
docker pull alpine:latest >/dev/null
# Provisioner attaches workspace containers to
# molecule-monorepo-net (workspace-server/internal/provisioner/
# molecule-core-net (workspace-server/internal/provisioner/
# provisioner.go::DefaultNetwork). The bridge already exists on
# the operator host's docker daemon — `network create` is
# idempotent via `|| true`.
docker network create molecule-monorepo-net >/dev/null 2>&1 || true
echo "alpine:latest pre-pulled; molecule-monorepo-net ensured."
docker network create molecule-core-net >/dev/null 2>&1 || true
echo "alpine:latest pre-pulled; molecule-core-net ensured."
- name: Start Postgres (docker)
if: needs.detect-changes.outputs.api == 'true'
run: |

View File

@ -34,7 +34,7 @@ name: Handlers Postgres Integration
# So we sidestep `services:` entirely. The job container still uses
# host-net (inherited from runner config; required for cache server
# discovery on the bridge IP 172.18.0.17:42631). We launch a sibling
# postgres on the existing `molecule-monorepo-net` bridge with a
# postgres on the existing `molecule-core-net` bridge with a
# UNIQUE name per run — `pg-handlers-${RUN_ID}-${RUN_ATTEMPT}` — and
# read its bridge IP via `docker inspect`. A host-net job container
# can reach a bridge-net container directly via the bridge IP (verified
@ -44,7 +44,7 @@ name: Handlers Postgres Integration
# + No host-port collision; N parallel runs share the bridge cleanly
# + `if: always()` cleanup runs even on test-step failure
# - One more step in the workflow (+~3 lines)
# - Requires `molecule-monorepo-net` to exist on the operator host
# - Requires `molecule-core-net` to exist on the operator host
# (it does; declared in docker-compose.yml + docker-compose.infra.yml)
#
# Class B Hongming-owned CICD red sweep, 2026-05-08.
@ -96,7 +96,7 @@ jobs:
PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
# Bridge network already exists on the operator host (declared
# in docker-compose.yml + docker-compose.infra.yml).
PG_NETWORK: molecule-monorepo-net
PG_NETWORK: molecule-core-net
defaults:
run:
working-directory: workspace-server

View File

@ -119,6 +119,17 @@ jobs:
# symptom, different root cause: staging still has the in-image
# clone path, hits the auth error directly).
#
# 2026-05-08 sub-finding (#192): the clone step ALSO fails when
# any referenced workspace-template repo is private and the
# AUTO_SYNC_TOKEN bearer (devops-engineer persona) lacks read
# access. Root cause: 5 of 9 workspace-template repos
# (openclaw, codex, crewai, deepagents, gemini-cli) had been
# marked private with no team grant. Resolution: flipped them
# to public per `feedback_oss_first_repo_visibility_default`
# (the OSS surface should be public). Layer-3 (customer-private +
# marketplace third-party repos) tracked separately in
# internal#102.
#
# Token shape matches publish-workspace-server-image.yml: AUTO_SYNC_TOKEN
# is the devops-engineer persona PAT, NOT the founder PAT (per
# `feedback_per_agent_gitea_identity_default`). clone-manifest.sh

View File

@ -1,276 +0,0 @@
name: Retarget main PRs to staging
# Mechanical enforcement of SHARED_RULES rule 8 ("Staging-first
# workflow, no exceptions"). When a bot opens a PR against `main`,
# retarget it to `staging` automatically and leave an explanatory
# comment. Human / CEO-authored PRs (the staging→main promotion
# PRs, etc.) are left alone — they're the authorised exception
# to the rule.
#
# ============================================================
# What this workflow does
# ============================================================
#
# On `pull_request_target` opened/reopened against `main`:
# 1. If the PR head is `staging`, skip (the auto-promote PRs
# MUST stay base=main).
# 2. If the PR author is a bot, retarget the PR base to
# `staging` via Gitea REST `PATCH /pulls/{N}` body
# `{"base":"staging"}`.
# 3. If the retarget returns 422 "pull request already exists
# for base branch 'staging'" (issue #1884 case: another PR
# on the same head already targets staging), close the
# now-redundant main-PR via Gitea REST instead of failing
# red.
# 4. Post an explainer comment on the retargeted PR via
# Gitea REST `POST /issues/{N}/comments`.
#
# ============================================================
# Why Gitea REST (and not `gh api / gh pr close / gh pr comment`)
# ============================================================
#
# Pre-2026-05-06 this workflow used `gh api -X PATCH "repos/{owner}/{repo}/pulls/{N}" -f base=staging`
# plus `gh pr close` and `gh pr comment`. After the GitHub→Gitea
# cutover those calls fail because:
#
# - `gh` CLI defaults to `api.github.com`. Even with `GH_HOST`
# pointing at Gitea, `gh pr close / comment` route through
# GraphQL (`/api/graphql`) which Gitea does not expose.
# Empirical: every `gh pr *` call returns
# `HTTP 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)`
# — same root cause as #65 (auto-sync, fixed in PR #66) and
# #73/#195 (auto-promote, fixed in PR #78).
# - `gh api -X PATCH /pulls/{N}` happens to use a REST path
# that Gitea also has, but the `gh` host-resolution layer
# and pagination/retry logic don't always hit Gitea cleanly,
# and the cost of switching to direct `curl` is one extra
# line of code.
#
# So this workflow uses direct `curl` calls to Gitea REST. No
# `gh` CLI dependency, no GraphQL, no flaky host-resolution.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# Pre-fix this workflow used the per-job ephemeral
# `secrets.GITHUB_TOKEN`. On Gitea Actions that token has
# narrow scope and unpredictable cross-PR write capability.
#
# Post-fix: `secrets.AUTO_SYNC_TOKEN` (the `devops-engineer`
# Gitea persona). Same persona used by `auto-sync-main-to-staging.yml`
# (PR #66) and `auto-promote-staging.yml` (PR #78). Token scope:
# `push: true` repo write, sufficient for PR-edit + close + comment.
#
# Why this token does NOT need branch-protection bypass:
# patching a PR's base ref is a PR-level operation that does not
# require push perms on either branch (the PR's own commits stay
# put; only the metadata changes).
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — PATCH base→staging returns 422 "pull request already exists"
# (issue #1884 case):
# - Detected by string-match on response body. Workflow
# falls through to closing the now-redundant main-PR
# (Gitea REST `PATCH /pulls/{N}` with `state: closed`)
# and posts an explanation comment. Step summary surfaces.
#
# B — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - First REST call returns 401/403. Step summary surfaces.
# Re-issue token from `~/.molecule-ai/personas/` on the
# operator host and update repo Actions secret.
#
# C — PR was deleted between trigger and run:
# - REST call returns 404. Workflow exits 0 with a notice
# (the rule was already enforced or the PR is gone).
#
# D — author is not actually a bot but the filter mis-fires:
# - Filter is conservative: only triggers on
# `user.type == 'Bot'`, `login` ends with `[bot]`, or
# known bot logins (`molecule-ai[bot]`, `app/molecule-ai`).
# Human PRs slip through unaffected. If a NEW bot login
# starts shipping main-PRs, add it to the filter.
on:
pull_request_target:
types: [opened, reopened]
branches: [main]
permissions:
pull-requests: write
jobs:
retarget:
name: Retarget to staging
runs-on: ubuntu-latest
# Only fire for bot-authored PRs. Human CEO PRs (staging→main
# promotion) are intentional and pass through.
#
# Head-ref guard: never retarget a PR whose head IS `staging`
# — those are the auto-promote staging→main PRs (opened by
# `devops-engineer` since PR #78 / #195 fix). Retargeting
# head=staging onto base=staging fails with HTTP 422 "no new
# commits between base 'staging' and head 'staging'", which
# would surface as a noisy red workflow run on every
# auto-promote (caught 2026-05-03 on the GitHub-era PR #2588).
if: >-
github.event.pull_request.head.ref != 'staging'
&& (
github.event.pull_request.user.type == 'Bot'
|| endsWith(github.event.pull_request.user.login, '[bot]')
|| github.event.pull_request.user.login == 'app/molecule-ai'
|| github.event.pull_request.user.login == 'molecule-ai[bot]'
|| github.event.pull_request.user.login == 'devops-engineer'
)
steps:
- name: Retarget PR base to staging via Gitea REST
id: retarget
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
# Issue #1884 case: when the bot opens a PR against main
# and there's already another PR on the same head branch
# targeting staging, Gitea's PATCH returns 422 with a
# body mentioning "pull request already exists for base
# branch 'staging'" (the Gitea message wording is
# slightly different from GitHub's; the substring match
# below covers both for forward/back compat).
# The retarget can't proceed — but the right response is
# to close the now-redundant main-PR, not to fail the
# workflow noisily. Detect that specific 422 and close
# instead.
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
echo "Retargeting PR #${PR_NUMBER} (author: ${PR_AUTHOR}) from main → staging"
# Curl-status-capture pattern per `feedback_curl_status_capture_pollution`:
# http_code via -w to its own scalar, body to a tempfile, set +e/-e
# bracket so curl's non-zero-on-4xx doesn't pollute the script's exit chain.
BODY_FILE=$(mktemp)
REQ='{"base":"staging"}'
set +e
STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
CURL_RC=$?
set -e
if [ "${CURL_RC}" -ne 0 ]; then
echo "::error::curl PATCH failed (rc=${CURL_RC})"
rm -f "${BODY_FILE}"
exit 1
fi
if [ "${STATUS}" = "201" ] || [ "${STATUS}" = "200" ]; then
NEW_BASE=$(jq -r '.base.ref // "?"' < "${BODY_FILE}")
rm -f "${BODY_FILE}"
if [ "${NEW_BASE}" = "staging" ]; then
echo "::notice::Retargeted PR #${PR_NUMBER} → staging"
echo "outcome=retargeted" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::PATCH returned ${STATUS} but base.ref is '${NEW_BASE}', not 'staging'"
exit 1
fi
# Specifically match the 422 duplicate-base/head error so
# any OTHER PATCH failure (auth, deleted PR, etc.) still
# surfaces as a real workflow failure.
BODY=$(cat "${BODY_FILE}" || true)
rm -f "${BODY_FILE}"
if [ "${STATUS}" = "422" ] && echo "${BODY}" | grep -qE "(pull request already exists for base branch 'staging'|already exists.*base.*staging)"; then
echo "::notice::PR #${PR_NUMBER}: duplicate target-staging PR exists on same head — closing this main-PR as redundant."
# Close the now-redundant main-PR via Gitea REST
# (PATCH state=closed). Post comment explaining
# rationale BEFORE close so the comment lands on the
# PR (commenting on a closed PR works on Gitea, but
# historically caused notification ordering surprises).
CLOSE_BODY_FILE=$(mktemp)
CMT_REQ=$(jq -n '{body:"[retarget-bot] Closing — another PR on the same head branch already targets `staging`. This PR is redundant. See issue #1884 for the rationale."}')
set +e
CMT_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${CMT_REQ}" \
-o "${CLOSE_BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${CMT_STATUS}" != "201" ]; then
echo "::warning::dup-close comment POST returned ${CMT_STATUS}; continuing to close anyway"
cat "${CLOSE_BODY_FILE}" | head -c 300 || true
fi
rm -f "${CLOSE_BODY_FILE}"
CLOSE_REQ='{"state":"closed"}'
CLOSE_RESP=$(mktemp)
set +e
CL_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${CLOSE_REQ}" \
-o "${CLOSE_RESP}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
set -e
if [ "${CL_STATUS}" = "201" ] || [ "${CL_STATUS}" = "200" ]; then
echo "::notice::Closed PR #${PR_NUMBER} as redundant"
echo "outcome=closed-as-duplicate" >> "$GITHUB_OUTPUT"
rm -f "${CLOSE_RESP}"
exit 0
fi
echo "::error::Failed to close redundant PR: HTTP ${CL_STATUS}"
cat "${CLOSE_RESP}" | head -c 300 || true
rm -f "${CLOSE_RESP}"
exit 1
fi
echo "::error::Retarget PATCH failed and was NOT a duplicate-base error: HTTP ${STATUS}"
echo "${BODY}" | head -c 500 >&2
exit 1
- name: Post explainer comment
if: steps.retarget.outputs.outcome == 'retargeted'
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
# PR comments live on the issue endpoint in Gitea
# (PRs ARE issues — same endpoint, different sub-resources
# for diffs/files/etc.). The body uses jq to safely
# encode the multi-line markdown without shell-quote
# nightmares.
REQ=$(jq -n '{body:"[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.\n\n**Why:** per [SHARED_RULES rule 8](https://git.moleculesai.app/molecule-ai/molecule-ai-org-template-molecule-dev/src/branch/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.\n\n**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`.\n\n**If this PR is the CEO`s staging→main promotion:** the Action skipped you (only bot-authored PRs are retargeted, head=staging is also exempted). If you see this comment on your CEO PR, that`s a bug — please tag @hongmingwang."}')
BODY_FILE=$(mktemp)
set +e
STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${STATUS}" = "201" ]; then
echo "::notice::Posted explainer comment on PR #${PR_NUMBER}"
else
echo "::warning::Failed to post explainer (HTTP ${STATUS}) — retarget itself succeeded"
cat "${BODY_FILE}" | head -c 300 || true
fi
rm -f "${BODY_FILE}"

View File

@ -284,7 +284,7 @@ cp .env.example .env
./infra/scripts/setup.sh
# Boots Postgres (:5432), Redis (:6379), Langfuse (:3001),
# and Temporal (:7233 gRPC, :8233 UI) on the shared
# `molecule-monorepo-net` Docker network. Temporal runs with
# `molecule-core-net` Docker network. Temporal runs with
# no auth on localhost — dev-only; production must gate it.
#
# Also populates the template/plugin registry by cloning every repo

View File

@ -283,7 +283,7 @@ cp .env.example .env
./infra/scripts/setup.sh
# 启动 Postgres (:5432)、Redis (:6379)、Langfuse (:3001)
# 以及 Temporal (:7233 gRPC, :8233 UI),全部挂在共享的
# `molecule-monorepo-net` Docker 网络上。Temporal 默认无鉴权,
# `molecule-core-net` Docker 网络上。Temporal 默认无鉴权,
# 仅用于本地开发;生产环境必须加 mTLS / API Key。
#
# 同时会根据 manifest.json 拉取所有模板/插件仓库到

10
canvas/.dockerignore Normal file
View File

@ -0,0 +1,10 @@
# Excluded from `docker build` context. Without this, the COPY . . step in
# canvas/Dockerfile clobbers the freshly-installed node_modules with the
# host's (potentially broken / wrong-arch) copy — the @tailwindcss/oxide
# native binary disagreed and broke `next build`.
node_modules
.next
.git
*.log
.env*
!.env.example

View File

@ -1,7 +1,11 @@
FROM node:22-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm install
# `npm ci` (not `install`) for lockfile-exact reproducibility.
# `--include=optional` ensures the platform-specific @tailwindcss/oxide
# native binary lands — without it, postcss fails with "Cannot read
# properties of undefined (reading 'All')" at build time.
RUN npm ci --include=optional
COPY . .
ARG NEXT_PUBLIC_PLATFORM_URL=http://localhost:8080
ARG NEXT_PUBLIC_WS_URL=ws://localhost:8080/ws

View File

@ -17,6 +17,24 @@ import { dirname, join } from "node:path";
// update one heuristic. Production is unaffected: `output: "standalone"`
// bakes resolved env into the build, and the marker file isn't shipped.
loadMonorepoEnv();
// Boot-time matched-pair guard for ADMIN_TOKEN / NEXT_PUBLIC_ADMIN_TOKEN.
// When ADMIN_TOKEN is set on the workspace-server (server-side bearer
// gate, wsauth_middleware.go ~L245), the canvas MUST send the matching
// NEXT_PUBLIC_ADMIN_TOKEN as `Authorization: Bearer ...` on every API
// call. If only one is set, every workspace API call 401s silently —
// the canvas hydrates with empty data and the user sees a broken page
// with no console hint about the auth-config mismatch.
//
// Pre-fix the matched-pair contract was descriptive only (a comment in
// .env): future devs/agents could re-misconfigure with one of the two
// unset and silently 401. Closes the post-PR-#174 self-review gap.
//
// Warn-only (not exit) — production canvas Docker images bake these
// vars into the build at image-build time, and a missed pair there
// would still emit the warning at runtime via the standalone server's
// startup. Killing the process on misconfiguration would turn a
// recoverable auth issue into a hard crashloop.
checkAdminTokenPair();
const nextConfig: NextConfig = {
output: "standalone",
@ -57,6 +75,43 @@ function loadMonorepoEnv() {
);
}
// Boot-time matched-pair guard. Runs after .env has been loaded so the
// check sees the post-load state. The two env vars must be set or
// unset together; one-without-the-other is the silent-401 footgun.
//
// Treats empty string ("") as unset. An explicitly-empty `KEY=` in
// .env counts as set-to-empty in `process.env`, but for auth purposes
// an empty bearer token is equivalent to no token — so both
// `ADMIN_TOKEN=` and an unset ADMIN_TOKEN are equivalent relative to
// the matched-pair invariant.
//
// Returns void; side effect is the console.error warning. Kept as a
// separate function (exported) so a future test can reset env, call
// this, and assert on captured stderr.
export function checkAdminTokenPair(): void {
const serverSet = !!process.env.ADMIN_TOKEN;
const clientSet = !!process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (serverSet === clientSet) return;
// Distinct messages so the operator can tell which half is missing
// — the fix is symmetric (set the other one) but the diagnostic
// mentions which side is currently set so they don't have to grep.
if (serverSet && !clientSet) {
// eslint-disable-next-line no-console
console.error(
"[next.config] ADMIN_TOKEN is set but NEXT_PUBLIC_ADMIN_TOKEN is not — " +
"canvas will 401 against workspace-server because the bearer header " +
"is never attached. Set both to the same value, or unset both.",
);
} else {
// eslint-disable-next-line no-console
console.error(
"[next.config] NEXT_PUBLIC_ADMIN_TOKEN is set but ADMIN_TOKEN is not — " +
"workspace-server will reject the bearer because no AdminAuth gate " +
"is configured. Set both to the same value, or unset both.",
);
}
}
function findMonorepoRoot(start: string): string | null {
let dir = start;
for (let i = 0; i < 6; i++) {

View File

@ -9,6 +9,7 @@
// AttachmentLightbox).
import { useState, useEffect, useRef } from "react";
import { platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
import { isPlatformAttachment, resolveAttachmentHref } from "./uploads";
import { AttachmentChip } from "./AttachmentViews";
@ -43,13 +44,8 @@ export function AttachmentAudio({ workspaceId, attachment, onDownload, tone }: P
void (async () => {
try {
const href = resolveAttachmentHref(workspaceId, attachment.uri);
const headers: Record<string, string> = {};
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const res = await fetch(href, {
headers,
headers: platformAuthHeaders(),
credentials: "include",
signal: AbortSignal.timeout(60_000),
});
@ -116,9 +112,5 @@ export function AttachmentAudio({ workspaceId, attachment, onDownload, tone }: P
);
}
function getTenantSlug(): string | null {
if (typeof window === "undefined") return null;
const host = window.location.hostname;
const m = host.match(/^([^.]+)\.moleculesai\.app$/);
return m ? m[1] : null;
}
// Local getTenantSlug() removed — auth-header construction now goes
// through platformAuthHeaders() from @/lib/api (#178).

View File

@ -35,6 +35,7 @@
// downscale via canvas, but defer that to v2.
import { useState, useEffect, useRef } from "react";
import { platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
import { isPlatformAttachment, resolveAttachmentHref } from "./uploads";
import { AttachmentLightbox } from "./AttachmentLightbox";
@ -75,22 +76,14 @@ export function AttachmentImage({ workspaceId, attachment, onDownload, tone }: P
}
// Platform-auth path: identical to downloadChatFile but we keep
// the blob (don't trigger a Save-As). Use the same headers it does
// by going through it indirectly — no, downloadChatFile triggers a
// Save-As. Need a separate fetch.
// the blob (don't trigger a Save-As). Auth headers come from the
// shared `platformAuthHeaders()` helper — one source of truth for
// every authenticated raw fetch in the canvas (#178).
void (async () => {
try {
const href = resolveAttachmentHref(workspaceId, attachment.uri);
const headers: Record<string, string> = {};
// Read the same env var downloadChatFile reads — single source
// of truth would be cleaner; refactor opportunity for PR-2 if
// we add the same path to AttachmentVideo.
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const res = await fetch(href, {
headers,
headers: platformAuthHeaders(),
credentials: "include",
signal: AbortSignal.timeout(30_000),
});
@ -184,15 +177,7 @@ export function AttachmentImage({ workspaceId, attachment, onDownload, tone }: P
);
}
// Internal helper — duplicated from uploads.ts (it's not exported
// there). Kept local so this component doesn't reach into private
// surface; if AttachmentVideo / AttachmentPDF in PR-2/PR-3 also need
// it, lift to an exported helper at that point (the third-caller
// rule).
function getTenantSlug(): string | null {
if (typeof window === "undefined") return null;
const host = window.location.hostname;
// Tenant subdomain shape: <slug>.moleculesai.app
const m = host.match(/^([^.]+)\.moleculesai\.app$/);
return m ? m[1] : null;
}
// Local getTenantSlug() removed — auth-header construction now goes
// through platformAuthHeaders() from @/lib/api which uses the canonical
// getTenantSlug() from @/lib/tenant. This eliminates the duplicate
// hostname-regex + the duplicate bearer-token-attach pattern (#178).

View File

@ -33,6 +33,7 @@
// timeout, swap to chip. Implemented as a 3-second watchdog.
import { useState, useEffect, useRef } from "react";
import { platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
import { isPlatformAttachment, resolveAttachmentHref } from "./uploads";
import { AttachmentLightbox } from "./AttachmentLightbox";
@ -69,13 +70,8 @@ export function AttachmentPDF({ workspaceId, attachment, onDownload, tone }: Pro
void (async () => {
try {
const href = resolveAttachmentHref(workspaceId, attachment.uri);
const headers: Record<string, string> = {};
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const res = await fetch(href, {
headers,
headers: platformAuthHeaders(),
credentials: "include",
signal: AbortSignal.timeout(60_000),
});
@ -189,9 +185,5 @@ function PdfGlyph() {
);
}
function getTenantSlug(): string | null {
if (typeof window === "undefined") return null;
const host = window.location.hostname;
const m = host.match(/^([^.]+)\.moleculesai\.app$/);
return m ? m[1] : null;
}
// Local getTenantSlug() removed — auth-header construction now goes
// through platformAuthHeaders() from @/lib/api (#178).

View File

@ -26,6 +26,7 @@
// to download the full file.
import { useState, useEffect } from "react";
import { platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
import { isPlatformAttachment, resolveAttachmentHref } from "./uploads";
import { AttachmentChip } from "./AttachmentViews";
@ -57,13 +58,13 @@ export function AttachmentTextPreview({ workspaceId, attachment, onDownload, ton
void (async () => {
try {
const href = resolveAttachmentHref(workspaceId, attachment.uri);
const headers: Record<string, string> = {};
if (isPlatformAttachment(attachment.uri)) {
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
}
// Only attach platform auth headers for in-platform URIs —
// off-platform URLs (HTTP/HTTPS attachments) MUST NOT receive
// our bearer token (it would leak the admin token to a third
// party). The branch is preserved with the new shared helper.
const headers: Record<string, string> = isPlatformAttachment(attachment.uri)
? platformAuthHeaders()
: {};
const res = await fetch(href, {
headers,
credentials: "include",
@ -182,9 +183,5 @@ export function AttachmentTextPreview({ workspaceId, attachment, onDownload, ton
);
}
function getTenantSlug(): string | null {
if (typeof window === "undefined") return null;
const host = window.location.hostname;
const m = host.match(/^([^.]+)\.moleculesai\.app$/);
return m ? m[1] : null;
}
// Local getTenantSlug() removed — auth-header construction now goes
// through platformAuthHeaders() from @/lib/api (#178).

View File

@ -25,6 +25,7 @@
// fetch via service worker. v2 if measured-needed.
import { useState, useEffect, useRef } from "react";
import { platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
import { isPlatformAttachment, resolveAttachmentHref } from "./uploads";
import { AttachmentChip } from "./AttachmentViews";
@ -61,13 +62,8 @@ export function AttachmentVideo({ workspaceId, attachment, onDownload, tone }: P
void (async () => {
try {
const href = resolveAttachmentHref(workspaceId, attachment.uri);
const headers: Record<string, string> = {};
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const res = await fetch(href, {
headers,
headers: platformAuthHeaders(),
credentials: "include",
// Videos are larger than images on average; give the request
// more headroom. The server's per-request body cap (50MB) is
@ -147,11 +143,5 @@ export function AttachmentVideo({ workspaceId, attachment, onDownload, tone }: P
);
}
// Internal helper — same shape as AttachmentImage's. Lifted to a
// shared util in PR-2.5 if a third caller needs it (PDF, audio).
function getTenantSlug(): string | null {
if (typeof window === "undefined") return null;
const host = window.location.hostname;
const m = host.match(/^([^.]+)\.moleculesai\.app$/);
return m ? m[1] : null;
}
// Local getTenantSlug() removed — auth-header construction now goes
// through platformAuthHeaders() from @/lib/api (#178).

View File

@ -64,6 +64,54 @@ describe("extractRequestText", () => {
};
expect(extractRequestText(body)).toBe("");
});
// Regression: delegation.go stores request_body as {"task": "...", "delegation_id": "..."}.
// extractRequestText was checking only the A2A params.message.parts path, so
// outbound delegation messages were rendered as blank bubbles.
// Fix: check body.task first (delegation format), then fall back to A2A.
it("extracts text from body.task (delegation format)", () => {
const body = {
task: "Deploy the staging environment for this sprint's release",
delegation_id: "delg_01jx8q4n3k",
};
expect(extractRequestText(body)).toBe(
"Deploy the staging environment for this sprint's release"
);
});
it("prefers body.task over A2A params when both present", () => {
const body = {
task: "Delegation text wins",
params: {
message: {
parts: [{ kind: "text", text: "A2A text" }],
},
},
};
// body.task is checked first; delegation wins for delegation activities.
expect(extractRequestText(body)).toBe("Delegation text wins");
});
it("falls back to A2A format when body.task is absent", () => {
const body = {
params: {
message: {
parts: [{ kind: "text", text: "A2A fallback" }],
},
},
};
expect(extractRequestText(body)).toBe("A2A fallback");
});
it("returns empty string when body.task is empty string", () => {
const body = { task: "" };
expect(extractRequestText(body)).toBe("");
});
it("returns empty string when body.task is not a string", () => {
const body = { task: 42 };
expect(extractRequestText(body)).toBe("");
});
});
describe("extractResponseText", () => {

View File

@ -114,9 +114,15 @@ function basename(uri: string): string {
return slash >= 0 ? cleaned.slice(slash + 1) : cleaned || "file";
}
/** Extract user message text from an activity log request_body */
/** Extract user message text from an activity log request_body.
*
* Delegation activities from delegation.go store the task text directly
* at `body.task` as a plain string: {"task": "...", "delegation_id": "..."}.
* Check this first before falling back to the A2A JSON-RPC format
* (`body.params.message.parts[].text`). */
export function extractRequestText(body: Record<string, unknown> | null): string {
if (!body) return "";
if (typeof body.task === "string" && body.task) return body.task;
const params = body.params as Record<string, unknown> | undefined;
const msg = params?.message as Record<string, unknown> | undefined;
const parts = msg?.parts as Array<Record<string, unknown>> | undefined;

View File

@ -1,12 +1,16 @@
import { PLATFORM_URL } from "@/lib/api";
import { getTenantSlug } from "@/lib/tenant";
import { PLATFORM_URL, platformAuthHeaders } from "@/lib/api";
import type { ChatAttachment } from "./types";
/** Chat attachments are intentionally uploaded via a direct fetch()
* instead of the `api.post` helper `api.post` JSON-stringifies the
* body, which would 500 on a Blob. Mirrors the header plumbing
* (tenant slug, admin token, credentials) so SaaS + self-hosted
* callers work the same way. */
* body, which would 500 on a Blob. Auth headers (tenant slug, admin
* token, credentials) come from `platformAuthHeaders()` the same
* helper `request()` uses, so a missing bearer surfaces as a single
* fix site instead of N copies. We deliberately do NOT set
* Content-Type so the browser writes the multipart boundary into the
* header; setting it manually would yield a multipart body the server
* can't parse. See lib/api.ts platformAuthHeaders() for the full
* rationale on why this pair must stay matched. */
export async function uploadChatFiles(
workspaceId: string,
files: File[],
@ -16,18 +20,12 @@ export async function uploadChatFiles(
const form = new FormData();
for (const f of files) form.append("files", f, f.name);
const headers: Record<string, string> = {};
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
// Uploads legitimately take a while on cold cache (tar write +
// docker cp into the container). 60s is comfortable for the 25MB/
// 50MB caps the server enforces.
const res = await fetch(`${PLATFORM_URL}/workspaces/${workspaceId}/chat/uploads`, {
method: "POST",
headers,
headers: platformAuthHeaders(),
body: form,
credentials: "include",
signal: AbortSignal.timeout(60_000),
@ -143,14 +141,8 @@ export async function downloadChatFile(
return;
}
const headers: Record<string, string> = {};
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const res = await fetch(href, {
headers,
headers: platformAuthHeaders(),
credentials: "include",
signal: AbortSignal.timeout(60_000),
});

View File

@ -0,0 +1,130 @@
// @vitest-environment node
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
// Tests for the boot-time matched-pair guard added to next.config.ts.
//
// Why this lives in src/lib/__tests__ even though the function is in
// canvas/next.config.ts:
// - next.config.ts runs as ESM-but-also-CJS depending on which
// consumer loads it (Next.js dev server vs Next.js build); we
// want the test to be a plain ESM module Vitest already handles.
// - Importing from "../../../next.config" pulls in the rest of the
// file (loadMonorepoEnv, the default export, etc.) which has
// side effects on module load (it runs loadMonorepoEnv()
// immediately). To keep the test hermetic we don't import — we
// duplicate the function under test.
//
// Sourcing the function from a shared module would be cleaner, but
// next.config.ts is required to be a single self-contained file by
// Next.js's loader on some host configurations. Pin invariant: the
// duplicated function below MUST stay byte-identical to the one in
// next.config.ts. If you change one, change the other and bump this
// comment.
function checkAdminTokenPair(): void {
const serverSet = !!process.env.ADMIN_TOKEN;
const clientSet = !!process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (serverSet === clientSet) return;
if (serverSet && !clientSet) {
// eslint-disable-next-line no-console
console.error(
"[next.config] ADMIN_TOKEN is set but NEXT_PUBLIC_ADMIN_TOKEN is not — " +
"canvas will 401 against workspace-server because the bearer header " +
"is never attached. Set both to the same value, or unset both.",
);
} else {
// eslint-disable-next-line no-console
console.error(
"[next.config] NEXT_PUBLIC_ADMIN_TOKEN is set but ADMIN_TOKEN is not — " +
"workspace-server will reject the bearer because no AdminAuth gate " +
"is configured. Set both to the same value, or unset both.",
);
}
}
describe("checkAdminTokenPair", () => {
// Snapshot env so individual tests can stomp on it without leaking.
// Rebuild from snapshot in afterEach so the next test sees a known
// baseline regardless of mutation pattern.
let originalEnv: Record<string, string | undefined>;
let errorSpy: ReturnType<typeof vi.spyOn>;
beforeEach(() => {
originalEnv = {
ADMIN_TOKEN: process.env.ADMIN_TOKEN,
NEXT_PUBLIC_ADMIN_TOKEN: process.env.NEXT_PUBLIC_ADMIN_TOKEN,
};
delete process.env.ADMIN_TOKEN;
delete process.env.NEXT_PUBLIC_ADMIN_TOKEN;
errorSpy = vi.spyOn(console, "error").mockImplementation(() => {});
});
afterEach(() => {
if (originalEnv.ADMIN_TOKEN === undefined) delete process.env.ADMIN_TOKEN;
else process.env.ADMIN_TOKEN = originalEnv.ADMIN_TOKEN;
if (originalEnv.NEXT_PUBLIC_ADMIN_TOKEN === undefined) delete process.env.NEXT_PUBLIC_ADMIN_TOKEN;
else process.env.NEXT_PUBLIC_ADMIN_TOKEN = originalEnv.NEXT_PUBLIC_ADMIN_TOKEN;
errorSpy.mockRestore();
});
it("emits no warning when both are unset", () => {
checkAdminTokenPair();
expect(errorSpy).not.toHaveBeenCalled();
});
it("emits no warning when both are set (matched pair, the happy path)", () => {
process.env.ADMIN_TOKEN = "local-dev-admin";
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "local-dev-admin";
checkAdminTokenPair();
expect(errorSpy).not.toHaveBeenCalled();
});
it("warns when ADMIN_TOKEN is set but NEXT_PUBLIC_ADMIN_TOKEN is not", () => {
process.env.ADMIN_TOKEN = "local-dev-admin";
checkAdminTokenPair();
expect(errorSpy).toHaveBeenCalledTimes(1);
// Exact-string assertion — substring would also pass when the
// function's branch logic is broken (e.g. emits both messages, or
// emits the wrong one). Pin the exact message that operators will
// see in their dev console so regressions are visible.
expect(errorSpy).toHaveBeenCalledWith(
"[next.config] ADMIN_TOKEN is set but NEXT_PUBLIC_ADMIN_TOKEN is not — " +
"canvas will 401 against workspace-server because the bearer header " +
"is never attached. Set both to the same value, or unset both.",
);
});
it("warns when NEXT_PUBLIC_ADMIN_TOKEN is set but ADMIN_TOKEN is not", () => {
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "local-dev-admin";
checkAdminTokenPair();
expect(errorSpy).toHaveBeenCalledTimes(1);
expect(errorSpy).toHaveBeenCalledWith(
"[next.config] NEXT_PUBLIC_ADMIN_TOKEN is set but ADMIN_TOKEN is not — " +
"workspace-server will reject the bearer because no AdminAuth gate " +
"is configured. Set both to the same value, or unset both.",
);
});
// Empty string in process.env is the JS-side representation of `KEY=`
// (no value) in a .env file. Treating "" as unset makes the pair
// invariant symmetric: `KEY=` and `unset KEY` produce the same
// verdict. Without this branch, an operator who comments out the
// value but leaves the line would get a false-positive warning.
it("treats empty string as unset (so KEY= and unset KEY are equivalent)", () => {
process.env.ADMIN_TOKEN = "";
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "";
checkAdminTokenPair();
expect(errorSpy).not.toHaveBeenCalled();
});
it("warns when ADMIN_TOKEN is set and NEXT_PUBLIC_ADMIN_TOKEN is empty string", () => {
process.env.ADMIN_TOKEN = "local-dev-admin";
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "";
checkAdminTokenPair();
expect(errorSpy).toHaveBeenCalledTimes(1);
// First branch — server set, client unset.
expect(errorSpy).toHaveBeenCalledWith(
expect.stringContaining("ADMIN_TOKEN is set but NEXT_PUBLIC_ADMIN_TOKEN is not"),
);
});
});

View File

@ -0,0 +1,97 @@
// @vitest-environment jsdom
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
// Tests for platformAuthHeaders — the shared helper extracted in #178
// to consolidate the bearer-token-attach + tenant-slug-attach pattern
// that was previously duplicated across 7 raw-fetch callsites in the
// canvas (uploads + 5 Attachment* components + the api.ts request()
// function).
//
// What we pin here:
// - Returns a fresh object each call (so callers can mutate without
// leaking into each other).
// - Empty result on a non-tenant host with no admin token (the
// localhost / self-hosted shape).
// - Bearer attached when NEXT_PUBLIC_ADMIN_TOKEN is set.
// - X-Molecule-Org-Slug attached when window.location.hostname is a
// tenant subdomain (<slug>.moleculesai.app).
// - Both attached when both apply (the production SaaS shape).
//
// Why jsdom: getTenantSlug() reads window.location.hostname. Node-only
// environment yields no window and getTenantSlug returns null
// unconditionally — wouldn't exercise the slug branch.
import { platformAuthHeaders } from "../api";
describe("platformAuthHeaders", () => {
let originalAdminToken: string | undefined;
beforeEach(() => {
originalAdminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
delete process.env.NEXT_PUBLIC_ADMIN_TOKEN;
});
afterEach(() => {
if (originalAdminToken === undefined) delete process.env.NEXT_PUBLIC_ADMIN_TOKEN;
else process.env.NEXT_PUBLIC_ADMIN_TOKEN = originalAdminToken;
// jsdom resets hostname between tests via the @vitest-environment
// pragma's per-test isolation. No explicit reset needed.
});
it("returns an empty object on a non-tenant host with no admin token", () => {
// jsdom default hostname is "localhost" — not a tenant slug, so
// getTenantSlug() returns null and no X-Molecule-Org-Slug is added.
const headers = platformAuthHeaders();
expect(headers).toEqual({});
});
it("attaches Authorization when NEXT_PUBLIC_ADMIN_TOKEN is set", () => {
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "local-dev-admin";
const headers = platformAuthHeaders();
expect(headers).toEqual({ Authorization: "Bearer local-dev-admin" });
});
it("does NOT attach Authorization when NEXT_PUBLIC_ADMIN_TOKEN is empty string", () => {
// Empty-string env is the JS-side shape of `KEY=` in .env.
// Treating it as unset matches the matched-pair guard in
// next.config.ts (admin-token-pair.test.ts) — symmetric semantics.
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "";
const headers = platformAuthHeaders();
expect(headers).toEqual({});
});
it("attaches X-Molecule-Org-Slug on a tenant subdomain", () => {
Object.defineProperty(window, "location", {
value: { hostname: "reno-stars.moleculesai.app" },
writable: true,
});
const headers = platformAuthHeaders();
expect(headers).toEqual({ "X-Molecule-Org-Slug": "reno-stars" });
});
it("attaches both when both apply (production SaaS shape)", () => {
Object.defineProperty(window, "location", {
value: { hostname: "reno-stars.moleculesai.app" },
writable: true,
});
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "tenant-bearer";
const headers = platformAuthHeaders();
// Pin exact-equality on the full shape — substring/contains
// assertions would also pass for an extra-header bug.
expect(headers).toEqual({
"X-Molecule-Org-Slug": "reno-stars",
Authorization: "Bearer tenant-bearer",
});
});
it("returns a fresh object each call (callers can mutate safely)", () => {
process.env.NEXT_PUBLIC_ADMIN_TOKEN = "tok";
const a = platformAuthHeaders();
const b = platformAuthHeaders();
expect(a).not.toBe(b); // distinct refs
expect(a).toEqual(b); // same content
a["Content-Type"] = "application/json";
// Mutation on `a` does not leak into `b`.
expect(b["Content-Type"]).toBeUndefined();
});
});

View File

@ -21,6 +21,45 @@ export interface RequestOptions {
timeoutMs?: number;
}
/**
* Build the platform auth header set used by every authenticated fetch
* from the canvas. Returns a fresh object so callers can mutate (e.g.
* append `Content-Type` for JSON requests, omit it for FormData).
*
* SaaS cross-origin shape:
* - `X-Molecule-Org-Slug` derived from `window.location.hostname`
* by `getTenantSlug()`. Control plane uses it for fly-replay
* routing. Empty on localhost / non-tenant hosts safe to omit.
* - `Authorization: Bearer <token>` `NEXT_PUBLIC_ADMIN_TOKEN` baked
* into the canvas build (see canvas/Dockerfile L8/L11). Required by
* the workspace-server when `ADMIN_TOKEN` is set on the server side
* (Tier-2b AdminAuth gate, wsauth_middleware.go ~L245). Empty when
* no admin token was provisioned the Tier-1 session-cookie path
* handles that case via `credentials:"include"`.
*
* Why a shared helper: the two-line "read env, attach bearer; read
* slug, attach header" pattern was duplicated across `request()` and
* 7 raw-fetch callsites (chat uploads/download + 5 Attachment*
* components) before this consolidation. A new poller or raw fetch
* that forgets one of the two headers silently 401s against
* workspace-server when ADMIN_TOKEN is set the exact bug shape
* called out in #178 / closes the post-#176 self-review gap.
*
* Callers that want JSON Content-Type should spread this and add it
* themselves; FormData callers should NOT add Content-Type (the
* browser sets the multipart boundary). Centralizing the auth pair
* but leaving Content-Type up to the caller is the minimum viable
* shared shape.
*/
export function platformAuthHeaders(): Record<string, string> {
const headers: Record<string, string> = {};
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
return headers;
}
async function request<T>(
method: string,
path: string,
@ -28,17 +67,16 @@ async function request<T>(
retryCount = 0,
options?: RequestOptions,
): Promise<T> {
// SaaS cross-origin shape:
// - X-Molecule-Org-Slug: derived from window.location.hostname by
// getTenantSlug(). Control plane uses it for fly-replay routing.
// Empty on localhost / non-tenant hosts — safe to omit.
// - credentials:"include": sends the session cookie cross-origin.
// Cookie's Domain=.moleculesai.app attribute + cp's CORS allow this.
const headers: Record<string, string> = { "Content-Type": "application/json" };
// JSON-bodied request — Content-Type is JSON. Auth pair comes from
// the shared helper; see its doc comment for the SaaS-shape rationale.
const headers: Record<string, string> = {
"Content-Type": "application/json",
...platformAuthHeaders(),
};
// Re-read slug locally for the 401 handler below — `headers` already
// has it, but the 401 branch needs the bare value to gate the
// session-probe + redirect logic on tenant context.
const slug = getTenantSlug();
if (slug) headers["X-Molecule-Org-Slug"] = slug;
const adminToken = process.env.NEXT_PUBLIC_ADMIN_TOKEN;
if (adminToken) headers["Authorization"] = `Bearer ${adminToken}`;
const res = await fetch(`${PLATFORM_URL}${path}`, {
method,

View File

@ -7,6 +7,22 @@ export default defineConfig({
test: {
environment: 'node',
exclude: ['e2e/**', 'node_modules/**', '**/dist/**'],
// Issue #22 / vitest pool investigation:
//
// The forks pool spawns one Node.js worker per concurrent slot.
// Each jsdom-environment worker bootstraps a full DOM (~30-50 MB resident
// set) at cold-start. With the default maxWorkers derived from CPU
// count, multiple jsdom workers can start simultaneously, exhausting
// memory on the 2-CPU Gitea Actions runner and causing pool workers to
// fail to respond with "[vitest-pool]: Timeout starting … runner."
//
// Fix: cap maxWorkers at 1 so only one worker is alive at any time.
// Tests still run in parallel within that single worker's process (via
// node's EventLoop) — this is the same parallelism as the `threads`
// pool but without the per-worker jsdom cold-start overhead. 51 test
// files that previously took 5070 s with 5 failures now run
// sequentially through one worker, eliminating the memory spike.
maxWorkers: 1,
// CI-conditional test timeout (issue #96).
//
// Vitest's 5000ms default is too tight for the first test in any

View File

@ -119,7 +119,7 @@ services:
networks:
default:
name: molecule-monorepo-net
name: molecule-core-net
external: true
volumes:

View File

@ -1,3 +1,7 @@
# Include infra services (Temporal, Langfuse) so `docker compose up` starts the full stack.
include:
- docker-compose.infra.yml
services:
# --- Infrastructure ---
postgres:
@ -12,7 +16,8 @@ services:
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- molecule-monorepo-net
- molecule-core-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-dev}"]
interval: 2s
@ -39,7 +44,7 @@ services:
psql -h postgres -U "$${POSTGRES_USER}" -d postgres -c "CREATE DATABASE langfuse"
fi
networks:
- molecule-monorepo-net
- molecule-core-net
redis:
image: redis:7-alpine
@ -49,7 +54,8 @@ services:
volumes:
- redisdata:/data
networks:
- molecule-monorepo-net
- molecule-core-net
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 2s
@ -66,7 +72,7 @@ services:
volumes:
- clickhousedata:/var/lib/clickhouse
networks:
- molecule-monorepo-net
- molecule-core-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:8123/ping || exit 1"]
interval: 5s
@ -95,7 +101,7 @@ services:
ports:
- "3001:3000"
networks:
- molecule-monorepo-net
- molecule-core-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/public/health || exit 1"]
interval: 10s
@ -126,6 +132,10 @@ services:
REDIS_URL: redis://redis:6379
PORT: "${PLATFORM_PORT:-8080}"
PLATFORM_URL: "http://platform:${PLATFORM_PORT:-8080}"
# Container network namespace is already isolated; "all interfaces"
# inside the container = the bridge interface only. The fail-open
# default (127.0.0.1) would block host-to-container access.
BIND_ADDR: "${BIND_ADDR:-0.0.0.0}"
# Default MOLECULE_ENV=development so the WorkspaceAuth / AdminAuth
# middleware fail-open path activates when ADMIN_TOKEN is unset —
# otherwise the canvas (which runs without a bearer in pure local
@ -195,12 +205,28 @@ services:
# App private key — read-only bind-mount. The host-side path is
# gitignored per .gitignore rules (/.secrets/ + *.pem).
- ./.secrets/github-app.pem:/secrets/github-app.pem:ro
# Per-role persona credentials (molecule-core#242 local surface).
# Sourced at workspace creation time by org_import.go::loadPersonaEnvFile
# when a workspace.yaml carries `role: <name>`. The host-side dir is
# populated by the operator-host bootstrap kit (28 dev-tree personas);
# /etc/molecule-bootstrap/personas is the in-container path the
# platform expects (matches the prod tenant-EC2 path so the same code
# works in both modes).
#
# Read-only mount — workspace-server only reads, never writes here.
# If the host dir is empty/missing the platform's loadPersonaEnvFile
# silently no-ops per its existing semantics, so this mount is safe
# even on a fresh machine that hasn't run the bootstrap kit yet.
- ${MOLECULE_PERSONA_ROOT_HOST:-${HOME}/.molecule-ai/personas}:/etc/molecule-bootstrap/personas:ro
ports:
- "${PLATFORM_PUBLISH_PORT:-8080}:${PLATFORM_PORT:-8080}"
networks:
- molecule-monorepo-net
- molecule-core-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:${PLATFORM_PORT:-8080}/health || exit 1"]
# Plain GET — `--spider` would issue HEAD, which returns 404 because
# /health is registered as GET only.
test: ["CMD-SHELL", "wget -qO /dev/null --tries=1 http://localhost:${PLATFORM_PORT:-8080}/health || exit 1"]
interval: 5s
timeout: 5s
retries: 10
@ -236,9 +262,9 @@ services:
ports:
- "${CANVAS_PUBLISH_PORT:-3000}:${CANVAS_PORT:-3000}"
networks:
- molecule-monorepo-net
- molecule-core-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:${CANVAS_PORT:-3000} || exit 1"]
test: ["CMD-SHELL", "wget -qO /dev/null --tries=1 http://127.0.0.1:${CANVAS_PORT:-3000} || exit 1"]
interval: 10s
timeout: 5s
retries: 10
@ -269,7 +295,7 @@ services:
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-}
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-sk-molecule}
networks:
- molecule-monorepo-net
- molecule-core-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:4000/health || exit 1"]
@ -294,7 +320,7 @@ services:
volumes:
- ollamadata:/root/.ollama
networks:
- molecule-monorepo-net
- molecule-core-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "ollama list || exit 1"]
@ -304,8 +330,8 @@ services:
start_period: 20s
networks:
molecule-monorepo-net:
name: molecule-monorepo-net
molecule-core-net:
name: molecule-core-net
volumes:
pgdata:

View File

@ -67,7 +67,7 @@ On-demand fits naturally with how agents work — an agent only needs to know ab
This is acceptable for MVP because:
- All workspaces are provisioned by the same platform on trusted infrastructure
- Docker network isolation (`molecule-monorepo-net`) limits who can reach workspace endpoints
- Docker network isolation (`molecule-core-net`) limits who can reach workspace endpoints
- The tool is self-hosted — the operator controls the network
**Known gap:** Once workspace A caches workspace B's URL, nothing stops A from calling B directly even after the hierarchy changes and A is no longer supposed to reach B. The cached URL remains valid until the container is restarted or the URL changes.

View File

@ -124,7 +124,7 @@ Six runtime adapters ship production-ready on `main`: LangGraph, DeepAgents, Cla
| Platform ↔ Redis | TCP | Ephemeral state (liveness TTL), caching, pub/sub |
| Workspace ↔ Workspace | HTTP (A2A JSON-RPC 2.0) | Direct peer-to-peer, **platform not in data path** |
| Workspace → Langfuse | HTTP | Automatic OpenTelemetry tracing |
| Docker Network | `molecule-monorepo-net` | Internal-only by default, no exposed DB/Redis ports |
| Docker Network | `molecule-core-net` | Internal-only by default, no exposed DB/Redis ports |
### Core Components
@ -465,7 +465,7 @@ Unknown tier values default to T2 for safety. Applied via `provisioner.ApplyTier
### Docker Networking
- All containers join `molecule-monorepo-net` private network
- All containers join `molecule-core-net` private network
- Container naming: `ws-{workspace_id[:12]}`
- Ephemeral host port binding: `127.0.0.1:0→8000/tcp`

View File

@ -19,7 +19,7 @@ The provisioner is the platform component that deploys workspace containers and
## Docker Networking (Tier 1-3, Tier 4 uses host)
All workspace containers join the `molecule-monorepo-net` Docker network. Containers are named `ws-{id[:12]}` (first 12 chars of workspace UUID). Two exported helpers in `provisioner` package provide the canonical naming:
All workspace containers join the `molecule-core-net` Docker network. Containers are named `ws-{id[:12]}` (first 12 chars of workspace UUID). Two exported helpers in `provisioner` package provide the canonical naming:
- `provisioner.ContainerName(workspaceID)``ws-{id[:12]}`
- `provisioner.InternalURL(workspaceID)``http://ws-{id[:12]}:8000`
@ -38,7 +38,7 @@ This URL is pre-stored in both Postgres and Redis before the agent registers. Wh
**Why not use Docker-internal URLs?** In local dev, the platform runs on the host (not in Docker), so it cannot resolve Docker container hostnames. The ephemeral port mapping lets the A2A proxy reach agents via localhost. In production (platform in Docker), the Docker-internal URL (`http://ws-{id}:8000`) would work directly.
**Workspace-to-workspace discovery:** When a workspace discovers another workspace (via `X-Workspace-ID` header on `GET /registry/discover/:id`), the platform returns the Docker-internal URL (`http://ws-{first12chars}:8000`) so containers can reach each other directly on `molecule-monorepo-net`. The internal URL is cached in Redis at provision time and also synthesized as a fallback if the cache misses (only for online/degraded workspaces).
**Workspace-to-workspace discovery:** When a workspace discovers another workspace (via `X-Workspace-ID` header on `GET /registry/discover/:id`), the platform returns the Docker-internal URL (`http://ws-{first12chars}:8000`) so containers can reach each other directly on `molecule-core-net`. The internal URL is cached in Redis at provision time and also synthesized as a fallback if the cache misses (only for online/degraded workspaces).
For external HTTPS access (multi-host mode), Nginx on the host handles TLS termination and proxies to the container.

View File

@ -0,0 +1,119 @@
# Canvas Architecture Audit — VERIFIED
> **Status:** VERIFIED — Cross-referenced against molecule-core/canvas/src/ (2026-05-09)
> **Author:** Core-FE (draft), Core-UIUX (verification)
> **Updated:** 2026-05-09 with architecture structure + known issues
## Canvas Stack (Verified)
| Technology | Version | Purpose |
|-----------|--------|---------|
| React Flow | `@xyflow/react` v12 | Node/edge rendering |
| Framework | Next.js 14 App Router | Routing, SSR |
| Styling | Tailwind v4 | CSS with custom properties |
| State | Zustand | Client state management |
## Directory Structure (Verified)
```
canvas/src/
├── components/
│ ├── Canvas.tsx # Viewport management, ReactFlow wrapper
│ ├── Toolbar.tsx # Add node/edge controls
│ ├── ContextMenu.tsx # Right-click menu
│ ├── SidePanel.tsx # Properties panel
│ ├── WorkspaceNode.tsx # Node rendering
│ ├── A2AEdge.tsx # Edge rendering
│ └── [tests]/ # Accessibility + component tests
├── stores/
│ └── secrets-store.ts # ⚠️ getGrouped() performance issue
├── hooks/
│ ├── useSocketEvent.ts
│ ├── useTemplateDeploy.tsx
│ └── useWorkspaceName.ts
└── lib/
├── api.ts
├── auth.ts
├── canvas-actions.ts
├── design-tokens.ts # STATUS_CONFIG, TIER_CONFIG
├── theme.ts
└── theme-provider.tsx # ThemeProvider, useTheme()
## Known Issues
### 🔴 HIGH: secrets-store.ts Performance
**File:** `canvas/src/stores/secrets-store.ts`
**Issue:** `getGrouped()` selector creates new objects every call (Object.fromEntries + arrays). Not memoized.
**Impact:** Causes unnecessary re-renders on frequent selector access.
**Fix needed:** Memoize the selector or use a proper Zustand selector pattern.
### 🟡 MEDIUM: Pre-commit Hook Verification
**Issue:** Pre-commit hook checks 'use client' on hook-using components but unclear if it actually fails on violations.
**Action:** Verify the hook is enforcing the rule correctly.
## Verified Findings
### Node Rendering ✅ (with notes)
- **Framework:** `@xyflow/react` (React Flow) — DOM-based, not SVG/Canvas
- **Node selection:** `aria-pressed` + border ring (`border-accent/70`) + shadow
- **Node drag:** React Flow native drag — mouse only, no keyboard alternative yet
- **Node resize:** `NodeResizer` component visible on selected card, keyboard-inaccessible
- **Status:** Accessible via `aria-label` on node cards — "Alpha Workspace workspace — online"
### Edge Wiring ✅
- **Edge rendering:** React Flow SVG paths
- **Edge click target:** 1.5px stroke (CSS `stroke-width: 1.5 !important` in globals.css)
- **Edge creation:** React Flow drag-from-handle
- **Edge anchors:** Visible on hover (`hover:!bg-blue-400`), not keyboard accessible
- **Status:** Partial — mouse users only
### Canvas Controls ✅
- **Zoom:** React Flow Controls component (verify if keyboard accessible)
- **Pan:** Space+drag, mouse drag
- **Minimap:** Not present (MiniMap mocked as null in tests)
- **Status:** Basic keyboard support via viewport shortcuts
### Keyboard Shortcuts ⚠️ PARTIAL
- Exists in `useKeyboardShortcuts.ts` but no `aria-describedby` on trigger buttons
- No dedicated keyboard shortcut help dialog
- **Gap:** Users can't discover shortcuts visually
### Focus Management ✅ (strong)
- Skip link → `#canvas-main`
- `aria-label` on ReactFlow container ✅
- Focus trap in modals via Radix ✅
- Focus ring: `focus-visible:ring-2 focus-visible:ring-blue-500 focus-visible:ring-offset-2 focus-visible:ring-offset-zinc-950`
### Accessibility Tree ⚠️ PARTIAL
- Canvas is in accessibility tree (React Flow DOM nodes)
- Node state changes not announced to screen readers (no `aria-live` region)
- Context menus announced via `role="menu"`
### Context Menus ✅ (strong)
- `role="menu"`, `role="menuitem"`, `role="separator"`
- `aria-label` with workspace name ✅
- ArrowUp/Down navigation with wrap-around ✅
- Escape + Tab close menu ✅
- Auto-focus first item on open ✅
### Drag and Drop ⚠️ PARTIAL
- **Mouse drag:** React Flow native
- **Drop target:** Visual indicator (`bg-emerald-950/40 border-emerald-400/60`) ✅
- **Keyboard alternative:** None — nodes repositioned only via mouse drag
- **Status:** Mouse-only. Keyboard users cannot rearrange nodes.
---
## Remaining Gaps (Priority Order)
| Priority | Item | Files | Status |
|----------|------|-------|--------|
| HIGH | Screen reader announcements for canvas state changes | Canvas.tsx | Not started |
| MEDIUM | Keyboard shortcut help dialog | useKeyboardShortcuts.ts | Not started |
| MEDIUM | Keyboard-accessible node drag | WorkspaceNode.tsx, useDragHandlers.ts | Not started |
| LOW | Edge anchor keyboard accessibility | A2AEdge.tsx | Not started |
| LOW | Node resize keyboard accessibility | WorkspaceNode.tsx (NodeResizer) | Not started |
---
*Verified 2026-05-09 by Core-UIUX against molecule-core/canvas/src/*

View File

@ -0,0 +1,424 @@
# Canvas Design System v1 — VERIFIED
> **Status:** VERIFIED — Cross-referenced against molecule-core/canvas/src/ (2026-05-09)
> **Authors:** Core-FE (draft), Core-UIUX (verification + updates)
> **Source files verified:**
> - `canvas/src/app/globals.css`
> - `canvas/src/styles/theme-tokens.css`
> - `canvas/src/lib/design-tokens.ts`
> - `canvas/src/components/Tooltip.tsx`
> - `canvas/src/components/ContextMenu.tsx`
> - `canvas/src/components/Canvas.tsx`
> - `canvas/src/components/__tests__/Canvas.a11y.test.tsx`
> - `canvas/src/components/__tests__/ContextMenu.keyboard.test.tsx`
> - `canvas/src/components/__tests__/MissingKeysModal.a11y.test.tsx`
> - `canvas/src/components/__tests__/ConversationTraceModal.a11y.test.tsx`
---
## 1. Color Palette — Three-Mode Theme System
Canvas supports **three themes**: System (follows OS), Light, Dark. Controlled via `ThemeProvider` in `theme-provider.tsx` with preference persisted in `mol_theme` cookie.
**Key principle: Use semantic tokens, NOT raw zinc values for surfaces.**
### 1.1 Theme-Mutable Tokens (use these for surfaces)
Defined in `globals.css` via Tailwind v4 `@theme` block. Automatically flip between light/dark.
**Light theme (warm paper):**
| Token | Tailwind Class | Hex | Usage |
|-------|--------------|-----|-------|
| `--color-surface` | `bg-surface` | `#fafaf7` | Page background |
| `--color-surface-elevated` | `bg-surface-elevated` | `#ffffff` | Elevated cards, modals |
| `--color-surface-sunken` | `bg-surface-sunken` | `#f3f1ec` | Input fields, recessed areas |
| `--color-surface-card` | `bg-surface-card` | `#efece4` | Node cards, chips |
| `--color-line` | `border-line` | `#e6e2d8` | Dividers, borders |
| `--color-line-soft` | `border-line-soft` | `#efece4` | Subtle dividers |
| `--color-ink` | `text-ink` | `#15181c` | Primary text |
| `--color-ink-mid` | `text-ink-mid` | `#5a5e66` | Secondary text |
| `--color-ink-soft` | `text-ink-soft` | `#8b8e95` | Tertiary text, placeholders |
| `--color-accent` | `text-accent` | `#3b5bdb` | Links, primary actions |
| `--color-accent-strong` | `text-accent-strong` | `#1a2f99` | Emphasized accent |
| `--color-warm` | `text-warm` | `#c0532b` | Warnings |
| `--color-good` | `text-good` | `#2f7a4d` | Success states |
| `--color-bad` | `text-bad` | `#b94e4a` | Error states |
**Dark theme:**
| Token | Hex | Usage |
|-------|-----|-------|
| `--color-surface` | `#0e1014` | Page background |
| `--color-surface-elevated` | `#15181c` | Elevated cards |
| `--color-surface-sunken` | `#0a0b0e` | Input fields |
| `--color-surface-card` | `#1a1d23` | Node cards |
| `--color-line` | `#2a2f3a` | Dividers |
| `--color-ink` | `#f4f1e9` | Primary text |
| `--color-ink-mid` | `#c8c2b4` | Secondary text |
| `--color-ink-soft` | `#8d92a0` | Tertiary text |
| `--color-accent` | `#6883e8` | Links (brighter for AA contrast) |
| `--color-accent-strong` | `#8aa1ee` | Emphasized accent |
| `--color-warm` | `#d96f48` | Warnings |
| `--color-good` | `#4ca06e` | Success |
| `--color-bad` | `#d27773` | Errors |
### 1.2 Always-Dark Tokens (terminal surfaces)
Terminals, console modal, log streams **stay dark** in all themes — readable green-on-black doesn't translate to light.
| Token | Tailwind Class | Hex | Usage |
|-------|--------------|-----|-------|
| `--color-bg` | `bg-bg` | `rgb(9 9 11)` / zinc-950 | Terminal background |
| `--color-bg-elev` | `bg-bg-elev` | `rgb(24 24 27)` / zinc-900 | Elevated terminal surfaces |
| `--color-bg-card` | `bg-bg-card` | `rgb(39 39 42)` / zinc-800 | Terminal cards |
| `--color-line-strong` | `border-line-strong` | `rgb(63 63 70)` / zinc-700 | Strong borders |
| `--color-ink-mute` | `text-ink-mute` | `rgb(161 161 170)` / zinc-400 | Muted text |
| `--color-ink-dim` | `text-ink-dim` | `rgb(113 113 122)` / zinc-500 | Dim text |
### 1.3 Raw Zinc Usage Rules
**Use raw zinc for:**
- Borders: `border-zinc-700`, `border-zinc-800`
- Disabled states: `text-zinc-600`, `bg-zinc-800`
- Code highlighting: `bg-zinc-900`, `text-zinc-300`
- Terminal surfaces: `bg-zinc-950` (always-dark)
**NEVER use for surfaces:**
- `bg-zinc-900` or `bg-zinc-950` as page/card backgrounds — use `bg-surface`
- `text-zinc-50` or `text-zinc-100` as primary text — use `text-ink`
- `bg-white`, `bg-gray-50/100` for surfaces — use semantic tokens
### 1.4 Accessibility Contrast
| Pair | Ratio | WCAG |
|------|-------|------|
| `text-ink` on `bg-surface` (light) | ~14.5:1 | AAA |
| `text-ink` on `bg-surface` (dark) | ~15.8:1 | AAA |
| `text-ink-mid` on `bg-surface` (light) | ~5.2:1 | AA |
| `text-ink-mid` on `bg-surface` (dark) | ~5.9:1 | AA |
| `text-accent` on `bg-surface` (light) | ~4.8:1 | AA |
| `text-accent` on `bg-surface` (dark) | ~4.6:1 | AA |
---
## 2. Typography Scale
**Actual font stack** (from `globals.css`):
```
-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", sans-serif
```
No custom fonts loaded — uses OS-native system stack.
| Size Token | Tailwind | Usage |
|------------|----------|-------|
| `text-[10px]` | 10px | Micro badges, tier labels |
| `text-[11px]` | 11px | Tooltip text |
| `text-xs` / `text-[12px]` | 12px | Badges, timestamps |
| `text-sm` / `text-[13px]` | 1314px | Secondary labels, node titles |
| `text-base` / `text-[16px]` | 16px | Body text |
| `text-lg` | 18px | Section headers |
| `text-xl` | 20px | Modal titles |
**Line height:** `leading-tight` (1.25) for headings, `leading-relaxed` (1.625) for body/tooltips.
---
## 3. Animation / Motion Tokens
**Defined in `canvas/src/styles/theme-tokens.css`** — use these, don't hardcode ms values.
| Token | Value | Usage |
|-------|-------|-------|
| `--mol-duration-fast` | 150ms | Hover states, button feedback |
| `--mol-duration-base` | 300ms | Standard transitions |
| `--mol-duration-spawn` | 350ms | Node spawn animation |
| `--mol-duration-root-complete` | 700ms | Org-deploy root glow |
| `--mol-duration-fit-view` | 800ms | Canvas fit-viewport |
| Token | Value | Usage |
|-------|-------|-------|
| `--mol-easing-standard` | `cubic-bezier(0.2, 0, 0, 1)` | Default ease |
| `--mol-easing-bounce-out` | `cubic-bezier(0.2, 0.8, 0.2, 1.05)` | Node spawn bounce |
| `--mol-easing-emphasize` | `cubic-bezier(0.3, 0, 0, 1)` | Modal/drawer enter |
**CSS usage:**
```css
/* Good — reference the token */
transition: all var(--mol-duration-fast) ease;
/* Bad — hardcoded value */
transition: all 150ms ease;
```
---
## 4. Component Patterns (Verified)
### 4.1 Buttons
```tsx
// Primary — accent background, ink text
<button className="bg-accent hover:bg-accent/90 active:scale-95
text-ink px-4 py-2 rounded-md text-sm font-medium
focus-visible:ring-2 focus-visible:ring-blue-500
focus-visible:ring-offset-2 focus-visible:ring-offset-zinc-900
disabled:opacity-50 disabled:cursor-not-allowed">
Primary
</button>
// Secondary — surface-card background, border-line
<button className="bg-surface-card hover:bg-surface-elevated border border-line
text-ink px-4 py-2 rounded-md text-sm font-medium
focus-visible:ring-2 focus-visible:ring-blue-500
focus-visible:ring-offset-2 focus-visible:ring-offset-zinc-900">
Secondary
</button>
// Ghost — no background, hover surface
<button className="hover:bg-surface-card text-ink-mid hover:text-ink
px-4 py-2 rounded-md text-sm font-medium">
Ghost
</button>
// Danger — bad color, requires confirmation dialog
<button className="bg-bad hover:bg-bad/90 text-white px-4 py-2
rounded-md text-sm font-medium">
Delete
</button>
```
**States:** default, hover, active (`scale-95`), focus (`ring-2 ring-blue-500 ring-offset-2 ring-offset-zinc-900`), disabled (`opacity-50 cursor-not-allowed`).
### 4.2 Inputs
```tsx
// Text input — use semantic tokens for surfaces
<input
className="bg-surface-sunken border border-line text-ink
placeholder:text-ink-soft px-3 py-2 rounded-md text-sm
focus:outline-none focus:ring-2 focus:ring-blue-500
focus:border-transparent
disabled:opacity-50 disabled:cursor-not-allowed"
placeholder="Enter workspace name"
/>
// Error state
<input
className="border-bad focus:ring-bad"
aria-invalid="true"
aria-describedby="error-message"
/>
```
**Label:** `text-sm font-medium text-ink mb-1`
**Error:** `text-xs text-bad mt-1`
### 4.3 Cards
```tsx
// Workspace node card (from WorkspaceNode.tsx)
<div className="bg-surface-sunken/90 border border-line/80
rounded-xl p-3.5 py-2.5
hover:border-zinc-500/60 shadow-lg shadow-black/30
focus-visible:ring-2 focus-visible:ring-accent/70
focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-950">
```
### 4.4 Modals (Radix Dialog)
```tsx
// Backdrop
<div className="fixed inset-0 bg-black/70 backdrop-blur-sm z-50"
aria-hidden="true" />
// Dialog — use surface-card + border-line
<div className="bg-surface-card border border-line rounded-xl
shadow-2xl p-6 max-w-md w-full mx-4">
{/* Modal content */}
</div>
```
Note: Uses `--color-surface-sunken` for sunken areas (node cards). Cards use `bg-surface-card`.
**Important:** Use `@radix-ui/react-dialog` — it provides WCAG 2.1 compliance automatically (focus trap, Escape key, aria-modal, aria-labelledby).
### 4.5 Tooltips
**Verified implementation** (`canvas/src/components/Tooltip.tsx`):
```tsx
// Trigger wraps children
<span aria-describedby="tooltip-id">
{children}
</span>
// Tooltip portal (shows on hover + focus, 400ms delay)
<div id="tooltip-id"
role="tooltip"
className="fixed z-[9999] max-w-[400px] max-h-[300px] overflow-y-auto
px-3 py-2 bg-surface-card border border-line
rounded-lg shadow-2xl shadow-black/60 pointer-events-none">
<div className="text-[11px] text-ink whitespace-pre-wrap break-words leading-relaxed">
{text}
</div>
</div>
```
**WCAG 1.4.13 compliance:** Escape key dismisses tooltip without moving pointer/focus.
### 4.6 Theme Switching
Use `useTheme()` hook from `theme-provider.tsx`:
```tsx
import { useTheme } from "@/lib/theme-provider";
function ThemeToggle() {
const { theme, resolvedTheme, setTheme } = useTheme();
return (
<select
value={theme}
onChange={(e) => setTheme(e.target.value as ThemePreference)}
>
<option value="system">System</option>
<option value="light">Light</option>
<option value="dark">Dark</option>
</select>
);
}
```
**Theme types:**
```ts
type ThemePreference = "system" | "light" | "dark";
type ResolvedTheme = "light" | "dark";
```
**Cookie:** `mol_theme` with `Domain=.moleculesai.app` — persists across surfaces.
---
## 5. Accessibility Rules (WCAG 2.1 AA) — VERIFIED
### 5.1 Focus Management ✅ VERIFIED
- All interactive elements have `focus-visible:ring-2 focus-visible:ring-blue-500 focus-visible:ring-offset-2 focus-visible:ring-offset-zinc-950`
- No `outline-none` without equivalent focus ring
- Radix Dialog traps focus automatically
### 5.2 Semantic HTML ✅ VERIFIED
- Buttons use `<button>` — verified in WorkspaceNode.tsx, ContextMenu.tsx
- Form inputs have associated `<label>` patterns
- Radix Dialog provides role="dialog" + aria-modal
### 5.3 ARIA ✅ VERIFIED
- Icon-only buttons: `aria-label` with descriptive text (not "X")
- Example: `aria-label="Extract ${name} from team"` in WorkspaceNode.tsx
- Live regions: `aria-live="polite"` on Toast component
- Modals: Radix provides `role="dialog"`, `aria-modal="true"`, `aria-labelledby`
- Error messages: `aria-invalid="true"`, `aria-describedby` linking to error text
- Tooltips: `role="tooltip"` + `aria-describedby` on trigger
### 5.4 Keyboard Navigation ✅ VERIFIED
- ContextMenu: ArrowUp/Down wraps, Enter/Space selects, Escape closes, Tab closes
- Modals: Escape closes (Radix), focus returns to trigger
- `prefers-reduced-motion` ✅ (verified in globals.css)
### 5.5 Color Independence ✅
- Status indicators use text labels + icons, not color alone
- `STATUS_CONFIG` has text labels: "Online", "Offline", "Failed", etc.
---
## 6. React Flow Canvas Specifics
Canvas uses `@xyflow/react` (React Flow).
### Canvas Container ✅ VERIFIED
```tsx
// Canvas.tsx wraps ReactFlow with:
<ReactFlow
aria-label="Molecule AI workspace canvas"
// ...
/>
```
### Node Accessibility ✅ VERIFIED
- `role="button"` on workspace node cards
- `tabIndex={0}` for keyboard focus
- `aria-pressed` for selection state
- `aria-label` with workspace name + status
### Skip Link ✅ VERIFIED
```tsx
<a href="#canvas-main">Skip to canvas</a>
<main id="canvas-main" role="main">
```
---
## 7. Enforcement Checklist
### Color Token Rules
- [x] No `bg-white` / `bg-zinc-50` for surfaces — use `bg-surface`
- [x] No `text-zinc-50` / `text-zinc-100` for surfaces — use `text-ink`
- [x] No `bg-zinc-900` / `bg-zinc-950` for surfaces — use `bg-surface` or `bg-surface-card`
- [x] Raw zinc OK for: borders, disabled states, code, terminal surfaces
### Accessibility Rules
- [x] All buttons have focus rings (verified in tests)
- [x] All modals use Radix Dialog (verified)
- [x] All tooltips use `role="tooltip"` + `aria-describedby` (verified)
- [x] No `outline-none` without focus ring (verified)
- [x] All inputs have visible labels (verified pattern)
- [x] Contrast ratios at 4.5:1 minimum (verified above)
- [x] `prefers-reduced-motion` suppresses all animations (verified in globals.css)
- [x] Context menu has keyboard navigation (verified in ContextMenu.keyboard.test.tsx)
- [x] Theme switching works: System/Light/Dark modes verified
---
## 8. Canvas Architecture (Verified)
**Stack:**
- `@xyflow/react` v12 (React Flow) — node/edge rendering
- Next.js 14 App Router
- Tailwind v4 with CSS custom properties
- Zustand for state management
**Directory Structure:**
```
canvas/src/
├── components/ # Canvas.tsx, Toolbar.tsx, ContextMenu.tsx, SidePanel.tsx, WorkspaceNode.tsx, A2AEdge.tsx
├── stores/ # secrets-store.ts (only store)
├── hooks/ # useSocketEvent.ts, useTemplateDeploy.tsx, useWorkspaceName.ts
├── lib/ # api.ts, auth.ts, canvas-actions.ts, design-tokens.ts, theme.ts, theme-provider.tsx
└── app/ # Next.js App Router
```
## 9. Known Issues (Technical Debt)
### Performance Issues
- **secrets-store.ts getGrouped() selector** — Creates new objects every call (Object.fromEntries + arrays) — not memoized. Causes performance issues with frequent re-renders. Needs selector optimization.
### Code Quality
- Check for `any` types in canvas/ directory
- Verify pre-commit hook actually fails on 'use client' violations (unverified)
- Verify all Zustand selectors avoid object creation (see getGrouped issue above)
- Check 'use client' directive on hook-using components
### Testing
- Add axe-core integration for automated accessibility testing
- Visual regression tests — no screenshot tests exist yet (KI-006)
- Target >80% test coverage on changed files
## 10. Remaining Open Items
### Accessibility Gaps
1. **Screen reader announcements** — Node/edge changes not announced. Need `aria-live="polite"` region.
2. **Keyboard shortcut help dialog** — No dedicated dialog. Shortcuts exist in `useKeyboardShortcuts.ts` but no `aria-describedby` hints on buttons.
3. **Edge anchor accessibility** — React Flow handles purely visual. Need ARIA annotations for screen readers.
4. **Drag-and-drop keyboard alternative** — Mouse only. Need keyboard equivalent for node rearrangement.
### Performance
5. **secrets-store.ts getGrouped()** — Not memoized, creates new objects every call.

View File

@ -73,7 +73,7 @@ These are applied after CORS middleware on every response.
## 14. No Exposed Database Ports
Postgres and Redis must not expose host ports. They communicate exclusively over the internal Docker network (`molecule-monorepo-net`). Use `docker compose exec` for direct access during development.
Postgres and Redis must not expose host ports. They communicate exclusively over the internal Docker network (`molecule-core-net`). Use `docker compose exec` for direct access during development.
## Related Docs

View File

@ -73,19 +73,19 @@ runner-wide setting, not per-job. Source: gitea/act_runner config docs
Flipping the global `container.network` to `bridge` would break every
other workflow in the repo (cache server discovery,
`molecule-monorepo-net` peer access during integration tests, etc.) —
`molecule-core-net` peer access during integration tests, etc.) —
unacceptable blast radius for a per-test bug.
## Fix shape
`handlers-postgres-integration.yml` no longer uses `services: postgres:`.
It launches a sibling postgres container manually on the existing
`molecule-monorepo-net` bridge network with a per-run unique name:
`molecule-core-net` bridge network with a per-run unique name:
```yaml
env:
PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
PG_NETWORK: molecule-monorepo-net
PG_NETWORK: molecule-core-net
steps:
- name: Start sibling Postgres on bridge network
@ -117,7 +117,7 @@ host-network runner config. Translate using this same pattern:
1. Drop the `services:` block.
2. Use `${{ github.run_id }}-${{ github.run_attempt }}` for unique
container name.
3. Launch on `molecule-monorepo-net` (already trusted bridge in
3. Launch on `molecule-core-net` (already trusted bridge in
`docker-compose.infra.yml`).
4. Read back the bridge IP via `docker inspect` and export as a step env.
5. `if: always()` cleanup step at the end.
@ -131,7 +131,7 @@ in one place.
- Issue #88 (closed by #92): localhost → 127.0.0.1 fix that unmasked
this collision; the IPv6 fix is correct, port collision is the new
layer.
- Issue #94 created `molecule-monorepo-net` + `alpine:latest` as
- Issue #94 created `molecule-core-net` + `alpine:latest` as
prereqs.
- Saved memory `feedback_act_runner_github_server_url` documents
another act_runner-vs-GHA divergence (server URL).

View File

@ -5,7 +5,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
echo "==> Ensuring shared docker network exists..."
docker network create molecule-monorepo-net 2>/dev/null || true
docker network create molecule-core-net 2>/dev/null || true
# Populate the template / plugin registry.
# workspace-configs-templates/, org-templates/, and plugins/ are intentionally

View File

@ -1,5 +1,5 @@
{
"_comment": "Pin refs to release tags for reproducible builds. 'main' is OK while all repos are internal.",
"_comment": "OSS surface registry — every repo listed here MUST be public on git.moleculesai.app. Layer-3 customer/private templates are NOT registered here; they are handled at provision-time via the per-tenant credential resolver (see internal#102 RFC). 'main' refs are pinned to tags before broad rollout.",
"version": 1,
"plugins": [
{"name": "browser-automation", "repo": "molecule-ai/molecule-ai-plugin-browser-automation", "ref": "main"},
@ -40,7 +40,6 @@
{"name": "free-beats-all", "repo": "molecule-ai/molecule-ai-org-template-free-beats-all", "ref": "main"},
{"name": "medo-smoke", "repo": "molecule-ai/molecule-ai-org-template-medo-smoke", "ref": "main"},
{"name": "molecule-worker-gemini", "repo": "molecule-ai/molecule-ai-org-template-molecule-worker-gemini", "ref": "main"},
{"name": "reno-stars", "repo": "molecule-ai/molecule-ai-org-template-reno-stars", "ref": "main"},
{"name": "ux-ab-lab", "repo": "molecule-ai/molecule-ai-org-template-ux-ab-lab", "ref": "main"},
{"name": "mock-bigorg", "repo": "molecule-ai/molecule-ai-org-template-mock-bigorg", "ref": "main"}
]

View File

@ -8,27 +8,24 @@
# Requires: git, jq (lighter than python3 — ~2MB vs ~50MB in Alpine)
#
# Auth (optional):
# When MOLECULE_GITEA_TOKEN is set, embed it as the basic-auth password so
# private Gitea repos clone successfully. When unset, clone anonymously
# (works only for repos that are public on git.moleculesai.app).
# Post-2026-05-08 (#192): every repo in manifest.json is public on
# git.moleculesai.app. Anonymous clone works for the entire registered
# set. The OSS-surface contract is recorded in manifest.json's _comment
# — Layer-3 customer/private templates (e.g. reno-stars) are NOT in the
# manifest; they are handled at provision-time via the per-tenant
# credential resolver (internal#102 RFC).
#
# This is the path the publish-workspace-server-image.yml workflow uses:
# it injects AUTO_SYNC_TOKEN (devops-engineer persona PAT, repo:read on
# the molecule-ai org) so the in-CI pre-clone step succeeds for ALL
# manifest entries — including the 5 private workspace-template-* repos
# (codex, crewai, deepagents, gemini-cli, langgraph) and all 7
# org-template-* repos.
# MOLECULE_GITEA_TOKEN is therefore optional today. Kept supported for
# two reasons: (a) historical CI configs that still inject
# AUTO_SYNC_TOKEN remain harmless, (b) reserved for the case where a
# private internal-only template is later registered via a ci-readonly
# team grant — review must explicitly sign off on that, since it
# violates the public-OSS-surface contract.
#
# The token never enters the Docker image: this script runs in the
# trusted CI context BEFORE `docker buildx build`, populates
# The token (when set) never enters the Docker image: this script runs
# in the trusted CI context BEFORE `docker buildx build`, populates
# .tenant-bundle-deps/, then `Dockerfile.tenant` COPYs from there with
# the .git directories already stripped (see line ~67 below).
#
# For backward compatibility — and so a fresh clone works without
# secrets when (eventually) the workspace-template-* repos flip public —
# the unset path remains a plain anonymous HTTPS clone. That path will
# FAIL with "could not read Username" on private repos today; CI MUST
# set MOLECULE_GITEA_TOKEN.
set -euo pipefail

View File

@ -24,7 +24,7 @@ echo "=== NUKE ==="
docker compose -f "$ROOT/docker-compose.yml" down -v 2>/dev/null || true
docker ps -a --format "{{.Names}}" | grep "^ws-" | xargs -r docker rm -f 2>/dev/null || true
docker volume ls --format "{{.Name}}" | grep "^ws-" | xargs -r docker volume rm 2>/dev/null || true
docker network rm molecule-monorepo-net 2>/dev/null || true
docker network rm molecule-core-net 2>/dev/null || true
echo " cleaned"
echo "=== POPULATE MANIFEST DIRS ==="

View File

@ -1,2 +1,5 @@
# The compiled binary, not the cmd/server package.
/server
# air live-reload build cache (Dockerfile.dev + docker-compose.dev.yml).
/tmp/

View File

@ -15,8 +15,14 @@
FROM golang:1.25-alpine
# air + git (for go mod) + ca-certs (for TLS) + tzdata (for time-zone DB).
RUN apk add --no-cache git ca-certificates tzdata wget \
# air + git (for go mod) + ca-certs (for TLS) + tzdata (for time-zone DB)
# + docker-cli + docker-cli-buildx so the platform binary can shell out to
# /var/run/docker.sock (bind-mounted from host) for local-build provisioning.
# docker-cli alone is insufficient: alpine's docker-cli enables BuildKit by
# default but ships without buildx, producing
# `ERROR: BuildKit is enabled but the buildx component is missing or broken`
# on every `docker build`. docker-cli-buildx provides the buildx subcommand.
RUN apk add --no-cache git ca-certificates tzdata wget docker-cli docker-cli-buildx \
&& go install github.com/air-verse/air@latest
WORKDIR /app/workspace-server
@ -31,7 +37,7 @@ RUN go mod download
# block) so the Dockerfile doesn't need to COPY it. air watches the
# bind-mounted dir for changes.
ENV CGO_ENABLED=1
ENV CGO_ENABLED=0
ENV GOFLAGS="-buildvcs=false"
# Run air with the .air.toml in the bind-mounted source dir.

View File

@ -26,6 +26,14 @@ func TestExtended_WorkspaceDelete(t *testing.T) {
WithArgs(wsDelID).
WillReturnRows(sqlmock.NewRows([]string{"id", "name"}))
// CascadeDelete walks descendants unconditionally (the 0-children
// optimization in the old inline path was dropped during the
// CascadeDelete extraction — descendant CTE returns 0 rows here,
// same end state, one extra cheap query).
mock.ExpectQuery("WITH RECURSIVE descendants").
WithArgs(wsDelID).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
// #73: batch UPDATE happens BEFORE any container teardown.
// Uses ANY($1::uuid[]) even with a single ID for consistency.
mock.ExpectExec("UPDATE workspaces SET status =").

View File

@ -25,6 +25,35 @@ import (
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
"github.com/google/uuid"
)
// insertMCPDelegationRow writes a delegation activity row so the canvas
// Agent Comms tab can show the task text for MCP-initiated delegations.
// Mirrors insertDelegationRow (delegation.go) for the MCP tool path.
func insertMCPDelegationRow(ctx context.Context, db *sql.DB, workspaceID, targetID, delegationID, task string) error {
taskJSON, _ := json.Marshal(map[string]interface{}{
"task": task,
"delegation_id": delegationID,
})
_, err := db.ExecContext(ctx, `
INSERT INTO activity_logs (workspace_id, activity_type, method, source_id, target_id, summary, request_body, status)
VALUES ($1, 'delegation', 'delegate', $2, $3, $4, $5::jsonb, 'pending')
`, workspaceID, workspaceID, targetID, "Delegating to "+targetID, string(taskJSON))
return err
}
// updateMCPDelegationStatus updates a delegation activity row's status.
// Mirrors updateDelegationStatus (delegation.go) for the MCP tool path.
func updateMCPDelegationStatus(ctx context.Context, db *sql.DB, workspaceID, delegationID, status, errorDetail string) {
if _, err := db.ExecContext(ctx, `
UPDATE activity_logs
SET status = $1, error_detail = CASE WHEN $2 = '' THEN error_detail ELSE $2 END
WHERE workspace_id = $3
AND method = 'delegate'
AND request_body->>'delegation_id' = $4
`, status, errorDetail, workspaceID, delegationID); err != nil {
log.Printf("MCP Delegation %s: status update failed: %v", delegationID, err)
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Tool implementations
// ─────────────────────────────────────────────────────────────────────────────
@ -154,6 +183,13 @@ func (h *MCPHandler) toolDelegateTask(ctx context.Context, callerID string, args
return "", fmt.Errorf("workspace %s is not authorised to communicate with %s", callerID, targetID)
}
// Issue #158: write delegation row so canvas Agent Comms tab shows the task text.
delegationID := uuid.New().String()
if err := insertMCPDelegationRow(ctx, h.database, callerID, targetID, delegationID, task); err != nil {
log.Printf("MCP delegate_task: failed to record delegation row: %v", err)
// Non-fatal: still make the A2A call even if activity log write fails.
}
agentURL, err := mcpResolveURL(ctx, h.database, targetID)
if err != nil {
return "", err
@ -197,10 +233,16 @@ func (h *MCPHandler) toolDelegateTask(ctx context.Context, callerID string, args
resp, err := http.DefaultClient.Do(httpReq)
if err != nil {
updateMCPDelegationStatus(ctx, h.database, callerID, delegationID, "failed", err.Error())
return "", fmt.Errorf("A2A call failed: %w", err)
}
defer func() { _ = resp.Body.Close() }()
// A 200/500 from the peer still means the call was dispatched — only
// network errors are truly "failed". Status 'dispatched' is correct for
// any HTTP response (peer's A2A layer handles the actual processing).
updateMCPDelegationStatus(ctx, h.database, callerID, delegationID, "dispatched", "")
body, err := io.ReadAll(io.LimitReader(resp.Body, 1<<20))
if err != nil {
return "", fmt.Errorf("failed to read response: %w", err)
@ -223,7 +265,16 @@ func (h *MCPHandler) toolDelegateTaskAsync(ctx context.Context, callerID string,
return "", fmt.Errorf("workspace %s is not authorised to communicate with %s", callerID, targetID)
}
taskID := uuid.New().String()
delegationID := uuid.New().String()
// Issue #158: write delegation row so canvas Agent Comms tab shows the task text.
// Insert with 'dispatched' status since the goroutine won't update it.
if err := insertMCPDelegationRow(ctx, h.database, callerID, targetID, delegationID, task); err != nil {
log.Printf("MCP delegate_task_async: failed to record delegation row: %v", err)
// Non-fatal: still fire the A2A call.
} else {
updateMCPDelegationStatus(ctx, h.database, callerID, delegationID, "dispatched", "")
}
// Fire and forget in a detached goroutine. Use a background context so
// the call is not cancelled when the HTTP request completes.
@ -244,7 +295,7 @@ func (h *MCPHandler) toolDelegateTaskAsync(ctx context.Context, callerID string,
a2aBody, _ := json.Marshal(map[string]interface{}{
"jsonrpc": "2.0",
"id": taskID,
"id": delegationID,
"method": "message/send",
"params": map[string]interface{}{
"message": map[string]interface{}{
@ -273,7 +324,7 @@ func (h *MCPHandler) toolDelegateTaskAsync(ctx context.Context, callerID string,
_, _ = io.Copy(io.Discard, resp.Body)
}()
return fmt.Sprintf(`{"task_id":%q,"status":"dispatched","target_id":%q}`, taskID, targetID), nil
return fmt.Sprintf(`{"task_id":%q,"status":"dispatched","target_id":%q}`, delegationID, targetID), nil
}
func (h *MCPHandler) toolCheckTaskStatus(ctx context.Context, callerID string, args map[string]interface{}) (string, error) {

View File

@ -13,12 +13,15 @@ import (
"path/filepath"
"strconv"
"strings"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/channels"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/events"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/models"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/gin-gonic/gin"
"github.com/lib/pq"
"gopkg.in/yaml.v3"
)
@ -422,6 +425,16 @@ type OrgWorkspace struct {
Tier int `yaml:"tier" json:"tier"`
Template string `yaml:"template" json:"template"`
FilesDir string `yaml:"files_dir" json:"files_dir"`
// Spawning gates whether this workspace (AND its descendants) gets
// provisioned during /org/import. Pointer so we can distinguish
// "explicitly set to false" from "unset" (default = spawn). Use case:
// the dev-tree org template declares the full team structure but a
// developer's local machine only has RAM for a subset; setting
// spawning: false on a leaf or a sub-tree root skips that branch
// entirely without editing the canonical template structure.
// Counted in countWorkspaces same as actual; subtree-skip happens
// at provision time in createWorkspaceTree.
Spawning *bool `yaml:"spawning,omitempty" json:"spawning,omitempty"`
// SystemPrompt is an inline override. Normally each role's system-prompt.md
// lives at `<files_dir>/system-prompt.md` and is copied via the files_dir
// template-copy step; inline overrides that path for ad-hoc workspaces.
@ -558,6 +571,19 @@ func (h *OrgHandler) Import(c *gin.Context) {
var body struct {
Dir string `json:"dir"` // org template directory name
Template OrgTemplate `json:"template"` // or inline template
// Mode controls cleanup behavior of pre-existing workspaces:
// "" / "merge" — additive (default; current behavior).
// Existing workspaces matched by
// (parent_id, name) are skipped; nothing
// outside the new tree is touched.
// "reconcile" — additive + cleanup. After import, any
// online workspace whose name matches an
// imported workspace's name but whose id
// isn't in the import result set is
// cascade-deleted. Catches "previous
// import survived a re-import" zombies
// (the 20:13→21:17 dev-tree case).
Mode string `json:"mode"`
}
if err := c.ShouldBindJSON(&body); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request body"})
@ -581,7 +607,16 @@ func (h *OrgHandler) Import(c *gin.Context) {
orgFile := filepath.Join(orgBaseDir, "org.yaml")
data, err := os.ReadFile(orgFile)
if err != nil {
c.JSON(http.StatusNotFound, gin.H{"error": fmt.Sprintf("org template not found: %s", body.Dir)})
// Audit 2026-05-09 (Core-Security): the prior message echoed
// the user-supplied `body.Dir` verbatim. Path traversal is
// already blocked by resolveInsideRoot above, but echoing
// the raw input back lets a client probe for the existence
// of relative paths inside h.orgDir (a 404 with the input
// vs. a 400 from resolveInsideRoot is itself a signal).
// Drop the input from the message; log full context server-
// side via the resolved path for operator triage.
log.Printf("OrgImport: failed to read %s (requested dir=%q): %v", orgFile, body.Dir, err)
c.JSON(http.StatusNotFound, gin.H{"error": "org template not found"})
return
}
// Expand !include directives before unmarshal. Splits org.yaml
@ -603,6 +638,19 @@ func (h *OrgHandler) Import(c *gin.Context) {
return
}
// Emit started AFTER the YAML is loaded so payload.name carries the
// resolved template name (was: empty when caller passed `dir` instead
// of inline `template`). Pre-parse error paths above return without
// emitting — semantically "we couldn't even start an import" — so
// every started event is guaranteed a paired completed/failed below
// (no orphan started rows in structure_events).
importStart := time.Now()
emitOrgEvent(c.Request.Context(), "org.import.started", map[string]any{
"name": tmpl.Name,
"dir": body.Dir,
"mode": body.Mode,
})
// Required-env preflight — refuses import when any required_env is
// missing from global_secrets. No bypass: the prior `force: true`
// escape hatch was removed (issue #2290) because it was the silent
@ -708,18 +756,171 @@ func (h *OrgHandler) Import(c *gin.Context) {
}
}
// Reconcile mode: prune workspaces present from a previous import that
// share a name with the new tree but are NOT in the new result set.
// Catches the additive-import bug where re-running /org/import with a
// changed tree shape (different parent_id for the same role name) leaves
// the prior workspace online — visible to the canvas, consuming
// containers, and looking like a duplicate. Default mode "" / "merge"
// preserves the old additive behavior.
reconcileRemovedCount := 0
reconcileSkipped := 0
reconcileErrs := []string{}
if body.Mode == "reconcile" && createErr == nil {
ctx := c.Request.Context()
importedNames := []string{}
walkOrgWorkspaceNames(tmpl.Workspaces, &importedNames)
importedIDs := make([]string, 0, len(results))
for _, r := range results {
if id, ok := r["id"].(string); ok && id != "" {
importedIDs = append(importedIDs, id)
}
}
// Empty-set guards: if the import didn't produce any names or any
// IDs, skip — querying with empty arrays would either match
// nothing (harmless) or, worse, match every workspace if a future
// query rewrite drops the IN clause. Belt-and-suspenders.
if len(importedNames) > 0 && len(importedIDs) > 0 {
rows, err := db.DB.QueryContext(ctx, `
SELECT id FROM workspaces
WHERE name = ANY($1::text[])
AND id != ALL($2::uuid[])
AND status != 'removed'
`, pq.Array(importedNames), pq.Array(importedIDs))
if err != nil {
log.Printf("Org import reconcile: orphan query failed: %v", err)
reconcileErrs = append(reconcileErrs, fmt.Sprintf("orphan query: %v", err))
} else {
orphanIDs := []string{}
for rows.Next() {
var orphanID string
if rows.Scan(&orphanID) == nil {
orphanIDs = append(orphanIDs, orphanID)
}
}
rows.Close()
for _, oid := range orphanIDs {
descendantIDs, stopErrs, err := h.workspace.CascadeDelete(ctx, oid)
if err != nil {
log.Printf("Org import reconcile: CascadeDelete(%s) failed: %v", oid, err)
reconcileErrs = append(reconcileErrs, fmt.Sprintf("delete %s: %v", oid, err))
reconcileSkipped++
continue
}
reconcileRemovedCount += 1 + len(descendantIDs)
if len(stopErrs) > 0 {
log.Printf("Org import reconcile: %s had %d stop errors (orphan sweeper will retry)", oid, len(stopErrs))
}
}
log.Printf("Org import reconcile: %d orphans removed (%d cascade descendants), %d skipped", len(orphanIDs), reconcileRemovedCount-len(orphanIDs), reconcileSkipped)
}
}
}
status := http.StatusCreated
resp := gin.H{
"org": tmpl.Name,
"workspaces": results,
"count": len(results),
}
if body.Mode == "reconcile" {
resp["mode"] = "reconcile"
resp["reconcile_removed_count"] = reconcileRemovedCount
if len(reconcileErrs) > 0 {
resp["reconcile_errors"] = reconcileErrs
}
}
if createErr != nil {
status = http.StatusMultiStatus
resp["error"] = createErr.Error()
}
log.Printf("Org import: %s — %d workspaces created", tmpl.Name, len(results))
// results contains both freshly-created AND lookupExistingChild skips
// (entries with "skipped":true). Splitting the count here so the audit
// row reflects "what changed" vs "what was already there" — telemetry
// readers shouldn't need to grep stdout to tell an idempotent re-run
// apart from a fresh-create.
createdCount, skippedCount := 0, 0
for _, r := range results {
if skipped, _ := r["skipped"].(bool); skipped {
skippedCount++
} else {
createdCount++
}
}
log.Printf("Org import: %s — %d created, %d skipped, %d reconciled",
tmpl.Name, createdCount, skippedCount, reconcileRemovedCount)
emitOrgEvent(c.Request.Context(), "org.import.completed", map[string]any{
"name": tmpl.Name,
"dir": body.Dir,
"mode": body.Mode,
"created_count": createdCount,
"skipped_count": skippedCount,
"reconcile_removed_count": reconcileRemovedCount,
"reconcile_errors": len(reconcileErrs),
"duration_ms": time.Since(importStart).Milliseconds(),
"create_error": errString(createErr),
})
c.JSON(status, resp)
}
// walkOrgWorkspaceNames collects every Name in the tree (in any order) into
// names. Used by reconcile to detect orphan workspaces — workspaces with the
// same role name as a freshly-imported one but a different id, surviving from
// a prior import.
func walkOrgWorkspaceNames(workspaces []OrgWorkspace, names *[]string) {
for _, w := range workspaces {
// spawning:false subtrees are still part of the imported tree
// from a logical-tree perspective — DON'T skip the recursion,
// or reconcile would orphan the rest of the subtree on every
// re-import where spawning is toggled. Names of skipped
// workspaces remain registered so reconcile won't double-create
// them when spawning flips back to true.
if w.Name != "" {
*names = append(*names, w.Name)
}
walkOrgWorkspaceNames(w.Children, names)
}
}
// emitOrgEvent records an org-lifecycle event in structure_events so the
// import history is queryable independent of stdout log retention. Errors
// are logged and swallowed — never block the request path on telemetry.
//
// Event-type taxonomy (extend by appending; never rename):
//
// org.import.started — handler entered, request body parsed
// org.import.completed — handler exiting (success or partial)
// org.import.failed — handler exiting with an unrecoverable error
//
// payload fields are documented at each call site.
func emitOrgEvent(ctx context.Context, eventType string, payload map[string]any) {
if payload == nil {
payload = map[string]any{}
}
payloadJSON, err := json.Marshal(payload)
if err != nil {
log.Printf("emitOrgEvent: marshal %s payload failed: %v", eventType, err)
return
}
if _, err := db.DB.ExecContext(ctx, `
INSERT INTO structure_events (event_type, payload, created_at)
VALUES ($1, $2, now())
`, eventType, payloadJSON); err != nil {
log.Printf("emitOrgEvent: insert %s failed: %v", eventType, err)
}
}
// errString returns "" for a nil error, err.Error() otherwise. Lets us put
// nullable error strings in event payloads without checking for nil at every
// call site.
func errString(err error) string {
if err == nil {
return ""
}
return err.Error()
}

View File

@ -42,6 +42,20 @@ import (
// straight into the parent's child-coordinate space without doing a
// canvas-wide absolute-position walk.
func (h *OrgHandler) createWorkspaceTree(ws OrgWorkspace, parentID *string, absX, absY, relX, relY float64, defaults OrgDefaults, orgBaseDir string, results *[]map[string]interface{}, provisionSem chan struct{}) error {
// spawning: false guard — skip this workspace AND all descendants.
// Pointer-typed so we distinguish "explicitly false" from "unset"
// (unset = default to spawn). The guard sits BEFORE any side effect
// (no DB row, no docker provision, no children recursion) so a
// false-spawning subtree is genuinely a no-op except for the log line.
// Use case: dev-tree org template ships the full role taxonomy but a
// developer's machine only has RAM for a subset; a per-workspace
// `spawning: false` lets them narrow without editing the parent
// template's structure.
if ws.Spawning != nil && !*ws.Spawning {
log.Printf("Org import: skipping workspace %q (spawning=false; %d descendant workspace(s) in subtree also skipped)", ws.Name, countWorkspaces(ws.Children))
return nil
}
// Apply defaults
runtime := ws.Runtime
if runtime == "" {
@ -453,8 +467,25 @@ func (h *OrgHandler) createWorkspaceTree(ws OrgWorkspace, parentID *string, absX
envVars := map[string]string{}
// 0. Persona env (lowest precedence; injects the role's Gitea identity:
// GITEA_USER, GITEA_TOKEN, GITEA_TOKEN_SCOPES, GITEA_USER_EMAIL,
// GITEA_SSH_KEY_PATH). Workspace and org .env can override.
loadPersonaEnvFile(ws.Role, envVars)
// GITEA_SSH_KEY_PATH, plus MODEL_PROVIDER/MODEL and the LLM auth
// token like CLAUDE_CODE_OAUTH_TOKEN or MINIMAX_API_KEY).
// Workspace and org .env can override.
//
// Use ws.FilesDir as the persona-dir lookup key, NOT ws.Role. In the
// dev-tree org.yaml shape, `role:` carries the multi-line descriptive
// text the agent reads from its prompt ("Engineering planning and
// team coordination — leads Core Platform, Controlplane, ..."), while
// `files_dir:` holds the short slug (`core-lead`, `dev-lead`, etc.)
// matching `~/.molecule-ai/personas/<files_dir>/env`
// (bind-mounted to `/etc/molecule-bootstrap/personas/<files_dir>/env`).
//
// Pre-fix, this passed `ws.Role` whose multi-word content failed
// isSafeRoleName silently, so every imported workspace booted with
// zero persona-env rows in workspace_secrets — no ANTHROPIC /
// CLAUDE_CODE auth in the container env. The claude_agent_sdk
// then wedged on `query.initialize()` with a 60s control-request
// timeout (caught 2026-05-08 right after dev-only org/import).
loadPersonaEnvFile(ws.FilesDir, envVars)
if orgBaseDir != "" {
// 1. Org root .env (shared defaults)
parseEnvFile(filepath.Join(orgBaseDir, ".env"), envVars)

View File

@ -0,0 +1,158 @@
package handlers
import (
"context"
"sort"
"testing"
"github.com/DATA-DOG/go-sqlmock"
)
// Tests for the reconcile-mode + audit-event additions to OrgHandler.Import.
//
// Background: /org/import was purely additive — re-running with a tree that
// renamed/reparented a role left the prior workspace online (different
// parent_id from the new one, so lookupExistingChild's parent-scoped dedupe
// missed it). The 2026-05-08 dev-tree case left 8 orphans surviving a
// re-import. mode="reconcile" closes the gap; emitOrgEvent makes "what
// happened at 20:13?" queryable instead of stdout-grep archaeology.
func TestWalkOrgWorkspaceNames_FlatTree(t *testing.T) {
tree := []OrgWorkspace{
{Name: "Dev Lead"},
{Name: "Release Manager"},
}
var names []string
walkOrgWorkspaceNames(tree, &names)
sort.Strings(names)
want := []string{"Dev Lead", "Release Manager"}
if !equalStrings(names, want) {
t.Errorf("flat tree: got %v, want %v", names, want)
}
}
func TestWalkOrgWorkspaceNames_NestedTree(t *testing.T) {
tree := []OrgWorkspace{
{
Name: "Dev Lead",
Children: []OrgWorkspace{
{Name: "Core Platform Lead", Children: []OrgWorkspace{{Name: "Core-BE"}}},
{Name: "SDK Lead"},
},
},
}
var names []string
walkOrgWorkspaceNames(tree, &names)
sort.Strings(names)
want := []string{"Core Platform Lead", "Core-BE", "Dev Lead", "SDK Lead"}
if !equalStrings(names, want) {
t.Errorf("nested tree: got %v, want %v", names, want)
}
}
// Pins the contract that spawning:false subtrees still contribute their names
// to the reconcile working set. If the walker started skipping them, a
// re-import that toggled spawning would orphan whichever workspaces had been
// previously imported with spawning:true — the inverse of the bug being
// fixed. Spawning gates *provisioning*, not *reconcile membership*.
func TestWalkOrgWorkspaceNames_SpawningFalseStillCounted(t *testing.T) {
f := false
tree := []OrgWorkspace{
{Name: "Dev Lead", Children: []OrgWorkspace{
{Name: "Skipped Lead", Spawning: &f, Children: []OrgWorkspace{
{Name: "Skipped Child"},
}},
}},
}
var names []string
walkOrgWorkspaceNames(tree, &names)
sort.Strings(names)
want := []string{"Dev Lead", "Skipped Child", "Skipped Lead"}
if !equalStrings(names, want) {
t.Errorf("spawning:false subtree: got %v, want %v", names, want)
}
}
func TestWalkOrgWorkspaceNames_EmptyNamesSkipped(t *testing.T) {
tree := []OrgWorkspace{
{Name: "Dev Lead"},
{Name: ""}, // YAML default / placeholder
{Name: "Release Manager"},
}
var names []string
walkOrgWorkspaceNames(tree, &names)
sort.Strings(names)
want := []string{"Dev Lead", "Release Manager"}
if !equalStrings(names, want) {
t.Errorf("empty-name skip: got %v, want %v", names, want)
}
}
// emitOrgEvent must INSERT into structure_events with event_type + JSON
// payload. Verifies the SQL shape pinning so a future schema rename
// (e.g., switching to audit_events) breaks the test loudly instead of
// silently dropping telemetry.
func TestEmitOrgEvent_InsertsToStructureEvents(t *testing.T) {
mock := setupTestDB(t)
mock.ExpectExec(`INSERT INTO structure_events`).
WithArgs("org.import.started", sqlmock.AnyArg()).
WillReturnResult(sqlmock.NewResult(1, 1))
emitOrgEvent(context.Background(), "org.import.started", map[string]any{
"name": "test-org",
"mode": "reconcile",
})
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
}
// Insert failures are log-and-swallow — telemetry MUST NOT block the
// caller path. If this regresses (e.g., a future patch returns the err),
// org-import requests would fail with HTTP 500 every time a structure_events
// INSERT hiccups, which is strictly worse than losing the row.
func TestEmitOrgEvent_DBErrorIsSwallowed(t *testing.T) {
mock := setupTestDB(t)
mock.ExpectExec(`INSERT INTO structure_events`).
WithArgs("org.import.failed", sqlmock.AnyArg()).
WillReturnError(errSentinelTest)
// Must not panic; must not propagate. The function returns nothing,
// so the contract is "doesn't crash."
emitOrgEvent(context.Background(), "org.import.failed", map[string]any{
"err": "preflight failed",
})
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
}
func TestErrString(t *testing.T) {
if got := errString(nil); got != "" {
t.Errorf("nil error: got %q, want empty", got)
}
if got := errString(errSentinelTest); got != "sentinel" {
t.Errorf("sentinel error: got %q, want \"sentinel\"", got)
}
}
// errSentinelTest is a marker error used for swallow-error assertions.
var errSentinelTest = sentinelErrTest{}
type sentinelErrTest struct{}
func (sentinelErrTest) Error() string { return "sentinel" }
func equalStrings(a, b []string) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}

View File

@ -91,6 +91,14 @@ func (h *PluginsHandler) Install(c *gin.Context) {
return
}
// Record the install in workspace_plugins (core#113 — version-subscription
// foundation). Best-effort: DB write failure is logged but doesn't fail
// the install — the plugin IS in the container; surfacing a 500 here
// would mislead the caller about the install state.
if err := recordWorkspacePluginInstall(ctx, workspaceID, result.PluginName, result.Source.Raw(), req.Track); err != nil {
log.Printf("Plugin install: failed to record %s for %s in workspace_plugins: %v (install succeeded; tracking row missing)", result.PluginName, workspaceID, err)
}
log.Printf("Plugin install: %s via %s → workspace %s (restarting)", result.PluginName, result.Source.Scheme, workspaceID)
c.JSON(http.StatusOK, gin.H{
"status": "installed",

View File

@ -114,6 +114,15 @@ type installRequest struct {
// When present, resolveAndStage verifies the fetched content matches
// before allowing the install to proceed (SAFE-T1102 supply-chain hardening).
SHA256 string `json:"sha256,omitempty"`
// Track is the version-subscription mode for this install (core#113):
// "none" — no auto-update tracking (default)
// "tag:vX.Y.Z" — track a specific version tag
// "tag:latest" — track latest tag, drift on every new tag
// "sha:<full>" — pinned, no drift ever
// The drift detector (separate component, follow-up) reads
// workspace_plugins rows where tracked_ref != 'none' and queues
// updates when upstream resolves to a different SHA.
Track string `json:"track,omitempty"`
}
// stageResult bundles the outputs of resolveAndStage for the caller.

View File

@ -0,0 +1,78 @@
package handlers
// plugins_tracking.go — workspace_plugins DB tracking for the
// version-subscription model (core#113).
//
// Schema lives in migration 20260508160000_workspace_plugins_tracking.up.sql.
// This file is the Go-side write surface used at install time to record
// each plugin's install record. Drift detection / queue / apply are
// follow-up scope (filed as a separate issue once this lands).
import (
"context"
"errors"
"fmt"
"strings"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
)
// trackedRefValues is the closed set of bare-string values the
// workspace_plugins.tracked_ref column accepts. Prefixed values
// ("tag:..." / "sha:...") are validated structurally below.
var trackedRefValues = map[string]bool{
"none": true,
}
// validateTrackedRef returns the canonical form of a track string, or
// an error if the input is malformed. Empty input → "none" (default).
//
// Accepted shapes:
//
// "" — defaults to "none"
// "none" — no tracking
// "tag:vX.Y.Z" — track a specific tag
// "tag:latest" — track latest tag, drift on every new tag
// "sha:<full-sha>" — pinned to commit SHA
func validateTrackedRef(s string) (string, error) {
s = strings.TrimSpace(s)
if s == "" {
return "none", nil
}
if trackedRefValues[s] {
return s, nil
}
if strings.HasPrefix(s, "tag:") && len(s) > 4 {
return s, nil
}
if strings.HasPrefix(s, "sha:") && len(s) > 4 {
return s, nil
}
return "", fmt.Errorf("invalid track value %q: expected 'none' | 'tag:vX.Y.Z' | 'tag:latest' | 'sha:<full>'", s)
}
// recordWorkspacePluginInstall upserts the workspace_plugins row for a
// plugin install. ON CONFLICT (workspace_id, plugin_name) DO UPDATE so
// reinstalling the same plugin name (with a possibly-different source or
// track value) updates the existing row rather than failing.
func recordWorkspacePluginInstall(
ctx context.Context, workspaceID, pluginName, sourceRaw, track string,
) error {
if workspaceID == "" || pluginName == "" || sourceRaw == "" {
return errors.New("recordWorkspacePluginInstall: missing required field")
}
canonicalTrack, err := validateTrackedRef(track)
if err != nil {
return err
}
_, err = db.DB.ExecContext(ctx, `
INSERT INTO workspace_plugins (workspace_id, plugin_name, source_raw, tracked_ref)
VALUES ($1, $2, $3, $4)
ON CONFLICT (workspace_id, plugin_name)
DO UPDATE SET
source_raw = EXCLUDED.source_raw,
tracked_ref = EXCLUDED.tracked_ref,
updated_at = NOW()
`, workspaceID, pluginName, sourceRaw, canonicalTrack)
return err
}

View File

@ -0,0 +1,54 @@
package handlers
import "testing"
// TestValidateTrackedRef: pin the exact set of accepted track values
// the install endpoint stores. Drift detector reads this column; any
// value that slips through here without structural validation would
// silently fail at drift-check time.
func TestValidateTrackedRef(t *testing.T) {
cases := []struct {
in string
want string
err bool
}{
// Defaults
{"", "none", false},
{" ", "none", false},
{"none", "none", false},
// Tag shape
{"tag:v1.0.0", "tag:v1.0.0", false},
{"tag:v0.4.0-gitea.1", "tag:v0.4.0-gitea.1", false},
{"tag:latest", "tag:latest", false},
// SHA shape
{"sha:abc123", "sha:abc123", false},
{"sha:0123456789abcdef0123456789abcdef01234567", "sha:0123456789abcdef0123456789abcdef01234567", false},
// Reject malformed
{"tag:", "", true}, // empty after prefix
{"sha:", "", true}, // empty after prefix
{"latest", "", true}, // bare 'latest' is ambiguous (tag? branch?)
{"main", "", true}, // bare branch name not allowed
{"v1.0.0", "", true}, // missing tag: prefix
{"random", "", true}, // not in allowlist
{"tag", "", true}, // prefix without separator
}
for _, tc := range cases {
got, err := validateTrackedRef(tc.in)
if tc.err {
if err == nil {
t.Errorf("validateTrackedRef(%q) = (%q, nil); want error", tc.in, got)
}
continue
}
if err != nil {
t.Errorf("validateTrackedRef(%q) error: %v", tc.in, err)
continue
}
if got != tc.want {
t.Errorf("validateTrackedRef(%q) = %q; want %q", tc.in, got, tc.want)
}
}
}

View File

@ -134,7 +134,7 @@ func (h *TranscriptHandler) Get(c *gin.Context) {
// - block cloud metadata endpoints (IMDS, GCP, Azure)
// - block link-local IPs (169.254/16 IPv4, fe80::/10 IPv6)
// - loopback is allowed — local dev runs workspaces on 127.0.0.1
// - Docker internal hostnames (host.docker.internal, *.molecule-monorepo-net)
// - Docker internal hostnames (host.docker.internal, *.molecule-core-net)
// are allowed; the whole threat model assumes the platform already
// trusts peers on that network
func validateWorkspaceURL(u *url.URL) error {

View File

@ -323,161 +323,25 @@ func (h *WorkspaceHandler) Delete(c *gin.Context) {
return
}
// Cascade delete: collect ALL descendants (not just direct children) via
// recursive CTE, then stop each container and remove each volume.
// Previous bug: only direct children's containers were stopped, leaving
// grandchildren as orphan running containers after a cascade delete.
descendantIDs := []string{}
if len(children) > 0 {
descRows, err := db.DB.QueryContext(ctx, `
WITH RECURSIVE descendants AS (
SELECT id FROM workspaces WHERE parent_id = $1 AND status != 'removed'
UNION ALL
SELECT w.id FROM workspaces w JOIN descendants d ON w.parent_id = d.id WHERE w.status != 'removed'
)
SELECT id FROM descendants
`, id)
if err != nil {
log.Printf("Delete: descendant query error for %s: %v", id, err)
} else {
for descRows.Next() {
var descID string
if descRows.Scan(&descID) == nil {
descendantIDs = append(descendantIDs, descID)
}
}
descRows.Close()
}
// Delegate the cascade to CascadeDelete so the HTTP path and the
// OrgImport reconcile path share one teardown sequence (#73 race
// guard, container stop, volume removal, token revocation, schedule
// disable, broadcast). The HTTP-specific bits — direct-children 409
// gate above, ?purge=true hard-delete below, response shaping —
// stay in this handler.
descendantIDs, stopErrs, err := h.CascadeDelete(ctx, id)
if err != nil {
// Audit 2026-05-09 (Core-Security): raw `err.Error()` here was
// exposed to HTTP clients verbatim, including wrapped lib/pq
// driver strings that disclose schema column names + index
// hints. Log full error server-side; return a sanitized message
// to the client. Operators trace via the log line below using
// the workspace id.
log.Printf("Delete: CascadeDelete(%s) failed: %v", id, err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "internal error processing delete request"})
return
}
// #73 fix: mark rows 'removed' in the DB FIRST, BEFORE stopping containers
// or removing volumes. Previously the sequence was stop → update-status,
// which left a gap where:
// - the container's last pre-teardown heartbeat could resurrect the row
// via the register-handler UPSERT (now also guarded in #73)
// - the liveness monitor could observe 'online' status + expired Redis
// TTL and trigger RestartByID, recreating a container we're trying
// to destroy
// Marking 'removed' first makes both of those paths no-op via their
// existing `status NOT IN ('removed', ...)` guards.
allIDs := append([]string{id}, descendantIDs...)
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspaces SET status = $1, updated_at = now() WHERE id = ANY($2::uuid[])`,
models.StatusRemoved, pq.Array(allIDs)); err != nil {
log.Printf("Delete status update error for %s: %v", id, err)
}
if _, err := db.DB.ExecContext(ctx,
`DELETE FROM canvas_layouts WHERE workspace_id = ANY($1::uuid[])`,
pq.Array(allIDs)); err != nil {
log.Printf("Delete canvas_layouts error for %s: %v", id, err)
}
// Revoke all auth tokens for the deleted workspaces. Once the workspace is
// gone its tokens are meaningless; leaving them alive would keep
// HasAnyLiveTokenGlobal = true even after the platform is otherwise empty,
// which prevents AdminAuth from returning to fail-open and breaks the E2E
// test's count-zero assertion (and local re-run cleanup).
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspace_auth_tokens SET revoked_at = now()
WHERE workspace_id = ANY($1::uuid[]) AND revoked_at IS NULL`,
pq.Array(allIDs)); err != nil {
log.Printf("Delete token revocation error for %s: %v", id, err)
}
// #1027: cascade-disable all schedules for the deleted workspaces so
// the scheduler never fires a cron into a removed container.
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspace_schedules SET enabled = false, updated_at = now()
WHERE workspace_id = ANY($1::uuid[]) AND enabled = true`,
pq.Array(allIDs)); err != nil {
log.Printf("Delete schedule disable error for %s: %v", id, err)
}
// Now stop containers + remove volumes for all descendants (any depth).
// Any concurrent heartbeat / registration / liveness-triggered restart
// will see status='removed' and bail out early.
//
// Combines two concerns:
//
// 1. Detach cleanup from the request ctx via WithoutCancel + a 30s
// timeout, so when the canvas's `api.del` resolves on our 200
// (and gin cancels c.Request.Context()), in-flight Docker
// stop/remove calls don't get cancelled mid-operation. The
// previous shape leaked containers every time the canvas hung
// up promptly: Stop returned "context canceled", the container
// stayed up, and the next RemoveVolume failed with
// "volume in use". 30s is generous for Docker daemon round-
// trips (typical: <2s) and bounds a stuck daemon.
//
// 2. #1843: aggregate Stop() failures into stopErrs so the
// post-deletion block surfaces them as 500. On the CP/EC2
// backend, Stop() calls control plane's DELETE endpoint to
// terminate the EC2; if that errors (transient 5xx, network),
// the EC2 stays running with no DB row to track it (the
// "orphan EC2 on a 0-customer account" scenario). Loud-fail
// instead of silent-leak — clients retry, Stop's instance_id
// lookup is idempotent against status='removed'. RemoveVolume
// errors stay log-and-continue (local cleanup, not infra-leak).
cleanupCtx, cleanupCancel := context.WithTimeout(
context.WithoutCancel(ctx), 30*time.Second)
defer cleanupCancel()
var stopErrs []error
stopAndRemove := func(wsID string) {
// Stop the workload first via the backend dispatcher (CP for
// SaaS, Docker for self-hosted). Pre-2026-05-05 this gate was
// `if h.provisioner == nil { return }` — early-returning on
// every SaaS tenant left the EC2 running with no DB row to
// track it (issue #2814; the comment below claimed "loud-fail
// instead of silent-leak" but the early-return made it the
// silent path on SaaS).
//
// Check Stop's error before any volume cleanup — the previous
// code discarded it and immediately tried RemoveVolume, which
// always fails with "volume in use" when Stop didn't actually
// kill the container. The orphan sweeper
// (registry/orphan_sweeper.go) catches what we skip here on
// the next reconcile pass.
if err := h.StopWorkspaceAuto(cleanupCtx, wsID); err != nil {
log.Printf("Delete %s stop failed: %v — leaving cleanup for orphan sweeper", wsID, err)
stopErrs = append(stopErrs, fmt.Errorf("stop %s: %w", wsID, err))
return
}
// Volume cleanup is Docker-only — CP-managed workspaces have
// no host-bind volumes to remove. Skip silently when no Docker
// provisioner is wired (the SaaS path already terminated the
// EC2 above; nothing left to do).
if h.provisioner != nil {
if err := h.provisioner.RemoveVolume(cleanupCtx, wsID); err != nil {
log.Printf("Delete %s volume removal warning: %v", wsID, err)
}
}
}
for _, descID := range descendantIDs {
stopAndRemove(descID)
db.ClearWorkspaceKeys(cleanupCtx, descID)
// #2269: drop the per-workspace restartState entry so it
// doesn't accumulate across the platform's lifetime. The
// LoadOrStore that creates the entry (workspace_restart.go)
// has no companion remove path; without this Delete, every
// short-lived workspace leaks ~16 bytes forever.
restartStates.Delete(descID)
// Detach broadcaster ctx for the same reason as the cleanup
// above — RecordAndBroadcast does an INSERT INTO
// structure_events + Redis Publish. If the canvas hangs up,
// a request-ctx-bound INSERT can be cancelled mid-write,
// leaving other WS clients ignorant of the cascade. The DB
// row is already 'removed' so it's recoverable, but the
// inconsistency is avoidable.
h.broadcaster.RecordAndBroadcast(cleanupCtx, string(events.EventWorkspaceRemoved), descID, map[string]interface{}{})
}
stopAndRemove(id)
db.ClearWorkspaceKeys(cleanupCtx, id)
restartStates.Delete(id) // #2269: same as descendants above
h.broadcaster.RecordAndBroadcast(cleanupCtx, string(events.EventWorkspaceRemoved), id, map[string]interface{}{
"cascade_deleted": len(descendantIDs),
})
// If any Stop call failed, surface 500 so the client retries. The DB
// row is already 'removed' (idempotent), and Stop's instance_id
@ -543,6 +407,104 @@ func (h *WorkspaceHandler) Delete(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{"status": "removed", "cascade_deleted": len(descendantIDs)})
}
// CascadeDelete performs the cascade-removal sequence used by the HTTP
// DELETE handler and by OrgImport's reconcile mode: walk descendants, mark
// self+descendants 'removed' first (#73 race guard), stop containers / EC2s,
// remove volumes, revoke tokens, disable schedules, broadcast events.
//
// Idempotent against already-removed rows (the descendant CTE and all UPDATE
// guards skip status='removed'). Returns the descendant id list so the HTTP
// caller can drive the optional `?purge=true` hard-delete path against the
// same set the cascade just touched, plus any per-workspace stop errors so
// callers can surface a retryable failure instead of a silent-leak.
//
// Caller is responsible for the children-confirmation gate (the HTTP handler
// returns 409 when children exist + ?confirm=true is missing); this helper
// always cascades.
func (h *WorkspaceHandler) CascadeDelete(ctx context.Context, id string) ([]string, []error, error) {
if err := validateWorkspaceID(id); err != nil {
return nil, nil, err
}
descendantIDs := []string{}
descRows, err := db.DB.QueryContext(ctx, `
WITH RECURSIVE descendants AS (
SELECT id FROM workspaces WHERE parent_id = $1 AND status != 'removed'
UNION ALL
SELECT w.id FROM workspaces w JOIN descendants d ON w.parent_id = d.id WHERE w.status != 'removed'
)
SELECT id FROM descendants
`, id)
if err != nil {
return nil, nil, fmt.Errorf("descendant query: %w", err)
}
for descRows.Next() {
var descID string
if descRows.Scan(&descID) == nil {
descendantIDs = append(descendantIDs, descID)
}
}
descRows.Close()
allIDs := append([]string{id}, descendantIDs...)
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspaces SET status = $1, updated_at = now() WHERE id = ANY($2::uuid[])`,
models.StatusRemoved, pq.Array(allIDs)); err != nil {
log.Printf("CascadeDelete status update for %s: %v", id, err)
}
if _, err := db.DB.ExecContext(ctx,
`DELETE FROM canvas_layouts WHERE workspace_id = ANY($1::uuid[])`,
pq.Array(allIDs)); err != nil {
log.Printf("CascadeDelete canvas_layouts for %s: %v", id, err)
}
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspace_auth_tokens SET revoked_at = now()
WHERE workspace_id = ANY($1::uuid[]) AND revoked_at IS NULL`,
pq.Array(allIDs)); err != nil {
log.Printf("CascadeDelete token revocation for %s: %v", id, err)
}
if _, err := db.DB.ExecContext(ctx,
`UPDATE workspace_schedules SET enabled = false, updated_at = now()
WHERE workspace_id = ANY($1::uuid[]) AND enabled = true`,
pq.Array(allIDs)); err != nil {
log.Printf("CascadeDelete schedule disable for %s: %v", id, err)
}
cleanupCtx, cleanupCancel := context.WithTimeout(
context.WithoutCancel(ctx), 30*time.Second)
defer cleanupCancel()
var stopErrs []error
stopAndRemove := func(wsID string) {
if err := h.StopWorkspaceAuto(cleanupCtx, wsID); err != nil {
log.Printf("CascadeDelete %s stop failed: %v — leaving cleanup for orphan sweeper", wsID, err)
stopErrs = append(stopErrs, fmt.Errorf("stop %s: %w", wsID, err))
return
}
if h.provisioner != nil {
if err := h.provisioner.RemoveVolume(cleanupCtx, wsID); err != nil {
log.Printf("CascadeDelete %s volume removal warning: %v", wsID, err)
}
}
}
for _, descID := range descendantIDs {
stopAndRemove(descID)
db.ClearWorkspaceKeys(cleanupCtx, descID)
restartStates.Delete(descID)
h.broadcaster.RecordAndBroadcast(cleanupCtx, string(events.EventWorkspaceRemoved), descID, map[string]interface{}{})
}
stopAndRemove(id)
db.ClearWorkspaceKeys(cleanupCtx, id)
restartStates.Delete(id)
h.broadcaster.RecordAndBroadcast(cleanupCtx, string(events.EventWorkspaceRemoved), id, map[string]interface{}{
"cascade_deleted": len(descendantIDs),
})
return descendantIDs, stopErrs, nil
}
// validateWorkspaceID returns an error when id is not a valid UUID.
// #687: prevents 500s from Postgres when a garbage string (e.g. ../../etc/passwd)
// is passed as the :id path parameter.

View File

@ -173,7 +173,7 @@ func (h *WorkspaceHandler) provisionWorkspaceOpts(workspaceID, templatePath stri
log.Printf("Provisioner: failed to cache URL for %s: %v", workspaceID, cacheErr)
}
// Also cache the Docker-internal URL for workspace-to-workspace discovery.
// Containers on molecule-monorepo-net can reach each other by container name.
// Containers on molecule-core-net can reach each other by container name.
internalURL := provisioner.InternalURL(workspaceID)
if cacheErr := db.CacheInternalURL(ctx, workspaceID, internalURL); cacheErr != nil {
log.Printf("Provisioner: failed to cache internal URL for %s: %v", workspaceID, cacheErr)
@ -715,14 +715,30 @@ func deriveProviderFromModelSlug(model string) string {
// payload.Model at boot), this is a no-op — no harm in the switch
// being empty for those cases.
func applyRuntimeModelEnv(envVars map[string]string, runtime, model string) {
// Fall back to the MODEL_PROVIDER workspace secret when the caller
// didn't pass one explicitly. This is the path that "Save+Restart"
// hits — Restart builds its payload from the workspaces row (no model
// column there) so payload.Model is always empty, but the user's
// canvas selection was stored as MODEL_PROVIDER via PUT /model and
// is already loaded into envVars here. Without this fallback hermes
// silently boots with the template default and errors "No LLM
// provider configured" even though the user picked a valid model.
// Resolution order (priority high → low):
// 1. payload.Model (caller passed the canvas-picked model id verbatim)
// 2. envVars["MODEL"] (workspace_secret persisted by /org/import via
// the persona env file — MODEL=MiniMax-M2.7-highspeed etc.)
// 3. envVars["MODEL_PROVIDER"] (legacy: this secret was historically a
// *model id* set by canvas Save+Restart's PUT /model; on the
// post-2026-05-08 persona-env convention it's a *provider slug*
// (e.g. "minimax") which is NOT a valid model id, so this fallback
// only fires when MODEL is absent.)
//
// Pre-fix bug: this function unconditionally OVERWROTE envVars["MODEL"]
// with the MODEL_PROVIDER slug (when payload.Model was empty), wiping
// the operator's explicit per-persona MODEL secret on every restart.
// Symptom: a workspace whose persona env said
// MODEL=MiniMax-M2.7-highspeed booted fine on first /org/import (the
// envVars map was populated direct from the env file), then on the
// next Restart the workspace_secrets-derived MODEL got clobbered by
// MODEL_PROVIDER="minimax" — the literal slug, not a valid model id —
// and the workspace template's adapter routed to providers[0]
// (anthropic-oauth) and wedged at SDK initialize. Caught 2026-05-08
// during Phase 4 verification of template-claude-code PR #9.
if model == "" {
model = envVars["MODEL"]
}
if model == "" {
model = envVars["MODEL_PROVIDER"]
}

View File

@ -724,3 +724,68 @@ func TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes(t *testing.T) {
})
}
}
// TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved locks in the
// 2026-05-08 fix that prevents the MODEL_PROVIDER-as-slug fallback from
// silently overwriting a per-persona MODEL workspace_secret on restart.
//
// Pre-fix bug recurrence guard: when the persona env file (loaded into
// workspace_secrets at /org/import time) declares both MODEL=<id> and
// MODEL_PROVIDER=<slug>, the restart path used to overwrite envVars["MODEL"]
// with the MODEL_PROVIDER slug because applyRuntimeModelEnv'\''s
// payload.Model fallback consulted MODEL_PROVIDER first. Symptom: dev-tree
// workspaces booted fine on first /org/import, then on next restart the
// model id became literal "minimax" and the workspace template'\''s adapter
// failed to match any registry prefix, fell through to anthropic-oauth,
// and wedged at SDK initialize. Caught during Phase 4 verification of
// template-claude-code PR #9.
func TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved(t *testing.T) {
cases := []struct {
name string
envMODEL string
envMP string
wantMODEL string
}{
{
name: "MODEL secret wins over MODEL_PROVIDER slug (persona-env shape on restart)",
envMODEL: "MiniMax-M2.7-highspeed",
envMP: "minimax",
wantMODEL: "MiniMax-M2.7-highspeed",
},
{
name: "MODEL secret wins even when same as MODEL_PROVIDER",
envMODEL: "opus",
envMP: "claude-code",
wantMODEL: "opus",
},
{
name: "MODEL absent → fall back to MODEL_PROVIDER (legacy canvas Save+Restart shape)",
envMODEL: "",
envMP: "MiniMax-M2.7",
wantMODEL: "MiniMax-M2.7",
},
{
name: "Both absent → no MODEL set",
envMODEL: "",
envMP: "",
wantMODEL: "",
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
envVars := map[string]string{}
if tc.envMODEL != "" {
envVars["MODEL"] = tc.envMODEL
}
if tc.envMP != "" {
envVars["MODEL_PROVIDER"] = tc.envMP
}
// payload.Model is empty (the restart case)
applyRuntimeModelEnv(envVars, "claude-code", "")
if got := envVars["MODEL"]; got != tc.wantMODEL {
t.Errorf("MODEL = %q, want %q (envMODEL=%q envMP=%q)",
got, tc.wantMODEL, tc.envMODEL, tc.envMP)
}
})
}
}

View File

@ -813,6 +813,12 @@ func TestWorkspaceDelete_DisablesSchedules(t *testing.T) {
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"id", "name"}))
// CascadeDelete walks descendants unconditionally — 0-children case
// returns 0 rows here.
mock.ExpectQuery("WITH RECURSIVE descendants").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
// Mark workspace as removed
mock.ExpectExec("UPDATE workspaces SET status =").
WillReturnResult(sqlmock.NewResult(0, 1))
@ -935,6 +941,12 @@ func TestWorkspaceDelete_ScheduleDisableOnlyTargetsDeletedWorkspace(t *testing.T
WithArgs(wsA).
WillReturnRows(sqlmock.NewRows([]string{"id", "name"}))
// CascadeDelete walks descendants unconditionally — 0-children case
// returns 0 rows here.
mock.ExpectQuery("WITH RECURSIVE descendants").
WithArgs(wsA).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
// Mark only workspace A as removed
mock.ExpectExec("UPDATE workspaces SET status =").
WillReturnResult(sqlmock.NewResult(0, 1))

View File

@ -67,7 +67,7 @@ var DefaultImage = RuntimeImage(defaultRuntime)
const (
// DefaultNetwork is the Docker network workspaces join.
DefaultNetwork = "molecule-monorepo-net"
DefaultNetwork = "molecule-core-net"
// DefaultPort is the port the A2A server listens on inside the container.
DefaultPort = "8000"
@ -405,7 +405,7 @@ func (p *Provisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string, e
// Apply tier-based container configuration
ApplyTierConfig(hostCfg, cfg, configMount, name)
// Network config — join molecule-monorepo-net with container name as alias
// Network config — join molecule-core-net with container name as alias
networkCfg := &network.NetworkingConfig{
EndpointsConfig: map[string]*network.EndpointSettings{
DefaultNetwork: {

View File

@ -0,0 +1,3 @@
DROP INDEX IF EXISTS workspace_plugins_tracked_not_none;
DROP INDEX IF EXISTS workspace_plugins_workspace_name;
DROP TABLE IF EXISTS workspace_plugins;

View File

@ -0,0 +1,39 @@
-- workspace_plugins: per-workspace record of installed plugins, with the
-- tracked-ref needed for the version-subscription model (core#113).
--
-- Today plugin install state is filesystem-only — `/configs/plugins/<name>/`
-- inside the workspace container. There's no DB record of "what's installed
-- where, from what source, pinned to what." That's fine until you want
-- drift detection (compare upstream tag's resolved SHA vs the installed
-- one) and that's the foundation this table provides.
--
-- This migration is purely additive: existing install paths keep working;
-- they'll write to this table on next install. Workspaces with plugins
-- already installed before this migration won't have rows until they're
-- re-installed (acceptable — the tracking is forward-looking).
--
-- tracked_ref values:
-- 'none' — no auto-update tracking (default)
-- 'tag:vX.Y.Z' — track a specific version tag
-- 'tag:latest' — track the latest tag (drift on every new tag)
-- 'sha:<full>' — pinned to a specific commit SHA (no drift ever)
--
-- A subsequent migration adds the plugin_update_queue table once drift
-- detection lands.
CREATE TABLE IF NOT EXISTS workspace_plugins (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id UUID NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
plugin_name TEXT NOT NULL,
source_raw TEXT NOT NULL,
tracked_ref TEXT NOT NULL DEFAULT 'none',
installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX IF NOT EXISTS workspace_plugins_workspace_name
ON workspace_plugins(workspace_id, plugin_name);
-- Partial index for the drift detector: only scan rows opted into tracking.
CREATE INDEX IF NOT EXISTS workspace_plugins_tracked_not_none
ON workspace_plugins(tracked_ref) WHERE tracked_ref != 'none';

View File

@ -0,0 +1,2 @@
DROP INDEX IF EXISTS workspaces_update_tier_canary;
ALTER TABLE workspaces DROP COLUMN IF EXISTS update_tier;

View File

@ -0,0 +1,26 @@
-- workspaces.update_tier — canary vs production filter for plugin updates
-- (core#115). Composes with the version-subscription DB foundation
-- (core#113, merged) and the upcoming drift+queue+apply endpoint
-- (core#123).
--
-- Tiers:
-- 'production' (default) — fan-out target; only updated AFTER canary soak
-- 'canary' — early-adopter target; updates land here first
--
-- Default 'production' so existing customers (Reno-Stars + any future
-- live tenant) are default-safe. Synthetic dogfooding workspaces opt
-- INTO 'canary' explicitly.
--
-- The column is just metadata at this layer; the actual filter logic
-- ('apply this update only to canary tier first') lives in the future
-- POST /admin/plugin-updates/:id/apply endpoint (core#123).
ALTER TABLE workspaces
ADD COLUMN IF NOT EXISTS update_tier TEXT NOT NULL DEFAULT 'production'
CHECK (update_tier IN ('canary', 'production'));
-- Partial index for the apply endpoint's canary-tier scan: only
-- index canary rows since the apply path queries them most often
-- and the production set is the much larger default.
CREATE INDEX IF NOT EXISTS workspaces_update_tier_canary
ON workspaces(update_tier) WHERE update_tier = 'canary';

View File

@ -43,11 +43,29 @@ if [ "$(id -u)" = "0" ]; then
ln -sfn /root/.claude/sessions /home/agent/.claude/sessions
fi
# --- Per-persona git identity (closes molecule-core#155) ---
# Without this, every team commit lands with an empty author and Gitea
# attributes the work to the founder PAT instead of the persona that
# actually authored it. Same fingerprint that got us suspended on GitHub
# 2026-05-06. GITEA_USER is injected by the provisioner from the
# workspace_secrets table; bot.moleculesai.app is the agent-only domain
# so commits are clearly distinguishable from human authors.
if [ -n "${GITEA_USER:-}" ]; then
git config --global user.name "${GITEA_USER}"
git config --global user.email "${GITEA_USER}@bot.moleculesai.app"
fi
# --- GitHub credential helper setup (issue #547 / #613) ---
# Configure git to use the molecule credential helper for github.com.
# This runs as root so the global gitconfig is written before we drop
# to agent. The helper fetches fresh GitHub App installation tokens
# from the platform API, with caching and env-var fallback.
#
# NOTE: post-suspension (2026-05-06), github.com/Molecule-AI is gone;
# the helper's platform endpoint also 500s (internal#187). The helper
# block is kept for legacy boxes that still have a working token chain;
# post-suspension provisioner injects GITEA_TOKEN directly so this
# path's failure is non-fatal. Full removal tracked under #171.
if [ -x /app/scripts/molecule-git-token-helper.sh ]; then
# Set credential helper for github.com only (not all hosts).
# The '!' prefix tells git to run the command as a shell command.
@ -55,11 +73,13 @@ if [ "$(id -u)" = "0" ]; then
"!/app/scripts/molecule-git-token-helper.sh"
# Disable other credential helpers for github.com to avoid conflicts.
git config --global "credential.https://github.com.useHttpPath" true
# Move gitconfig to agent's home so it takes effect after gosu.
if [ -f /root/.gitconfig ]; then
cp /root/.gitconfig /home/agent/.gitconfig
chown agent:agent /home/agent/.gitconfig
fi
fi
# Move gitconfig to agent's home so it takes effect after gosu —
# done unconditionally so the per-persona identity survives the drop
# even when the github.com helper block is skipped.
if [ -f /root/.gitconfig ]; then
cp /root/.gitconfig /home/agent/.gitconfig
chown agent:agent /home/agent/.gitconfig
fi
# Create the token cache directory for the agent user.
mkdir -p /home/agent/.molecule-token-cache

View File

@ -434,7 +434,7 @@ async def main(): # pragma: no cover
async def _transcript_handler(request):
# Require workspace bearer token — the same token issued at registration
# and stored in /configs/.auth_token. Any container on molecule-monorepo-net
# and stored in /configs/.auth_token. Any container on molecule-core-net
# could otherwise read the full session log. Closes #287.
#
# #328: fail CLOSED when the token file is unavailable. get_token()

View File

@ -3,7 +3,7 @@ the workspace auth token is not yet on disk.
Prior behaviour (regressed in #287): `if expected:` skipped the auth
check when `get_token()` returned None, so any container on
`molecule-monorepo-net` could read the full session log during the
`molecule-core-net` could read the full session log during the
bootstrap window. The fix lifts the guard into transcript_auth.py for
testability.
"""