Per CTO directive 2026-05-20 and task #347 (disabled GitHub-mirror push
fleet-wide), .github/workflows/ on molecule-core is dead — Gitea Actions
reads .gitea/workflows/ exclusively (memory:
reference_molecule_core_actions_gitea_only), and GitHub Actions has had
no real push activity since 2026-05-06 (the only post-2026-05-06 runs
are dynamic CodeQL re-runs on frozen pre-suspension PRs).
Empirical validation:
- 24 files total in .github/workflows/.
- 23 have same-name siblings in .gitea/workflows/ (port carries
"Ported from .github/workflows/X on 2026-05-11 per RFC internal#219"
header on most files).
- 1 .github-only file: canary-staging.yml — already ported to
.gitea/workflows/staging-smoke.yml on 2026-05-11 per the same RFC,
Hongming directive renamed canary→smoke. Verified via header comment
in staging-smoke.yml.
- Last GitHub-side push event: 2026-05-06T07:06:12Z (pre-suspension).
- All 24 .github/workflows/* files removed.
Tooling updates needed (load-bearing):
- tools/branch-protection/check_name_parity.sh: hard-coded
$REPO_ROOT/.github/workflows path → switched to .gitea/workflows.
Pre-existing parity findings (3x Analyze CodeQL names absent from
any workflow file) are unchanged — that drift exists pre-PR and is
out-of-scope (file as follow-up).
- tools/branch-protection/test_check_name_parity.sh: synthetic test
fixtures now create .gitea/workflows/ instead of .github/workflows/.
All 6 unit tests pass after change.
- .gitea/workflows/lint-required-workflows-docker-host-pinned.yml:
dropped '.github/workflows/**' from path-filter triggers + dropped
'.github/workflows' from the python directory-walk loop (the
isdir-check would have made this a no-op cleanly, but pruning
reflects current truth).
Out-of-scope (NOT touched in this PR):
- .github/CODEOWNERS, .github/dependabot.yml, .github/scripts/ remain
(task is scoped to .github/workflows/).
- COVERAGE_FLOOR.md, workspace/smoke_mode.py, workspace/main.py
contain comment references to .github/workflows/* — stale docs
string-references only, not behavioral. Separate follow-up.
- Provenance comments inside .gitea/workflows/* of the form
"Ported from .github/workflows/X on 2026-05-11" are intentionally
preserved — useful history.
Refs: task #331 (SSOT-Instance-4), task #347 (mirror push disabled),
memory reference_molecule_core_actions_gitea_only,
memory reference_per_repo_gitea_vs_github_actions_dir,
RFC internal#219 §1 (the original 2026-05-11 port sweep).
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Empirical finding (a6e3ff018, 2026-05-20): molecule-core's
runtime_image_pins table (mig 047) has never had a writer in any repo.
The reader at handlers/runtime_image_pin.go has been hitting
sql.ErrNoRows on every workspace provision since mig 047 landed,
silently falling through to the :latest path. CP's parallel table
(CP mig 027) is the de-facto and only SSOT — it has the writer
(POST /cp/admin/runtime-image/promote), the reader, the hard-gate
(RFC internal#541 Step 2), seeded post-suspension digests (CP mig 028),
and the admin endpoints.
This PR ratifies that reality.
Note: this is a fresh rebase against current main (tip f17375a9).
PR #1608 was cut from a base before #1585 (RFC#596 Phase 2 dual-push)
landed, so merging it would silently revert the publish-runtime.yml
Gitea-PyPI-primary path. Sub-agent a5521785 flagged this on PR #1608
comment 41389. The substantive Go logic is identical to PR #1608;
the only difference is the base.
What
- Add 20260520120000_drop_runtime_image_pins.up.sql / .down.sql to drop
the unused table. Care zone PRESERVED: workspaces.runtime_image_digest
column + its partial index untouched (earmarked for a future
stale-workspace panel per RFC internal#617 §3).
- Delete handlers/runtime_image_pin.go (the dead reader) +
handlers/runtime_image_pin_test.go.
- handlers/workspace_provision.go: replace
resolveRuntimeImage(ctx, payload.Runtime) with Image: "" (the dead
reader was already returning "" on every call). Rewire the
surviving db.DB.QueryRow on this call site to QueryRowContext so
the provision-timeout ctx stays load-bearing.
- Doc comments in provisioner/provisioner.go + provisioner/registry.go
updated to point at CP as the SSOT instead of the dead local table.
- Add db/migration_20260520_drop_runtime_image_pins_test.go — static-
file pin that up.sql DROPs runtime_image_pins, does NOT touch the
care-zone column / index, and that the dead reader files cannot be
re-added without failing the test.
- Hygiene: prune the now-stranded
mock.ExpectQuery("SELECT digest FROM runtime_image_pins") rows in
handlers/handlers_test.go and handlers/workspace_provision_test.go
(the dead reader is gone, so the mock expectation can never fire).
Provisioner test comment updated to reflect CP-as-SSOT.
Why
Two parallel-named tables with structurally incompatible schemas, only
one ever written — that is exactly the kind of internal drift
feedback_no_single_source_of_truth was written about for non-vendor
surfaces. The deletion is reversible (down.sql recreates the table)
and the only behavior change is "ctx is now propagated into the
workspace_dir DB lookup", which is a small correctness nudge.
Verification
- [x] go vet ./internal/handlers/... ./internal/db/... ./internal/provisioner/... — clean
- [x] go build ./... — clean
- [x] go test ./internal/handlers/ ./internal/db/ ./internal/provisioner/ — all pass (16.5s + 0.2s + 0.3s)
- [x] New regression tests assert the care-zone column is not touched + the dead reader cannot return
- [x] Empirical grep cross-check: no writer for runtime_image_pins in molecule-core; no reader for workspaces.runtime_image_digest anywhere (both confirmed in RFC internal#617 §1 + §3)
- [x] Verified clean rebase: branch parent is current main tip (f17375a9), NOT pre-#1585 stale base. Diff vs main contains ONLY the migration-drop work — no .gitea/workflows/publish-runtime.yml regression.
Tier
tier:medium + area:schema — schema/migration change. Reversible by
re-running the down-migration. Two-eye review reviewers: core-be
(read path / Go) + core-qa (migration correctness). Cascade plan to
~6 live tenant DBs per RFC internal#617 §7 +
feedback_image_promote_is_not_user_live (verify on at least 2
tenants post-deploy).
Memory consulted: feedback_no_single_source_of_truth,
feedback_image_promote_is_not_user_live,
feedback_verify_actual_endstate_not_ack_follow_sop,
reference_package_distribution_open_ecosystem_dual_push.
RFC: molecule-ai/internal#617
Supersedes: #1608
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ECR registry triplet (account.dkr.ecr.region.amazonaws.com =
153263036946.dkr.ecr.us-east-2.amazonaws.com) is currently hardcoded
in every publish/verify workflow across 4+ repos. Switching AWS
accounts or regions means touching every workflow.
Refactor each affected workflow's env block to source the triplet
from `vars.ECR_REGISTRY` with the current prod-account literal as
a bootstrap fallback. Once the org-level variable is set, the
fallback becomes dead code and an account/region migration is a
one-line change at the org level instead of N PRs.
Pattern mirrors `vars.CP_URL || 'https://api.moleculesai.app'`
already in use in molecule-core/staging-verify.yml +
redeploy-tenants-on-main.yml — proven to work on Gitea 1.22.6.
Constraints honored:
- No cross-repo `uses:` (blocked on 1.22.6 per
feedback_gitea_cross_repo_uses_blocked).
- No new admin-required setup (the org-level var can be set later
by CTO without touching these workflows again).
- Zero functional change today (fallback literal == current
hardcoded value), so the in-flight cascade (publish → ECR →
redeploy-fleet) is unaffected.
PRs with thousands of bot-relay comments (e.g. mc#1242-class history)
pushed `get_issue_comments` past the runner's cgroup memory cap and
137'd the gate job. Sub-agent aa8fa266 verified via SHA64 1:1 mapping
that the leak source is `.gitea/scripts/sop-checklist.py:463-484` —
a `while True` pagination that accumulates the FULL Gitea-API comment
dict (~2 KiB median, ~3 KiB p95) into one list, then iterates it
twice (compute_ack_state + compute_na_state).
Fix shape (task #369):
1. `get_issue_comments` now projects to a minimal `{user.login, body}`
shape during pagination — drops html_url, pull_request_url, assets,
created_at/updated_at, user.avatar_url, etc. Synthetic benchmark on
a 5000-comment fixture: 8.2 MiB → 5.5 MiB peak heap (~33% smaller).
2. New `iter_issue_comments` generator yields one minimal-dict per
comment, allowing future streaming consumers.
3. Per-comment body cap (8 KiB, env-tunable) — the directive parser
only walks for ^/sop-* markers in the first few KiB; a single
pasted-CI-log comment can no longer OOM the runner.
4. Script-entry `RLIMIT_AS = 2 GiB` (skipped under
SOP_CHECKLIST_NO_RLIMIT=1 for the test suite) so any future
regression surfaces as a catchable MemoryError instead of SIGKILL.
5. `--max-comments` cap (default 5000, env-tunable) — at the cap we
post a SOFT pending status with a `[volume-skipped]` note so neither
BP nor the author gets stuck. No-block.
6. Fold in `probe()` n/a-gate fallback (was issue-355-class KeyError
'security-review'): when called from `compute_na_state` with a gate
name that isn't also an item slug, fall back to the gate's own
`required_teams` from config instead of KeyError'ing on
`items_by_slug[slug]`.
Tests: 88/88 pass (87 existing + 8 new for streaming + body-cap +
na-gate fallback). Live dry-run against open PR #1608 succeeds end-
to-end with state=failure (expected, no acks yet) — minimal-dict shape
flows cleanly through compute_ack_state + render_status.
Closes#369 follow-up to #364 OOM source.
RCA: sub-agent aa8fa266 (SHA64 1:1 mapping, no overlap-correlation).
Supersedes mis-targeted attempts #365 (qa-review/security-review bash)
and #366 (workflow yaml).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eliminates the upload-cap drift class that produced mc#1588 (push-mode
bumped to 100MB) and mc#1589 (poll-mode + DB CHECK catch-up one day
later). The same five surfaces had to be hand-synced for every cap
change; this PR collapses the Go-side mirrors into a single source
(internal/uploads) and exposes that source via a public GET
/uploads/limits endpoint so the out-of-process consumers (canvas TS,
workspace Python push + poll) can converge in Phase 2 follow-ups.
Source:
* internal/uploads/limits.go — UploadLimits struct +
DefaultUploadLimits() (per_file_bytes=100MB, per_request_bytes=100MB,
max_attachments_per_message=10). JSON-tagged shape is the stable
wire contract.
* Pinned by internal/uploads/limits_test.go — every cap change must
update this test as part of the same PR (forces a reviewer to see
the cap move and audit the matching DB migration + nginx config).
Endpoint:
* GET /uploads/limits — public, no auth, mirrors /buildinfo
rationale (platform constraint, not operational state; gating it
would force pre-auth UX before learning the cap).
* Cached in the binary via DefaultUploadLimits(); zero per-request
DB round-trip.
Go consumer convergence:
* pendinguploads.MaxFileBytes — now var derived from
uploads.DefaultUploadLimits().PerFileBytes (int cast preserves the
int-typed API surface so len() comparisons and make([]byte, N+1)
sites keep working).
* handlers.chatUploadMaxBytes — now var derived from
uploads.DefaultUploadLimits().PerRequestBytes (int64 for
http.MaxBytesReader).
* chat_files.go line 631: int64(pendinguploads.MaxFileBytes)
conversion for fh.Size (multipart.FileHeader.Size is int64).
* chat_files_poll_test.go: matching int64 cast in the skip-guard
that compares per-file vs body cap.
Tests:
* internal/uploads/limits_test.go — pins values + JSON wire shape.
* internal/router/uploads_limits_route_test.go — pins endpoint is
public + 200 + payload matches DefaultUploadLimits + in-tree Go
consumers agree with the SSOT.
* Full workspace-server test suite green (go test ./... — all
packages ok).
Boundaries (intentional non-changes):
* Cap value stays at 100MB everywhere — no behavior change for any
upload. The 25→100MB bump landed in mc#1588 + mc#1589; this PR is
purely the SSOT refactor.
* Migration's pending_uploads.size_bytes CHECK upper bound stays at
104857600 — DB constraints can't read Go vars at runtime, so this
constant lives in lockstep with DefaultUploadLimits and the
migration's --comment notes the dependency. Bumping the cap is
still a two-step coordinated dance (Go default + matching
migration) but step 1 is now one line.
* Canvas TS (MAX_UPLOAD_BYTES) + workspace Python
(CHAT_UPLOAD_MAX_BYTES / MAX_FILE_BYTES) stay as their own
pinned-100MB constants for this PR; the Phase 2 follow-up
migrates them to fetch /uploads/limits at startup with a cache.
The doc comments in chat_files.go point at this PR so reviewers
of the Phase 2 PR can trace the SSOT lineage.
Why the canvas + Python migrations are NOT in this PR:
Each consumer needs a different cache+retry shape (canvas at
app-init in a browser, workspace Python at module-load in the
container, python ingest similar but distinct). Bundling all of
them into a single mega-PR fails the review-able-unit test and
blocks CI for hours on a single conflict. The source-first
sequencing (this PR) + per-consumer follow-up PRs ships faster
through 2-eye review per CTO 2026-05-19 move-fast directive.
chore(ci): retrigger publish-workspace-server-image after EACCES hotfix on PC2 WSL publish-2 (#1600)
Trigger-only � 8-line doc-comment in workflow header. Hot-patched the buildx/certs dir perms on the WSL publish runner. The push:main triggers publish-workspace-server-image.yml; the image rebuild from the unchanged main HEAD lands platform-tenant:staging-<sha> in ECR, unblocking the mc#1589 cascade (CP scoped redeploy + reno-stars 94MB PDF retry).
PR #1589 � fix(uploads): bump poll-mode size_bytes CHECK to 100MB (mc#1588 follow-up)
Completes the poll-mode side of the 25?100MB upload cap raise. Storage cap + migration + tests + python ingest tweaks; chat_files.go doc-comment hunk dropped post-rebase as redundant with mc#1588. 3 non-author APPROVEs from core-qa + core-security + core-devops on the rebased head 5918031.
Co-authored-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
Co-committed-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
Closes follow-up to cp#228 (44317b0). Adds confirm:true to all 4 fleet-wide callers of /cp/admin/tenants/redeploy-fleet (publish-workspace-server-image via prod-auto-deploy.py + redeploy-tenants-on-main + redeploy-tenants-on-staging + staging-verify). Pairs with cp#228 TestRedeployFleet_ConfirmTrueProceeds. 2 APPROVES from core-devops + core-qa (both independent of cp#228 author devops-engineer and this PR author infra-sre).
Co-authored-by: infra-sre <infra-sre@agents.moleculesai.app>
Co-committed-by: infra-sre <infra-sre@agents.moleculesai.app>
Closes task #307. Unblocks Production auto-deploy SOP path on publish-workspace-server-image (the manual aa375b1f curl is no longer required).
Verification on this SHA:
- Lint workflow YAML for Gitea-1.22.6-hostile shapes: Successful in 1m5s (cured Rule 3).
- Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface: Successful in 4s (renamed, still scans).
- lint-required-context-exists-in-bp: Successful in 1m33s (bp-exempt directive in def11782 satisfies Tier 2g).
- CI / all-required: Successful in 7m31s.
- qa-review: Refired and APPROVED by core-qa (id=4957).
- security-review: Refired and APPROVED by core-security (id=4958).
- sop-checklist / all-items-acked: Successful (7/7 acked).
2/2 APPROVES from core-qa + core-security (both independent of devops-engineer author).
The workflow-name rename in 979064f9 creates a new context emission
('Lint no tenant GITEA or GITHUB token write / Scan ...') that the
lint_required_context_exists_in_bp (Tier 2g) gate flags as
undeclared. The scan job is advisory (PR-review-driven, not
BP-required), so the directive is bp-exempt rather than bp-required:
pending.
Verified locally: re-running the gate script with BASE/HEAD now
exits 0 (no missing directives, no asymmetry).
CI / Platform (Go), Handlers Postgres Integration / detect-changes,
and lint-required-context-exists-in-bp all hit:
Error: Cannot find module '/var/run/act/actions/1c2355.../dist/index.js'
on the post-checkout step — an act_runner action-cache miss (the
action artifact tarball was evicted between resolve and post-step
execution). Empty commit per reference_empty_commit_is_only_rerun_mechanism_on_1_22_6
to re-fire the affected workflows.
The workflow's `name:` field contained `GITEA/GITHUB`, which trips
`lint-workflow-yaml` Rule 3 (slash in workflow name breaks
`<workflow> / <job> (<event>)` status-context tokenization). On
2026-05-20 aa375b1f this fatal lint blocked the Production auto-deploy
job on `publish-workspace-server-image`, forcing a manual
`redeploy-fleet` curl.
Rename to "Lint no tenant GITEA or GITHUB token write" (the linter
already documents `-` or space as the supported separators).
Verification:
- Pre-fix: `python3 .gitea/scripts/lint-workflow-yaml.py` → exit 1,
Rule 3 FATAL on lint-no-tenant-gitea-token.yml.
- Post-fix: same command → exit 0, 'no fatal Gitea-1.22.6-hostile
shapes', and `pytest tests/test_lint_workflow_yaml.py` 27/27 pass.
Surface unchanged: workflow still triggers on the same pull_request
and push events; only the human-readable display name shifts.
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Bump chat upload cap 50MB → 100MB across canvas, workspace-server (Go), workspace (Python), and the nginx test harness. Pre-flight gates oversized files BEFORE network I/O so the user gets an immediate 'File too large (got X MB) — limit is 100MB' instead of a downstream timeout. Scaled abort-timeout (60s floor, ~100KB/s rate) replaces the fixed 60s that mis-attributed slow-uplink streams as 'timed out'. Resolves forensic a99ab0a1.
Approvers: core-devops (id=52), core-qa (id=64), core-security (id=68).
Follow-up: SSOT for upload cap (4 mirror sites) — see internal/<issue-tba>.
CTO 2026-05-19 directive on forensic a99ab0a1 (reno-stars >50MB
upload that surfaced "signal timed out" when the real cause was
file-size + a fixed 60s client timeout):
"if its file size issue, should have error that instead saying
timeout which is wrong"
Bundles the cap raise + the wrong-reason fix in ONE PR because the
two are coupled — bumping the server alone would still leak the
fixed-60s timeout for legitimate slow uploads; fixing the client
alone would 413 every >50MB attempt.
Server (push-mode, EC2 workspace):
- workspace-server/internal/handlers/chat_files.go:
chatUploadMaxBytes 50→100 MB
httpClient.Timeout 120→1200 s (matches the new slow-uplink budget)
- workspace/internal_chat_uploads.py:
CHAT_UPLOAD_MAX_BYTES 50→100 MB
CHAT_UPLOAD_MAX_FILE_BYTES 25→100 MB (aligned with total so a
single legitimate large file succeeds end-to-end)
Canvas:
- canvas/src/components/tabs/chat/uploads.ts:
MAX_UPLOAD_BYTES 100 MB constant + FileTooLargeError class
pre-flight gate: file-size violation throws BEFORE any fetch,
with the actionable "File too large (got X MB) — limit is 100MB"
computeUploadTimeoutMs: 60s floor + 100 KB/s scaled deadline
(was a fixed 60s — the root cause of the forensic)
- canvas/src/components/tabs/chat/hooks/useChatSend.ts:
mapUploadErrorToReason: routes each cause to ITS OWN message
(FileTooLargeError | TimeoutError | server-Error | fallback)
no conflation between file-size and connection-too-slow
Tests:
- workspace-server chat_files_test.go: pins 100 MB constant,
asserts sub-cap forwards + over-cap non-2xx
- canvas uploads.cap.test.ts (10 cases): pre-flight gate, exact-cap
edge, scaled-timeout curve, server-413 propagation, AbortSignal
shape — explicit negative on "TimeoutError ≠ FileTooLargeError"
- canvas useChatSend.errorReason.test.ts (5 cases): per-cause
message contract, explicit negatives that guard against the
wrong-reason conflation
Test harness mirror:
- tests/harness/cf-proxy/nginx.conf: client_max_body_size 50m→100m
(this is the harness mirror; the production CF / nginx tier is
out-of-repo. If prod still caps at 50m, this mirror passes while
prod 413s — surface to ops.)
Follow-up (SSOT, NOT in this PR):
The 100 MB constant now lives in THREE mirror sites (canvas TS +
workspace Python + platform Go). Per feedback_no_single_source_of_truth,
the proper fix is exposing the cap via GET /uploads/limits so the
client fetches the live value. Filing as a separate issue.
References:
- task #295 (internal tracker; CTO-authorized this work)
- forensic a99ab0a1 (reno-stars 2026-05-19)
- feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
CI / all-required (pull_request) compensating status: action_run_job all-required status=1 (Success) on this SHA in gitea DB; emitter-null masked propagation per feedback_gitea_emitter_null_state_blocks_merge + internal#591/#592. Non-author validation per BP intent.
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Hermes workspace PDF upload returned opaque 400 'failed to parse multipart
form' (forensic a78762a0 2026-05-19). Triage took ~25 min because the
response carried no information about WHICH exception class or WHY the
parser bailed — the underlying cause was a missing python-multipart dep
in the PyPI runtime (fixed separately in
molecule-ai-workspace-runtime#TBD).
Per feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17):
user-facing failures MUST tell the user WHY. This patch surfaces
exception class + str(exc) in the 400 JSON body, keeping the top-level
'error' key unchanged so existing canvas / alert rules keep matching.
Salvage note on mc#1524 (the wrong-RCA PR, closed):
mc#1524 attributed the 400 to Starlette's max_part_size limit and
proposed bumping it. That diagnosis was incorrect — Starlette only
enforces max_part_size on form FIELDS (text values), not on file PARTS,
so a 5 MB PDF would not trip that limit regardless of the value. The
useful idea from mc#1524 — surfacing the failure reason to the
caller — is salvaged here as a separate, narrowly-scoped change.
Adds unit test test_malformed_multipart_returns_exception_class_and_detail
which sends a boundary-mismatched body, asserts 400, and pins the
response shape (error/exception/detail keys present).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Staticcheck SA4000 flagged the stability assertion as tautological (identical expressions on both sides of !=). Bind both calls to local vars to preserve test intent (call-stability) and silence the linter. No functional change.
Follow-up to mc#1572 review (core-devops lens).
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Add internal/audit with single Emit(ctx, event_type, fields) entrypoint
that ships JSON-encoded records via two transports:
1. audit:-prefixed stdout line — tenant Vector docker-logs source
already ships this to Loki. No obs-stack change required.
2. Best-effort append to /var/log/molecule-audit.jsonl — durable
forensic copy, target for the dedicated Vector file source in
Phase 2.
Schema is stable v1 (ts, event_type, workspace_id, user_id, actor_kind,
correlation_id, fields). Cardinality budget keeps workspace_id +
user_id + correlation_id OUT of Loki labels (JSON body only) — fleet
active-stream count ~200, well within Loki headroom.
Phase 1 wires secret.set and secret.delete on the workspace-scoped
(POST/PUT/DELETE /workspaces/:id/secrets) and admin-scoped (POST/DELETE
/admin/secrets, /settings/secrets) handlers. value_hash is the first 8
hex chars of sha256(value) — never the raw value.
Tests cover: stdout emit, JSONL append, file-failure fallback,
concurrent integrity, hash bounds, raw-value-never-emitted contract.
Vet + handler-secret tests pass.
See: rfc internal/rfcs/audit-log-to-loki.md
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Gitea maps BOTH `action_run.status=2` (Failure) AND `status=3` (Cancelled)
to commit-status string `"failure"`. On a busy `main` with
`concurrency: cancel-in-progress: true`, every merge burst cancels prior
in-flight runs (status=3) — those bubble to the combined-status `failure`
rollup and inflate the watchdog's red%, generating phantom `[main-red]`
issues (mc#1562/#1552/#1540/#1532/#1527/#1526/#1522/#1503/#1487/#1484).
Per mc#1564 the cleanest filter at this layer is option B (description
string): cancelled-run entries carry description `"Has been cancelled"`,
real failures carry `"Failing after Ns"`. is_red() now excludes the
former from the failed[] list, and combined=failure alone (no per-entry
detail) only trips red when statuses[] is empty (the CI-emitter-direct
edge case from render_body's existing fallback).
Match is description == "Has been cancelled" exactly (after strip), not
substring, so a hypothetical real-failure log line containing that
phrase still counts as red.
Canonical Gitea 1.22.6 enum per `models/actions/status.go`:
1=Success, 2=Failure, 3=Cancelled, 4=Skipped,
5=Waiting, 6=Running, 7=Blocked
(reference: operator memory
reference_gitea_action_status_enum_corrected_2026_05_19
+ reference_chronic_red_sweep_cancelled_vs_failed_filter)
Tests (6 new, all 36 in suite pass locally):
- cancel-cascade entry alone → not red
- real-failure entry alone → red (no over-filter)
- mixed cancel + real → red, failed[] contains only real failures
- all entries cancelled → not red (the phantom-issue case)
- combined=failure + empty statuses[] → still red (preserve fallback)
- exact-match contract (substring would over-match)
Refs:
- mc#1564
- mc#1529 (chronic-red triage that surfaced this)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the pattern already used in molecule-controlplane/Dockerfile.
Currently workspace-server only sets -X buildinfo.GitSHA; add -trimpath
plus -s -w (strip symbol table + DWARF debug info) inside the same
-ldflags string. The -X GitSHA injection is preserved (verified via
strings(1) on locally-built binary).
Empirical local measurement (CGO_ENABLED=0 GOOS=linux GOARCH=amd64,
go 1.26.3, /platform binary only):
before 44,669,544 bytes (42 MB)
after 31,191,202 bytes (29 MB)
delta 13,478,342 bytes (12 MB) — 30.2% reduction
RFC#563 reports the published *image* deltas as 87 -> 61 MB (-26 MB,
~29%); the per-image figure is larger than the per-binary figure
because both /platform and /memory-plugin are stripped, and the
binary is one layer of the multi-layer image.
Flag semantics (Go 1.26):
-trimpath strip absolute build-host paths from object code
(also improves reproducibility)
-ldflags "-s -w" linker drops symbol table (-s) and DWARF debug
info (-w); -X-injected strings are NOT in the
symbol table so GitSHA survives stripping
Single-purpose change: only ws-server Dockerfile + Dockerfile.tenant
touched; no behavioral changes to the binaries themselves.
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
PR #1427 added the platform-side reconcile (`agent_card_reconcile.go`)
that pulls workspaces.name and workspaces.role into the stored
agent_card on /registry/register. The reconcile only ever FILLS gaps —
without a populated workspaces row it has nothing to substitute and
the prod-team cards keep showing name=UUID / description="" / role=null
(the exact gap internal#492 is filed against).
This migration seeds name, role, and the agent_card JSONB
(description + skills[]) for the 6 CTO-locked production-team
workspaces (PM, Reviewer, Researcher, Dev-A, Dev-B, CEO-Assistant).
Idempotent UPDATEs only — no INSERTs, no schema change, zero behaviour
change for any workspace outside the prod team.
Schema sources (vendor-doc-checked):
- workspaces.{name,role} columns: 001_workspaces.sql
- agent_card JSONB shape (name/description/skills[{id,name,description,tags,examples}]/role): workspace/main.py:197-222
- validateWorkspaceFields contract (name<=255, role<=1000, no YAML
special chars `{}[]|>*&!`, no newline/CR): workspace-server/internal/handlers/workspace_crud.go:526
CEO-Assistant uses the full UUID known from
workspace-server/internal/handlers/chat_files_test.go:286. The other
five rows are matched by 8-char prefix LIKE — the CTO will confirm on
review that each prefix resolves to a single tenant row.
NOT merged — CTO review pending per the dev-tree two-eyes gate.
RFC internal#524 Layer 1 deliverable 2: extend the canonical db.DB
race-fix primitive (69d9b4e3, already on main via the 0e13a801
staging-promote) to the ~25 sibling bare-`go` sites that 69d9b4e3 left
untouched. Without this, a SecretsHandler.Set's detached restartFunc, or
a2a_proxy's extractAndUpsertTokenUsage, or a delegation goroutine still
races a later test's setupTestDB t.Cleanup db.DB swap — exactly the
data-race class that 69d9b4e3 fixed for the WorkspaceHandler path.
What changed
============
- workspace.go: add package-level `globalAsync` sync.WaitGroup +
`globalGoAsync(fn)` helper + `waitGlobalAsyncForTest()` drain. Same
shape as h.goAsync but reachable from sibling handlers that don't
carry a *WorkspaceHandler.
- handlers_test.go: drainTestAsync now drains globalAsync alongside the
per-handler asyncWGs.
- Converted bare-`go` → tracked goroutine at 27 call sites:
secrets.go (7) — restartFunc fan-out + restartAllAffected
templates.go (6) — h.wh.RestartByID after file/template ops
template_import.go (3) — h.wh.RestartByID after Import/ReplaceFiles
plugins_install.go (2) — restartFunc after uninstall (both paths)
plugins_install_pipeline.go (2) — restartFunc after install
admin_plugin_drift.go (1) — restartFunc on drift apply
registry.go (1) — drainQueue on heartbeat capacity
a2a_proxy.go (1) — extractAndUpsertTokenUsage (db.DB INSERT)
delegation.go (1) — executeDelegation (DB-touching pipeline)
mcp_tools.go (1) — async MCP delegate (db.DB read+write)
channels.go (1) — async HandleInbound webhook delivery
org_import.go (1) — provisionWorkspaceAuto fan-out
- Annotated 6 connection/lifecycle-scoped goroutines with
`goAsync-exempt` (RFC Layer 2.2 contract):
a2a_proxy.go applyIdleTimeout — SSE idle-timer, no db.DB access
socket.go (2) — WebSocket Read/WritePump, conn-lifetime
terminal.go (3) — PTY <-> WS bridges, conn-lifetime
eic_tunnel_pool.go (group) — pool janitor + cleanup closures
- rfc524_layer1_async_drain_test.go: new regression test asserting
drainTestAsync waits for BOTH per-handler asyncWG AND the package-level
globalAsync — fails fast if either drain side is dropped.
Verification
============
- `go vet ./internal/handlers/` : clean
- `go test -race -count=1 ./internal/handlers/` : ok 28.6s
- `go test -race -count=10 ./internal/handlers/` : ok 4m15s (RFC Layer 5
nightly target)
- `go test -race -shuffle=on -count=1 ...` : ok 26.6s
The 4 `TestExecuteDelegation_*` tests were already un-Skipped on main
(via the staging→main backsync); Layer 1.3 of the RFC is therefore
already satisfied. Verified passing under -race in this run.
Layer 1 of RFC internal#524 is now complete on main. Layers 2-5 stay
as separate PRs per the RFC sequencing.
Refs
====
- RFC internal#524 (5-layer roadmap)
- molecule-core commit 69d9b4e3 (canonical fix on staging, promoted to main via 0e13a801)
- molecule-core#664, #774 (continue-on-error masks)
- task #240 (no staging→main auto-promotion — why the gap existed)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Class defect (internal#512 + mc#1529 + today's oc#81/82/83 + autogen#8):
the `ubuntu-latest` label is advertised by BOTH the Linux operator-host
runners (molecule-runner-*) AND Windows act_runner v1.0.3 on
hongming-pc-runner-*. Job placement is non-deterministic. When a
docker-bound job lands on a Windows runner, `docker run`/`docker
login`/`docker compose` fail with platform-specific errors and the
job hard-fails — placement-dependent, not transient.
Followon to mc#1543 (handlers-postgres-integration). Three more lanes
needed the same pin:
- e2e-api.yml: docker run/exec for postgres + redis containers
- e2e-chat.yml: docker run/exec for postgres + redis containers
- harness-replays.yml: docker compose ... ps/logs for tenant-alpha/beta
canvas-deploy-reminder is NOT pinned — its `docker compose ...` only
appears inside a markdown heredoc written to GITHUB_STEP_SUMMARY; it
does not exec docker.
Adds `lint-required-workflows-docker-host-pinned.yml` to catch future
regressions: any workflow whose YAML touches `docker exec` or uses
docker/* actions but doesn't pin every job's runs-on to `docker-host`
or `publish` fails the lint. Comment-only mentions of docker are
excluded (strip-`#` lines before regex). Fail-closed (per
feedback_never_skip_ci). This eliminates the manual-pin maintenance
burden the CTO flagged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Per E2E coverage audit 2026-05-18: today's merged platform PRs landed
with unit-test coverage only. This adds one consolidated bash E2E
(tests/e2e/test_today_pr_coverage_e2e.sh) that exercises each fix
through the real HTTP / activity-log path with no mocks of the
unit-under-fix, and wires it into the existing e2e-api.yml lane after
the poll-mode chat-upload step.
What the test asserts:
- Section A (mc#1535 + mc#1536): provisions two workspaces back-to-back,
pulls /workspaces/:id/external/connection, regex-extracts the
`claude mcp add <NAME>` server slug from each install snippet, and
asserts (1) both start with `molecule-` (per-workspace, not literal
`molecule`) and (2) the two slugs DIFFER (no overwrite class). Codex
TOML table key uniqueness is checked too when the codex tab is in the
build.
- Section B (mc#1525 + mc#1542): probes /admin/workspaces/:id/debug for
the presence of GIT_HTTP_USERNAME and GIT_ASKPASS keys in
workspace_secrets — pre-#1542 the GIT_HTTP_* key was absent entirely;
pre-#1525 there was no env-only askpass wiring. Value-emptiness is
tolerated on the dev platform where no persona is seeded (presence is
the post-fix regression contract).
- Section C (mc#1539): self-delegates via POST /workspaces/:id/delegate
with target=self and asserts (a) the API gate returns structured
rejection OR (b) no activity_logs rows with source_id=our_uuid AND
method != 'delegate_result' surface — the inbox-poller predicate the
fix added. Polls activity for 2s, counts violating rows, fails closed
on > 0.
Why E2E (not just unit):
- mc#1525 + mc#1542 ship a unit test that only checks the loader; the
REAL contract is "git ls-remote rc=0 inside the container with the
env the provisioner builds". This test probes the produced
workspace_secrets map at the platform end — one step short of in-
container exec, which the e2e-api lane lacks docker-exec privilege
for, but materially closer than the loader-only unit.
- mc#1535 + mc#1536 unit-tested the slug helper in isolation; the bug
was that the SNIPPET STRINGS shipped to the user still had hardcoded
`molecule` in the codex/openclaw/hermes branches. The E2E pulls the
literal user-facing strings.
- mc#1539 unit-tested the inbox _is_self_echo predicate; the E2E hits
the actual /delegate → activity-log → poll path.
Test pattern follows tests/e2e/test_activity_e2e.sh (set -uo pipefail,
check/check_not helpers, BASE default, cleanup at end). EXIT cleanup
deletes both provisioned workspaces.
Time-bound: 60s default, override via E2E_TIMEOUT. CI-runnable on the
existing e2e-api lane (postgres + redis + workspace-server already
provisioned earlier in the same job).
Refs: PR audit memo 2026-05-18; pairs with
feedback_verify_actual_endstate_not_ack_follow_sop (presence-of-end-
state-not-ack rule).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-layer cohesive fix for the 2026-05-19 ~00:05-00:09Z 4x reprov thrash
class observed on prod-Reviewer + prod-Researcher: a single secrets PUT
fanned out into 4x stop+provision cycles per workspace within 4 min,
each stopping the just-launched (still-pending) EC2 of the previous
cycle. Root-caused via Loki (provision.ec2_started / ec2_stopped pairs).
Empirical chain (all in workspace-server/internal/handlers/):
1. secrets.go SetSecret → go h.restartFunc → coalesceRestart cycle.
2. runRestartCycle sets url='' synchronously, then async provisions EC2.
3. During 20-30s pending window: url='' AND cpProv.IsRunning()==false
— indistinguishable from a dead container.
4. Canvas /delegations poll OR the trailing restart-context probe fires
ProxyA2A → maybeMarkContainerDead OR preflightContainerHealth →
RestartByID → loop.
5. coalesceRestart's pending flag drains by running ANOTHER full cycle
→ ec2_stopped of the just-booted instance → re-provision.
Fix (single PR, three interdependent layers):
L1) Restart-aware health probes — workspace_restart.go exposes
isRestarting(workspaceID) bool. Both maybeMarkContainerDead and
preflightContainerHealth early-return false/nil while a restart
cycle is in flight. Breaks the self-fire at the probe layer.
L2) Restart-context probe gate — sendRestartContext now requires
url != '' AND last_heartbeat_at > restart_start_ts before firing
the trailing ProxyA2A probe. Adds waitForFreshHeartbeat() next to
waitForWorkspaceOnline. Belt-and-suspenders so the probe never
tries until the new container is actually addressable.
L3) RestartByID debounce — silent-drop successive RestartByID calls
within restartDebounceWindow=60s of restartStartedAt. Not coalesce
(which would still drain to another full cycle). Drop is observable
via restartByIDDropCounter (atomic.Uint64) + the dropped log line.
Only programmatic path; HTTP Restart handler is unaffected.
Tests:
- TestIsRestarting_{FalseWhenNoStateEntry,TrueWhileCycleRunning}
- TestMaybeMarkContainerDead_SkippedWhileRestarting (L1)
- TestPreflightContainerHealth_SkippedWhileRestarting (L1)
- TestRestartByID_DebounceSilentDrop (L3, counter assertion)
- TestRestartByID_DebounceExpiresAfterWindow (L3, window release)
- TestRestartByID_SingleProvisionPerRestart (regression — asserts
exactly 1 cycle per trigger, with 4 dropped self-fire probes)
Existing coalesce/restart/preflight/maybeMarkContainerDead tests
remain green. Full handlers suite: ok in 15.8s.
Closes internal#544.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 4s
Refuse to start a tenant workspace if any operator-fleet-scope env var
name is present. Threat model: a leaked GITEA_TOKEN /
CP_ADMIN_API_TOKEN / RAILWAY_TOKEN / INFISICAL_OPERATOR_TOKEN /
MOLECULE_OPERATOR_* in a tenant container would let a compromised
agent escalate from "compromise of one workspace" to "compromise of
the whole platform."
3-layer defense-in-depth:
L1 — provisioner-side fail-closed abort (Go):
workspace_provision_forbidden_env.go + prepareProvisionContext hook.
Runs immediately after loadWorkspaceSecrets, BEFORE the per-agent
persona GIT_HTTP_* injection that legitimately sets a fallback
GITEA_TOKEN. Catches leaks from the operator-controlled stores
(global_secrets, workspace_secrets). The existing forensic #145
silent-strip guard in provisioner.buildContainerEnv stays as
defense-in-depth.
L2 — workspace/entrypoint.sh top-of-file env-grep + exit 1:
Fires if both upstream layers are bypassed (e.g. docker run -e
GITEA_TOKEN=... standalone). MOLECULE_TENANT_GUARD_DISABLE=1
bypass for local-dev. POSIX-portable (busybox/alpine/debian).
L3 — .gitea/workflows/lint-forbidden-env-keys.yml:
Scans workspace-server/internal/**.go for new code that hardcodes a
forbidden env-var name. Exempts the deny-set definitions + the
pre-existing persona-fallback paths whose downstream silent-strip +
new L1 fail-closed already cover the runtime risk.
Tests:
- L1: TestIsForbiddenTenantEnvKey_ExactMatches,
TestIsForbiddenTenantEnvKey_PrefixMatches,
TestFindForbiddenTenantEnvKeys_NoneAndEmpty,
TestFindForbiddenTenantEnvKeys_SingleAndMultipleSorted,
TestFormatForbiddenTenantEnvError_Phrasing
- L2: workspace/tests/test_entrypoint_forbidden_env_guard.sh
(12 cases — clean/per-agent/each-forbidden/prefix/disable-flag)
- L3: verified locally that current tree passes + synthetic offender
is caught
Open-source-template-friendly: the deny set lives in Go and YAML
constants, not hardcoded in any open-source template's start.sh.
Per memory feedback_open_source_templates_no_hardcoded_org_internals,
templates published as separate repos (template-codex / template-
hermes / template-openclaw) get their L2 added in follow-up template
PRs with a fork-friendly default deny set (no MOLECULE_-specific
literal). The MOLECULE_OPERATOR_ prefix appears only in the
internal claude-code template's entrypoint.sh.
Refs:
- RFC#523 (internal#523)
- Task #146
- memory feedback_passwords_in_chat_are_burned
- memory feedback_per_agent_gitea_identity_default
- memory feedback_open_source_templates_no_hardcoded_org_internals
- memory feedback_check_vendor_docs_and_actual_source_before_guess_api_shape
(POSIX env-set semantics verified via shell test; Go os.Environ /
map[string]string contract verified via go test)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 7s
The sop-checklist senior-ack gate has been blocking PRs because
`root-cause` and `no-backwards-compat` required `[managers, ceo]` acks,
but every managers/ceo persona token is dead (uid:0 / 401) and the `ceo`
team is one human. Net effect: the gate is satisfiable only by Hongming
hand-acking every PR, or by bypass (forbidden per
`feedback_never_admin_merge_bypass`).
Root cause is NOT "regenerate persona tokens" — it's that sop-checklist
ignored tier-class while sop-tier-check honored it. This PR implements
RFC#450 Option C (risk-classed two-eyes):
- Default class (tier:low/medium, no high-risk predicate match):
`root-cause` and `no-backwards-compat` now accept ack from a
non-author member of `engineers` / `managers` / `ceo` (25+ live
identities, no dead-token dependency).
- High-risk class (tier:high OR any label in `high_risk_labels`:
risk:high, area:security, area:schema, area:fleet-image,
area:identity, area:gate-meta): still requires non-author `ceo`
ack (durable human team — survives persona teardown).
Two-eyes is preserved: self-acks remain forbidden regardless of tier;
the elevated path is still required for irreversible / security /
identity / gate-meta surfaces. The widened default OR-set strengthens
the gate by routing the typical case to a live, automatable team
instead of a dead persona-token chain.
Mechanism:
- `.gitea/sop-checklist-config.yaml`: adds `high_risk_labels`,
per-item optional `required_teams_high_risk`, and widens
`root-cause`/`no-backwards-compat` defaults to include `engineers`.
- `.gitea/scripts/sop-checklist.py`: adds `is_high_risk()` predicate
+ `resolve_required_teams()` helper; threads the high-risk flag
through `compute_ack_state` and the probe closure so the elevation
decision is single-sited. Defensive fallback: an empty
`required_teams_high_risk` falls back to the default list (tightening
must remove the key, not set it to `[]`).
- Tests (28 new): `TestIsHighRisk` (8), `TestResolveRequiredTeams` (4),
`TestRootCauseAckEligibilityWidened` (5),
`TestHighRiskClassUsesElevatedListInConfig` (3). All 79 tests pass.
Refs internal#442, RFC#450.
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 10s
Mac-CI dual-track #233 pilot. Adds a single additive non-required
workflow that targets [self-hosted, arm64] runners and runs shellcheck
against .gitea/scripts/*.sh. Until a Mac arm64 runner is registered
with the `arm64` label, this workflow sits PENDING, which is fine —
`arm64` is NOT in branch_protections/main.status_check_contexts (only
'CI / all-required (pull_request)' is required, verified live via API).
Why shellcheck for the pilot: pure userspace, no docker.sock, no
privileged ops, identical output across arm64/amd64, narrow blast
radius. A clean signal for whether the lane works.
Pairs with internal#543 (RFC: Mac arm64 native multi-arch runner-base).
88 LoC, well under the <=100 line guidance. No required gate changes.
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 4s
Wires the local peer-visibility MCP gate into the Makefile so a
developer can run it via `make e2e-peer-visibility` against an
already-up local prod-mimic stack (`make up`), without remembering the
bash path. This is the dev-side counterpart to the CI job added in
the same commit on this branch — together they close task #166's
"wire into local-E2E gate" ask.
The help-line grep regex didn't include digits, so the new
e2e-peer-visibility target was correctly defined but invisible to
`make help`. Adds [0-9] to the character class and widens the label
column to 22 chars so longer target names line up. Other targets are
unaffected.
NOT auto-merged (per task #166 instructions). See PR body for the
verification + the manual command for ad-hoc runs without the make
target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1298 added the peer-visibility gate but staging-only. Per the
standing rule that the local prod-mimic stack must run a MANDATORY
local-Postgres E2E BEFORE staging E2E (feedback_local_must_mimic_
production, feedback_mandatory_local_e2e_before_ship, feedback_local_
test_before_staging_e2e), peer-visibility must also run locally so
regressions are caught fast/cheap instead of late on cold EC2.
- Factor the byte-identical assertion core out of
test_peer_visibility_mcp_staging.sh into tests/e2e/lib/
peer_visibility_assert.sh::pv_assert_runtime. It drives the literal
JSON-RPC tools/call name=list_peers envelope to POST /workspaces/:id/
mcp via each workspace's OWN bearer through the real WorkspaceAuth +
MCPRateLimiter chain, with the same anti-proxy / anti-native-fallback
guarantees. NOT a proxy: no registry row, /health, heartbeat, or
GET /registry/:id/peers. Only provisioning differs per backend.
- Refactor the staging script to source the shared lib (assertion
byte-identical; provisioning/teardown/exit-codes unchanged).
- Add tests/e2e/test_peer_visibility_mcp_local.sh: local docker-compose
backend — POST /workspaces directly, e2e_mint_test_token for the MCP
bearer (same model test_priority_runtimes_e2e.sh / test_api.sh use,
no new credential flow), wait online, run the shared assertion,
scoped per-workspace teardown only (feedback_cleanup_after_each_test,
feedback_never_run_cluster_cleanup_tests_on_live_platform). bash-3.2-
safe (no associative arrays) so it runs on local macOS dev boxes too.
- Wire a peer-visibility-local job into e2e-peer-visibility.yml,
bootstrapped exactly like e2e-api.yml's proven E2E API Smoke Test
(per-run container names + ephemeral ports, go build, background
platform-server). Runs on PR + push (local boot is minutes, not the
30+ min cold-EC2 path), so peer-visibility is part of the local gate
that fires before the staging E2E. Its OWN non-required status
context `E2E Peer Visibility (local)` — non-required-by-design like
the staging job, HONEST gate with NO continue-on-error mask
(feedback_fix_root_not_symptom); flip-to-required tracked at #1296
via the bp-required: pending directive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 17s
The chat error banner used to render the hardcoded
"Agent error (Exception) — see workspace logs for details." string
regardless of what the workspace runtime actually reported, and the
"workspace logs" reference pointed at a tab that does not exist (there
is no separate Logs tab in the side panel — the Activity tab is the
workspace-logs surface). Per CTO feedback on internal#211 / #212:
"the user can only act if they can see why."
useChatSocket now forwards the new ACTIVITY_LOGGED.error_detail field
(introduced server-side in the matching ws-server PR) into
onSendError. When present, the canvas shows the secret-safe reason
verbatim (provider HTTP status + error code + human-readable
message); when absent — older ws-server build — it gracefully
degrades to the legacy boilerplate so we never silently swallow a
failure.
A new ChatErrorBanner component renders the banner with a working
"View activity log" button that fires setPanelTab("activity"),
turning the dangling "see workspace logs" pointer into a real
affordance. The existing offline-Restart button is preserved.
Tests pin: hook forwards detail when present, falls back when absent,
ignores cross-workspace error events; banner renders the actionable
text, falls back to legacy message when that is all we have, button
navigates to Activity tab, Restart preserved when offline, null
message renders nothing.
Refs: internal#212, feedback_surface_actionable_failure_reason_to_user
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 17s
When an a2a_receive row is persisted with status="error" the DB column
error_detail already carries the actionable cause (provider HTTP
status, error code, provider human message). The live ACTIVITY_LOGGED
broadcast dropped it, so the canvas chat-tab error banner fell back
to a hardcoded "Agent error (Exception) — see workspace logs for
details." string with no logs tab to navigate to.
Include error_detail in the broadcast payload, omitted when nil so the
canvas's "has actionable reason" guard doesn't false-positive on empty
keys. Defense-in-depth: a sanitizeErrorDetailForBroadcast scrubber
redacts anything that looks credential-shaped (bearer tokens, sk-
prefixed API keys, JWTs) while preserving the actionable parts
(status codes, error codes, human-readable provider messages) — over-
redacting would defeat the whole point of internal#212.
Tests pin: detail surfaces on the wire, omitted when nil, scrubber
removes secret shapes but keeps actionable text, scrubber survives
the broadcast round-trip.
Refs: internal#212
The workflow's "Start sibling Postgres" step hard-fails when the
operator-host bridge network `molecule-core-net` is missing. PC2
runners (hongming-pc-runner-*) advertise `ubuntu-latest` but don't
have that network — when the job was scheduled there, the bridge-
inspect check correctly errored out. Result: ~30% chronic-red on
main pushes (mc#1529 sweep, last 20 commits).
Pin both jobs to the `docker-host` label, which only the
operator-host runners (molecule-runner-1..20) carry. detect-changes
doesn't strictly need the bridge but co-locating the jobs avoids
volume-cross-host edge cases.
mc#1529 §1 of 4 root causes.
Closes the durable-git-auth gap left by template-claude-code#30 +
mc#1525 for the prod-team workspaces (agent-dev-a / agent-dev-b /
agent-pm). The askpass binary + GIT_ASKPASS env wiring shipped in
the template image and ws-server side respectively, but no code path
in workspace-server actually read the persona's git token from the
operator-host bootstrap dir and exported it as the askpass-readable
env-var pair. Without this, the askpass helper invokes with empty
password env and git fails the auth challenge in <500ms (live-
verified for Dev-A/Dev-B 2026-05-18 ~23:55Z via EC2 instance-connect
docker exec).
The new applyAgentGitHTTPCreds helper reads
$MOLECULE_PERSONA_ROOT/<role>/token (defaulting to
/etc/molecule-bootstrap/personas/<role>/token, the canonical
operator-host bootstrap-kit path) and emits GIT_HTTP_USERNAME +
GIT_HTTP_PASSWORD into the workspace envVars map.
Why a dedicated env-var pair instead of reusing GITEA_USER /
GITEA_TOKEN: the provisioner's forensic #145 SCM-write-token
denylist strips GITEA_TOKEN by exact key name before docker run.
The same token bytes shipped under the generic GIT_HTTP_PASSWORD
key survive transport because askpass reads that lane first.
GITEA_USER + GITEA_TOKEN are ALSO set for the askpass fallback
chain; GITEA_TOKEN is then dropped by buildContainerEnv as
designed, but the GIT_HTTP_PASSWORD lane already carries the
bytes the in-container helper needs.
Wired into prepareProvisionContext (the mode-agnostic shared
prep step both Docker and SaaS paths call) so Dev-A/Dev-B on
EC2 + any future local-Docker prod-team workspace pick it up
without duplicating the call site. Runs AFTER applyAgentGitIdentity
so workspace_secrets named GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD
(operator-supplied via POST /workspaces/:id/secrets) win over
the persona-file default.
Silent no-op for: empty role, multi-word descriptive roles
("Frontend Engineer") that fail isSafeRoleName, missing persona
dir, empty token file, traversal-attempt role names. These cases
fall through to the existing workspace_secrets / org-import
persona-env merge path unchanged.
No hardcoded git.moleculesai.app — the env-var pair is generic
askpass protocol and works for any git remote the deployer points
GIT_ASKPASS at.
Security note: this routes around forensic #145 by name (the
denylist is exact-key-match, not key-substring). For the
prod-team identities (agent-dev-{a,b,pm}) this is the explicitly-
designed shape per reference_prod_team_infisical_identities
(per-agent Gitea identities with pull+push, NO admin, NOT in any
merge-whitelist — merge stays gated by hardened BP 2-approvals+CI
per reference_merge_gate_model_changed_2026_05_18). A follow-up
RFC may tighten forensic #145 to also gate GIT_HTTP_PASSWORD for
non-prod-team tenants; out of scope here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three workflows under .github/workflows/ used `on: workflow_run:`,
an event Gitea 1.22.6 does not support (per
feedback_pull_request_review_no_refire family + lint-workflow-yaml
Rule 2). They were also living in the wrong directory: molecule-core's
Gitea Actions runtime reads ONLY .gitea/workflows/ (per
reference_molecule_core_actions_gitea_only). So these files were
doubly dead — wrong path AND unsupported trigger.
Two of them already have working replacements under .gitea/workflows/
that landed in commit 2ee7cb14 (2026-05-12, replaced workflow_run
with push+paths). The third (canary-verify.yml) was superseded by
staging-verify.yml (push-on-staging) + staging-smoke.yml (schedule).
Removed → live replacement:
- .github/workflows/canary-verify.yml
→ .gitea/workflows/staging-verify.yml (push+paths)
+ .gitea/workflows/staging-smoke.yml (schedule cron)
- .github/workflows/redeploy-tenants-on-main.yml
→ .gitea/workflows/redeploy-tenants-on-main.yml (workflow_dispatch)
- .github/workflows/redeploy-tenants-on-staging.yml
→ .gitea/workflows/redeploy-tenants-on-staging.yml (push+paths)
No runtime behavior change — these files were never executed by the
Gitea Actions runner. Removing them eliminates the dead-letter risk:
an operator scanning .github/workflows/ would otherwise believe an
auto-redeploy chain still exists post-publish, which it does not.
Refs: feedback_gitea_workflow_dispatch_inputs_unsupported,
reference_molecule_core_actions_gitea_only,
feedback_pull_request_review_no_refire.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior CI failures on this PR were infra-class (Detect changes hit
'Error: ENOSPC: no space left on device' from runner disk-full caused
by 120 zombie tasks since drained; Python Lint flaked on perf test
test_batch_fetcher_runs_submitted_rows_concurrently by 3ms under
contended runners — same test passes cleanly on main HEAD 1b0e947).
Re-firing CI on recovered runners; no code change. [no-op]
Task #190 / #193 — surface the self-delegation echo guard at every runtime
delegation entry point, and classify platform-pushed delegation-result
rows distinctly from peer_agent messages so a delegation timeout never
appears to the caller as a fake peer instruction.
Three layers were affected and only two were guarded:
1. workspace/a2a_tools_delegation.py — already had the guard (added in
#548 / #469). Untouched.
2. workspace-server/internal/handlers/delegation.go — Go API gate
already had the guard. Untouched.
3. workspace/builtin_tools/a2a_tools.py::delegate_task — framework-
agnostic adapter surface used by adapters that don't go through (1).
NO GUARD. Added.
4. workspace/builtin_tools/delegation.py::delegate_task_async — the
LangChain @tool fire-and-forget path. NO GUARD on the local helper
(it dispatched the background _execute_delegation coroutine to our
own URL). Added.
Symptom without (3)/(4): a workspace delegating to its own UUID rounds
through the platform proxy, the synchronous handler waits on the run
lock the caller holds, the request times out, the platform writes the
failure as activity_type='a2a_receive' source_id=our workspace UUID,
the inbox poller picks it up and surfaces it as kind='peer_agent' with
peer_id=our own workspace — the agent then sees its own timeout as a
new peer instructing it (#190 self-echo). Reply via delegate_task to
that "peer" re-triggers the loop.
Inbox-side fix (workspace/inbox.py): InboxMessage.to_dict() now
classifies rows with method='delegate_result' as kind='delegation_result'
regardless of peer_id. This makes pushDelegationResultToInbox results
(RFC #2829 PR-2) surface as STRUCTURED delegation outcomes to the
caller's wait_for_message instead of fake peer_agent messages. This
covers both the self-delegation echo path AND the cross-workspace
ProxyA2A failure path where the delegation result lands in the caller's
inbox with source_id=caller's own workspace UUID.
Tests added:
- tests/test_a2a_tools_module.py::TestSelfDelegationGuard — verifies
the builtin_tools/a2a_tools.py guard short-circuits BEFORE any HTTP
call, and lets a real peer through.
- tests/test_delegation.py::TestSelfDelegationGuard — verifies
builtin_tools/delegation.py::delegate_task_async returns the
structured rejection error without scheduling a background task.
- tests/test_inbox.py::test_message_from_activity_delegate_result_distinct_kind
— pins kind='delegation_result' for method='delegate_result' rows
so the #190 mis-classification regression is locked.
Runtime mirror (molecule-ai-workspace-runtime) is a publish artifact of
this directory — it picks up the fix automatically on the next
runtime-v* tag → publish-runtime workflow → PyPI 0.1.1003.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CONTRIBUTING.md:195 had a wrong `--channels` install string
(`plugin:molecule@Molecule-AI/molecule-mcp-claude-channel` — both the
plugin-name format and the dead GitHub-org path are stale). Aligned all
three doc surfaces (CONTRIBUTING.md, README.md, README.zh-CN.md) with
the actual install pattern emitted by workspace-server/internal/
handlers/external_connection.go (externalChannelTemplate):
/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git
/plugin install molecule@molecule-channel
claude --dangerously-load-development-channels --channels plugin:molecule@molecule-channel
Also normalised display labels for the now-canonical Gitea org
(`Molecule-AI/` → `molecule-ai/`) — these are link captions, the URLs
were already correct. Docs-only, no behavioural change.
Task #230. Refs memory `feedback_github_botring_fingerprint` (canonical
SCM = git.moleculesai.app/molecule-ai/...).
Adds the standard Next.js App-Router SEO surface to the canvas
landing so the marketing push has crawlable metadata + structured
data on day one.
What landed:
- layout.tsx — Metadata API: title.template, description,
keywords, canonical, metadataBase, OG/Twitter text fields,
robots index:true. JSON-LD @graph (Organization + WebSite +
SoftwareApplication) injected with the per-request CSP nonce.
- robots.ts — allow public marketing routes (/, /pricing, /blog),
disallow /orgs, /api/, /cp/, /checkout/; declares sitemap +
canonical host.
- sitemap.ts — apex + pricing + live blog post; authed routes
excluded by construction.
- opengraph-image.tsx — segment-level dynamic OG card via
next/og ImageResponse (1200x630); no static binary blob.
- __tests__/seo-routes.test.ts — pins the crawler contract
(10 cases) so a future refactor can't silently flip the
marketing surface to noindex or drop the sitemap.
Out of scope (per issue): design copy, hero rewrite, Lighthouse
CWV tuning. Those are CTO/marketing inputs and a separate ticket.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI / all-required (pull_request) Compensating status: emitter dropped CI / all-required ctx on this commit (gitea 1.22.6 null-state). 2 non-author APPROVEs present, sop-checklist/sop-tier-check/gate-check-v3 all Successful per status descriptions. Posted per feedback_gitea_emitter_null_state_blocks_merge.
## Summary
- mc#1535 fixed the per-session-overwrite bug in the Universal MCP
snippet (`claude mcp add molecule -s user` keyed by `molecule`, so
installing for a second workspace silently replaced the first). The
same equivalence-class bug exists in EVERY other runtime tab the
Canvas modal renders: each MCP host keys its config by name, and all
five templates hardcoded a fixed `molecule` identifier.
- This PR extends mc#1535's existing `{{MCP_SERVER_NAME}}` placeholder
+ `mcpServerNameForWorkspace()` helper into the 4 remaining
templates so the Canvas snippet a user pastes is unique per
workspace by construction across ALL runtime tabs — multi-workspace
works out-of-the-box with no per-host workarounds.
## Bug shape per runtime tab (mc#1535 sibling)
- **codex** (`~/.codex/config.toml`): `[mcp_servers.molecule]` — TOML
rejects duplicate table keys, so re-paste either breaks parsing or
overwrites.
- **openclaw** (`~/.openclaw/mcp/molecule.json`): `openclaw mcp set
molecule` keyed by name — second workspace overwrites.
- **hermes** (`~/.hermes/config.yaml`): `plugin_platforms.molecule:` —
YAML rejects duplicate mapping keys, second workspace silently
collapses.
- **kimi** (`~/.molecule-ai/kimi-workspace/`): single per-host dir —
second workspace's env+bridge.py overwrites the first.
## What changed
- `workspace-server/internal/handlers/external_connection.go`:
- 4 templates now stamp `{{MCP_SERVER_NAME}}` (the same slug
mc#1535 already derives + plumbs into the universal_mcp snippet)
in the keyed identifier:
- codex: `[mcp_servers.{{MCP_SERVER_NAME}}]` + `.env` table.
- openclaw: `openclaw mcp set {{MCP_SERVER_NAME}}` + log path.
- hermes: `plugin_platforms.{{MCP_SERVER_NAME}}:`.
- kimi: `~/.molecule-ai/kimi-{{MCP_SERVER_NAME}}/` dir + embedded
python `ENV` path.
- Header comment in each template documents the multi-workspace
contract (mirrors mc#1535's universal_mcp header).
- `workspace-server/internal/handlers/external_rotate_test.go`:
- New `TestBuildExternalConnectionPayload_AllRuntimeSnippetsAreWorkspaceUnique`
pins the per-template literal that proves the slug was stamped,
AND asserts no template leaves a literal `{{MCP_SERVER_NAME}}`
placeholder — catches a future template author who forgets to
register a new tab with the stamp pipeline.
- `workspace/a2a_mcp_server.py`:
- Comment-only update on `serverInfo.name` to reflect that the
per-host registration name is workspace-specific. No code change;
`serverInfo.name` stays the generic `"molecule"` self-label.
- `scripts/build_runtime_package.py` (PyPI README generator):
- Updates 3 `claude mcp add molecule -- molecule-mcp` references to
`claude mcp add molecule-<workspace-slug> -- molecule-mcp` so the
PyPI README matches the Canvas-stamped snippet pattern.
- Adds a "Server name in `claude mcp add` is workspace-specific"
bullet pointing at mc#1535 + this PR for context.
## Open-source-templates cleanliness check
- Templates touched here live in the PRIVATE molecule-core repo
(Canvas modal generator); they STAMP per-workspace server names but
do NOT bake any new `git.moleculesai.app` literal or other
org-internal infra. Generic `pip install
'git+https://git.moleculesai.app/molecule-ai/hermes-channel-molecule.git'`
in the hermes template is the only such URL touched and was
pre-existing — that one points at a public hermes-side plugin and
has its own canonical URL; not in scope for the open-source-template
rule (the rule applies to template-codex/template-hermes/
template-openclaw — separate public repos, untouched here).
- No `.moleculesai.app` literal added; persona-token shape unchanged
(auth_token still per-workspace minted by Rotate/Create — same path
mc#1535 audited).
## Sample stamped snippets (workspace name "my-bot", slug "molecule-my-bot")
- codex: `[mcp_servers.molecule-my-bot]` + `[mcp_servers.molecule-my-bot.env]`
- openclaw: `openclaw mcp set molecule-my-bot "$(cat <<EOF ... )"`
- hermes: `plugin_platforms:\n molecule-my-bot:\n enabled: true`
- kimi: `~/.molecule-ai/kimi-molecule-my-bot/{env,kimi_bridge.py}`
## Diff size
- 4 files, +135/-40 LoC. Most of it is comment text + the new test.
- Did NOT change `BuildExternalConnectionPayload` signature or
`mcpServerNameForWorkspace` semantics — both were already plumbed
by mc#1535 to all 8 snippets via the stamp closure; this PR only
updates the template text to USE the placeholder.
## Test plan
- [x] `go test ./internal/handlers/ -run TestBuildExternalConnectionPayload` — 5/5 green, including new `_AllRuntimeSnippetsAreWorkspaceUnique`.
- [x] `go test ./internal/handlers/` full package — 15.9s green.
- [x] `go vet ./internal/handlers/` — clean.
- [ ] Manual (post-merge, requires mc#1535 also merged): create two
"bot-a" + "bot-b" external workspaces on staging; paste each
tab's snippet into the corresponding host on a single machine;
verify `claude mcp list` / `cat ~/.codex/config.toml` /
`openclaw mcp list` / `~/.hermes/config.yaml` / `ls
~/.molecule-ai/` each shows BOTH workspaces' entries side-by-
side, not overwriting.
## Sequencing
- This PR's base is mc#1535's branch
(`fix/add-to-claude-code-unique-server-name-per-workspace`),
because it reuses mc#1535's `{{MCP_SERVER_NAME}}` placeholder +
slug helper + `BuildExternalConnectionPayload(workspaceName)`
signature change. Will need a rebase on main after mc#1535 lands;
prefer to keep stacked to make the review of EACH PR scope-tight.
- CTO 2026-05-18 22:43Z: "其实是我们没有做好instruction,这个得补充" —
this PR is the consolidated per-repo doc/generator fix.
## Related
- Sibling: mc#1535 (Universal MCP snippet, already open).
- Follow-up #230: molecule-core stale channel-install mentions
(CONTRIBUTING.md:195, etc.) — separate scope.
Author identity: core-devops (per-role persona; not founder-PAT).
Opened for non-author review, NOT auto-merged.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Universal MCP install snippet hardcoded `claude mcp add molecule -s user`
— `claude mcp add` keys entries by name, so installing for workspace B
silently overwrote workspace A in the user's ~/.claude.json. A single
external Claude Code session ended up able to talk to only ONE molecule
workspace at a time — the CTO-observed "this is per-session" UX
(2026-05-18 22:28Z). MCP itself supports many servers per session; the
install snippet was the only thing standing in the way.
Fix: derive a unique server name per workspace at payload-build time —
`molecule-<slug>` where slug = lowercased/hyphen-collapsed workspace
name (max 24 chars), falling back to the first 8 chars of the workspace
ID when the name is empty or slugifies to nothing. The result is
alphanumeric + hyphens only (URL-safe + Claude-Code-name-safe).
Plumbed through all 3 callers of BuildExternalConnectionPayload:
- Create (workspace.go) passes payload.Name directly.
- Rotate / GetExternalConnection (external_rotate.go) extend the
existing runtime lookup to also SELECT name in the same round-trip
(lookupWorkspaceRuntimeAndName replaces lookupWorkspaceRuntime —
one query, no extra DB load).
Snippet header now documents the multi-workspace contract: re-running
the snippet from another workspace's modal ADDS another entry; same-
name workspaces collide by design, rename one to disambiguate.
Surgical: only externalUniversalMcpTemplate gained a {{MCP_SERVER_NAME}}
placeholder. Other tabs (Python SDK / curl / Hermes / codex / openclaw /
kimi) already use distinct config keys per provider and aren't affected.
Tests: TestBuildExternalConnectionPayload_McpServerNameUniquePerWorkspace
pins 4 cases (plain name, name w/ spaces+caps, name w/ symbols, empty
name fallback to UUID prefix) — would have caught the original
"claude mcp add molecule" regression. Existing rotate/get tests updated
for the 2-column SELECT.
Related: task #229 (molecule-mcp-claude-channel install-doc blockers).
This is the canvas-side counterpart — that PR fixed the plugin docs,
this PR fixes the modal-generator snippet operators actually copy.
Sample generated lines (was → now):
was: claude mcp add molecule -s user -- env WORKSPACE_ID=... molecule-mcp
now: claude mcp add molecule-my-bot -s user -- env WORKSPACE_ID=... molecule-mcp
(where "my-bot" is the workspace name; "molecule-12345678" if unnamed)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds workspace-server/internal/provisioner/t4_privilege_contract.go as the
single source of truth for the T4 ("full machine access") capability set
that template-repo CI workflows currently re-implement as bespoke shell.
Today's t4-conformance gates in template-claude-code / template-hermes /
template-codex each hand-assert agent-uid + token-ownership + host-root
reach. The shell drifts (the very Hermes 401 class bug came from drift),
and there's no way to add a new capability fleet-wide without N template
PRs.
This contract:
* Defines T4Capability as code (Name/Description/Probe/Severity/Source)
* Lists the closure: agent_uid_1000, auth_token_agent_owned,
host_root_reach_via_nsenter, host_fs_write_readback,
docker_socket_reachable, list_peers_http_200, agent_home_writable,
network_egress_https, privileged_flag_observable, pid_host_visible
* Renders to YAML via AsYAML() and cmd/t4-contract-dump so any
template CI can do:
go run ./workspace-server/cmd/t4-contract-dump > t4_capabilities.yaml
and iterate capabilities — new capabilities propagate without
per-template PRs.
* Pure stdlib + no Molecule-AI-internal deps so fork users can adopt
the same contract.
Anti-drift unit tests (7, all green):
- all caps have required fields
- names unique
- core closure (RFC#456 + task #128/#174) is present
- hard-severity is strict majority
- YAML is deterministic + escapes double quotes
- YAML header cites internal#456
- AgentUID const consistent with probes
Does NOT change Docker/Dockerfile or any existing emit-side behavior;
this is purely additive. The provisioner.go T4 branch is unchanged.
Templates adopt the YAML in a separate PR (pilot:
template-claude-code).
Refs: RFC internal#456, task #174, memory
reference_per_template_privilege_contract_class_audit_2026_05_16,
memory feedback_hermes_listpeers_401_token_root600_unreadable_by_uid1000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mobile browsers (iOS Safari, Chrome on Android in deep-sleep) silently
drop the WebSocket when the tab is backgrounded. The in-page `onclose`
fires very late or never, so the reconnect backoff never schedules — the
canvas appears frozen until the user manually refreshes. Symptoms:
- #223 mobile canvas chat has no real-time updates (must refresh)
- #228 cross-device: user's own chat input doesn't broadcast to
other sessions in real time (must refresh)
Root cause: `canvas/src/store/socket.ts` had no visibility-wake. The
reconnect loop only re-arms on `onclose`, and mobile OSes don't always
fire `onclose` when they kill the WS.
Fix:
- Add `ReconnectingSocket.wake()` — forces an immediate reconnect
when the socket is in CLOSED / CLOSING / null limbo, no-op when
OPEN or CONNECTING. Pre-empts any pending backoff timer and resets
the attempt counter (this was a user-initiated wake, not an
unattended-tab failure cascade).
- Wire a module-level `visibilitychange` + `pageshow` listener inside
`connectSocket()`; remove it in `disconnectSocket()`. `pageshow`
covers Safari's bfcache restore where `visibilitychange` doesn't
fire on its own.
- Export `wakeSocket()` so the test suite can exercise the path
without depending on a jsdom DOM (the existing socket.test.ts
runs under the `node` environment).
Tests (5 new cases under `wakeSocket → reconnect`):
- wake on OPEN: no new WS
- wake on CLOSED: new WS created (the #223 fix)
- wake on CONNECTING: no extra handshake piled on
- wake cancels pending backoff `setTimeout`
- wake after `disconnectSocket()` is a no-op (no zombie)
Closes#223Closes#228
iOS Safari and PWAs auto-zoom the viewport when a focused input or
textarea has a computed font-size below 16px. Two mobile-canvas inputs
were below that bound, causing the layout to jump and look broken on
focus until the user pinched back:
- MobileSpawn.tsx agent-name input (fontSize: 13.5) — #225
- MobileChat.tsx composer textarea (fontSize: 14.5) — #224
Both bumped to 16px (the minimum that suppresses focus-zoom). This is
the same class of bug as desktop #1434, scoped here to the mobile
breakpoint.
Tests:
- MobileSpawn.test: assert agent-name input renders at fontSize >= 16
- MobileChat.test: assert composer textarea renders at fontSize >= 16
Both parse the inline style.fontSize (jsdom has no layout engine, so
getComputedStyle reports the inline value verbatim).
Closes#224Closes#225
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
The new prod-team personas (agent-dev-a, agent-dev-b, agent-pm) ship
only `token` + `universal-auth.env` (Infisical UA bootstrap), no `env`
file. loadPersonaEnvFile silently no-ops on them today. With this
fallback, GITEA_TOKEN/USER/EMAIL get populated from the token file
when no env file exists.
Combined with the GIT_ASKPASS injection earlier in this PR, this
makes the askpass helper functional for the new personas.
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Wire container-side `git` HTTPS authentication to the persona credentials
that already arrive via workspace_secrets (GITEA_USER / GITEA_TOKEN,
GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD) without mutating ~/.gitconfig or
~/.git-credentials inside the container.
Mechanism:
1. New generic GIT_ASKPASS helper baked into the workspace runtime
image at /usr/local/bin/molecule-askpass. Script body is hostname-
free and vendor-neutral — the deployer decides which remote the
credentials apply to by virtue of populating the env vars.
2. applyAgentGitIdentity (already the per-agent commit-identity
chokepoint at workspace_provision_shared.go:134) now also sets
GIT_ASKPASS=/usr/local/bin/molecule-askpass via the new
applyGitAskpass helper. Idempotent — respects pre-existing
workspace_secret / env-mutator overrides.
When git encounters an HTTPS auth challenge on a host with no configured
credential.helper, it invokes GIT_ASKPASS to read the username + password
from env. This is the cleanest possible wire-up: no on-disk credential
files, no hostname literals in code, fail-loud on misconfiguration.
Tests added: GIT_ASKPASS set on success, operator-override respected,
empty-name no-op symmetry, nil-map safety.
Companion PRs on the 3 open-source workspace templates ship the same
generic askpass script at scripts/git-askpass.sh → identical install
path. Image build + helper script are intentionally split so the
platform PR can ship without breaking external template builds, and vice
versa: applyGitAskpass setting a missing helper is harmless (git would
just emit "exec: not found" and fall through to whatever auth chain
existed before — same baseline as no env-only patch at all).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #1504 (role=alert on ConfigTab error divs) — the
AgentAbilitiesSection error div was in a separate render branch and
was missed. WCAG 4.1.3 requires dynamic error messages to be announced
by screen readers immediately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SEV-1 #1413 follow-up: sop-tier-check.yml uses
{{ secrets.SOP_TIER_CHECK_TOKEN }} but lacked secrets:read
permission. Without it, the env var substitution fails → token
is empty → API calls get 401 → tier check fails on every PR.
Same fix applied to qa-review/security-review/sop-checklist in PR #1498.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 4.1.3: two error divs in ConfigTab.tsx used text-bad styling
without declaring themselves as live regions. Screen readers miss
the error announcement.
Fix: add role="alert" aria-live="assertive" to both error divs,
matching the pattern applied in PRs #1463/#1465 by core-uiux for
other tab components.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add focus-visible ring to three buttons missing it:
- Mobile hydration error Retry button
- Desktop hydration error Retry button
- PlatformDownDiagnostic Reload button
- Wrap <Canvas /> in <main aria-label="Agent canvas"> landmark
(WCAG 1.3.1 — main content now has a proper landmark)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- AgentCommsPanel: add focus-visible ring + aria-label to Retry button
(error state). Add focus-visible to CommsTab tab buttons.
- AttachmentViews: add focus-visible ring + aria-label to Remove button
(PendingAttachmentPill) and Download button (AttachmentChip).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Refresh button inside the SecretsTab error state had no focus ring
defined in CSS. Without it, keyboard-only users cannot determine which
element has focus on that error screen.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The free-text model input (shown when /templates returns no models for
the runtime) had a visual <label>Model</label> but the input lacked an
id and the label lacked htmlFor — the association was purely visual.
Added aria-label="Model" to make the name programmatically determinable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The two FilesTab confirm dialogs (delete-all, delete-one) use role="alertdialog"
but were missing aria-modal. These are inline in-page prompts without focus
trapping — aria-modal="false" explicitly documents the non-modal nature so
assistive technology knows the rest of the page remains interactive.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileHome: spawn FAB had no focus indicator — added emerald ring.
MobileMe: accent color swatches (all 8 colors) and theme toggle buttons
(Dark / Light / System) had no focus indicators — added emerald ring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileCanvas: reset zoom button had no focus indicator — added
focus-visible:ring-2 with emerald-500 ring (consistent with other
mobile interactive elements in the same branch).
MobileComms: filter toggle buttons (All / Errors) had no focus indicator
— added focus-visible:ring-2 with emerald-500 ring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileChat: composer textarea had no aria-label — added aria-label="Message".
MobileSpawn: name input had no programmatic label — added aria-label="Agent name".
Both inputs had visible text labels above them but no accessible-name association,
violating WCAG 1.3.1 (info/structure relationships).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Add new" section had two bare <input> elements with only
placeholder text. Added aria-label="Secret key name" and
aria-label="Secret value" — distinct from the per-row Field
inputs that PR #1453 already fixed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- MissingKeysModal.tsx: Add aria-label to both password inputs
(inside map loops where entry.key is the accessible name source).
WCAG 1.3.1 / 4.1.2.
- AuditTrailPanel.tsx: Add role="status" aria-live="polite" to
the loading state div. WCAG 4.1.3.
- ConversationTraceModal.tsx: Add role="status" aria-live="polite"
to both the loading state and empty state divs. WCAG 4.1.3.
Found via systematic accessibility audit sweep of non-tab components.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tests in ExternalConnectModal.test.tsx used document.querySelector("pre")
which returns the first pre in DOM order. After restructuring panels as
always-rendered (hidden CSS for inactive), the first pre was in a hidden
panel, not the expected active one.
Fix: add data-testid to each panel div and update all test queries to
scope within the specific active panel via
document.querySelector("[data-testid='panel-...']").
All 18 tests pass. Build passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add id=, aria-controls=, and tabIndex= to each role=tab button
- Add id= and role=tabpanel + aria-labelledby= to each snippet panel
- Restructure panels as always-rendered (hidden CSS) so aria-controls
targets are stable — active panel has role=tabpanel, hidden panels
are hidden with aria-hidden semantics via hidden attribute
- Add ArrowRight/ArrowLeft/ArrowDown/ArrowUp + Home/End keyboard
navigation for the tablist (ARIA tab pattern requirement)
- Compute tabList once after filled* vars to share between tab bar
and keyboard handler
WCAG 4.1.3 (Name, Role, Value) — tab controls now have correct
role, aria-selected, aria-controls, and keyboard navigation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Error divs in EventsTab, TracesTab, ChannelsTab, DetailsTab (save/restart/delete),
and ExternalConnectionSection now use role=alert so assistive technology
announces each error immediately when it appears.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Force a new workflow run to pick up the /sop-n/a qa-review
and /sop-n/a security-review declarations from infra-runtime-be
(engineers team) and the [core-security-agent] APPROVED comment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-qa-agent and core-security-agent approve PRs via issue comments,
not the reviews API. The reviews API returns zero entries for comment-only
approvals (internal#348), causing qa-review / security-review gates to
fail on every PR — even when both agents have explicitly approved.
Changes:
- review-check.sh: after reviews-API candidate check fails, fetch
GET /repos/{owner}/{repo}/issues/{N}/comments and extract logins that
posted (a) the agent-prefix pattern ([core-qa-agent] or
[core-security-agent]) OR (b) a generic approval keyword (APPROVED /
LGTM / ACCEPTED, word-anchored, case-insensitive). Non-author filter
is applied. Candidates from comments are merged and fall through to the
team-membership probe, same as reviews-API candidates.
- _review_check_fixture.py: add T15 (agent-prefix match → exit 0),
T16 (generic keyword match → exit 0), T17 (no approval → exit 1)
scenarios with corresponding issue comments endpoint handler.
- test_review_check.sh: add T15, T16, T17 regression tests.
Also fixes a JQ operator-precedence bug in an earlier draft where
`| $cmt.user.login` was placed OUTSIDE the `or` expression, causing the
filter to always output the login (jq resolves bound variables regardless
of the current context). Fixed by using `if-then-elif-else-empty` so the
login projection only fires on a genuine match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /workspaces silently substituted langgraph and returned 201 when a
caller named a `template` (intent for a specific runtime) but the runtime
could not be resolved from it (config.yaml unreadable / no `runtime:`
key). This is the molecule-controlplane#188 / #184 contract violation —
it produced 5/5 wrong-runtime workspaces and a false codex E2E pass.
The ws-server `Create` handler is the boundary the product UI actually
hits (the canvas dialog and provision_workspace MCP tool both POST here);
controlplane#188's CP-side gate is the sibling. This closes the
ws-server side: when the caller expressed runtime intent (passed
`runtime`, or named a `template`) but it cannot be honored, return 422
RUNTIME_UNRESOLVED instead of a silent langgraph 201.
The legitimate default path (bare {"name":...} — no template, no
runtime) still defaults to langgraph and returns 201; a regression test
pins that so the fail-closed gate can't over-fire.
Tests: TestWorkspaceCreate_188_* (missing template, no-runtime-key
template, default-path regression guard, explicit-runtime OK).
Refs: molecule-controlplane#188, #184
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SEV-1 #1413: three CI workflows fail for ALL open PRs because
Gitea Actions cannot substitute secret values without secrets:read
permission. Without it, env vars are empty → every API call gets 401
→ jobs exit 1 → merge-queue blocked.
Fix: add secrets:read to all three workflow permission blocks.
sop-checklist.yml also cleans up stale comment boilerplate around
statuses:write (already declared but undocumented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The broadcast_enabled and talk_to_user_enabled workspace abilities have
complete, wired backends (commit 29b4bffb: workspace_abilities.go,
workspace_broadcast.go, agent_message_writer.go) but no usable canvas
control — so the CTO cannot see or toggle them from the canvas.
- broadcast_enabled (default FALSE): no canvas control existed at all.
- talk_to_user_enabled (default TRUE): only surfaced as the ChatTab
recovery banner, which renders solely when the flag is false and is
therefore invisible under the TRUE default.
Adds an always-visible "Agent Abilities" section to ConfigTab with two
on/off toggles bound to the existing PATCH /workspaces/:id/abilities
endpoint (same call the ChatTab recovery banner uses), optimistic store
updates via updateNodeData with rollback on failure, and server-truth
reconciliation through the existing canvas-topology hydration.
The ChatTab recovery banner is left unchanged — the disabled-state
recovery path is not regressed; the new toggles are the always-visible
control.
Refs internal#510, internal#511.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Error divs in EventsTab, TracesTab, ChannelsTab, DetailsTab (save/restart/delete),
and ExternalConnectionSection now use role=alert so assistive technology
announces each error immediately when it appears.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Screen readers were not announcing error messages in several canvas components.
Each error div now uses role=alert so assistive technology announces the
error immediately and assertively — without the user having to manually
navigate to find the error.
Fixed: ConfigTab, ScheduleTab, MissingKeysModal (per-entry + global),
WorkspaceUsage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Screen readers were not announcing loading or empty states in several
canvas components. Each conditional div now uses role=status so assistive
technology announces the state change politely (without interrupting
current speech).
Fixed: ActivityTab, MobileChat, MobileComms, MobileDetail, MobileSpawn,
EmptyState.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The spinner SVG inside the test-connection button is decorative — it
visualizes loading state alongside the text label. Add aria-hidden="true"
so screen readers ignore it and use only the visible text as the accessible
button name.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 2.4.7: DeleteConfirmDialog Cancel and Delete buttons were missing
:focus-visible rules in settings-panel.css. Keyboard users tabbing to
these dialog buttons would see no visible focus indicator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 2.4.7: keyboard-only users need a visible focus indicator on all
interactive buttons. The Copy, Dismiss, and Revoke buttons in OrgTokensTab
and TokensTab had :hover but no :focus-visible, making focus state
invisible when tabbing to these buttons.
Add focus-visible:ring-2 (accent for copy/dismiss, red-400 for revoke)
to all non-disabled action buttons in both tabs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a2a_mcp_server.py main()'s stdio read loop used
`await loop.run_in_executor(None, stdin.read, 65536)`. On a PIPE,
read(n) blocks until n bytes accumulate OR EOF. A live MCP client
(openclaw bundle-mcp, Claude Code, Cursor) sends one ~150-byte
newline-delimited request and keeps stdin OPEN waiting for the reply,
so neither condition is met: the server never parses `initialize` and
the client times out (~30s; openclaw: "MCP error -32000: Connection
closed"). This silently broke peer visibility for every pipe-spawned
MCP host while passing all existing stdio tests, which only fed stdin
from a regular file or a heredoc-pipe that CLOSES (EOF returns
immediately). readline() returns as soon as one newline-delimited
line is available — exactly the JSON-RPC framing — and is
backward-compatible with the EOF/file cases.
Root cause of the 2026-05-15 openclaw peer-visibility outage
(workspace 95744c11): the molecule MCP server could not complete the
handshake over openclaw's stdio pipe, so the agent fell back to
native sessions_list. The openclaw template adapter fix
(template-openclaw#16) works around this via HTTP transport; this
patch fixes the stdio root cause so stdio works for all CLI MCP hosts.
Regression coverage:
- tests/test_a2a_mcp_server.py::TestStdioKeepOpenPipe — spawns the
real a2a_mcp_server.py, writes one request over a pipe, and
DELIBERATELY keeps stdin open. FAILS (15s timeout, empty response)
on read(65536); PASSES on readline(). Verified both directions.
- ci-mcp-stdio-transport.yml: new "pipe held OPEN, no EOF" step that
reproduces the literal openclaw failure (the prior steps only
exercised EOF-closing stdin, which is why the outage shipped green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 23:43:14 -07:00
270 changed files with 13124 additions and 10168 deletions
echo"::error::${TEAM}-review: non-author review(s) were SUBMITTED but stored as PENDING — almost certainly the wrong Gitea review event string (internal#503)."
echo"::error::Gitea accepts ONLY the exact enum APPROVED / REQUEST_CHANGES / COMMENT. 'APPROVE' or lowercase is silently (HTTP 200) filed as PENDING and is invisible to this gate."
[ -n "${_rid:-}"]&&echo"::error:: review id=${_rid} by '${_rl}': RE-SUBMIT via POST ${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}/reviews with {\"event\":\"APPROVED\"} (correct enum) — do NOT edit the DB."
done
fi
# --- Fallback (internal#348): check issue comments for agent-approval ---
# core-qa-agent and core-security-agent approve via issue comments, NOT
# the reviews API. The reviews API returns zero entries for comment-only
# approvals. This fallback reads PR issue comments and extracts logins that:
# 1. Posted a comment matching the agent-prefix pattern for this gate:
# qa → "[core-qa-agent] APPROVED"
# security → "[core-security-agent] APPROVED"
# OR posted a generic approval keyword (word-anchored, case-insensitive):
# APPROVED / LGTM / ACCEPTED
# 2. Are not the PR author
# 3. The team-membership probe below is the authoritative filter.
echo"::notice::${TEAM}-review: reviews API found no APPROVED reviews; found $(echo"$CANDIDATES"| wc -w | xargs) comment-based approval candidate(s) — verifying team membership..."
fi
else
debug "could not fetch issue comments (HTTP ${HTTP_CODE})"
fi
fi
if[ -z "${CANDIDATES:-}"];then
echo"::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (no candidates from reviews API or issue comments)"
echo "::warning::review-refire-comments.yml is deprecated. Refire logic is now in sop-checklist.yml review-refire job. This workflow is a no-op stub pending deletion (issue #1280)."
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::canary teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/canary-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::canary teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
# Strip the package-import prefix so we can match .coverage-allowlist.txt
# entries written as paths relative to workspace-server/.
# Handle both module paths: platform/workspace-server/... and platform/...
rel=$(echo "$file" | sed 's|^github.com/molecule-ai/molecule-monorepo/platform/workspace-server/||; s|^github.com/molecule-ai/molecule-monorepo/platform/||')
if echo "$ALLOWLIST" | grep -qxF "$rel"; then
echo "::warning file=workspace-server/$rel::Critical file at ${pct}% coverage (allowlisted, #1823) — fix before expiry."
WARNED=$((WARNED+1))
else
echo "::error file=workspace-server/$rel::Critical file at ${pct}% coverage — must be >=10% (target 80%). See #1823. To acknowledge as known debt, add this path to .coverage-allowlist.txt."
FAILED=$((FAILED+1))
fi
done < /tmp/perfile.txt
done
echo ""
echo "Critical-path check: $FAILED new failures, $WARNED allowlisted warnings."
if [ "$FAILED" -gt 0 ]; then
echo ""
echo "$FAILED security-critical file(s) have <10% test coverage and are"
echo "NOT in the allowlist. These paths handle auth, tokens, secrets, or"
echo "workspace provisioning — a 0% file here is the exact gap that let"
echo "CWE-22, CWE-78, KI-005 slip through in past incidents. Either:"
# e2e-api job moved to .github/workflows/e2e-api.yml (issue #458).
# It now has workflow-level concurrency (cancel-in-progress: false) so
# new pushes queue the E2E run rather than cancelling it at the run level.
# Shellcheck (E2E scripts) — required check, always runs. See
# platform-build for the rationale.
shellcheck:
name:Shellcheck (E2E scripts)
needs:changes
runs-on:ubuntu-latest
steps:
- if:needs.changes.outputs.scripts != 'true'
run:echo "No tests/e2e/ or infra/scripts/ changes — skipping real shellcheck; this job always runs to satisfy the required-check name on branch protection."
# fires = ~30 min cadence; closer to the 20-min target than the
# current shape and provides a real degradation alarm if drops
# get worse.
- cron:'2,12,22,32,42,52 * * * *'
workflow_dispatch:
inputs:
runtime:
description:"Runtime to provision (claude-code = default + cheapest via MiniMax; langgraph = OpenAI-only; hermes = SDK-native path, slower)"
required:false
default:"claude-code"
type:string
model_slug:
description:"Model id to provision the workspace with (default MiniMax-M2.7-highspeed; e.g. 'sonnet' to test direct Anthropic, 'openai/gpt-4o' for hermes)"
required:false
default:"MiniMax-M2.7-highspeed"
type:string
keep_org:
description:"Skip teardown for post-mortem debugging (only manual dispatch — never set this for cron runs)"
required:false
default:false
type:boolean
permissions:
contents:read
# No issue-write here — failures surface as red runs in the workflow
# history. If you want auto-issue-on-fail, add a follow-up step that
# uses gh issue create gated on `if: failure()`. Keeping the surface
# minimal until that's actually wanted.
# Serialize so two firings can never overlap. Cron firing every 20 min
# but scripts conservatively bounded at 10 min — overlap shouldn't
# happen in steady state, but if a run hangs we don't want N more
if docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG; then
echo "Redis ready after ${i}s"
exit 0
fi
sleep 1
done
echo "::error::Redis did not become ready in 15s"
docker logs "$REDIS_CONTAINER" || true
exit 1
- name:Build platform
if:needs.detect-changes.outputs.api == 'true'
working-directory:workspace-server
run:go build -o platform-server ./cmd/server
- name:Start platform (background)
if:needs.detect-changes.outputs.api == 'true'
working-directory:workspace-server
run:|
# DATABASE_URL + REDIS_URL exported by the start-postgres /
# start-redis steps point at this run's per-run host ports.
./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name:Wait for /health
if:needs.detect-changes.outputs.api == 'true'
run:|
for i in $(seq 1 30); do
if curl -sf http://127.0.0.1:8080/health > /dev/null; then
echo "Platform up after ${i}s"
exit 0
fi
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true
exit 1
- name:Assert migrations applied
if:needs.detect-changes.outputs.api == 'true'
run:|
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc "SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::canvas teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/canvas-cleanup.out 2>/dev/null)"
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::external teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/external-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::external teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
else
echo "Safety-net sweep: no leftover orgs to clean."
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::saas teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/saas-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::saas teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::sanity teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/sanity-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::sanity teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
print(f"::error file={f}::Curl status-capture pollution: '|| echo \"000\"' inside a $(curl ... -w '%{{http_code}}' ...) subshell. On non-2xx or connection failure, curl's -w writes a status, then exits non-zero, then the || echo appends another '000' — producing 'HTTP 000000' or '409000' that fails comparisons silently. Fix: route -w into a tempfile so the exit code can't pollute stdout. See memory feedback_curl_status_capture_pollution.md.")
echo "::notice::RAILWAY_AUDIT_TOKEN not configured — soft-skipping (manual dispatch)"
exit 0
fi
echo "::error::RAILWAY_AUDIT_TOKEN secret missing — schedule trigger requires it. Provision the token (read-only \`variables\` scope on the molecule-platform Railway project) and store as repo secret RAILWAY_AUDIT_TOKEN."
`Daily Railway pin audit found drift-prone image-tag pins in the molecule-platform Railway project.\n\n` +
`**What this means:** an env var (likely on \`controlplane\`) is pinned to a SHA-shaped or semver tag instead of a floating tag. ` +
`Same pattern that caused the 2026-04-24 TENANT_IMAGE incident — fix-PRs land but the running service doesn't pick them up.\n\n` +
`**Recovery:** open the Railway dashboard, replace the flagged value with a floating tag (\`:staging-latest\`, \`:main\`) unless the pin is intentional and documented in the ops runbook.\n\n` +
# Belt-and-suspenders sanity floor: same logic as the staging
# variant — see that file's comment for the full rationale.
# Floor only applies when fleet >= 4; below that, canary-verify
# is the actual gate.
TOTAL_VERIFIED=${#SLUGS[@]}
if [ $TOTAL_VERIFIED -ge 4 ] && [ $UNREACHABLE_COUNT -gt $((TOTAL_VERIFIED / 2)) ]; then
echo "::error::$UNREACHABLE_COUNT of $TOTAL_VERIFIED tenant(s) unreachable — exceeds 50% threshold on a fleet large enough that this signals a real outage, not teardown race."
exit 1
fi
if [ $STALE_COUNT -gt 0 ]; then
echo "::error::$STALE_COUNT tenant(s) returned a stale SHA. ssm_status=Success was misleading — see job summary."
exit 1
fi
echo "::notice::Tenant fleet redeploy complete — all reachable tenants on ${EXPECTED_SHA:0:7} (${UNREACHABLE_COUNT} unreachable, soft-warned)."
# Auto-refresh staging tenant EC2s after every staging-branch merge.
#
# Mirror of redeploy-tenants-on-main.yml, with the staging-CP host and
# the :staging-latest tag. Sister workflow exists for prod (rolls
# :latest after canary-verify). Both share the same shape — just
# different CP_URL + target_tag + admin token secret.
#
# Why this workflow exists: publish-workspace-server-image now builds
# on every staging-branch push (PR #2335), pushing
# platform-tenant:staging-latest to GHCR. Existing tenants pulled
# their image once at boot and never re-pull, so the new image just
# sits unused until the tenant is reprovisioned.
#
# This workflow closes the gap by calling staging-CP's
# /cp/admin/tenants/redeploy-fleet, which performs a canary-first,
# batched, health-gated SSM redeploy across every live staging tenant.
# Same endpoint shape as prod CP — only the host differs.
#
# Runtime ordering:
# 1. publish-workspace-server-image completes on staging branch →
# new :staging-latest in GHCR.
# 2. This workflow fires via workflow_run, waits 30s for GHCR's CDN
# to propagate the new tag.
# 3. Calls redeploy-fleet with no canary (staging IS canary; we don't
# need a sub-canary inside it). Soak still applies to the first
# tenant in case of bad-deploy detection.
# 4. Any failure aborts the rollout and leaves older tenants on the
# prior image — safer default than half-and-half state.
#
# Rollback path: re-run with workflow_dispatch + target_tag=staging-<sha>
# of a known-good build.
on:
workflow_run:
workflows:['publish-workspace-server-image']
types:[completed]
branches:[main]
workflow_dispatch:
inputs:
target_tag:
description:'Tenant image tag to deploy (e.g. "staging-latest" or "staging-a59f1a6c"). Defaults to staging-latest when empty.'
required:false
type:string
default:'staging-latest'
canary_slug:
description:'Tenant slug to deploy first + soak (empty = skip canary, fan out immediately). Default empty for staging since staging itself is the canary.'
required:false
type:string
default:''
soak_seconds:
description:'Seconds to wait after canary before fanning out. Only meaningful if canary_slug is set.'
required:false
type:string
default:'60'
batch_size:
description:'How many tenants SSM redeploys in parallel per batch.'
required:false
type:string
default:'3'
dry_run:
description:'Plan only — do not actually redeploy.'
required:false
type:boolean
default:false
permissions:
contents:read
# No write scopes needed — the workflow hits an external CP endpoint,
# not the GitHub API.
# Serialize per-branch so two rapid staging pushes' redeploys don't
# overlap and cause confusing per-tenant SSM state. cancel-in-progress
# is false because aborting a half-rolled-out fleet leaves tenants
# stuck on whatever image they happened to be on when cancelled.
concurrency:
group:redeploy-tenants-on-staging
cancel-in-progress:false
jobs:
redeploy:
# Skip the auto-trigger if publish-workspace-server-image didn't
# actually succeed. workflow_run fires on any completion state; we
# don't want to redeploy against a half-built image.
echo "::warning::redeploy-fleet returned HTTP 500 but every failed tenant ($COUNT) is ephemeral (e2e-*/rt-e2e-*) — treating as teardown race, soft-warning."
printf '%s\n' "$FAILED_SLUGS" | sed 's/^/::warning:: failed: /'
elif [ "$HTTP_CODE" != "200" ]; then
echo "::error::redeploy-fleet returned HTTP $HTTP_CODE"
if [ -n "$NON_EPHEMERAL_FAILED" ]; then
echo "::error::non-ephemeral tenant(s) failed:"
printf '%s\n' "$NON_EPHEMERAL_FAILED" | sed 's/^/::error:: /'
fi
exit 1
else
# HTTP=200 but ok=false (shouldn't happen with current CP
# but keep the gate for completeness).
echo "::error::redeploy-fleet reported ok=false (see summary for which tenant halted the rollout)"
exit 1
fi
echo "::notice::Staging tenant fleet redeploy reported ssm_status=Success — verifying actual image roll on each tenant..."
if [ $TOTAL_VERIFIED -ge 4 ] && [ $UNREACHABLE_COUNT -gt $((TOTAL_VERIFIED / 2)) ]; then
echo "::error::$UNREACHABLE_COUNT of $TOTAL_VERIFIED staging tenant(s) unreachable — exceeds 50% threshold on a fleet large enough that this signals a real outage, not teardown race."
exit 1
fi
if [ $STALE_COUNT -gt 0 ]; then
echo "::error::$STALE_COUNT staging tenant(s) returned a stale SHA. ssm_status=Success was misleading — see job summary."
exit 1
fi
echo "::notice::Staging tenant fleet redeploy complete — all reachable tenants on ${EXPECTED_SHA:0:7} (${UNREACHABLE_COUNT} unreachable, soft-warned)."
# and sweep-cf-tunnels (hardened 2026-04-28). Same principle:
# - schedule → exit 1 on missing secrets (red CI surfaces it)
# - workflow_dispatch → exit 0 with warning (operator-driven,
# they already accepted the repo state)
run:|
missing=()
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY CP_PROD_ADMIN_TOKEN CP_STAGING_ADMIN_TOKEN; do
if [ -z "${!var:-}" ]; then
missing+=("$var")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::skipping sweep — secrets not configured: ${missing[*]}"
echo "::warning::set them at Settings → Secrets and Variables → Actions, then rerun."
echo "::warning::AWS_JANITOR_* must belong to a principal with secretsmanager:ListSecrets and secretsmanager:DeleteSecret on molecule/tenant/* (the prod molecule-cp principal lacks ListSecrets)."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::sweep cannot run — required secrets missing: ${missing[*]}"
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow."
echo "::error::AWS_JANITOR_* must belong to a principal with secretsmanager:ListSecrets and secretsmanager:DeleteSecret on molecule/tenant/*."
# The earlier soft-skip-on-schedule policy hid a real leak. All
# six secrets were unset on this repo for an unknown duration;
# every hourly run printed a yellow ::warning:: and exited 0,
# so the workflow registered as "passing" while doing nothing.
# CF orphans accumulated to 152/200 (~76% of the zone quota
# gone) before a manual `dig`-driven audit caught it. Anything
# that runs as a janitor and reports green while idle is
# indistinguishable from "the janitor is healthy" — so we now
# treat schedule (and any future workflow_run/push triggers)
# as a hard-fail when secrets are missing.
#
# - schedule / workflow_run / push → exit 1 (red CI run
# surfaces the misconfiguration the next tick)
# - workflow_dispatch → exit 0 with a warning
# (an operator ran this ad-hoc; they already accepted the
# state of the repo and want the workflow to short-circuit
# so they can rerun after fixing the secret)
run:|
missing=()
for var in CF_API_TOKEN CF_ZONE_ID CP_PROD_ADMIN_TOKEN CP_STAGING_ADMIN_TOKEN AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
missing+=("$var")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::skipping sweep — secrets not configured: ${missing[*]}"
echo "::warning::set them at Settings → Secrets and Variables → Actions, then rerun."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::sweep cannot run — required secrets missing: ${missing[*]}"
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow."
echo "::error::a silent skip masked an active CF DNS leak (152/200 zone records) caught only by a manual audit on 2026-04-28; this gate exists to make the gap visible."
echo "Found $count stale e2e org(s) older than ${MAX_AGE_MINUTES}m"
if [ "$count" -gt 0 ]; then
echo "First 20:"
head -20 stale_slugs.txt | sed 's/^/ /'
fi
echo "count=$count" >> "$GITHUB_OUTPUT"
- name:Safety gate
if:steps.identify.outputs.count != '0'
run:|
count="${{ steps.identify.outputs.count }}"
if [ "$count" -gt "$SAFETY_CAP" ]; then
echo "::error::Refusing to delete $count orgs in one sweep (cap=$SAFETY_CAP). Investigate manually — this usually means the CP admin API returned no created_at or returned a degraded result. Re-run with workflow_dispatch + max_age_minutes if intentional."
echo "::warning::orphan-tunnels cleanup returned HTTP $http_code — body: $body"
fi
- name:Dry-run summary
if:env.DRY_RUN == 'true'
run:|
echo "DRY RUN — would have deleted ${{ steps.identify.outputs.count }} org(s) AND triggered orphan-tunnels cleanup. Re-run with dry_run=false to actually delete."
| Engineering docs (`docs/adr/`, `docs/architecture/`, `docs/incidents/`) | This repo (internal, not published) |
| Live product pages (e.g. `canvas/src/app/pricing/page.tsx`) | This repo (these are app code, not marketing copy) |
If a PR fails the `Block forbidden paths` check, the contents belong in
`Molecule-AI/docs`. No CI drag, no Canvas E2E, content lands in minutes.
`molecule-ai/docs`. No CI drag, no Canvas E2E, content lands in minutes.
## Development Workflow
@@ -106,7 +106,7 @@ causing a render loop when any node position changed.
#### Auto-merge & the "extra commit" trap
**Two system guards protect against pushing commits after auto-merge has been enabled.** Don't try to work around them — they exist because we shipped a half-merged PR on 2026-04-27 (`#2174` merged with only its first commit; the second was orphaned on a branch GitHub had already deleted).
**Two system guards protect against pushing commits after auto-merge has been enabled.** Don't try to work around them — they exist because we shipped a half-merged PR on 2026-04-27 (`#2174` merged with only its first commit; the second was orphaned on a branch the host had already deleted).
1.**Repo-wide:** "Automatically delete head branches" is on. Once a PR merges, the branch is deleted server-side. Any subsequent `git push` to that branch fails with `remote rejected — no such branch`.
@@ -145,7 +145,7 @@ Fix violations before committing — the hook will reject the commit.
### CI Pipeline
CI runs on GitHub Actions with a self-hosted runner. External contributors:
CI runs on Gitea Actions with self-hosted runners. External contributors:
PRs from forks will not trigger CI automatically. A maintainer will review
and run CI manually.
@@ -190,9 +190,9 @@ Runs the full regression suite against a fixture HTTP server. No network access
Code in this repo lands in molecule-core. Some related runtime artifacts
live in their own repos:
- [`Molecule-AI/molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`Molecule-AI/molecule-sdk-python`](https://git.moleculesai.app/molecule-ai/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`Molecule-AI/molecule-mcp-claude-channel`](https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install with `claude --channels plugin:molecule@Molecule-AI/molecule-mcp-claude-channel`.
- [`molecule-ai/molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`molecule-ai/molecule-sdk-python`](https://git.moleculesai.app/molecule-ai/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`molecule-ai/molecule-mcp-claude-channel`](https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install inside Claude Code via `/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git` → `/plugin install molecule@molecule-channel`, then launch with `claude --dangerously-load-development-channels=plugin:molecule@molecule-channel`.
When extending the **A2A surface** in molecule-core (`workspace-server/internal/handlers/a2a_proxy.go` etc.), consider whether the change has a downstream impact on the runtime SDK or the channel plugin — they're versioned independently but share the wire shape.
@@ -206,7 +206,7 @@ See `CLAUDE.md` for detailed architecture documentation, including:
## Reporting Issues
Use GitHub Issues with a clear title and reproduction steps. Include:
Use Gitea Issues with a clear title and reproduction steps. Include:
- What you expected
- What actually happened
- Platform/OS version
@@ -214,8 +214,9 @@ Use GitHub Issues with a clear title and reproduction steps. Include:
## Security
If you discover a security vulnerability, please report it privately via
GitHub Security Advisories rather than opening a public issue.
If you discover a security vulnerability, please report it privately by
opening an issue against `molecule-ai/internal` (a private repo only
maintainers can see) rather than filing a public issue here.
@@ -238,7 +238,7 @@ The result is not just “an agent that learns.” It is **an organization that
- subscribe to one or more workspaces; peer messages surface as conversation turns; replies route back through Molecule's A2A
- no tunnel, no public endpoint — the plugin self-registers each watched workspace as `delivery_mode=poll` and long-polls `/activity?since_id=…`
- multi-tenant friendly: one plugin install can watch workspaces across multiple Molecule tenants (`MOLECULE_PLATFORM_URLS` per-workspace)
- install via the standard marketplace flow: `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel` → `/plugin install molecule-channel@molecule-mcp-claude-channel`
- install via the standard marketplace flow: `/plugin marketplace add https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel.git` → `/plugin install molecule@molecule-channel`, then launch with `claude --dangerously-load-development-channels=plugin:molecule@molecule-channel`
"Molecule AI is an org-chart canvas for AI agent teams. Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace with credit metering, audit, and one-click runtime provisioning.",
applicationName:"Molecule AI",
keywords:[
"AI agents",
"multi-agent",
"agent orchestration",
"AI org chart",
"Claude Code",
"Codex",
"MCP",
"agent governance",
"A2A",
"agent runtime",
],
authors:[{name:"Molecule AI"}],
creator:"Molecule AI",
publisher:"Molecule AI",
alternates:{canonical:"/"},
// OG + Twitter images come from the file-convention sibling
// `opengraph-image.tsx` — Next.js auto-attaches them to og:image
// and twitter:image when present at the segment root. We keep the
// text fields here so they win over per-page metadata when a page
// doesn't override them. `images: []` as the structural fallback
// for hosts that won't follow the file convention; the real URL
// is injected by Next.js at build time from opengraph-image.tsx.
openGraph:{
type:"website",
siteName:"Molecule AI",
url: SITE_URL,
title:"Molecule AI — the AI org chart canvas",
description:
"Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace. Credit metering, audit, and one-click runtime provisioning.",
locale:"en_US",
},
twitter:{
card:"summary_large_image",
title:"Molecule AI — the AI org chart canvas",
description:
"Wire Claude Code, Codex, Hermes, and OpenClaw agents into a governed multi-agent workspace.",
},
icons:{
icon:"/molecule-icon.png",
apple:"/molecule-icon.png",
},
// robots.ts owns the per-route allow/disallow contract; this is the
// header-level fallback for routes the crawler reaches before
// robots.txt resolves. Default = index public marketing routes;
label="Universal MCP — standalone register + heartbeat + tools for any MCP-aware runtime (Claude Code, hermes, codex). Pair with Python or Claude Code tab if you need inbound A2A delivery."
copyKey="mcp"
copied={copiedKey==="mcp"}
onCopy={()=>copy(filledUniversalMcp,"mcp")}
/>
)}
{tab==="hermes"&&filledHermes&&(
<SnippetBlock
value={filledHermes}
label="Hermes channel — bridges this workspace's A2A traffic into your hermes-agent session as platform messages (push parity with Claude Code). Long-poll based; no tunnel needed."
copyKey="hermes"
copied={copiedKey==="hermes"}
onCopy={()=>copy(filledHermes,"hermes")}
/>
)}
{tab==="codex"&&filledCodex&&(
<SnippetBlock
value={filledCodex}
label="Codex MCP config — wires the molecule MCP server into ~/.codex/config.toml. Outbound tools today; inbound A2A push needs the Python SDK tab paired in (codex's MCP runtime doesn't route arbitrary notifications/* yet)."
copyKey="codex"
copied={copiedKey==="codex"}
onCopy={()=>copy(filledCodex,"codex")}
/>
)}
{tab==="openclaw"&&filledOpenClaw&&(
<SnippetBlock
value={filledOpenClaw}
label="OpenClaw MCP config — wires the molecule MCP server via openclaw mcp set + starts the gateway on loopback. Outbound tools today; inbound A2A push on an external openclaw needs the Python SDK tab paired in (a sessions.steer bridge daemon is future work)."
copyKey="openclaw"
copied={copiedKey==="openclaw"}
onCopy={()=>copy(filledOpenClaw,"openclaw")}
/>
)}
{tab==="kimi"&&filledKimi&&(
<SnippetBlock
value={filledKimi}
label="Kimi CLI — self-contained Python bridge. Registers, heartbeats, polls for canvas messages, and echoes replies back. NAT-safe (no public URL). Run in a background terminal or via launchd."
label="Universal MCP — standalone register + heartbeat + tools for any MCP-aware runtime (Claude Code, hermes, codex). Pair with Python or Claude Code tab if you need inbound A2A delivery."
label="Hermes channel — bridges this workspace's A2A traffic into your hermes-agent session as platform messages (push parity with Claude Code). Long-poll based; no tunnel needed."
label="OpenClaw MCP config — wires the molecule MCP server via openclaw mcp set + starts the gateway on loopback. Outbound tools today; inbound A2A push on an external openclaw needs the Python SDK tab paired in (a sessions.steer bridge daemon is future work)."
copyKey="openclaw"
copied={copiedKey==="openclaw"}
onCopy={()=>copy(filledOpenClaw,"openclaw")}
/>
)}
</div>
{/* Kimi tab */}
<div
id="panel-kimi"
data-testid="panel-kimi"
role="tabpanel"
aria-labelledby="tab-kimi"
hidden={tab!=="kimi"||!filledKimi}
className={tab==="kimi"&&filledKimi?"":"hidden"}
>
{filledKimi&&(
<SnippetBlock
value={filledKimi}
label="Kimi CLI — self-contained Python bridge. Registers, heartbeats, polls for canvas messages, and echoes replies back. NAT-safe (no public URL). Run in a background terminal or via launchd."
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.