Compare commits

..

4 Commits

Author SHA1 Message Date
Molecule AI Dev Engineer A (Kimi) 323c080743 test(handlers): add org_scope coverage — orgRootID + sameOrg at 0% (#1953)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Check migration collisions / Migration version collision check (pull_request) Successful in 20s
CI / Detect changes (pull_request) Successful in 19s
CI / Python Lint & Test (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 14s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 37s
security-review / approved (pull_request) Failing after 11s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 1m12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m40s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m17s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m11s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m24s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 4m28s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m21s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
gate-check-v3 / gate-check (pull_request) Successful in 12s
qa-review / approved (pull_request) Failing after 8s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 32s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request) Successful in 8s
sop-checklist / all-items-acked (pull_request) Successful in 9s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m21s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m11s
CI / Canvas (Next.js) (pull_request) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m42s
E2E Chat / E2E Chat (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m10s
CI / Platform (Go) (pull_request) Successful in 5m28s
CI / all-required (pull_request) Successful in 36m6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request_target) Has been skipped
Adds 10 sqlmock-backed tests covering the cross-tenant isolation
helpers introduced in #1953.

Covered:
- orgRootID: happy path (child→root), workspace-is-root, no rows,
  DB error, empty root string
- sameOrg: identical IDs (short-circuit), same org root,
  different org roots, orgRootID fails, orgRootID not found

Closes #1953 follow-up (test debt)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 04:59:11 +00:00
Molecule AI Dev Engineer A (Kimi) 952949ee95 test(handlers): add PatchAbilities coverage — workspace_abilities.go at 0%
Adds 11 sqlmock-backed tests covering the PATCH /workspaces/:id/abilities
handler (PatchAbilities):

- Invalid workspace ID → 400
- Invalid JSON body → 400
- Empty body (no fields) → 400
- Workspace not found → 404
- Existence query error → 404 (fail-closed)
- Patch broadcast_enabled only → 200
- Patch talk_to_user_enabled only → 200
- Patch both fields → 200
- DB error on broadcast update → 500
- DB error on talk_to_user update → 500
- DB error on broadcast when both supplied → 500 (partial update not committed)

Closes #1312

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 04:59:11 +00:00
Molecule AI Dev Engineer A (Kimi) 049df05ca5 ci(workflows): flip cancel-in-progress false→true on 16 workflows (#1357)
Gitea 1.22.6 does not honor cancel-in-progress: false for scheduled/push
events — queued runs accumulate as stale scheduled tasks instead of
waiting, saturating the runner pool (#1357). Flipping to true lets
obsolete in-flight runs cancel correctly, freeing slots.

Safe-flip set (PM + Eng B reviewed, 16 workflows):
- ci-required-drift, staging-smoke, e2e-staging-sanity
- sweep-cf-orphans, sweep-aws-secrets, sweep-cf-tunnels, sweep-stale-e2e-orgs
- e2e-chat, e2e-legacy-advisory, e2e-peer-visibility, e2e-staging-canvas
- continuous-synth-e2e, railway-pin-audit
- handlers-postgres-integration, harness-replays, e2e-api

Excluded (protected — half-rolled fleet / auto-promote / merge ordering):
- e2e-staging-external, e2e-staging-saas, gitea-merge-queue
- redeploy-tenants-on-staging, redeploy-tenants-on-main
- main-red-watchdog, publish-workspace-server-image, status-reaper
- gate-check-v3

Fixes #1357

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 04:59:11 +00:00
Molecule AI Dev Engineer A (Kimi) 0078f702ee ci(workflows): renew continue-on-error tracker mc#774 → mc#1982
mc#774 reached its 14-day renewal cap, causing lint-continue-on-error-tracking
to fail across all workflow PRs and making main red (#1975). Renew the forced-
renewal tracker by creating mc#1982 and updating all 37 job-level mask comments.

Affected: 34 workflow files with continue-on-error: true directives.
Next renewal due: 2026-06-11.

Fixes #1975
Refs: mc#774, mc#1982, feedback_chained_defects_in_never_tested_workflows

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 04:59:11 +00:00
270 changed files with 2956 additions and 25363 deletions
+6 -15
View File
@@ -19,22 +19,13 @@ REDIS_URL=redis://localhost:6379
# itself to 3000 in canvas/package.json, so sourcing this file before
# `npm run dev` won't accidentally make Next.js try to bind 8080.
PORT=8080
# ---- Admin credential — REQUIRED in EVERY environment (auth is fail-closed) ----
# Auth is fail-CLOSED everywhere now (harden/no-fail-open-auth): there is NO
# dev-mode escape hatch. AdminAuth / WorkspaceAuth / discovery all require a
# real credential. The canvas authenticates by sending this value as a bearer
# (it reads NEXT_PUBLIC_ADMIN_TOKEN — set it to the SAME value).
# ---- Admin credential — REQUIRED to close issue #684 (AdminAuth bearer bypass) ----
# When ADMIN_TOKEN is set, only this value is accepted on /admin/* and /approvals/* routes.
# (When unset, a fresh install 401s on admin routes and any valid workspace bearer
# is the only deprecated fallback once tokens exist — set ADMIN_TOKEN to close #684.)
# Generate: openssl rand -base64 32 (scripts/dev-start.sh provisions a fixed dev value)
# Without it, any valid workspace bearer token can call admin endpoints (backward compat
# fallback, still vulnerable). Set this in every environment, rotate when compromised.
# Generate: openssl rand -base64 32
# Store in fly secrets / deployment env — NEVER commit the actual value here.
ADMIN_TOKEN=
# NEXT_PUBLIC_ADMIN_TOKEN= # Canvas-side mirror of ADMIN_TOKEN. The canvas
# bakes this into its bundle and sends it as the
# bearer. MUST equal ADMIN_TOKEN (next.config.ts
# warns if the pair is half-set). dev-start.sh
# exports it for you.
SECRETS_ENCRYPTION_KEY= # 32-byte key (raw or base64). Leave empty for plaintext (dev only).
CONFIGS_DIR= # Path to workspace-configs-templates/ (auto-discovered if empty)
PLUGINS_DIR= # Path to plugins/ directory (default: /plugins in container)
@@ -43,7 +34,7 @@ PLUGINS_DIR= # Path to plugins/ directory (default: /plugins i
# MOLECULE_MCP_ALLOW_SEND_MESSAGE= # Set to "true" to include send_message_to_user in the MCP bridge tool list (issue #810). Excluded by default to prevent unintended WebSocket pushes from CLI sessions.
# MOLECULE_MCP_URL=http://localhost:8080 # Platform URL for opencode MCP config (opencode.json). Same as PLATFORM_URL; separate var so opencode configs can reference it without ambiguity.
# WORKSPACE_DIR= # Optional global host path bind-mounted to /workspace in every container. Per-workspace workspace_dir column overrides this; if neither is set each workspace gets an isolated Docker named volume.
MOLECULE_ENV=development # Environment label (development/staging/production). Used for log tagging and for NON-security local-dev conveniences (loopback HTTP bind, relaxed rate-limit bucket). It is NOT an auth lever — auth is fail-closed in every environment. SaaS deployments MUST set MOLECULE_ENV=production.
MOLECULE_ENV=development # Environment label (development/staging/production). Used for log tagging and for the AdminAuth dev-mode escape hatch (lets the Canvas dashboard keep working after the first workspace is created, when ADMIN_TOKEN is unset). SaaS deployments MUST set MOLECULE_ENV=production.
# MOLECULE_ENABLE_TEST_TOKENS= # Set to 1 to expose GET /admin/workspaces/:id/test-token (mints a fresh bearer token for E2E scripts). The route is auto-enabled when MOLECULE_ENV != production; this flag is the explicit override. Leave unset/0 in prod — the route 404s unless enabled.
# MOLECULE_ORG_ID= # SaaS only: org UUID set by control plane on tenant machines. When set, workspace provisioning auto-routes through the control plane API instead of Docker.
# CP_PROVISION_URL= # Override control plane URL for workspace provisioning (default: https://api.moleculesai.app). Only needed for testing against a non-production control plane.
@@ -60,7 +51,7 @@ MOLECULE_ENV=development # Environment label (development/
# MOLECULE_IN_DOCKER= # Set when running the platform inside Docker (accepts 1/0, true/false). Triggers A2A proxy to rewrite 127.0.0.1:<port> agent URLs to Docker bridge hostnames. Auto-detected via /.dockerenv; only set if detection fails or to force off.
# GitHub
# GITHUB_REPO=owner/repo # Target repo for agent initial_prompt clone (e.g. Molecule-AI/molecule-core). Read inside workspace containers.
# GITHUB_REPO=owner/repo # Target repo for agent initial_prompt clone (e.g. Molecule-AI/molecule-monorepo). Read inside workspace containers.
# GITHUB_TOKEN= # Personal access token / installation token used by agents that clone private repos. Register as a global secret via POST /admin/secrets for propagation to workspace env. Token is used in-URL during clone and then scrubbed from .git/config via `git remote set-url`.
# Webhooks
+12 -28
View File
@@ -18,24 +18,15 @@
# per §SOP-6 security model). No-op when merged=false.
#
# Required env (set by the workflow):
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER
# plus one of REQUIRED_CHECKS_JSON (preferred) or REQUIRED_CHECKS (legacy)
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER, REQUIRED_CHECKS
#
# REQUIRED_CHECKS_JSON is a JSON object keyed by branch name. Each value
# is an array of status-check context names that branch protection
# requires for that branch. The script looks up the PR's base branch and
# evaluates only the checks declared for that branch.
#
# {"main": ["CI / all-required (pull_request)", ...],
# "staging": ["CI / all-required (pull_request)", ...]}
#
# REQUIRED_CHECKS (legacy) is a newline-separated list used when the
# JSON variable is not set. Declared in the workflow YAML rather than
# fetched from /branch_protections (which needs admin scope — sop-tier-bot
# has read-only). Trade dynamism for simplicity: when the required-check
# set changes, update both branch protection AND this env. Keeping them
# in sync is less complexity than granting the audit bot admin perms on
# every repo.
# REQUIRED_CHECKS is a newline-separated list of status-check context
# names that branch protection requires. Declared in the workflow YAML
# rather than fetched from /branch_protections (which needs admin
# scope — sop-tier-bot has read-only). Trade dynamism for simplicity:
# when the required-check set changes, update both branch protection
# AND this env. Keeping them in sync is less complexity than granting
# the audit bot admin perms on every repo.
set -euo pipefail
@@ -43,10 +34,7 @@ set -euo pipefail
: "${GITEA_HOST:?required}"
: "${REPO:?required}"
: "${PR_NUMBER:?required}"
if [ -z "${REQUIRED_CHECKS_JSON:-}" ] && [ -z "${REQUIRED_CHECKS:-}" ]; then
echo "::error::Either REQUIRED_CHECKS_JSON or REQUIRED_CHECKS must be set"
exit 1
fi
: "${REQUIRED_CHECKS:?required (newline-separated context names)}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
@@ -77,14 +65,10 @@ if [ -z "$MERGE_SHA" ]; then
exit 0
fi
# 2. Required status checks — branch-aware JSON dict takes precedence.
if [ -n "${REQUIRED_CHECKS_JSON:-}" ]; then
REQUIRED=$(echo "$REQUIRED_CHECKS_JSON" | jq -r --arg branch "$BASE_BRANCH" '.[$branch] // [] | .[]')
else
REQUIRED="$REQUIRED_CHECKS"
fi
# 2. Required status checks declared in the workflow env.
REQUIRED="$REQUIRED_CHECKS"
if [ -z "${REQUIRED//[[:space:]]/}" ]; then
echo "::notice::REQUIRED_CHECKS empty for branch '$BASE_BRANCH' — force-merge not applicable."
echo "::notice::REQUIRED_CHECKS empty — force-merge not applicable."
exit 0
fi
+29 -78
View File
@@ -8,8 +8,7 @@ pair diverges.
Sources:
A. `.gitea/workflows/ci.yml` jobs (CI source — the actual job set)
B. `status_check_contexts` in branch_protections (the merge gate)
C. `REQUIRED_CHECKS_JSON` (preferred) or `REQUIRED_CHECKS` (legacy)
env in audit-force-merge.yml (the audit env)
C. `REQUIRED_CHECKS` env in audit-force-merge.yml (the audit env)
Three failure classes:
F1 Job in (A) is not under the sentinel's `needs:` — sentinel
@@ -251,21 +250,13 @@ def sentinel_needs(ci_doc: dict) -> set[str]:
return set(needs)
def required_checks_env(audit_doc: dict, branch: str) -> set[str]:
"""Pull the required-checks env value from audit-force-merge.yml.
def required_checks_env(audit_doc: dict) -> set[str]:
"""Pull the REQUIRED_CHECKS env value from audit-force-merge.yml.
Walks the YAML AST per `feedback_behavior_based_ast_gates`: we do
NOT grep for env keys — that breaks under reformatting,
NOT grep for `REQUIRED_CHECKS:` — that breaks under reformatting,
multi-job workflows, or a future move of the env to a different
step. Instead, look inside every job's every step's `env:` map.
Supports two variants:
- REQUIRED_CHECKS_JSON (preferred): JSON dict keyed by branch name.
We extract the array for the target branch.
- REQUIRED_CHECKS (legacy): newline-separated list of context names.
"""
found_json: list[str] = []
found_legacy: list[str] = []
step. Instead, look inside every job's every step's `env:` map."""
found: list[str] = []
jobs = audit_doc.get("jobs", {})
if not isinstance(jobs, dict):
sys.stderr.write(f"::warning::{AUDIT_WORKFLOW_PATH} has no jobs: mapping\n")
@@ -277,67 +268,27 @@ def required_checks_env(audit_doc: dict, branch: str) -> set[str]:
if not isinstance(step, dict):
continue
step_env = step.get("env") or {}
if isinstance(step_env, dict):
if "REQUIRED_CHECKS_JSON" in step_env:
v = step_env["REQUIRED_CHECKS_JSON"]
if isinstance(v, str):
found_json.append(v)
if "REQUIRED_CHECKS" in step_env:
v = step_env["REQUIRED_CHECKS"]
if isinstance(v, str):
found_legacy.append(v)
# JSON variant takes precedence.
if found_json:
if len(found_json) > 1:
sys.stderr.write(
f"::error::REQUIRED_CHECKS_JSON env present in {len(found_json)} steps; ambiguous\n"
)
sys.exit(3)
try:
parsed = json.loads(found_json[0])
except json.JSONDecodeError as e:
sys.stderr.write(
f"::error::REQUIRED_CHECKS_JSON is not valid JSON: {e}\n"
)
sys.exit(3)
if not isinstance(parsed, dict):
sys.stderr.write(
f"::error::REQUIRED_CHECKS_JSON parsed to {type(parsed).__name__}, expected dict\n"
)
sys.exit(3)
branch_checks = parsed.get(branch)
if branch_checks is None:
sys.stderr.write(
f"::error::REQUIRED_CHECKS_JSON has no entry for branch '{branch}'\n"
)
sys.exit(3)
if not isinstance(branch_checks, list):
sys.stderr.write(
f"::error::REQUIRED_CHECKS_JSON['{branch}'] is {type(branch_checks).__name__}, expected list\n"
)
sys.exit(3)
return {str(item).strip() for item in branch_checks if str(item).strip()}
# Legacy variant fallback.
if found_legacy:
if len(found_legacy) > 1:
# Defensive: refuse to guess which one is canonical.
sys.stderr.write(
f"::error::REQUIRED_CHECKS env present in {len(found_legacy)} steps; ambiguous\n"
)
sys.exit(3)
raw = found_legacy[0]
# YAML block-scalars (`|`) leave a trailing newline + blanks; trim
# consistently with audit-force-merge.sh's parser so both sides
# produce identical sets.
return {line.strip() for line in raw.splitlines() if line.strip()}
sys.stderr.write(
f"::error::Neither REQUIRED_CHECKS_JSON nor REQUIRED_CHECKS env found in any step of "
f"{AUDIT_WORKFLOW_PATH}\n"
)
sys.exit(3)
if isinstance(step_env, dict) and "REQUIRED_CHECKS" in step_env:
v = step_env["REQUIRED_CHECKS"]
if isinstance(v, str):
found.append(v)
if not found:
sys.stderr.write(
f"::error::REQUIRED_CHECKS env not found in any step of "
f"{AUDIT_WORKFLOW_PATH}\n"
)
sys.exit(3)
if len(found) > 1:
# Defensive: refuse to guess which one is canonical.
sys.stderr.write(
f"::error::REQUIRED_CHECKS env present in {len(found)} steps; ambiguous\n"
)
sys.exit(3)
raw = found[0]
# YAML block-scalars (`|`) leave a trailing newline + blanks; trim
# consistently with audit-force-merge.sh's parser so both sides
# produce identical sets.
return {line.strip() for line in raw.splitlines() if line.strip()}
# --------------------------------------------------------------------------
@@ -379,7 +330,7 @@ def detect_drift(branch: str) -> tuple[list[str], dict]:
jobs = ci_job_names(ci_doc)
jobs_all = ci_jobs_all(ci_doc)
needs = sentinel_needs(ci_doc)
env_set = required_checks_env(audit_doc, branch)
env_set = required_checks_env(audit_doc)
# Protection
# api() raises ApiError on non-2xx. Transient 5xx should fail loud.
@@ -573,7 +524,7 @@ def render_body(branch: str, findings: list[str], debug: dict) -> str:
"- **F2**: rename the protection context to match an emitter, "
"or remove it from `status_check_contexts` "
"(PATCH `/api/v1/repos/{owner}/{repo}/branch_protections/{branch}`).",
"- **F3a / F3b**: bring `REQUIRED_CHECKS_JSON` (or `REQUIRED_CHECKS` legacy) env in "
"- **F3a / F3b**: bring `REQUIRED_CHECKS` env in "
"`.gitea/workflows/audit-force-merge.yml` into set-equality with "
"`status_check_contexts` (single PR, both files).",
"",
-5
View File
@@ -26,10 +26,6 @@ PROFILES: dict[str, dict[str, str]] = {
"handlers": (
r"^workspace-server/internal/handlers/"
r"|^workspace-server/internal/wsauth/"
# #2149: the scheduler real-PG integration tests run in this same
# workflow (they reuse its migrated Postgres), so changes to the
# scheduler package must trigger the job too.
r"|^workspace-server/internal/scheduler/"
r"|^workspace-server/migrations/"
r"|^\.gitea/workflows/handlers-postgres-integration\.yml$"
),
@@ -178,4 +174,3 @@ def main(argv: list[str]) -> int:
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
@@ -466,40 +466,12 @@ def fetch_log(target_url: str) -> str | None:
def grep_fail_markers(log_text: str) -> list[str]:
"""Return up to 5 sample matching lines for any FAIL_PATTERNS hit.
Empty list = clean log.
Heuristic: skip lines where the marker appears inside script source
(e.g. ``echo "::error::..."`` in a ``::group::Run`` block) rather
than actual execution output. The Gitea Actions log prints the raw
script before executing it; ``echo "::error::"`` lines in that
display are false positives.
"""
Empty list = clean log."""
matches: list[str] = []
in_run_group = False
group_depth = 0
for line in log_text.splitlines():
stripped = line.strip()
# Track Gitea Actions group markers so we can skip the
# ``::group::Run`` script-source display blocks.
if stripped.startswith("::group::Run"):
in_run_group = True
group_depth = 1
continue
if stripped == "::endgroup::":
if in_run_group:
in_run_group = False
group_depth = 0
continue
if in_run_group:
continue
for pat in FAIL_PATTERNS:
if pat in line:
# Additional false-positive guard: ``echo "::error::"``
# is script source, not a runtime error emission.
if pat == "::error::":
prefix = line[: line.index(pat)].strip()
if prefix.endswith('echo') or prefix.endswith("echo '") or prefix.endswith('echo "'):
break
# Truncate to keep error output bounded.
matches.append(line.strip()[:240])
break
if len(matches) >= 5:
-146
View File
@@ -208,61 +208,6 @@ def _raise_for_redeploy_result(status: int, body: dict, slugs: list[str]) -> Non
)
def rollout_stragglers(enumerated: list[str], results: list[dict]) -> list[str]:
"""Return every enumerated tenant NOT proven on the target build.
A straggler is any tenant the rollout was supposed to cover that the
CP could not verify is running the target image tag — whether it
errored, was skipped, or SSM-succeeded onto the wrong image
(internal#724). CP marks each per-tenant result row with
``verified_on_target`` (the REDEPLOY_RUNNING_IMAGE docker-inspect
proof). A tenant enumerated for the rollout but absent from the
result set (no batch ever ran it) is also a straggler — that is the
exact agents-team silent-skip class.
Backward-compat: an OLDER CP that doesn't emit ``verified_on_target``
yet returns rows without the key. Treat a missing key as verified so
this surfacing degrades to the previous (ok-based) behavior against an
un-upgraded CP, rather than failing every deploy spuriously. Once the
CP fix is deployed the key is always present and real stragglers are
caught.
"""
verified: set[str] = set()
for row in results:
if str(row.get("ssm_status") or "") == "DryRun":
continue
slug = str(row.get("slug") or "").strip()
if not slug:
continue
# Missing key (old CP) => assume verified; present key is authoritative.
if "verified_on_target" not in row or row.get("verified_on_target"):
verified.add(slug)
return sorted(s for s in dict.fromkeys(enumerated) if s not in verified)
def assert_full_coverage(enumerated: list[str], aggregate: dict, dry_run: bool) -> None:
"""Fail the rollout if any enumerated tenant is not on the target build.
This is the no-silent-skip gate (internal#724). A dry run proves
nothing landed, so coverage is not asserted for it.
"""
if dry_run:
return
stragglers = rollout_stragglers(enumerated, aggregate.get("results") or [])
if stragglers:
msg = (
f"incomplete rollout: {len(stragglers)} tenant(s) not verified on target "
f"after redeploy-fleet: {', '.join(stragglers)} "
f"(enumerated {len(set(enumerated))})"
)
aggregate["ok"] = False
aggregate["error"] = msg
aggregate["stragglers"] = stragglers
raise RolloutFailed(msg, aggregate)
def execute_scoped_rollout(
plan: dict,
token: str,
@@ -309,14 +254,6 @@ def execute_scoped_rollout(
aggregate["error"] = str(exc)
raise RolloutFailed(str(exc), aggregate) from exc
# No-silent-skip coverage gate (internal#724): every enumerated tenant
# must be PROVEN on the target build. A per-tenant HTTP-200/ok response
# is not proof — a tenant that SSM-succeeded but stayed on the old tag,
# or one enumerated but never batched, is a straggler. Surfacing it as
# a RolloutFailed makes the deploy step exit non-zero instead of
# silently reporting success (the exact agents-team failure mode).
assert_full_coverage(all_slugs, aggregate, dry_run)
return aggregate
@@ -364,71 +301,6 @@ def _api_json_optional(url: str, token: str) -> tuple[int, dict | None]:
return exc.code, None
def current_branch_head(env: dict[str, str]) -> str | None:
"""Return the SHA at the tip of the deploy branch (main) per Gitea, or None.
Used to detect a *superseded* deploy job (see `superseded_by`). Fail-safe:
any read error / missing token returns None so the caller treats the job as
NOT superseded and the strict /buildinfo verify still runs. We never let an
unreadable head silently green a deploy.
"""
token = env.get("GITEA_TOKEN", "").strip()
if not token:
return None
host = env.get("GITEA_HOST", "git.moleculesai.app")
repo = env.get("GITHUB_REPOSITORY", "molecule-ai/molecule-core")
# Deploy lane is on: push:main; the branch is always main here, but read it
# from the ref name when present so a future branch rename doesn't break us.
branch = env.get("GITHUB_REF_NAME", "").strip() or "main"
url = f"https://{host}/api/v1/repos/{repo}/branches/{quote(branch, safe='')}"
status, body = _api_json_optional(url, token)
if status != 200 or not isinstance(body, dict):
return None
commit = body.get("commit")
if isinstance(commit, dict):
head = commit.get("id") or commit.get("sha")
if isinstance(head, str) and head.strip():
return head.strip()
return None
def superseded_by(env: dict[str, str]) -> str | None:
"""Return the newer head SHA if THIS deploy job has been superseded, else None.
This workflow runs with no `concurrency:` (intentional — Gitea 1.22.6 cancels
queued runs, which is unacceptable for a prod deploy). When two main pushes
land close together, BOTH deploy-production jobs run. The newer push rolls the
fleet forward first; the OLDER job's strict /buildinfo verify then sees tenants
on the NEWER SHA and false-reds with "$slug is stale" — even though the fleet
is AHEAD, not behind. Git SHAs aren't ordered, so the verify can't tell ahead
from behind on its own (and /buildinfo exposes only git_sha, no build time).
Resolve it at the source of truth for ordering — the branch ref: if main's
current head is a DIFFERENT SHA than the one this job is deploying, a newer
commit has landed and this job is superseded; the newest job's verify is the
authoritative one. We return that head SHA so the caller can log it and exit
success early, skipping the strict-equality verify for this stale job.
Fail-safe: returns None (NOT superseded) when the head can't be read or equals
our SHA, so a genuinely-behind tenant under the LATEST deploy job still fails
the strict verify loudly. This never suppresses a real-stale signal — it only
excuses a job that is no longer the latest from asserting exact equality.
"""
sha = env.get("GITHUB_SHA", "").strip()
if not sha:
return None
head = current_branch_head(env)
if not head:
return None
# SHA lengths can differ (short vs full); compare on the shorter prefix.
n = min(len(head), len(sha))
if head[:n].lower() == sha[:n].lower():
return None
return head
def live_disable_flag(env: dict[str, str]) -> str:
"""Return a live disable value from Gitea variables when readable.
@@ -507,14 +379,6 @@ def main() -> int:
sub.add_parser("plan", help="print production deploy plan as JSON")
sub.add_parser("assert-enabled", help="fail if production deploy is currently disabled")
sub.add_parser("wait-ci", help="block until required CI context is green")
sub.add_parser(
"check-superseded",
help=(
"exit 0 if a newer commit has landed on the deploy branch (this job "
"is superseded; prints the newer head SHA), exit 10 if this job is "
"still the latest"
),
)
rollout_parser = sub.add_parser("rollout", help="execute canary-first scoped production rollout")
rollout_parser.add_argument("--plan", required=True, help="path to prod-auto-deploy plan JSON")
rollout_parser.add_argument("--response", required=True, help="path to write aggregate response JSON")
@@ -530,16 +394,6 @@ def main() -> int:
if args.command == "wait-ci":
wait_for_ci_context(dict(os.environ))
return 0
if args.command == "check-superseded":
newer = superseded_by(dict(os.environ))
if newer:
print(newer)
return 0
# Exit 10 (not 0, not 1): "this job is still the latest". The
# workflow treats only exit 0 as superseded; 10 means proceed to
# the strict verify. A non-zero code here is informational, not a
# failure — the workflow step swallows it.
return 10
if args.command == "rollout":
rollout_from_plan_file(args.plan, args.response, dict(os.environ))
return 0
+1 -15
View File
@@ -296,15 +296,7 @@ fi
# 403 → token owner is not in this team (Gitea 1.22.6 'Must be a team
# member' constraint — see follow-up issue for token-provisioning)
# 404 → not a member
# Track whether every candidate returned 403 (token owner not in team).
# When this happens the root cause is a token-provisioning issue, not a
# reviewer-eligibility issue — surface it clearly so ops don't waste time
# verifying team roster (Bug C / RFC#324 follow-up).
_ALL_CANDIDATES_403="yes"
_CANDIDATE_COUNT=0
for U in $CANDIDATES; do
_CANDIDATE_COUNT=$((_CANDIDATE_COUNT + 1))
CODE=$(curl -sS -o "$TEAM_PROBE_TMP" -w '%{http_code}' \
-K "$CURL_AUTH_FILE" "${API}/teams/${TEAM_ID}/members/${U}")
debug "probe ${U} in team ${TEAM} (id=${TEAM_ID}) → HTTP ${CODE}"
@@ -325,20 +317,14 @@ for U in $CANDIDATES; do
continue
;;
404)
_ALL_CANDIDATES_403="no"
debug "${U} not a member of ${TEAM}"
;;
*)
_ALL_CANDIDATES_403="no"
echo "::warning::team-probe for ${U} in ${TEAM} returned unexpected HTTP ${CODE}"
cat "$TEAM_PROBE_TMP" >&2
;;
esac
done
if [ "$_ALL_CANDIDATES_403" = "yes" ] && [ "$_CANDIDATE_COUNT" -gt 0 ]; then
echo "::error::${TEAM}-review FAILED — every candidate returned 403 (token owner is not a member of the ${TEAM} team). This is a TOKEN PROVISIONING issue, not a reviewer-eligibility issue. Add the token owner to the '${TEAM}' Gitea team (id=${TEAM_ID}) or use a token whose owner is already in that team."
else
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
fi
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
exit 1
+4 -10
View File
@@ -13,26 +13,20 @@ set -euo pipefail
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
# Branch-protection requires the (pull_request_target) context variant.
# The refire path must post the EXACT BP-required name so the gate flips.
CONTEXT="${TEAM}-review / approved (pull_request_target)"
CONTEXT="${TEAM}-review / approved (pull_request)"
TARGET_URL="https://${GITEA_HOST}/${OWNER}/${NAME}/pulls/${PR_NUMBER}"
authfile=$(mktemp)
post_authfile=$(mktemp)
prfile=$(mktemp)
postfile=$(mktemp)
# shellcheck disable=SC2329 # invoked by EXIT trap
cleanup() {
rm -f "$authfile" "$post_authfile" "$prfile" "$postfile"
rm -f "$authfile" "$prfile" "$postfile"
}
trap cleanup EXIT
chmod 600 "$authfile" "$post_authfile"
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
# STATUS_POST_TOKEN is narrow-scoped write:repository for explicit status POST.
# Falls back to GITEA_TOKEN for backward compatibility (e.g. local test).
printf 'header = "Authorization: token %s"\n' "${STATUS_POST_TOKEN:-$GITEA_TOKEN}" > "$post_authfile"
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}")
@@ -74,7 +68,7 @@ body=$(jq -nc \
'{state:$state, context:$context, description:$description, target_url:$target_url}')
code=$(curl -sS -o "$postfile" -w '%{http_code}' -X POST \
-K "$post_authfile" -H "Content-Type: application/json" \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"${API}/repos/${OWNER}/${NAME}/statuses/${head_sha}")
if [ "$code" != "200" ] && [ "$code" != "201" ]; then
+9 -96
View File
@@ -6,8 +6,8 @@
# RFC#351 Step 2 of 6 (implementation MVP).
#
# Invoked by .gitea/workflows/sop-checklist.yml on:
# - pull_request_target: [opened, edited, synchronize, reopened, labeled, unlabeled]
# - issue_comment: [created] # edited/deleted omitted (Gitea 1.22.6 job-parsing quirk)
# - pull_request_target: [opened, edited, synchronize, reopened]
# - issue_comment: [created, edited, deleted]
#
# Flow:
# 1. Load .gitea/sop-checklist-config.yaml (from BASE ref — trusted).
@@ -639,7 +639,9 @@ def load_config(path: str) -> dict[str, Any]:
# yaml is an optional dep; the canonical loader is used when available,
# but the SOP runs on runners that may not have PyYAML installed. The
# fallback _load_config_minimal covers the same config shape without
import yaml # type: ignore[import-not-found] # optional dep; fall back silently if absent
# requiring the dep, so the ignore is safe: if yaml loads, we use it;
# otherwise we fall back silently.
import yaml # type: ignore[import-not-found]
with open(path, encoding="utf-8") as f:
return yaml.safe_load(f)
except ImportError:
@@ -895,47 +897,6 @@ def resolve_required_teams(item: dict[str, Any], high_risk: bool) -> list[str]:
return list(item.get("required_teams") or [])
# ---------------------------------------------------------------------------
# CI status validation for testing-class AI acks (internal#760 CTO hardening)
# ---------------------------------------------------------------------------
# Slugs that require CI / all-required green before an AI ack is valid.
_TESTING_CLASS_SLUGS = {"comprehensive-testing", "local-postgres-e2e", "staging-smoke"}
# Human-only carve-out: these items can NEVER be acked by AI, regardless
# of config drift. Any item in this set MUST NOT have ai_ack_eligible.
# migration / schema are future-proofing — not yet in config items, but
# the code guard rejects them proactively (CTO hardening, msg 1388c76f).
_HUMAN_ONLY_SLUGS = {"root-cause", "no-backwards-compat", "migration", "schema"}
def get_ci_status(client: GiteaClient, owner: str, repo: str, sha: str) -> str:
"""Return the state of CI / all-required (pull_request) for `sha`.
Looks through the commit statuses and returns the state string
("success", "failure", "pending", "error") or "missing" if the
context is not found. This prevents an AI agent from attesting
"tests pass" independently of the actual CI run.
"""
code, data = client._req( # noqa: SLF001
"GET", f"/repos/{owner}/{repo}/statuses/{sha}"
)
if code != 200:
return "unknown"
if not data or not isinstance(data, list):
return "missing"
# Gitea returns statuses newest-first. Find the latest for our context.
for status in data:
if status.get("context") == "CI / all-required (pull_request)":
return status.get("state", "unknown")
return "missing"
# ---------------------------------------------------------------------------
# Main entry point
# ---------------------------------------------------------------------------
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser()
p.add_argument("--owner", required=True)
@@ -1029,9 +990,6 @@ def main(argv: list[str] | None = None) -> int:
# one membership lookup per team.
team_member_cache: dict[tuple[str, int], bool | None] = {}
# Pre-resolve the ai-sop-ack team id once (None if the team does not exist).
ai_sop_ack_team_id = client.resolve_team_id(args.owner, "ai-sop-ack")
def probe(slug: str, users: list[str]) -> list[str]:
# `slug` may be either an items-key (compute_ack_state caller) OR
# an n/a-gate key (compute_na_state caller). Previously this hard
@@ -1075,7 +1033,7 @@ def main(argv: list[str] | None = None) -> int:
for t in data:
if t.get("name") == tn:
tid = t.get("id")
client._team_id_cache[(args.owner, tn)] = tid # noqa: SLF001 # write-through cache; intentional side-effect for reuse across calls
client._team_id_cache[(args.owner, tn)] = tid # noqa: SLF001 # internal write-through cache
break
if tid is not None:
team_ids.append(tid)
@@ -1086,18 +1044,14 @@ def main(argv: list[str] | None = None) -> int:
file=sys.stderr,
)
approved: list[str] = []
rejected_ai_ineligible: list[str] = []
rejected_ci_not_green: list[str] = []
for u in users:
# 1) Human required_teams membership check
in_human_team = False
for tid in team_ids:
cache_key = (u, tid)
if cache_key not in team_member_cache:
team_member_cache[cache_key] = client.is_team_member(tid, u)
result = team_member_cache[cache_key]
if result is True:
in_human_team = True
approved.append(u)
break
if result is None:
print(
@@ -1107,44 +1061,6 @@ def main(argv: list[str] | None = None) -> int:
)
# Treat as not-in-team for this user/team pair; loop
# may still find membership in another team.
if in_human_team:
approved.append(u)
continue
# 2) AI-sop-ack team membership check (only for items that allow it).
if slug in items_by_slug:
item = items_by_slug[slug]
# Defensive: human-only carve-out is enforced in code, not just
# config. Even if ai_ack_eligible were mistakenly added to a
# migration/schema item, the AI path is rejected here.
if slug in _HUMAN_ONLY_SLUGS:
rejected_ai_ineligible.append(u)
continue
if item.get("ai_ack_eligible") and ai_sop_ack_team_id is not None:
cache_key = (u, ai_sop_ack_team_id)
if cache_key not in team_member_cache:
team_member_cache[cache_key] = client.is_team_member(
ai_sop_ack_team_id, u
)
result = team_member_cache[cache_key]
if result is True:
# 2a) Testing-class items require real CI artifact evidence.
if slug in _TESTING_CLASS_SLUGS:
ci_state = get_ci_status(
client, args.owner, args.repo, head_sha
)
if ci_state != "success":
print(
f"::warning::AI ack for {slug} rejected: "
f"CI / all-required is {ci_state}, not success",
file=sys.stderr,
)
rejected_ci_not_green.append(u)
continue
approved.append(u)
continue
# If we get here, user is not approved for this slug.
rejected_ai_ineligible.append(u)
return approved
ack_state = compute_ack_state(
@@ -1228,13 +1144,10 @@ def main(argv: list[str] | None = None) -> int:
)
na_desc = ", ".join(sorted(na_descs)) if na_descs else "(none)"
# internal#818: na-declarations is an informational context, not a merge
# gate. An empty declaration list is a terminal success state — pending
# here poisons the PR combined status.
na_status_state = "success"
na_status_state = "success" if na_descs else "pending"
# review-check.sh reads the description to discover which gates are N/A.
# Include the gate names so it can grep for them.
na_description = f"N/A: {na_desc}"
na_description = f"N/A: {na_desc}" if na_descs else "N/A: (none)"
if not args.dry_run:
client.post_status(
+1 -14
View File
@@ -114,19 +114,6 @@ if [ -z "$WHOAMI" ]; then
fi
echo "::notice::token resolves to user: $WHOAMI"
# 0.5 Read PR head SHA so we can reject stale approvals after head moves
# (internal#816). Reviews carry the commit_id they were submitted against.
HEAD_SHA=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}" | jq -r '.head.sha // ""') || true
if [ -z "$HEAD_SHA" ]; then
echo "::error::Failed to fetch PR head SHA — token may be invalid."
if [ "${SOP_FAIL_OPEN:-}" = "1" ]; then
echo "::warning::SOP_FAIL_OPEN=1 — exiting 0 so CI does not block."
exit 0
fi
exit 1
fi
debug "pr-head-sha=$HEAD_SHA"
# 1. Read tier label. || true ensures set -euo pipefail does not abort the
# script if curl or jq fails (e.g. 401 from empty token).
LABELS=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/issues/${PR_NUMBER}/labels" | jq -r '.[].name') || true
@@ -278,7 +265,7 @@ if [ $_REVIEWS_EXIT -ne 0 ] || [ -z "$REVIEWS" ]; then
fi
exit 1
fi
APPROVERS=$(echo "$REVIEWS" | jq -r --arg head_sha "$HEAD_SHA" '[.[] | select(.state=="APPROVED" and .commit_id == $head_sha) | .user.login] | unique | .[]') || true
APPROVERS=$(echo "$REVIEWS" | jq -r '[.[] | select(.state=="APPROVED") | .user.login] | unique | .[]') || true
if [ -z "$APPROVERS" ]; then
echo "::error::No approving reviews on this PR. Set SOP_DEBUG=1 and re-run for diagnostics."
exit 1
@@ -21,7 +21,6 @@ Scenarios:
T16_comments_generic_approval — reviews empty; comments have "APPROVED" by team member → exit 0
T17_comments_no_approval — reviews empty; comments have no approval keywords → exit 1
T18_review_wrong_team_comment_right_team — review candidate 404s, comment candidate passes
T19_ai_sop_ack_approved — ai-sop-ack member APPROVED review → team probe 404 → exit 1
Usage:
FIXTURE_STATE_DIR=/tmp/x python3 _review_check_fixture.py 8080
@@ -117,12 +116,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
{"state": "CHANGES_REQUESTED", "dismissed": False, "user": {"login": "bob"}, "commit_id": "abc1234"},
{"state": "APPROVED", "dismissed": False, "user": {"login": "core-devops"}, "commit_id": "abc1234"},
])
if sc == "T19_ai_sop_ack_approved":
# ai-sop-ack member submitted APPROVED review — must NOT count
# toward qa-review (team_id=20) or security-review (team_id=21).
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "user": {"login": "ai-reviewer"}, "commit_id": "abc1234"},
])
# Default: one non-author APPROVED
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "user": {"login": "core-devops"}, "commit_id": "abc1234"},
@@ -164,9 +157,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
return self._empty(403)
if sc == "T18_review_wrong_team_comment_right_team" and login == "core-devops":
return self._empty(404)
if sc == "T19_ai_sop_ack_approved" and login == "ai-reviewer":
# ai-sop-ack member is NOT in qa (20) or security (21).
return self._empty(404)
# T7_team_member: member
return self._empty(204)
@@ -1,5 +1,4 @@
import importlib.util
import json
import sys
from pathlib import Path
from unittest.mock import patch
@@ -37,76 +36,6 @@ def _make_audit_doc(required_checks: list[str]) -> dict:
}
def _make_audit_doc_json(required_checks_json: dict) -> dict:
return {
"jobs": {
"audit": {
"steps": [
{"env": {"REQUIRED_CHECKS_JSON": json.dumps(required_checks_json)}}
]
}
}
}
# ---------------------------------------------------------------------------
# required_checks_env — dual-variant parsing
# ---------------------------------------------------------------------------
def test_required_checks_env_prefers_json_over_legacy():
doc = {
"jobs": {
"audit": {
"steps": [
{
"env": {
"REQUIRED_CHECKS_JSON": json.dumps(
{"main": ["ctx-a"], "staging": ["ctx-b"]}
),
"REQUIRED_CHECKS": "ctx-legacy\nctx-old",
}
}
]
}
}
}
assert drift.required_checks_env(doc, "main") == {"ctx-a"}
assert drift.required_checks_env(doc, "staging") == {"ctx-b"}
def test_required_checks_env_falls_back_to_legacy():
doc = _make_audit_doc(["legacy-ctx"])
assert drift.required_checks_env(doc, "main") == {"legacy-ctx"}
def test_required_checks_env_json_missing_branch_fails():
doc = _make_audit_doc_json({"staging": ["ctx-b"]})
try:
drift.required_checks_env(doc, "main")
except SystemExit as exc:
assert exc.code == 3
else:
raise AssertionError("expected SystemExit(3)")
def test_required_checks_env_json_malformed_fails():
doc = {
"jobs": {
"audit": {
"steps": [
{"env": {"REQUIRED_CHECKS_JSON": "not-json"}}
]
}
}
}
try:
drift.required_checks_env(doc, "main")
except SystemExit as exc:
assert exc.code == 3
else:
raise AssertionError("expected SystemExit(3)")
# ---------------------------------------------------------------------------
# sentinel_needs
# ---------------------------------------------------------------------------
@@ -11,100 +11,21 @@ def load_workflow(name: str) -> dict:
return yaml.safe_load(f)
def _all_required(workflow: dict) -> dict:
return workflow["jobs"]["all-required"]
def test_all_required_uses_dedicated_meta_runner_lane():
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
all_required = workflow["jobs"]["all-required"]
# Stays on the dedicated `ci-meta` lane (the sentinel does no docker
# work, so it must NOT occupy the general docker-host pool).
assert all_required["runs-on"] == "ci-meta"
assert "needs" not in all_required
def test_all_required_is_needs_aggregator_not_a_polling_gate():
"""fix/ci-scheduler-fanout (2026-06-01): the sentinel was converted
from a status-polling loop (which squatted a ci-meta executor slot for
up to 40 min per PR) into a plain `needs:` aggregator that frees the
slot immediately. Pin the new shape so a regression to the poller is
caught.
"""
def test_all_required_reuses_path_filter_before_polling():
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
all_required = workflow["jobs"]["all-required"]
rendered = str(all_required)
# The job MUST aggregate via `needs:` (the slot-freeing design).
assert "needs" in all_required, "all-required must be a needs: aggregator"
# It MUST NOT reintroduce the polling loop / per-SHA status fetch that
# was the throughput sink.
assert "detect-changes.py" not in rendered, (
"all-required must not run the detect-changes poller path"
)
assert "commits/" not in rendered and "statuses" not in rendered, (
"all-required must not poll commit statuses (the slot-squat path)"
)
def test_all_required_does_not_use_if_always():
"""Plain `needs:` works on Gitea 1.22.6 / act_runner v0.6.1; `needs:` +
`if: always()` is BROKEN (feedback_gitea_needs_works_only_ifalways_broken)
and would let a non-success need pass the gate. The sentinel must use
plain `needs:` WITHOUT a job-level `if: always()`.
"""
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
job_if = all_required.get("if")
assert not (isinstance(job_if, str) and "always()" in job_if), (
"all-required must not combine needs: with if: always()"
)
def test_all_required_needs_matches_ci_required_drift_f1_set():
"""The sentinel `needs:` list MUST equal ci-required-drift.py's
`ci_job_names()` set: every job MINUS the sentinel itself MINUS jobs
whose `if:` gates on github.event_name/github.ref (event-gated jobs
skip on PRs and a `needs:` on a skipped job would never let the
sentinel run). If they diverge, ci-required-drift F1 fires.
"""
workflow = load_workflow("ci.yml")
jobs = workflow["jobs"]
sentinel = "all-required"
expected = set()
for key, body in jobs.items():
if key == sentinel:
continue
gate = body.get("if") if isinstance(body, dict) else None
if isinstance(gate, str) and (
"github.event_name" in gate or "github.ref" in gate
):
# event-gated → legitimately skips on some triggers; excluded
# from both `needs:` and the F1 set.
continue
expected.add(key)
needs = jobs[sentinel].get("needs", [])
if isinstance(needs, str):
needs = [needs]
actual = set(needs)
assert actual == expected, (
f"all-required needs: {sorted(actual)} != ci_job_names() "
f"{sorted(expected)} — ci-required-drift F1 would fire"
)
def test_all_required_needs_reference_real_jobs():
"""F1b guard: every entry in `needs:` must name an existing job."""
workflow = load_workflow("ci.yml")
jobs = workflow["jobs"]
needs = jobs["all-required"].get("needs", [])
if isinstance(needs, str):
needs = [needs]
job_keys = set(jobs)
for dep in needs:
assert dep in job_keys, f"all-required needs unknown job {dep!r}"
assert "--profile ci" in rendered
assert ".gitea/scripts/detect-changes.py" in rendered
assert "REQUIRE_PLATFORM" in rendered
assert "REQUIRE_CANVAS" in rendered
assert "REQUIRE_SCRIPTS" in rendered
@@ -1,244 +0,0 @@
"""Live-fire regression test for #2159 — gate auto-fire runtime verification.
Static tests (test_gate_review_auto_fire.py) validate that the workflow YAML
is structurally correct. This test validates the *runtime* path: submitting an
APPROVED review to a PR whose head contains the current gate workflows causes
Gitea Actions to queue the qa-review + security-review workflows and POST the
branch-protection-required (pull_request_target) contexts within a reasonable
window.
Skipped when Gitea API credentials are not available. Intended for:
- manual developer verification
- CI jobs provisioned with a service-account token
Environment:
GITEA_HOST — default: git.moleculesai.app
GITEA_TOKEN — token with read:repository + write:issues (for review POST)
REPO — default: molecule-ai/molecule-core
LIVEFIRE_PR_NUMBER — optional; if omitted the test tries to find a
suitable open PR automatically, or skips.
LIVEFIRE_TIMEOUT_SEC — default: 120
"""
import base64
import json
import os
import re
import time
import urllib.error
import urllib.request
from pathlib import Path
import pytest
import yaml
GITEA_HOST = os.environ.get("GITEA_HOST", "git.moleculesai.app")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
REPO = os.environ.get("REPO", "molecule-ai/molecule-core")
LIVEFIRE_PR_NUMBER = os.environ.get("LIVEFIRE_PR_NUMBER", "")
LIVEFIRE_TIMEOUT_SEC = int(os.environ.get("LIVEFIRE_TIMEOUT_SEC", "120"))
REQUIRED_CONTEXTS = [
"qa-review / approved (pull_request_target)",
"security-review / approved (pull_request_target)",
]
skip_no_token = pytest.mark.skipif(
not GITEA_TOKEN,
reason="GITEA_TOKEN not set — live-fire test requires API credentials",
)
def _api(method: str, path: str, body: dict | None = None) -> tuple[int, dict]:
url = f"https://{GITEA_HOST}/api/v1{path}"
headers = {
"Authorization": f"token {GITEA_TOKEN}",
"Content-Type": "application/json",
}
data = json.dumps(body).encode() if body else None
req = urllib.request.Request(url, data=data, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
raw = resp.read()
code = resp.status
except urllib.error.HTTPError as exc:
raw = exc.read()
code = exc.code
payload = json.loads(raw) if raw else {}
return code, payload
def _get_pr(number: int) -> dict:
code, pr = _api("GET", f"/repos/{REPO}/pulls/{number}")
if code != 200:
pytest.fail(f"GET /pulls/{number} returned HTTP {code}: {pr}")
return pr
def _list_open_prs() -> list[dict]:
code, prs = _api("GET", f"/repos/{REPO}/pulls?state=open&limit=50")
if code != 200:
pytest.fail(f"GET /pulls?state=open returned HTTP {code}: {prs}")
return prs
def _pr_has_trigger_in_head(pr: dict) -> bool:
"""Return True if the PR head contains pull_request_review in both workflows."""
head_sha = pr["head"]["sha"]
for wf_name in ("qa-review.yml", "security-review.yml"):
path = f"/repos/{REPO}/contents/.gitea/workflows/{wf_name}?ref={head_sha}"
code, payload = _api("GET", path)
if code != 200:
return False
raw = base64.b64decode(payload.get("content", "")).decode("utf-8")
wf = yaml.safe_load(raw)
on = wf.get(True) or wf.get("on") or {}
if isinstance(on, str):
if on != "pull_request_review":
return False
elif "pull_request_review" not in on:
return False
return True
def _find_suitable_pr() -> dict:
if LIVEFIRE_PR_NUMBER:
pr = _get_pr(int(LIVEFIRE_PR_NUMBER))
if pr.get("state") != "open":
pytest.skip(f"PR {LIVEFIRE_PR_NUMBER} is not open")
return pr
prs = _list_open_prs()
for pr in prs:
if _pr_has_trigger_in_head(pr):
return pr
pytest.skip("No open PR found whose head contains the pull_request_review trigger")
def _submit_approved_review(pr_number: int) -> dict:
code, review = _api(
"POST",
f"/repos/{REPO}/pulls/{pr_number}/reviews",
{"body": "Live-fire test APPROVED review", "event": "APPROVED"},
)
# 200 = created, 422 = review already exists (idempotent enough for our purposes)
if code not in (200, 201, 422):
pytest.fail(f"POST /pulls/{pr_number}/reviews returned HTTP {code}")
return review
def _get_status_snapshot(sha: str) -> dict[str, dict]:
"""Return mapping context -> {id, updated_at, target_url} for required contexts."""
code, statuses = _api("GET", f"/repos/{REPO}/statuses/{sha}?limit=100")
if code != 200:
return {}
result: dict[str, dict] = {}
for st in statuses:
ctx = st.get("context", "")
if ctx in REQUIRED_CONTEXTS:
result[ctx] = {
"id": st.get("id"),
"updated_at": st.get("updated_at", st.get("created_at", "")),
"target_url": st.get("target_url"),
}
return result
def _extract_run_id(target_url: str | None) -> str | None:
"""Extract the Actions run_id from a status target_url."""
if not target_url:
return None
m = re.search(r"/actions/runs/(\d+)", target_url)
return m.group(1) if m else None
def _poll_fresh_statuses(
sha: str,
prior_snapshot: dict[str, dict],
timeout_sec: int = LIVEFIRE_TIMEOUT_SEC,
) -> dict[str, dict]:
"""Poll until required contexts appear fresh (newer timestamp, id, or run)."""
deadline = time.monotonic() + timeout_sec
found: dict[str, dict] = {}
while time.monotonic() < deadline:
code, statuses = _api("GET", f"/repos/{REPO}/statuses/{sha}?limit=100")
if code == 200:
for st in statuses:
ctx = st.get("context", "")
if ctx in REQUIRED_CONTEXTS:
updated_at = st.get("updated_at", st.get("created_at", ""))
status_id = st.get("id")
target_url = st.get("target_url")
prior = prior_snapshot.get(ctx, {})
# Fresh if timestamp changed, id changed, or target_url changed.
is_fresh = (
ctx not in prior_snapshot
or updated_at != prior.get("updated_at", "")
or status_id != prior.get("id")
or target_url != prior.get("target_url")
)
if is_fresh:
found[ctx] = {
"state": st.get("state", st.get("status", "")),
"updated_at": updated_at,
"id": status_id,
"target_url": target_url,
}
if all(ctx in found for ctx in REQUIRED_CONTEXTS):
return found
time.sleep(5)
return found
@skip_no_token
class TestGateAutoFireLive:
def test_auto_fire_posts_required_contexts(self):
"""Submit APPROVED review; assert BP-required contexts appear fresh within timeout."""
pr = _find_suitable_pr()
pr_number = pr["number"]
head_sha = pr["head"]["sha"]
# Capture pre-existing status snapshot so we can prove FRESH contexts
# were posted after the review submission (not stale from a prior run).
prior_snapshot = _get_status_snapshot(head_sha)
prior_run_ids = {
_extract_run_id(s["target_url"])
for s in prior_snapshot.values()
if _extract_run_id(s["target_url"])
}
review = _submit_approved_review(pr_number)
found = _poll_fresh_statuses(head_sha, prior_snapshot)
missing = [ctx for ctx in REQUIRED_CONTEXTS if ctx not in found]
if missing:
pytest.fail(
f"After {LIVEFIRE_TIMEOUT_SEC}s, fresh contexts still missing: {missing}. "
f"Found: {found}. Prior snapshot: {prior_snapshot}. "
f"PR #{pr_number} head={head_sha}. "
f"This indicates the pull_request_review trigger did not fire at runtime."
)
# The contexts appeared fresh — that's the proof of auto-fire.
# We do NOT assert success vs failure; the evaluator decides that.
# The point of #2159 is that the workflows QUEUE and POST at all.
for ctx, info in found.items():
state = info["state"]
assert state in ("pending", "success", "failure"), (
f"Unexpected state {state!r} for {ctx}"
)
# CR2 Finding 1: prove a NEW workflow run was triggered, not just
# an in-place status update. Gitea 1.22.6 lacks REST /actions/runs/*
# endpoints, so we use the run_id embedded in the status target_url
# as a proxy for distinct run_id.
run_id = _extract_run_id(info.get("target_url"))
if run_id and run_id in prior_run_ids:
pytest.fail(
f"Context {ctx!r} has target_url run_id {run_id} which existed "
f"BEFORE the review was submitted. This means the status was "
f"updated in-place by an existing run, not by a new workflow "
f"run triggered from the pull_request_review event."
)
@@ -1,168 +0,0 @@
"""Regression test #765 — gate auto-fire on real qa/security APPROVED review.
Validates the structural configuration of qa-review.yml and security-review.yml
so that a real team-member APPROVED review fires the workflow and POSTs the
exact branch-protection-required context name. This is the test #2020's
stale-context failure would have caught.
"""
from pathlib import Path
import yaml
ROOT = Path(__file__).resolve().parents[2]
def load_workflow(name: str) -> dict:
with (ROOT / "workflows" / name).open() as f:
return yaml.safe_load(f)
def _job_guard_string(workflow: dict) -> str:
"""Return the raw job-level `if:` string for the single job."""
jobs = workflow["jobs"]
# Both qa-review and security-review have exactly one job named "approved".
job = jobs["approved"]
return str(job.get("if", ""))
def _post_step(workflow: dict) -> dict:
"""Return the explicit POST /statuses step from the job steps list."""
jobs = workflow["jobs"]
steps = jobs["approved"]["steps"]
for step in steps:
name = step.get("name", "")
if "Post required status context" in name:
return step
raise AssertionError("No explicit POST status step found")
class TestQaReviewDirectTrigger:
def test_trigger_is_pull_request_review_submitted(self):
wf = load_workflow("qa-review.yml")
# PyYAML parses bare 'on' as boolean True.
on = wf[True]
assert "pull_request_review" in on, (
"qa-review must trigger on pull_request_review"
)
types = on["pull_request_review"].get("types", [])
assert "submitted" in types, (
"pull_request_review must include 'submitted' type"
)
def test_job_guard_requires_approved_state(self):
wf = load_workflow("qa-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
def test_post_step_uses_status_post_token(self):
wf = load_workflow("qa-review.yml")
post = _post_step(wf)
env = post.get("env", {})
assert env.get("GITEA_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"POST step must use STATUS_POST_TOKEN for write-scoped status POST"
)
def test_post_step_context_name_exact(self):
"""The context POSTed must byte-match the branch-protection requirement."""
wf = load_workflow("qa-review.yml")
post = _post_step(wf)
run = post.get("run", "")
assert '"qa-review / approved (pull_request_target)"' in run, (
"POST step must emit exact BP-required context name"
)
class TestSecurityReviewDirectTrigger:
def test_trigger_is_pull_request_review_submitted(self):
wf = load_workflow("security-review.yml")
# PyYAML parses bare 'on' as boolean True.
on = wf[True]
assert "pull_request_review" in on, (
"security-review must trigger on pull_request_review"
)
types = on["pull_request_review"].get("types", [])
assert "submitted" in types, (
"pull_request_review must include 'submitted' type"
)
def test_job_guard_requires_approved_state(self):
wf = load_workflow("security-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
def test_post_step_uses_status_post_token(self):
wf = load_workflow("security-review.yml")
post = _post_step(wf)
env = post.get("env", {})
assert env.get("GITEA_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"POST step must use STATUS_POST_TOKEN for write-scoped status POST"
)
def test_post_step_context_name_exact(self):
"""The context POSTed must byte-match the branch-protection requirement."""
wf = load_workflow("security-review.yml")
post = _post_step(wf)
run = post.get("run", "")
assert '"security-review / approved (pull_request_target)"' in run, (
"POST step must emit exact BP-required context name"
)
class TestRefireScriptContextName:
"""review-refire-status.sh must emit the BP-required (pull_request_target) context."""
def test_refire_script_context_is_pull_request_target(self):
script = ROOT / "scripts" / "review-refire-status.sh"
content = script.read_text()
assert 'CONTEXT="${TEAM}-review / approved (pull_request_target)"' in content, (
"refire script CONTEXT must be the exact BP-required (pull_request_target) variant"
)
assert 'approved (pull_request)"' not in content, (
"refire script must NOT post bare (pull_request) context"
)
class TestRefireTokenSeparation:
"""The /qa-recheck + /security-recheck backstop must also use STATUS_POST_TOKEN."""
def _refire_step(self, workflow_name: str, step_name_keyword: str) -> dict:
wf = load_workflow(workflow_name)
jobs = wf["jobs"]
steps = jobs["review-refire"]["steps"]
for step in steps:
name = step.get("name", "")
if step_name_keyword in name:
return step
raise AssertionError(f"No refire step matching {step_name_keyword!r}")
def test_qa_refire_uses_status_post_token(self):
step = self._refire_step("sop-checklist.yml", "Refire qa-review")
env = step.get("env", {})
assert env.get("STATUS_POST_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"qa refire must receive STATUS_POST_TOKEN env var"
)
# Evaluator stays on read token
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"qa refire evaluator must stay on read-scoped token"
)
def test_security_refire_uses_status_post_token(self):
step = self._refire_step("sop-checklist.yml", "Refire security-review")
env = step.get("env", {})
assert env.get("STATUS_POST_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"security refire must receive STATUS_POST_TOKEN env var"
)
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"security refire evaluator must stay on read-scoped token"
)
@@ -1,145 +0,0 @@
"""Stale-head diagnostic test for #2159.
Deterministically reports whether a PR's HEAD contains the pull_request_review
trigger in qa-review.yml and security-review.yml. If the trigger is absent,
auto-fire on APPROVED review is impossible for that PR.
This is used as a self-diagnostic for future stale-PR situations (PRs opened
before #2157 merged, or branches cut from old bases).
Environment:
GITEA_HOST — default: git.moleculesai.app
GITEA_TOKEN — token with read:repository scope (optional; falls back to local files)
REPO — default: molecule-ai/molecule-core
PR_NUMBER — required when running against a real PR
"""
import base64
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
import pytest
import yaml
GITEA_HOST = os.environ.get("GITEA_HOST", "git.moleculesai.app")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
REPO = os.environ.get("REPO", "molecule-ai/molecule-core")
PR_NUMBER = os.environ.get("PR_NUMBER", "")
ROOT = Path(__file__).resolve().parents[2]
def _api(method: str, path: str) -> tuple[int, dict]:
url = f"https://{GITEA_HOST}/api/v1{path}"
headers = {"Authorization": f"token {GITEA_TOKEN}"}
req = urllib.request.Request(url, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
return resp.status, json.loads(resp.read())
except urllib.error.HTTPError as exc:
body = exc.read()
return exc.code, json.loads(body) if body else {}
def _fetch_workflow_from_ref(workflow_name: str, ref: str) -> dict:
path = f"/repos/{REPO}/contents/.gitea/workflows/{workflow_name}?ref={ref}"
code, payload = _api("GET", path)
if code != 200:
pytest.fail(
f"GET {path} returned HTTP {code}: {payload}. "
f"Cannot determine whether PR head contains the trigger."
)
raw = base64.b64decode(payload.get("content", "")).decode("utf-8")
return yaml.safe_load(raw)
def _fetch_workflow_local(workflow_name: str) -> dict:
p = ROOT / "workflows" / workflow_name
if not p.exists():
pytest.fail(f"Local workflow file not found: {p}")
return yaml.safe_load(p.read_text())
def _has_pull_request_review_trigger(wf: dict) -> bool:
on = wf.get(True) or wf.get("on") or {}
if isinstance(on, list):
return "pull_request_review" in on
if isinstance(on, dict):
return "pull_request_review" in on
if isinstance(on, str):
return on == "pull_request_review"
return False
def _diagnose_pr(pr_number: int) -> dict[str, bool]:
code, pr = _api("GET", f"/repos/{REPO}/pulls/{pr_number}")
if code != 200:
pytest.fail(f"GET /pulls/{pr_number} returned HTTP {code}: {pr}")
head_ref = pr["head"]["ref"]
head_sha = pr["head"]["sha"]
results: dict[str, bool] = {}
for wf_name in ("qa-review.yml", "security-review.yml"):
wf = _fetch_workflow_from_ref(wf_name, head_sha)
results[wf_name] = _has_pull_request_review_trigger(wf)
return {
"pr_number": pr_number,
"head_ref": head_ref,
"head_sha": head_sha,
"triggers": results,
"auto_fire_possible": all(results.values()),
}
def _diagnose_local() -> dict[str, bool]:
results: dict[str, bool] = {}
for wf_name in ("qa-review.yml", "security-review.yml"):
wf = _fetch_workflow_local(wf_name)
results[wf_name] = _has_pull_request_review_trigger(wf)
return {
"pr_number": None,
"head_ref": "local-checkout",
"head_sha": None,
"triggers": results,
"auto_fire_possible": all(results.values()),
}
class TestStaleHeadDiagnostic:
"""Test deterministically reports 'auto-fire impossible for this PR' when
the PR head lacks the pull_request_review trigger.
"""
def test_local_checkout_has_pull_request_review_trigger(self):
"""Local files (the ones in this checkout) must contain the trigger.
This is the baseline: if the checkout itself is stale, every PR cut
from it will also be stale.
"""
diag = _diagnose_local()
missing = [n for n, ok in diag["triggers"].items() if not ok]
if missing:
pytest.fail(
f"Local checkout is missing pull_request_review trigger in: {missing}. "
f"This branch cannot produce PRs that auto-fire."
)
@pytest.mark.skipif(not GITEA_TOKEN, reason="GITEA_TOKEN not set")
@pytest.mark.skipif(not PR_NUMBER, reason="PR_NUMBER not set")
def test_pr_head_has_pull_request_review_trigger(self):
"""When PR_NUMBER is given, assert the PR head contains the trigger."""
diag = _diagnose_pr(int(PR_NUMBER))
if not diag["auto_fire_possible"]:
missing = [n for n, ok in diag["triggers"].items() if not ok]
pytest.fail(
f"Auto-fire impossible for PR #{diag['pr_number']}. "
f"Head ref={diag['head_ref']} sha={diag['head_sha']}. "
f"Missing trigger in: {missing}. "
f"This PR needs /qa-recheck + /security-recheck fallback, or a rebase onto current main."
)
@@ -355,260 +355,3 @@ def test_rollout_from_plan_file_writes_partial_response_on_failure(tmp_path):
assert response_path.read_text(encoding="utf-8").strip()
assert '"ok": false' in response_path.read_text(encoding="utf-8")
assert '"slug": "hongming"' in response_path.read_text(encoding="utf-8")
# ──────────────────────────────────────────────────────────────────────
# No-silent-skip coverage gate (internal#724)
# ──────────────────────────────────────────────────────────────────────
def test_rollout_stragglers_flags_tenant_not_on_target():
# b SSM-succeeded but its container is on the old tag → straggler.
stragglers = prod.rollout_stragglers(
["a", "b", "c"],
[
{"slug": "a", "verified_on_target": True},
{"slug": "b", "verified_on_target": False, "running_image": "platform-tenant:staging-old"},
{"slug": "c", "verified_on_target": True},
],
)
assert stragglers == ["b"]
def test_rollout_stragglers_flags_enumerated_tenant_with_no_result():
# agents-team class: enumerated but no batch ever produced a row for it.
stragglers = prod.rollout_stragglers(
["a", "agents-team"],
[{"slug": "a", "verified_on_target": True}],
)
assert stragglers == ["agents-team"]
def test_rollout_stragglers_missing_key_is_backward_compatible():
# Older CP without verified_on_target → treat as verified (no spurious fail).
stragglers = prod.rollout_stragglers(
["a", "b"],
[{"slug": "a", "healthz_ok": True}, {"slug": "b", "healthz_ok": True}],
)
assert stragglers == []
def test_rollout_stragglers_ignores_dry_run_rows():
stragglers = prod.rollout_stragglers(
["a"], [{"slug": "a", "ssm_status": "DryRun"}]
)
# dry-run row is skipped, so "a" has no verifying row → straggler.
assert stragglers == ["a"]
def test_scoped_rollout_fails_when_a_tenant_stays_on_old_tag():
# Every per-tenant call returns ok=True, but agents-team is NOT
# verified_on_target. The rollout must still fail loudly — this is
# the exact "reported success, one tenant silently skipped" bug.
def fake_redeploy(_cp_url, _token, body):
rows = []
for slug in body["only_slugs"]:
rows.append({"slug": slug, "verified_on_target": slug != "agents-team"})
return 200, {"ok": True, "results": rows}
try:
prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
except prod.RolloutFailed as exc:
assert "incomplete rollout" in str(exc)
assert exc.response["stragglers"] == ["agents-team"]
assert exc.response["ok"] is False
else:
raise AssertionError("expected an incomplete rollout to fail loudly")
def test_scoped_rollout_passes_when_all_tenants_verified_on_target():
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [{"slug": s, "verified_on_target": True} for s in body["only_slugs"]],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
assert aggregate["ok"] is True
assert "stragglers" not in aggregate
def test_scoped_rollout_dry_run_does_not_assert_coverage():
# A dry run proves nothing landed; coverage must NOT be asserted or
# every plan would fail.
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [{"slug": s, "ssm_status": "DryRun"} for s in body["only_slugs"]],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": True,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["a", "b"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
assert aggregate["ok"] is True
# --- Superseded-deploy guard (false-stale fix) -----------------------------
#
# Scenario this fixes: no `concurrency:` on the prod-deploy workflow means two
# close main pushes run BOTH deploy-production jobs. eb31bcf (Fix A) and 286338
# (Fix C) merge back-to-back; the 286338 job rolls the fleet to staging-2863380
# first; the OLDER eb31bcf job's strict verify then sees tenants on 2863380 and
# false-reds "stale" though the fleet is AHEAD. superseded_by detects that main's
# head is no longer eb31bcf and lets the older job succeed without weakening the
# behind-tenant signal for whichever job IS the latest.
def test_superseded_by_returns_newer_head_when_main_moved_ahead(monkeypatch):
# eb31bcf job: main head is now 2863380 -> superseded, return the newer head.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
newer = prod.superseded_by({"GITHUB_SHA": "eb31bcffullhash"})
assert newer == "2863380fullhash"
def test_superseded_by_none_when_this_job_is_still_head(monkeypatch):
# 2863380 job (the latest): head == our SHA -> NOT superseded -> strict verify
# runs, so a genuinely-behind tenant still fails loudly.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
def test_superseded_by_matches_on_short_vs_full_sha_prefix(monkeypatch):
# GITHUB_SHA is full; Gitea may return a different-length id. Equal prefixes
# must NOT count as superseded (avoid false-skipping the real latest job).
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380FULLHASH")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
def test_superseded_by_fail_safe_returns_none_when_head_unreadable(monkeypatch):
# Fail-safe: unreadable head (no token / API error) must NOT be treated as
# superseded, so the strict verify still runs and never silently greens.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: None)
assert prod.superseded_by({"GITHUB_SHA": "eb31bcffullhash"}) is None
def test_superseded_by_none_without_github_sha(monkeypatch):
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
assert prod.superseded_by({}) is None
def test_current_branch_head_parses_gitea_branch_commit_id(monkeypatch):
captured = {}
def fake_optional(url, _token):
captured["url"] = url
return 200, {"name": "main", "commit": {"id": "2863380fullhash"}}
monkeypatch.setattr(prod, "_api_json_optional", fake_optional)
head = prod.current_branch_head(
{"GITEA_TOKEN": "secret", "GITHUB_REPOSITORY": "molecule-ai/molecule-core"}
)
assert head == "2863380fullhash"
assert captured["url"].endswith("/repos/molecule-ai/molecule-core/branches/main")
def test_current_branch_head_uses_ref_name_branch(monkeypatch):
captured = {}
def fake_optional(url, _token):
captured["url"] = url
return 200, {"commit": {"sha": "deadbeef"}}
monkeypatch.setattr(prod, "_api_json_optional", fake_optional)
head = prod.current_branch_head(
{"GITEA_TOKEN": "secret", "GITHUB_REF_NAME": "release"}
)
assert head == "deadbeef"
assert captured["url"].endswith("/branches/release")
def test_current_branch_head_none_without_token():
assert prod.current_branch_head({}) is None
def test_current_branch_head_none_on_non_200(monkeypatch):
monkeypatch.setattr(prod, "_api_json_optional", lambda _u, _t: (500, None))
assert prod.current_branch_head({"GITEA_TOKEN": "secret"}) is None
# --- #2213: superseded check must fire BEFORE production side effects ----------
#
# Real incident shape: two main pushes land ~2 min apart. The OLDER deploy job
# (GITHUB_SHA=7a72516, target staging-7a72516) started LATE — main head was
# already 7f25373. The #2194 guard only protected the *verify* step, so the
# older job still:
# 1. rolled the canary (hongming) BACKWARD to staging-7a72516 (the #2213 red,
# seen as the newer job's verify reading hongming on the old SHA), then
# 2. promoted :latest backward to the older image,
# before finally skipping verify. The workflow now calls this same superseded
# check BEFORE the redeploy + promote steps and gates both off when it fires.
# These tests pin the contract that check-superseded relies on for the exact
# incident shape.
def test_superseded_by_fires_for_older_job_when_newer_already_head(monkeypatch):
# Older job (7a72516) re-checks the head just before rollout and finds the
# newer merge (7f25373) already owns main -> superseded -> skip side effects.
monkeypatch.setattr(
prod, "current_branch_head", lambda _env: "7f25373309eca54a36f08c371ff783c3a47c3f8d"
)
newer = prod.superseded_by(
{"GITHUB_SHA": "7a72516f7e7ba1a710c4f393fef08be8d22e1866"}
)
assert newer == "7f25373309eca54a36f08c371ff783c3a47c3f8d"
def test_superseded_by_none_for_latest_job_so_it_still_rolls(monkeypatch):
# The newer job (7f25373) IS the head -> NOT superseded -> it proceeds to
# roll the fleet and verify, so a genuinely-behind tenant still fails loud.
monkeypatch.setattr(
prod, "current_branch_head", lambda _env: "7f25373309eca54a36f08c371ff783c3a47c3f8d"
)
assert (
prod.superseded_by(
{"GITHUB_SHA": "7f25373309eca54a36f08c371ff783c3a47c3f8d"}
)
is None
)
+2 -23
View File
@@ -205,8 +205,6 @@ chmod +x "$FIXTURE_DIR/bin/curl"
# Helper: run the script with fixture environment
run_review_check() {
local scenario="$1"
local team="${2:-qa}"
local team_id="${3:-20}"
echo "$scenario" >"$FIX_STATE_DIR/scenario"
local out
set +e
@@ -217,8 +215,8 @@ run_review_check() {
REPO="molecule-ai/molecule-core" \
PR_NUMBER="999" \
DEFAULT_BRANCH="main" \
TEAM="$team" \
TEAM_ID="$team_id" \
TEAM="qa" \
TEAM_ID="20" \
REVIEW_CHECK_DEBUG="0" \
REVIEW_CHECK_STRICT="0" \
bash "$SCRIPT" 2>&1
@@ -374,25 +372,6 @@ assert_eq "T18 exit code 0 (comment approval still considered)" "0" "$T18_RC"
assert_contains "T18 comment candidate notice" "comment-based approval" "$T18_OUT"
assert_contains "T18 comment approver accepted" "APPROVED by core-qa-agent" "$T18_OUT"
# T19 — ai-sop-ack member APPROVED review must NOT count toward qa-review
# or security-review (R1 hardening refinement, msg 1388c76f).
echo
echo "== T19 ai-sop-ack APPROVED review excluded from qa-review gate =="
T19_OUT=$(run_review_check "T19_ai_sop_ack_approved" "qa" "20")
T19_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T19 exit code 1 (ai-sop-ack not in qa team)" "1" "$T19_RC"
assert_contains "T19 ai-reviewer excluded from qa" "candidates: ai-reviewer" "$T19_OUT"
assert_contains "T19 none are in qa team" "none are in team" "$T19_OUT"
# T20 — same ai-sop-ack member must also be excluded from security-review gate.
echo
echo "== T20 ai-sop-ack APPROVED review excluded from security-review gate =="
T20_OUT=$(run_review_check "T19_ai_sop_ack_approved" "security" "21")
T20_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T20 exit code 1 (ai-sop-ack not in security team)" "1" "$T20_RC"
assert_contains "T20 ai-reviewer excluded from security" "candidates: ai-reviewer" "$T20_OUT"
assert_contains "T20 none are in security team" "none are in team" "$T20_OUT"
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL"
-401
View File
@@ -1003,404 +1003,3 @@ class TestComputeNaStateAcceptsGateNotInItems(unittest.TestCase):
comments, "alice", na_gates, lambda *_: ["alice"]
)
self.assertFalse(na_state["security-review"]["declared"])
# ---------------------------------------------------------------------------
# internal#760 ceremony — ai-sop-ack team + ai_ack_eligible per-item flag
# ---------------------------------------------------------------------------
class TestAIAckEligibleConfig(unittest.TestCase):
"""CTO-controlled allowlist (msg 1388c76f):
ai_ack_eligible: comprehensive-testing, local-postgres-e2e, staging-smoke,
five-axis-review, memory-consulted
human-only: root-cause, no-backwards-compat
"""
def test_ai_ack_eligible_items(self):
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
eligible = {
"comprehensive-testing",
"local-postgres-e2e",
"staging-smoke",
"five-axis-review",
"memory-consulted",
}
for slug in eligible:
self.assertTrue(
items_by_slug[slug].get("ai_ack_eligible"),
f"{slug} must be ai_ack_eligible",
)
def test_human_only_items(self):
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
human_only = {"root-cause", "no-backwards-compat"}
for slug in human_only:
self.assertFalse(
items_by_slug[slug].get("ai_ack_eligible", False),
f"{slug} must NOT be ai_ack_eligible (human-only)",
)
def test_testing_class_slugs_constant(self):
"""_TESTING_CLASS_SLUGS must match the three testing items."""
self.assertEqual(
sop._TESTING_CLASS_SLUGS,
{"comprehensive-testing", "local-postgres-e2e", "staging-smoke"},
)
def test_human_only_slugs_constant(self):
"""_HUMAN_ONLY_SLUGS encodes the migration/schema carve-out.
If this set changes, the CTO must approve the widening.
"""
self.assertEqual(
sop._HUMAN_ONLY_SLUGS,
{"root-cause", "no-backwards-compat", "migration", "schema"},
)
def test_human_only_invariant_enforced_in_code_and_config(self):
"""Every config-present slug in _HUMAN_ONLY_SLUGS must be human-only.
This test fails if a migration/schema-class item accidentally
acquires ai_ack_eligible via config drift. migration/schema are
future-proofing slugs not yet in the live config; they are checked
by the production probe closure but skipped here.
"""
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
for slug in sop._HUMAN_ONLY_SLUGS:
if slug not in items_by_slug:
# Future-proofing slug (e.g. migration, schema) — not yet
# in config, but the code guard still rejects AI acks.
continue
self.assertFalse(
items_by_slug[slug].get("ai_ack_eligible", False),
f"{slug} is in _HUMAN_ONLY_SLUGS and must NEVER be ai_ack_eligible",
)
class TestAIAckEligibilityProbe(unittest.TestCase):
"""The probe closure in main() delegates to compute_ack_state.
We simulate the AI-ack path by injecting a probe that behaves like
the production probe (human team first, then ai-sop-ack fallback).
"""
def setUp(self):
self.items = _items_by_slug()
self.aliases = _numeric_aliases()
def _probe_human_then_ai(self, human_users, ai_users):
"""Return users in human_users immediately; users in ai_users only
if the item is ai_ack_eligible."""
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u in human_users:
approved.append(u)
elif u in ai_users and item.get("ai_ack_eligible"):
approved.append(u)
return approved
return probe
def test_ai_ack_passes_for_eligible_item(self):
comments = [_comment("ai-bot", "/sop-ack five-axis-review")]
probe = self._probe_human_then_ai(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["five-axis-review"]["ackers"], ["ai-bot"])
def test_ai_ack_rejected_for_human_only_item(self):
comments = [_comment("ai-bot", "/sop-ack root-cause")]
probe = self._probe_human_then_ai(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["root-cause"]["ackers"], [])
self.assertIn("ai-bot", state["root-cause"]["rejected"]["not_in_team"])
def test_human_ack_still_works_for_ai_eligible_item(self):
comments = [_comment("bob", "/sop-ack comprehensive-testing")]
probe = self._probe_human_then_ai(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], ["bob"])
def test_ai_ack_rejected_for_testing_item_when_ci_red(self):
# Simulate the production probe that checks CI status for testing items.
# When CI is not green, ai-sop-ack member is rejected.
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u == "ai-bot" and item.get("ai_ack_eligible"):
# Testing items require CI green; simulate CI red.
if slug in sop._TESTING_CLASS_SLUGS:
continue # rejected: CI not green
approved.append(u)
return approved
comments = [_comment("ai-bot", "/sop-ack comprehensive-testing")]
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], [])
def test_ai_ack_passes_for_testing_item_when_ci_green(self):
# Simulate CI green → AI ack passes.
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u == "ai-bot" and item.get("ai_ack_eligible"):
if slug in sop._TESTING_CLASS_SLUGS:
# CI is green → allow
pass
approved.append(u)
return approved
comments = [_comment("ai-bot", "/sop-ack comprehensive-testing")]
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], ["ai-bot"])
class TestAIAckHumanOnlyMigrationSchema(unittest.TestCase):
"""RC 8322: migration and schema items are human-only regardless of
any future config that might accidentally mark them ai_ack_eligible.
These slugs are not yet in the live config items list; the tests use
synthetic items so the production guard can be exercised directly.
"""
def setUp(self):
# Synthetic items — if live config ever adds migration/schema,
# they MUST stay human-only. The probe below mirrors the actual
# production closure logic (human team first, then AI fallback
# with _HUMAN_ONLY_SLUGS guard).
self.items = {
"migration": {
"slug": "migration",
"ai_ack_eligible": True,
"required_teams": ["engineers"],
},
"schema": {
"slug": "schema",
"ai_ack_eligible": True,
"required_teams": ["engineers"],
},
}
self.aliases = {}
def _production_like_probe(self, human_users, ai_users):
"""Return a probe that mirrors the production closure's guard."""
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u in human_users:
approved.append(u)
elif u in ai_users:
# Production guard: _HUMAN_ONLY_SLUGS rejects AI acks
# regardless of the ai_ack_eligible flag.
if slug in sop._HUMAN_ONLY_SLUGS:
continue
if item.get("ai_ack_eligible"):
approved.append(u)
return approved
return probe
def test_ai_ack_rejected_for_migration(self):
comments = [_comment("ai-bot", "/sop-ack migration")]
probe = self._production_like_probe(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["migration"]["ackers"], [])
self.assertIn("ai-bot", state["migration"]["rejected"]["not_in_team"])
def test_ai_ack_rejected_for_schema(self):
comments = [_comment("ai-bot", "/sop-ack schema")]
probe = self._production_like_probe(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["schema"]["ackers"], [])
self.assertIn("ai-bot", state["schema"]["rejected"]["not_in_team"])
def test_human_ack_still_works_for_migration(self):
# Human team member acking migration/schema is unaffected.
comments = [_comment("bob", "/sop-ack migration")]
probe = self._production_like_probe(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["migration"]["ackers"], ["bob"])
def test_human_ack_still_works_for_schema(self):
comments = [_comment("bob", "/sop-ack schema")]
probe = self._production_like_probe(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["schema"]["ackers"], ["bob"])
class TestGetCIStatus(unittest.TestCase):
"""Verify get_ci_status reads the correct context from commit statuses."""
def _client_with_statuses(self, statuses):
client = sop.GiteaClient("git.example.com", "tok")
def fake_req(method, path, body=None, ok_codes=(200, 201, 204)):
return 200, statuses
client._req = fake_req # type: ignore[method-assign]
return client
def test_ci_green_returns_success(self):
client = self._client_with_statuses([
{"context": "CI / all-required (pull_request)", "state": "success"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "success"
)
def test_ci_red_returns_failure(self):
client = self._client_with_statuses([
{"context": "CI / all-required (pull_request)", "state": "failure"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "failure"
)
def test_missing_context_returns_missing(self):
client = self._client_with_statuses([
{"context": "some-other-context", "state": "success"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "missing"
)
def test_api_error_returns_unknown(self):
client = sop.GiteaClient("git.example.com", "tok")
def fake_req(method, path, body=None, ok_codes=(200, 201, 204)):
return 500, {"error": "boom"}
client._req = fake_req # type: ignore[method-assign]
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "unknown"
)
# ---------------------------------------------------------------------------
# internal#818 — na-declarations status must be terminal success
# ---------------------------------------------------------------------------
class TestNaDeclarationsStatusTerminal(unittest.TestCase):
"""Regression for internal#818: the na-declarations context is
informational, not a merge gate. An empty N/A declaration list must
post `success` (not `pending`) so it does not poison the PR combined
status."""
def _run_with_fake_client(self, fake_client_class):
"""Swap GiteaClient temporarily and invoke main() with a fake token."""
orig_client = sop.GiteaClient
orig_token = os.environ.get("GITEA_TOKEN")
try:
sop.GiteaClient = fake_client_class
os.environ["GITEA_TOKEN"] = "fake-token"
return sop.main([
"--owner", "o", "--repo", "r", "--pr", "1",
"--config", CONFIG_PATH,
"--gitea-host", "git.example.com",
])
finally:
sop.GiteaClient = orig_client
if orig_token is None:
os.environ.pop("GITEA_TOKEN", None)
else:
os.environ["GITEA_TOKEN"] = orig_token
def test_empty_na_descriptions_posts_success(self):
posted = []
class FakeClient(sop.GiteaClient):
def get_pr(self, owner, repo, pr):
return {
"state": "open",
"user": {"login": "alice"},
"head": {"sha": "abc123"},
"labels": [],
}
def get_issue_comments(self, owner, repo, issue, max_comments=None):
return []
def resolve_team_id(self, org, team_name):
return None
def is_team_member(self, team_id, login):
return False
def post_status(self, owner, repo, sha, state, context,
description, target_url=""):
posted.append({
"state": state,
"context": context,
"description": description,
})
rc = self._run_with_fake_client(FakeClient)
self.assertEqual(rc, 0)
na_posts = [p for p in posted if "na-declarations" in p["context"]]
self.assertEqual(len(na_posts), 1, f"expected one na-declarations post, got {posted}")
self.assertEqual(na_posts[0]["state"], "success")
self.assertEqual(na_posts[0]["description"], "N/A: (none)")
def test_populated_na_descriptions_posts_success(self):
posted = []
class FakeClient(sop.GiteaClient):
def get_pr(self, owner, repo, pr):
return {
"state": "open",
"user": {"login": "alice"},
"head": {"sha": "abc123"},
"labels": [],
}
def get_issue_comments(self, owner, repo, issue, max_comments=None):
return [
{"user": {"login": "bob"}, "body": "/sop-n/a qa-review N/A: docs-only"},
]
def resolve_team_id(self, org, team_name):
return 1
def is_team_member(self, team_id, login):
return True
def post_status(self, owner, repo, sha, state, context,
description, target_url=""):
posted.append({
"state": state,
"context": context,
"description": description,
})
rc = self._run_with_fake_client(FakeClient)
self.assertEqual(rc, 0)
na_posts = [p for p in posted if "na-declarations" in p["context"]]
self.assertEqual(len(na_posts), 1)
self.assertEqual(na_posts[0]["state"], "success")
self.assertIn("qa-review", na_posts[0]["description"])
@@ -1,66 +0,0 @@
#!/usr/bin/env bash
# Regression test for internal#816 — sop-tier-check must ignore APPROVED
# reviews that were submitted against an old PR head SHA.
#
# Bug: the script collected approvers with
# jq '[.[] | select(.state=="APPROVED") | .user.login]'
# without filtering on .commit_id == HEAD_SHA. After a PR head moved,
# stale approvals looked valid to the tier gate.
#
# Fix: the jq filter now includes
# select(.state=="APPROVED" and .commit_id == $head_sha)
# where $head_sha is the current PR head fetched from the API.
set -euo pipefail
# jq may not be on PATH in all environments (e.g. dev containers).
PATH="/tmp/bin:$PATH"
command -v jq >/dev/null 2>&1 || { echo "::error::jq required but not found"; exit 1; }
PASS=0
FAIL=0
assert_eq() {
local label="$1"
local expected="$2"
local got="$3"
if [ "$expected" = "$got" ]; then
echo " PASS $label"
PASS=$((PASS + 1))
else
echo " FAIL $label"
echo " expected: <$expected>"
echo " got: <$got>"
FAIL=$((FAIL + 1))
fi
}
# Sample reviews matching the shape from Gitea API
REVIEWS_JSON='[
{"state":"APPROVED","commit_id":"abc123","user":{"login":"bob"}},
{"state":"APPROVED","commit_id":"old456","user":{"login":"alice"}},
{"state":"COMMENT","commit_id":"abc123","user":{"login":"carol"}},
{"state":"APPROVED","commit_id":"abc123","user":{"login":"dave"}},
{"state":"REQUEST_CHANGES","commit_id":"abc123","user":{"login":"eve"}}
]'
echo "test: jq filter keeps only APPROVED on current head"
GOT=$(echo "$REVIEWS_JSON" | jq -r --arg head_sha "abc123" \
'[.[] | select(.state=="APPROVED" and .commit_id == $head_sha) | .user.login] | unique | .[]')
assert_eq "current-head approvers" "bob dave" "$(echo "$GOT" | tr '\n' ' ' | sed 's/ $//')"
echo "test: jq filter with all-stale reviews yields empty"
GOT=$(echo "$REVIEWS_JSON" | jq -r --arg head_sha "new789" \
'[.[] | select(.state=="APPROVED" and .commit_id == $head_sha) | .user.login] | unique | .[]')
assert_eq "all-stale yields empty" "" "$GOT"
echo "test: jq filter handles null commit_id gracefully"
NULL_JSON='[{"state":"APPROVED","commit_id":null,"user":{"login":"mallory"}}]'
GOT=$(echo "$NULL_JSON" | jq -r --arg head_sha "abc123" \
'[.[] | select(.state=="APPROVED" and .commit_id == $head_sha) | .user.login] | unique | .[]')
assert_eq "null commit_id excluded" "" "$GOT"
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL"
[ "$FAIL" -eq 0 ]
+1 -29
View File
@@ -32,26 +32,6 @@
# AUTHOR SELF-ACK IS FORBIDDEN regardless of which team contains them
# — the gate script enforces commenter != PR author before checking
# team membership.
#
# AI-SOP-ACK TEAM (internal#760 ceremony design, CTO-approved):
# The `ai-sop-ack` team contains AI agent identities that can ack
# SOP-checklist items ON BEHALF OF automated evidence. An AI ack is
# only valid when:
# 1. the item has `ai_ack_eligible: true`
# 2. the item is NOT in the human-only carve-out (migration/schema)
# 3. for testing-class items, CI / all-required (pull_request) is
# green on the current head SHA
#
# AI acks NEVER count toward qa-review or security-review gates —
# those remain human-team-only (enforced by review-check.sh team
# probe against TEAM_ID 20/21).
#
# INITIAL ai_ack_eligible allowlist (CTO-controlled, msg 1388c76f):
# comprehensive-testing, local-postgres-e2e, staging-smoke,
# five-axis-review, memory-consulted
# HUMAN-ONLY carve-out:
# root-cause, no-backwards-compat
# Any widening requires an explicit config change reviewed by CTO.
version: 1
@@ -103,31 +83,25 @@ items:
numeric_alias: 1
pr_section_marker: "Comprehensive testing performed"
required_teams: [qa, engineers]
ai_ack_eligible: true
description: >-
What was tested, how, edge cases covered. Ack from any qa-team
member (or engineers fallback while qa is small). AI ack valid
only when CI / all-required (pull_request) is green.
member (or engineers fallback while qa is small).
- slug: local-postgres-e2e
numeric_alias: 2
pr_section_marker: "Local-postgres E2E run"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Link to local CI artifact, or "N/A: pure-frontend change". Ack
from any engineer who can verify the local DB test actually ran.
AI ack valid only when CI / all-required (pull_request) is green.
- slug: staging-smoke
numeric_alias: 3
pr_section_marker: "Staging-smoke verified or pending"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Link to canary run, or "scheduled post-merge". Ack from any
engineer (core-devops/infra-sre are members of engineers team).
AI ack valid only when CI / all-required (pull_request) is green.
- slug: root-cause
numeric_alias: 4
@@ -146,7 +120,6 @@ items:
numeric_alias: 5
pr_section_marker: "Five-Axis review walked"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Correctness / readability / architecture / security / performance.
Ack from any non-author engineer.
@@ -167,7 +140,6 @@ items:
numeric_alias: 7
pr_section_marker: "Memory/saved-feedback consulted"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
List of feedback memories applicable to this change. Ack from
any engineer who has the same memory access.
+6 -18
View File
@@ -47,25 +47,13 @@ jobs:
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
# Required-status-check contexts to evaluate at merge time.
# Branch-aware JSON dict: keys are protected branch names,
# values are arrays of context names that branch protection
# requires for that branch. Mirror this against branch
# protection (settings → branches → protected branch →
# required checks) for each branch listed here.
#
# Newline-separated. Mirror this against branch protection
# (settings → branches → protected branch → required checks).
# Declared here rather than fetched from /branch_protections
# because that endpoint requires admin write — sop-tier-bot is
# read-only by design (least-privilege).
REQUIRED_CHECKS_JSON: |
{
"main": [
"CI / all-required (pull_request)",
"E2E API Smoke Test / E2E API Smoke Test (pull_request)",
"Handlers Postgres Integration / Handlers Postgres Integration (pull_request)"
],
"staging": [
"CI / all-required (pull_request)",
"sop-checklist / all-items-acked (pull_request)"
]
}
REQUIRED_CHECKS: |
CI / all-required (pull_request)
E2E API Smoke Test / E2E API Smoke Test (pull_request)
Handlers Postgres Integration / Handlers Postgres Integration (pull_request)
run: bash .gitea/scripts/audit-force-merge.sh
@@ -42,9 +42,11 @@ jobs:
check:
name: Migration version collision check
runs-on: ubuntu-latest
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port.
# mc#1982 mask removed — no surfaced defects in this lane.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 5
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-1
View File
@@ -96,7 +96,6 @@ env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: advisory arm64 pilot, non-gating by design (internal#418).
fast-checks:
name: fast-checks
# AND-set: only the Mac arm64 runner advertises macos-self-hosted.
+1 -1
View File
@@ -57,7 +57,7 @@ permissions:
# can produce duplicate comments before the title-search dedup wins.
concurrency:
group: ci-required-drift
cancel-in-progress: false
cancel-in-progress: true
jobs:
drift:
+159 -147
View File
@@ -25,9 +25,10 @@
# sufficient for `actions/checkout` against this same repo.
#
# 4. Docs — no docs/scripts reference github.com URLs that need swapping.
# The canvas-deploy-status step (core#2226, formerly canvas-deploy-reminder)
# writes the canvas ordered-deploy status into the step summary; it points
# at the ECR canvas image and the publish workflow, no ghcr.io prose.
# The canvas-deploy-reminder step writes a `ghcr.io/...` image
# reference into the step summary text — that's documentation prose
# pointing at the ECR-mirrored canvas image and stays unchanged for
# this port (a separate cleanup if ghcr→ECR sweep is in scope).
#
# Cross-links:
# - RFC: internal#219 (CI/CD hard-gate hardening)
@@ -105,7 +106,7 @@ jobs:
name: Platform (Go)
needs: changes
runs-on: ubuntu-latest
# mc#1982 (closed 2026-05-14): Phase 4 flip of the platform-build job.
# mc#774 (closed 2026-05-14): Phase 4 flip of the platform-build job.
# Phase 4 (#656) originally flipped this to continue-on-error: false based on
# Phase-3-masked "green on main 2026-05-12". Two failure classes then surfaced:
# (1) 4x delegation_test.go sqlmock gaps (PR #669 / #634 fix-forward, closed).
@@ -356,33 +357,6 @@ jobs:
name: Run E2E bash unit tests (no live infra)
run: |
bash tests/e2e/test_model_slug.sh
# molecule-core#1995 (#1994 follow-on): fail-direction proof for
# the A2A real-completion + byok-routing assertion helpers
# (lib/completion_assert.sh). Offline (no LLM, no network): it
# asserts an error-as-text payload FAILS the real-completion gate
# — the exact trap the historical shape-only `"kind":"text"`
# check missed. If a refactor weakens the gate to a shape check,
# this step goes red on every PR.
bash tests/e2e/test_completion_assert_unit.sh
# harden/e2e-staging-saas-failclosed: fail-direction proof for the
# E2E_REQUIRE_LIVE fail-closed-on-skip guard in
# test_staging_full_saas.sh. Offline (no LLM/network/provisioning):
# asserts the guard exits 5 when a live lifecycle did NOT run and
# passes when all milestones fired — so a refactor that lets the
# staging gate report green without a real provision→online→A2A
# cycle goes red on every PR.
bash tests/e2e/test_require_live_guard_unit.sh
# harden/enforce-ci-gates-core-v2 (PR #2286): fail-direction proof
# for the E2E_REQUIRE_LIVE zero-validated gate in
# test_priority_runtimes_e2e.sh (the REQUIRED `E2E API Smoke Test`).
# Offline (no LLM/network/provisioning): sources that script under
# its unit source-guard and drives the REAL evaluate_require_live_gate
# — asserts REQUIRE_LIVE=1 + zero validated → RED (the false-green
# trap), REQUIRE_LIVE=1 + >=1 validated → GREEN, and REQUIRE_LIVE
# unset + zero validated → GREEN (loud skip). CI can't provision a
# live arm to prove this, so this unit test IS the regression gate:
# a revert of the zero-validated→RED logic goes red on every PR.
bash tests/e2e/test_require_live_priority_gate_unit.sh
- if: ${{ needs.changes.outputs.scripts == 'true' }}
name: Test ECR promote-tenant-image script (mock-driven, no live infra)
@@ -407,61 +381,61 @@ jobs:
# mc#959 root-fix (sre)
canvas-deploy-status:
# core#2226: replaces the old advisory "Canvas Deploy Reminder". The canvas
# image now has a real ORDERED auto-deploy (publish-canvas-image.yml:
# build → push :staging-<sha> → wait green main CI → promote :latest by
# digest), and docker-compose pins via CANVAS_IMAGE_TAG. There is no longer
# a manual "go run docker compose pull by hand" step to remind operators
# about — so this job just records, on a canvas-touching main push, that the
# ordered deploy is handling it (and where to watch), instead of prescribing
# a manual action that determinism made obsolete.
name: Canvas Deploy Status
canvas-deploy-reminder:
name: Canvas Deploy Reminder
runs-on: docker-host
# Job-level `if:` so ci-required-drift.py's ci_job_names() detects this as
# github.ref-gated and skips it from the required-context F1 set (mc#1982).
# Step-level exit 0 handles the "not a canvas main push" case.
# mc#774 root-fix: added job-level `if:` so ci-required-drift.py's
# ci_job_names() detects this as github.ref-gated and skips it from F1.
# The step-level exit 0 handles the "not main push" case; the job-level
# `if:` makes the gating explicit so the drift script sees it.
# Runs on both main and staging pushes; step exits 0 when not applicable.
if: ${{ github.ref == 'refs/heads/main' || github.ref == 'refs/heads/staging' }}
needs: [changes, canvas-build]
steps:
- name: Record canvas ordered-deploy status
- name: Write deploy reminder to step summary
env:
COMMIT_SHA: ${{ github.sha }}
CANVAS_CHANGED: ${{ needs.changes.outputs.canvas }}
EVENT_NAME: ${{ github.event_name }}
REF_NAME: ${{ github.ref }}
# github.server_url resolves via the workflow-level env override to the
# Gitea instance, so RUN_URL points at the Gitea run page (not github.com).
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions
# github.server_url resolves via the workflow-level env override
# to the Gitea instance, so the RUN_URL points at the Gitea run
# page (not github.com). See feedback_act_runner_github_server_url.
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
set -euo pipefail
if [ "$CANVAS_CHANGED" != "true" ] || [ "$EVENT_NAME" != "push" ] || [ "$REF_NAME" != "refs/heads/main" ]; then
echo "Canvas deploy status not applicable for event=$EVENT_NAME ref=$REF_NAME canvas_changed=$CANVAS_CHANGED."
echo "Canvas deploy reminder not applicable for event=$EVENT_NAME ref=$REF_NAME canvas_changed=$CANVAS_CHANGED."
exit 0
fi
# Write body to a temp file — avoids backtick escaping in shell.
cat > /tmp/deploy-status.md << 'BODY'
## Canvas ordered deploy in progress — no manual action required
cat > /tmp/deploy-reminder.md << 'BODY'
## Canvas build passed — deploy required
This canvas-touching main push triggers `publish-canvas-image`, which now
runs an ORDERED, CI-gated deploy (core#2226) — the same shape as the
platform's deploy-production:
The `publish-canvas-image` workflow is now building a fresh Docker image
(`ghcr.io/molecule-ai/canvas:latest`) in the background.
1. Build → push `molecule-ai/canvas:staging-<sha>` + `:staging-latest`.
2. Wait for green main CI on this SHA.
3. Promote `:latest` to the verified `:staging-<sha>` by digest.
Once it completes (~35 min), apply on the host machine with:
```bash
cd <runner-workspace>
git pull origin main
docker compose pull canvas && docker compose up -d canvas
```
Tenants/hosts pin via `CANVAS_IMAGE_TAG` (default `latest` = the last
CI-green build), so a deploy is reproducible — no hand-run
`docker compose pull` needed. Watch the run in the canvas publish workflow.
If you need to rebuild from local source instead (e.g. testing unreleased
changes or a new `NEXT_PUBLIC_*` URL), use:
```bash
docker compose build canvas && docker compose up -d canvas
```
BODY
printf '\n> Posted automatically by CI · commit `%s` · [publish workflow](%s)\n' \
"$COMMIT_SHA" "$RUN_URL" >> /tmp/deploy-status.md
printf '\n> Posted automatically by CI · commit `%s` · [build log](%s)\n' \
"$COMMIT_SHA" "$RUN_URL" >> /tmp/deploy-reminder.md
# Gitea has no commit-comments API; write to GITHUB_STEP_SUMMARY, which
# both GitHub and Gitea Actions render as the run's summary page.
cat /tmp/deploy-status.md >> "$GITHUB_STEP_SUMMARY"
# Gitea has no commit-comments API; write to GITHUB_STEP_SUMMARY,
# which both GitHub Actions and Gitea Actions render as the
# workflow run's summary page. (#75 / PR-D)
cat /tmp/deploy-reminder.md >> "$GITHUB_STEP_SUMMARY"
# Python Lint & Test — required check, always runs.
# Runtime Python moved to molecule-ai-workspace-runtime. Keep this context as
@@ -493,10 +467,10 @@ jobs:
#
# Emits `CI / all-required (<event>)` where <event> is the workflow trigger
# (e.g. `CI / all-required (pull_request)`, `CI / all-required (push)`).
# Branch protection requires the event-suffixed name —
# Branch protection MUST be updated to require the event-suffixed name —
# requiring `CI / all-required` (bare, no suffix) silently blocks all merges
# because Gitea treats absent status contexts as pending (not skipped), and
# no workflow emits the bare name. BP requires
# no workflow emits the bare name. Fixed: BP now requires
# `CI / all-required (pull_request)` per issue #1473.
#
# Closes the failure mode where status_check_contexts on molecule-core/main
@@ -505,91 +479,129 @@ jobs:
# red silently merged through. See internal#286 for the three concrete
# tonight-of-2026-05-11 incidents that prompted the emergency bump.
#
# ── 2026-06-01 CI-scheduler-overload fix (fix/ci-scheduler-fanout) ──
# PREVIOUS shape: a poll-gate that ran detect-changes then LOOPED on
# `GET /commits/{sha}/statuses` every 15s for up to 40 min, occupying a
# `ci-meta` executor slot the entire time it waited for upstream jobs.
# With only 2 ci-meta runners, that poll-loop squatted half the lane on
# every PR — a confirmed throughput sink in the live RCA (two concurrent
# `JOB-all-required` containers observed pinning the lane). The polling
# design existed only to dodge the Gitea `needs:` + `if: always()` bug,
# where an always()-guarded sentinel could be marked skipped before
# upstream jobs settled (leaving BP pending forever).
# This job deliberately has no `needs:`. Gitea 1.22/act_runner can mark a
# job-level `if: always()` + `needs:` sentinel as skipped before upstream
# jobs settle, leaving branch protection with a permanent pending
# `CI / all-required` context. Instead, this independent sentinel polls the
# required commit-status contexts for this SHA and fails if any fail, skip,
# or never emit. It runs the same path detector as `changes` and only waits
# for path-relevant jobs; Gitea can otherwise leave needs/output-skipped
# jobs permanently pending with "Blocked by required conditions". It runs on
# the dedicated `ci-meta` lane so the poller does not occupy the same
# general runner pool as the jobs it is waiting for.
#
# NEW shape: a plain `needs:` aggregator with NO polling loop. This is
# safe here — and was NOT safe at the time the poller was written —
# because every aggregated CI job now gates its real work PER-STEP
# (`if: needs.changes.outputs.* != 'true'`) rather than at the JOB level.
# A per-step-gated job always reaches a terminal SUCCESS (it no-ops its
# expensive steps but the job itself still completes), so it is never
# `skipped`. Plain `needs:` (WITHOUT `if: always()`) works correctly on
# Gitea 1.22.6 / act_runner v0.6.1 — only `needs:` + `if: always()` is
# broken (feedback_gitea_needs_works_only_ifalways_broken). We therefore
# use plain `needs:` + an explicit per-need result check (NOT
# `if: always()`); if any need fails/errors, Gitea never starts this job
# and BP sees `CI / all-required` go red via the failed dependency
# propagation — exactly the gate we want, with zero runner-squat.
# canvas-deploy-reminder is intentionally NOT included in all-required.needs.
# It is an informational main-push reminder, not a PR quality gate. Keeping
# it in this dependency list lets a skipped reminder skip the required
# sentinel before the `always()` guard can emit a branch-protection status.
#
# The `needs:` list MUST stay in lockstep with ci-required-drift.py's
# F1 check (`ci_job_names()` = every job MINUS the sentinel MINUS jobs
# whose `if:` gates on github.event_name/github.ref). canvas-deploy-
# reminder is event-gated (`if: github.ref == refs/heads/{main,staging}`)
# so it is intentionally EXCLUDED — it skips on PRs and a `needs:` on a
# skipped job would never let the sentinel run. If a new always-running
# CI job is added, add it here too or ci-required-drift F1 will flag it.
#
# Stays on the dedicated `ci-meta` lane (no docker work, so the
# docker-host-pin lint does not apply), but now the job is sub-second:
# it only inspects already-settled `needs.*.result` values, so it frees
# the slot immediately instead of holding it for the whole CI duration.
#
needs:
- changes
- platform-build
- canvas-build
- shellcheck
- python-lint
continue-on-error: false
runs-on: ci-meta
timeout-minutes: 5
timeout-minutes: 45
steps:
- name: Verify all aggregated CI jobs succeeded
# NO polling, NO API call, NO checkout. Because this job lists the
# aggregated jobs under `needs:` (without `if: always()`), Gitea only
# starts it once every need has reached SUCCESS — a failed/errored
# need short-circuits the job and propagates red to the
# `CI / all-required` context. This explicit check is a
# belt-and-suspenders assertion + a readable run summary; the real
# gating is the `needs:` edge itself.
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- id: check
env:
CHANGES_RESULT: ${{ needs.changes.result }}
PLATFORM_RESULT: ${{ needs.platform-build.result }}
CANVAS_RESULT: ${{ needs.canvas-build.result }}
SHELLCHECK_RESULT: ${{ needs.shellcheck.result }}
PYTHON_LINT_RESULT: ${{ needs.python-lint.result }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_BASE_REF: ${{ github.event.pull_request.base.ref }}
PUSH_BEFORE: ${{ github.event.before }}
run: |
python3 .gitea/scripts/detect-changes.py \
--profile ci \
--event-name "${{ github.event_name }}" \
--pr-base-sha "$PR_BASE_SHA" \
--base-ref "$PR_BASE_REF" \
--push-before "${GITHUB_EVENT_BEFORE:-$PUSH_BEFORE}"
- name: Wait for required CI contexts
env:
GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}
API_ROOT: ${{ github.server_url }}/api/v1
REPOSITORY: ${{ github.repository }}
COMMIT_SHA: ${{ github.sha }}
EVENT_NAME: ${{ github.event_name }}
REQUIRE_PLATFORM: ${{ steps.check.outputs.platform }}
REQUIRE_CANVAS: ${{ steps.check.outputs.canvas }}
REQUIRE_SCRIPTS: ${{ steps.check.outputs.scripts }}
run: |
set -euo pipefail
fail=0
check() {
name="$1"; result="$2"
printf 'CI / %s = %s\n' "$name" "$result"
# `success` is the only green terminal state we accept. A plain
# `needs:` job is only started when all needs succeed, so reaching
# this step already implies success — but assert explicitly so a
# future `if: always()` reintroduction (which WOULD let non-success
# through) fails loudly instead of silently passing the gate.
if [ "$result" != "success" ]; then
echo "::error::aggregated CI job '${name}' did not succeed (result=${result})"
fail=1
fi
}
check "Detect changes" "$CHANGES_RESULT"
check "Platform (Go)" "$PLATFORM_RESULT"
check "Canvas (Next.js)" "$CANVAS_RESULT"
check "Shellcheck (E2E scripts)" "$SHELLCHECK_RESULT"
check "Python Lint & Test" "$PYTHON_LINT_RESULT"
if [ "$fail" -ne 0 ]; then
echo "::error::all-required: one or more aggregated CI jobs did not succeed"
exit 1
fi
echo "OK: all aggregated CI jobs succeeded — CI / all-required green."
python3 - <<'PY'
import json
import os
import sys
import time
import urllib.error
import urllib.request
token = os.environ["GITEA_TOKEN"]
api_root = os.environ["API_ROOT"].rstrip("/")
repo = os.environ["REPOSITORY"]
sha = os.environ["COMMIT_SHA"]
event = os.environ["EVENT_NAME"]
required = [
f"CI / Detect changes ({event})",
f"CI / Python Lint & Test ({event})",
]
if os.environ.get("REQUIRE_PLATFORM") == "true":
required.append(f"CI / Platform (Go) ({event})")
if os.environ.get("REQUIRE_CANVAS") == "true":
required.append(f"CI / Canvas (Next.js) ({event})")
if os.environ.get("REQUIRE_SCRIPTS") == "true":
required.append(f"CI / Shellcheck (E2E scripts) ({event})")
terminal_bad = {"failure", "error"}
deadline = time.time() + 40 * 60
last_summary = None
def fetch_statuses():
statuses = []
for page in range(1, 6):
url = f"{api_root}/repos/{repo}/commits/{sha}/statuses?page={page}&limit=100"
req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
with urllib.request.urlopen(req, timeout=10) as resp:
chunk = json.load(resp)
if not chunk:
break
statuses.extend(chunk)
latest = {}
for item in statuses:
ctx = item.get("context")
if not ctx:
continue
prev = latest.get(ctx)
if prev is None or (item.get("updated_at") or item.get("created_at") or "") >= (prev.get("updated_at") or prev.get("created_at") or ""):
latest[ctx] = item
return latest
while True:
try:
latest = fetch_statuses()
except (TimeoutError, OSError, urllib.error.URLError) as exc:
if time.time() >= deadline:
print(f"FAIL: status polling did not recover before deadline: {exc}", file=sys.stderr)
sys.exit(1)
print(f"WARN: status poll failed, retrying: {exc}", flush=True)
time.sleep(15)
continue
states = {ctx: (latest.get(ctx) or {}).get("status") or (latest.get(ctx) or {}).get("state") or "missing" for ctx in required}
summary = ", ".join(f"{ctx}={state}" for ctx, state in states.items())
if summary != last_summary:
print(summary, flush=True)
last_summary = summary
bad = {ctx: state for ctx, state in states.items() if state in terminal_bad}
if bad:
print("FAIL: required CI context failed:", file=sys.stderr)
for ctx, state in bad.items():
desc = (latest.get(ctx) or {}).get("description") or ""
print(f" - {ctx}: {state} {desc}", file=sys.stderr)
sys.exit(1)
if all(state == "success" for state in states.values()):
print(f"OK: all {len(required)} required CI contexts succeeded")
sys.exit(0)
if time.time() >= deadline:
print("FAIL: timed out waiting for required CI contexts:", file=sys.stderr)
for ctx, state in states.items():
print(f" - {ctx}: {state}", file=sys.stderr)
sys.exit(1)
time.sleep(15)
PY
+1 -9
View File
@@ -92,7 +92,7 @@ permissions:
# stacking up.
concurrency:
group: continuous-synth-e2e
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -166,10 +166,6 @@ jobs:
# canary path. The script picks the right blob shape based on
# which key is non-empty.
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
# google-adk canary path — AI-Studio key (config model
# google_genai:gemini-2.5-pro). PROD disallows API keys (Vertex+ADC);
# the keyed path is CI-only. Dispatch with E2E_RUNTIME=google-adk.
E2E_GOOGLE_API_KEY: ${{ secrets.MOLECULE_STAGING_GOOGLE_API_KEY }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -221,10 +217,6 @@ jobs:
required_secret_name="MOLECULE_STAGING_OPENAI_API_KEY"
required_secret_value="${E2E_OPENAI_API_KEY:-}"
;;
google-adk)
required_secret_name="MOLECULE_STAGING_GOOGLE_API_KEY"
required_secret_value="${E2E_GOOGLE_API_KEY:-}"
;;
*)
echo "::warning::Unknown E2E_RUNTIME='${E2E_RUNTIME}' — skipping LLM-key check"
required_secret_name=""
+13 -111
View File
@@ -101,7 +101,7 @@ concurrency:
# See e2e-staging-canvas.yml's identical concurrency block for the full
# rationale and the 2026-04-28 incident reference.
group: e2e-api-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -123,9 +123,8 @@ jobs:
# integration). See internal#512 for the class defect.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
api: ${{ steps.decide.outputs.api }}
steps:
@@ -161,9 +160,8 @@ jobs:
# detect-changes for the full rationale.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 15
env:
# Unique per-run container names so concurrent runs on the host-
@@ -272,24 +270,6 @@ jobs:
echo "::error::Redis did not become ready in 15s"
docker logs "$REDIS_CONTAINER" || true
exit 1
- name: Set deterministic admin token for the e2e platform
if: needs.detect-changes.outputs.api == 'true'
run: |
# AdminAuth (workspace-server/internal/middleware/wsauth_middleware.go:164)
# reads ADMIN_TOKEN. Setting it (a) closes isDevModeFailOpen (devmode.go:50
# returns false when ADMIN_TOKEN is non-empty), so admin routes require a
# bearer, and (b) makes Tier-2b accept a bearer that constant-time-equals
# ADMIN_TOKEN. The platform process inherits ADMIN_TOKEN from $GITHUB_ENV.
#
# MOLECULE_ADMIN_TOKEN is the var the e2e scripts send as the bearer
# (tests/e2e/_lib.sh:33 e2e_mint_workspace_token, and the run_mock
# org-import curl). Set BOTH to the SAME value so the bearer the test
# sends == the secret the platform checks. Deterministic test value;
# this platform is ephemeral, single-run, and never reachable off-host.
E2E_ADMIN_TOKEN="e2e-api-admin-${{ github.run_id }}-${{ github.run_attempt }}"
echo "ADMIN_TOKEN=${E2E_ADMIN_TOKEN}" >> "$GITHUB_ENV"
echo "MOLECULE_ADMIN_TOKEN=${E2E_ADMIN_TOKEN}" >> "$GITHUB_ENV"
echo "Admin token configured for the e2e platform (ADMIN_TOKEN + MOLECULE_ADMIN_TOKEN)."
- name: Build platform
if: needs.detect-changes.outputs.api == 'true'
working-directory: workspace-server
@@ -345,57 +325,19 @@ jobs:
# start-redis steps point at this run's per-run host ports.
./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (with migration completion gate)
# Issue #2205: 30 one-second probes is insufficient when the migration
# chain is still running; /health can flip true before migrations
# finish, so subsequent steps that touch the DB fail. Hybrid fix:
# bump timeout to 300s AND gate exit on the same workspaces-table
# existence check the downstream "Assert migrations applied" uses.
- name: Wait for /health
if: needs.detect-changes.outputs.api == 'true'
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction (it WILL be exceeded as migrations accrue).
# Use a generous wall-clock budget that comfortably exceeds
# cold-start + full-migration time, polling fast. This is robust to a
# growing chain WITHOUT masking a genuinely dead platform: if the
# background platform-server process has exited (e.g. a broken
# migration crashed it), we stop and fail loudly at once instead of
# waiting out the whole budget.
#
# Issue #2205: /health can flip true before migrations finish on a
# growing chain, so we gate exit on the workspaces-table existence
# check the downstream "Assert migrations applied" uses.
DEADLINE_SECS=300 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
for i in $(seq 1 30); do
if curl -sf "$BASE/health" > /dev/null; then
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc \
"SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'" 2>/dev/null || echo "0")
if [ "$tables" = "1" ]; then
echo "Platform healthy + migrations applied after $(( $(date +%s) - start ))s"
exit 0
fi
fi
# Fast-fail: if the platform process died, /health will never come.
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy with migrations applied within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
echo "Platform up after ${i}s"
exit 0
fi
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true
exit 1
- name: Assert migrations applied
if: needs.detect-changes.outputs.api == 'true'
run: |
@@ -412,51 +354,11 @@ jobs:
- name: Run E2E API tests
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_api.sh
- name: Run keyless feature-contract E2E (terminal-diagnose / webhooks / budget / checkpoints / audit / traces / session-search / rescue / llm-billing-mode / resume / hibernate)
# Keyless required-lane coverage for feature endpoints that ship without
# an LLM key (runtime=external fixture). Each asserts the real HTTP
# contract + a meaningful failure mode (401/400/fail-closed) so a
# regression goes RED, not silently green. The mock-runtime A2A canned
# round-trip is covered by the priority-runtimes `mock` arm, not here.
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_keyless_feature_contracts_e2e.sh
- name: Run secrets-dispatch contract test (keyless SECRETS_JSON branch order)
# Previously orphaned (no workflow referenced it). Hermetic unit-style
# contract over test_staging_full_saas.sh's LLM-key branch precedence —
# needs no platform, no bearer, no network. Guards the 2026-05-03
# "wrong key shape wins" incident class.
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_secrets_dispatch.sh
- name: Run notify-with-attachments E2E
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_notify_attachments_e2e.sh
- name: "Run priority-runtimes E2E (REQUIRE-LIVE: mock validates the runtime plumbing end-to-end)"
# E2E_REQUIRE_LIVE=1 is ON: the run MUST validate >=1 runtime end-to-end
# or it exits NON-zero (RED). This is now SAFE because the `mock` arm can
# actually provision in CI: the only blocker was that POST /org/import and
# POST /admin/workspaces/:id/tokens are AdminAuth-gated
# (router.go:778 + :427) and this job previously configured NO admin token,
# so every admin call 401'd ("admin auth required"). The "Set deterministic
# admin token" step above now sets ADMIN_TOKEN on the platform AND exports
# the matching MOLECULE_ADMIN_TOKEN the e2e scripts send as the bearer, so
# the mock arm can org-import → online → mint token → canned A2A reply →
# validated(). That guarantees VALIDATED>=1 on a healthy platform, so the
# REQUIRED `E2E API Smoke Test` gate now HONESTLY validates a runtime
# end-to-end; if the mock plumbing (DB insert, status flip, A2A proxy,
# activity logging, or the admin-auth wiring) genuinely breaks, the gate
# goes RED instead of false-green. The zero-validated→RED decision is also
# regression-gated WITHOUT provisioning by the bash unit test
# tests/e2e/test_require_live_priority_gate_unit.sh (wired into ci.yml's
# "Run E2E bash unit tests" job), so a revert of that logic still fails CI.
#
# MiniMax stays an OPPORTUNISTIC best-effort arm: create is registry-fragile
# in CI (422 UNREGISTERED_MODEL_FOR_RUNTIME), so a miss is reported via
# bestfail() and never reds the gate — mock carries the required validation,
# MiniMax is a bonus real-LLM check when it comes up. ZERO new credentials.
- name: Run priority-runtimes E2E (claude-code + hermes — skips when keys absent)
if: needs.detect-changes.outputs.api == 'true'
env:
E2E_REQUIRE_LIVE: '1'
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
run: bash tests/e2e/test_priority_runtimes_e2e.sh
- name: Install standalone runtime parser from Gitea registry
if: needs.detect-changes.outputs.api == 'true'
+15 -100
View File
@@ -32,7 +32,7 @@ on:
concurrency:
group: e2e-chat-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -113,28 +113,6 @@ jobs:
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
#
# PROMOTION-READINESS (toward required gate — do NOT flip continue-on-error
# without CTO sign-off, that's the irreversible call):
# NOW FAIL-CLOSED:
# - Postgres/Redis/platform/canvas readiness are already bounded
# readiness-polls that hard-fail (and dump logs) at their deadline,
# not fixed sleeps — preserved.
# - passWithNoTests:false + forbidOnly (playwright.config.ts) → a
# renamed/moved spec or stray test.only can no longer green the lane.
# - REQUIRE-LIVE guard in "Run Playwright E2E tests" → chat==true must
# actually execute >=1 test, else exit 1.
# - chat-desktop "activity log" test no longer swallows its assertion.
# STILL BLOCKS PROMOTION:
# - The echo round-trip asserts on rendered "Echo: ..." text but never
# asserts the echo runtime actually RECEIVED the A2A request
# (fixtures/echo-runtime.ts exposes lastRequest, unused) — an
# optimistic client-side render could pass without a real round-trip.
# Add a server-received assertion before required.
# - The "No-op pass" path (detect-changes chat!=true) is a legitimate
# paths-filter skip, but a required gate needs it to be a neutral
# check, not a green "success", so a skipped heavy lane can't be
# mistaken for a passed one.
continue-on-error: true
timeout-minutes: 15
env:
@@ -264,36 +242,16 @@ jobs:
- name: Wait for /health
if: needs.detect-changes.outputs.chat == 'true'
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
for i in $(seq 1 30); do
if curl -sf "http://127.0.0.1:${PLATFORM_PORT}/health" > /dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
echo "Platform up after ${i}s"
exit 0
fi
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true
exit 1
- name: Install canvas dependencies
if: needs.detect-changes.outputs.chat == 'true'
@@ -320,68 +278,25 @@ jobs:
export NEXT_PUBLIC_WS_URL="ws://127.0.0.1:${PLATFORM_PORT}/ws"
npx next dev --turbopack -p "${CANVAS_PORT}" > canvas.log 2>&1 &
echo $! > canvas.pid
# Readiness must wait for the actual chat route to *compile*, not
# just for the dev server to bind the port. `next dev --turbopack`
# accepts the TCP connection well before it has compiled a route
# on first request, so a bare `curl /` can 200 (or hang) while the
# page the tests load is still building. We therefore probe the
# real route the specs navigate to (`/?m=chat`) and require a 2xx,
# which only happens once Turbopack has finished the first
# compile. The previous 30s budget was also too tight for a cold
# Turbopack first-compile on a loaded operator-host runner — the
# `Canvas did not start in 30s` flake. Raise to 120s (job
# timeout-minutes is 15, so this is comfortably bounded) and probe
# every 2s.
READY=""
for i in $(seq 1 60); do
# Tempfile-routed -w + set +e/-e prevents curl-exit-code
# pollution of the captured status (lint-curl-status-capture.yml).
set +e
curl -s -o /dev/null -w '%{http_code}' "http://localhost:${CANVAS_PORT}/?m=chat" > /tmp/canvas-ready.code
set -e
CODE=$(cat /tmp/canvas-ready.code 2>/dev/null || echo "000")
if [ "$CODE" -ge 200 ] && [ "$CODE" -lt 400 ]; then
echo "Canvas (chat route compiled) up after ~$((i*2))s (HTTP ${CODE})"
READY=1
break
for i in $(seq 1 30); do
if curl -sf "http://localhost:${CANVAS_PORT}" > /dev/null 2>&1; then
echo "Canvas up after ${i}s"
exit 0
fi
sleep 2
sleep 1
done
if [ -z "$READY" ]; then
echo "::error::Canvas chat route did not compile in 120s (last HTTP ${CODE})"
cat canvas.log || true
exit 1
fi
echo "::error::Canvas did not start in 30s"
cat canvas.log || true
exit 1
- name: Run Playwright E2E tests
if: needs.detect-changes.outputs.chat == 'true'
working-directory: canvas
env:
# CI=1 activates forbidOnly in playwright.config.ts (a stray
# `test.only` would otherwise green the suite while skipping the
# rest). passWithNoTests:false (also in the config) already makes
# a zero-match selection exit non-zero.
CI: "1"
run: |
set -euo pipefail
export E2E_PLATFORM_URL="http://127.0.0.1:${PLATFORM_PORT}"
export E2E_DATABASE_URL="${DATABASE_URL}"
export PLAYWRIGHT_BASE_URL="http://localhost:${CANVAS_PORT}"
# REQUIRE-LIVE guard (mirrors CP serving-e2e SERVING_E2E_REQUIRE_LIVE):
# this lane reached here only because detect-changes said chat==true,
# so it MUST actually execute the round-trip specs. `pipefail` makes
# a real test failure (playwright non-zero) abort here under `set -e`;
# passWithNoTests:false makes a zero-match selection non-zero too. The
# explicit grep below is belt-and-braces: assert the list reporter
# printed an executed-count summary, so a silent all-skip / no-op can
# never report green.
npx playwright test e2e/chat-desktop.spec.ts e2e/chat-mobile.spec.ts \
--reporter=list 2>&1 | tee /tmp/pw-chat.out
if ! grep -qE '[0-9]+ (passed|failed|skipped)' /tmp/pw-chat.out; then
echo "::error::E2E Chat REQUIRE-LIVE: chat==true but Playwright reported no executed tests — specs missing or all-skipped, refusing to report green."
exit 1
fi
npx playwright test e2e/chat-desktop.spec.ts e2e/chat-mobile.spec.ts
- name: Dump platform log on failure
if: failure() && needs.detect-changes.outputs.chat == 'true'
+6 -30
View File
@@ -15,7 +15,7 @@ on:
concurrency:
group: e2e-legacy-advisory
cancel-in-progress: false
cancel-in-progress: true
permissions:
contents: read
@@ -130,37 +130,13 @@ jobs:
run: |
set -euo pipefail
./workspace-server/platform-server > workspace-server/platform.log 2>&1 &
PLATFORM_PID=$!
echo "$PLATFORM_PID" > workspace-server/platform.pid
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
start=$(date +%s)
while :; do
if curl -sf "$BASE/health" >/dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
exit 0
fi
if ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
echo $! > workspace-server/platform.pid
for i in $(seq 1 30); do
curl -sf "$BASE/health" >/dev/null && exit 0
sleep 1
done
cat workspace-server/platform.log || true
exit 1
- name: Run comprehensive E2E
run: bash tests/e2e/test_comprehensive_e2e.sh
+5 -30
View File
@@ -115,7 +115,7 @@ concurrency:
# would let a queued staging/main push behind a PR run get cancelled,
# leaving any gate that reads "completed run at SHA" stuck.
group: e2e-peer-visibility-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -126,7 +126,6 @@ jobs:
# push/dispatch/cron only (30+ min). This is NOT a fake-green mask of
# the real assertion — it validates the driving script's bash syntax
# and inline-python so a broken test script fails at PR time.
# bp-required: pending #1296 — PR emitter, not yet required (tracked in #1296).
pr-validate:
name: E2E Peer Visibility
runs-on: ubuntu-latest
@@ -268,36 +267,12 @@ jobs:
echo $! > platform.pid
- name: Wait for /health
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
if curl -sf "$BASE/health" > /dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
exit 0
fi
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
for i in $(seq 1 30); do
curl -sf "$BASE/health" > /dev/null && { echo "Platform up after ${i}s"; exit 0; }
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true; exit 1
- name: Run LOCAL fresh-provision peer-visibility E2E (literal MCP list_peers)
# HONEST gate — NO continue-on-error. The local backend uses
# external-mode workspaces so this context tests the literal MCP
+11 -46
View File
@@ -12,30 +12,9 @@ name: E2E Staging Canvas (Playwright)
#
# Playwright test suite that provisions a fresh staging org per run and
# verifies every workspace-panel tab renders REAL content (not just an
# empty/errored container). Complements e2e-staging-saas.yml (which tests
# the API shape) by exercising the actual browser + canvas bundle against
# live staging.
#
# PROMOTION-READINESS (toward making this a HARD merge-gate):
# NOW RELIABLE (spec hardened — staging-tabs.spec.ts):
# - All waits condition-based (toBeVisible/toHaveAttribute/expect.poll);
# no fixed waitForTimeout in the spec.
# - Tabs asserted on settled REAL content, not "container visible".
# - ErrorBoundary + visible error alerts fail non-degraded tabs.
# - Tab-list parity-checked vs live DOM; fail-closed on missing tenant.
# STILL BLOCKS PROMOTION-TO-REQUIRED (do NOT remove continue-on-error —
# CTO-owned, RFC internal#219 §1):
# - Infra dependency: real staging EC2 per run (12-20 min cold boot);
# AWS/Cloudflare/CP availability would become merge-blockers.
# - Shared-zone TLS/DNS/ACME propagation flake surface is upstream of
# this repo and outside its control.
# - Required-gate correctness needs CP_STAGING_ADMIN_API_TOKEN GUARANTEED
# present; today's skip-if-absent (core#2225) is right for non-gating
# but would skip-green a required check.
# - Single hermes/platform_managed workspace; agent-dependent content
# (live chat/traces round-trip) not exercised on staging (#2162).
# The full checklist lives at the foot of canvas/e2e/staging-tabs.spec.ts.
# verifies every workspace-panel tab renders without crashing. Complements
# e2e-staging-saas.yml (which tests the API shape) by exercising the
# actual browser + canvas bundle against live staging.
#
# Triggers: push to main, PR touching canvas sources + this workflow only
# after the PR enters `merge-queue`, manual dispatch, and scheduled cron to
@@ -83,7 +62,7 @@ concurrency:
# wasted CI is acceptable given the alternative is losing staging-tip
# data that auto-promote-staging needs.
group: e2e-staging-canvas-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -188,30 +167,16 @@ jobs:
- if: needs.detect-changes.outputs.canvas == 'true'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# Skip-if-absent (core#2225), mirroring the serving-e2e gate's
# skip-if-secret-unset contract: a MISSING CI secret is an operator
# CONFIG gap, not a code regression, so it must not paint this E2E
# red. When CP_STAGING_ADMIN_API_TOKEN is unset we emit a LOUD
# ::warning:: + ::notice:: and skip the real provision/test steps (the
# job still completes green). When the secret IS present we run the
# full suite exactly as before. Operators: set
# CP_STAGING_ADMIN_API_TOKEN as a repo/org Actions secret on
# molecule-core to actually exercise this E2E.
- name: Check admin token (skip-if-absent)
id: token_check
- name: Verify admin token present
if: needs.detect-changes.outputs.canvas == 'true'
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::warning::CP_STAGING_ADMIN_API_TOKEN is not set on this runner — SKIPPING the staging canvas E2E (cannot auth to staging CP). This is an operator config gap, not a code failure; set the secret on molecule-core (repo or org Actions secrets) to run it. See core#2225."
echo "::notice::E2E Staging Canvas skipped: CP_STAGING_ADMIN_API_TOKEN absent."
echo "present=false" >> "$GITHUB_OUTPUT"
else
echo "CP_STAGING_ADMIN_API_TOKEN present ✓ — running staging canvas E2E."
echo "present=true" >> "$GITHUB_OUTPUT"
echo "::error::Missing CP_STAGING_ADMIN_API_TOKEN"
exit 2
fi
- name: Set up Node
if: needs.detect-changes.outputs.canvas == 'true' && steps.token_check.outputs.present == 'true'
if: needs.detect-changes.outputs.canvas == 'true'
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
with:
node-version: '20'
@@ -219,11 +184,11 @@ jobs:
cache-dependency-path: canvas/package-lock.json
- name: Install canvas deps
if: needs.detect-changes.outputs.canvas == 'true' && steps.token_check.outputs.present == 'true'
if: needs.detect-changes.outputs.canvas == 'true'
run: npm ci
- name: Install Playwright browsers
if: needs.detect-changes.outputs.canvas == 'true' && steps.token_check.outputs.present == 'true'
if: needs.detect-changes.outputs.canvas == 'true'
timeout-minutes: 10
run: |
PREBAKED_PLAYWRIGHT=/ms-playwright
@@ -235,7 +200,7 @@ jobs:
npx playwright install --with-deps chromium
- name: Run staging canvas E2E
if: needs.detect-changes.outputs.canvas == 'true' && steps.token_check.outputs.present == 'true'
if: needs.detect-changes.outputs.canvas == 'true'
run: npx playwright test --config=playwright.staging.config.ts
- name: Upload Playwright report on failure
-28
View File
@@ -85,25 +85,6 @@ jobs:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
#
# PROMOTION-READINESS (toward required gate — do NOT flip continue-on-error
# without CTO sign-off, that's the irreversible call):
# NOW FAIL-CLOSED:
# - Missing CP_STAGING_ADMIN_API_TOKEN → hard exit 2 (preflight).
# - Staging CP unhealthy → hard exit 1 (preflight, not a workspace bug).
# - Harness E2E_REQUIRE_LIVE=1 → exit 5 if a clean exit didn't prove
# all four awaiting_agent transitions (no silent skip).
# - Sweep transition (step 6) is now a bounded readiness-poll, not a
# fixed sleep + one-shot assert → no more sweep-cadence flake.
# - register / re-register retry ONLY transient edge 5xx (bounded),
# fail closed on 4xx → no more cold-boot-502 flake.
# STILL BLOCKS PROMOTION:
# - Single shared staging tenant + EC2 quota window: an infra-side
# provisioning outage (not a code bug) would turn the gate red.
# Needs an infra-class vs code-class signal split before required.
# - "CP unhealthy → exit 1" currently looks identical to a real
# failure on the run page; required-gate would need it demoted to
# a neutral/skip so staging flakiness can't block merges.
continue-on-error: true
timeout-minutes: 25
@@ -143,15 +124,6 @@ jobs:
- name: Run external-runtime E2E
id: e2e
# E2E_REQUIRE_LIVE=1: the harness fails CLOSED (exit 5) if it ever
# reaches a clean exit without proving all four awaiting_agent
# transitions. Mirrors CP serving-e2e SERVING_E2E_REQUIRE_LIVE — a
# silent skip / early-return / dropped assertion can no longer
# masquerade as green. Token-missing and CP-unhealthy already
# hard-fail in the two preflight steps above, so reaching this step
# means a real cycle is expected.
env:
E2E_REQUIRE_LIVE: "1"
run: bash tests/e2e/test_staging_external_runtime.sh
# Mirror the e2e-staging-saas.yml safety net: if the runner is
-199
View File
@@ -1,199 +0,0 @@
name: E2E Staging Reconciler (heals terminated EC2)
# Live staging proof for the core#2261 instance-state reconciler
# (workspace-server/internal/registry/cp_instance_reconciler.go). The
# real-infra complement to the deterministic unit tests: provisions a real
# staging workspace, TERMINATES its EC2, and asserts the reconciler flips it
# off 'online' (PRIMARY gate) and auto-reprovisions on a new instance_id
# (SECONDARY, best-effort). See
# tests/e2e/test_reconciler_heals_terminated_instance.sh for the assertion
# contract + timeouts.
#
# Modeled on e2e-staging-saas.yml. Same secrets + same Gitea-port caveats:
# - Dropped workflow_dispatch.inputs (Gitea 1.22.6 parser rejects them).
# - Dropped merge_group / environment (no Gitea equivalent).
# - Workflow-level env.GITHUB_SERVER_URL pinned per
# feedback_act_runner_github_server_url.
#
# NOT a required check (yet). This is a brand-new live E2E that provisions +
# terminates real EC2 (costs money, shares the cp#245 cold-boot flake
# surface). A new live e2e must NOT hard-gate every merge until it has a
# green track record. continue-on-error: true surfaces failures without
# blocking. PROMOTE to branch-required (flip continue-on-error → false AND
# add "E2E Staging Reconciler" to branch protection) once it has run green on
# main for several consecutive days — same de-flake discipline the
# platform-boot job in e2e-staging-saas.yml documents.
on:
# Run when the reconciler itself, the script, or the libs it depends on
# change — so a reconciler regression is caught on the PR that introduces
# it (paths filter), plus a daily schedule to catch infra/AMI drift.
push:
branches: [main]
paths:
- 'workspace-server/internal/registry/cp_instance_reconciler.go'
- 'tests/e2e/test_reconciler_heals_terminated_instance.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/lib/model_slug.sh'
- '.gitea/workflows/e2e-staging-reconciler.yml'
pull_request:
branches: [main]
paths:
- 'workspace-server/internal/registry/cp_instance_reconciler.go'
- 'tests/e2e/test_reconciler_heals_terminated_instance.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/lib/model_slug.sh'
- '.gitea/workflows/e2e-staging-reconciler.yml'
workflow_dispatch:
schedule:
# 08:00 UTC daily — offset from e2e-staging-saas (07:00) so the two live
# harnesses don't fight over staging's per-hour org-creation quota.
- cron: '0 8 * * *'
# Serialize against itself: staging has a finite per-hour org-creation quota,
# and a cancelled run mid-teardown leaks EC2. cancel-in-progress: false
# mirrors e2e-staging-saas.yml.
concurrency:
group: e2e-staging-reconciler
cancel-in-progress: false
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# PR-validation path: always posts success so a workflow-only / script-only
# PR has a status check (this workflow's real job only fires on the paths
# filter). Mirrors the pr-validate job in e2e-staging-saas.yml.
pr-validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
continue-on-error: true
- name: YAML validation (best-effort)
run: |
echo "e2e-staging-reconciler.yml — PR validation: workflow YAML is valid."
echo "Live E2E step runs only when the reconciler / script / libs change."
continue-on-error: true
e2e-staging-reconciler:
name: E2E Staging Reconciler
runs-on: ubuntu-latest
# NOT required yet — surface failures without blocking merges. Flip to
# false + add to branch protection once green on main for a de-flake
# window (see the header note). mc#1982: do not renew this mask silently.
continue-on-error: true
timeout-minutes: 60
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
# Single admin-bearer secret drives provision + tenant-token retrieval +
# teardown (= Railway staging CP_ADMIN_API_TOKEN). Same secret name the
# saas workflow canonicalised to under internal#322.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
# Leak-check is REQUIRED here: this test deliberately terminates an EC2,
# so teardown MUST positively confirm no slug-tagged box survives.
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
# claude-code + MiniMax is the cheapest boot-to-online path (same as the
# saas job). The reconciler test never makes a completion, but the key is
# wired so the first boot reaches online on the same path the saas
# harness uses. First non-empty wins in the script's priority chain.
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
E2E_RUNTIME: claude-code
# Platform-managed create path (moonshot/kimi-k2.6, no tenant key) — the
# combo proven to create cleanly; this test only needs the ws online.
E2E_LLM_PATH: platform
E2E_MODEL_SLUG: MiniMax-M2
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify required secrets present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — this test terminates an EC2 and verifies no leak; AWS creds are mandatory"
exit 2
fi
done
echo "Required secrets present ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a reconciler bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run reconciler heal E2E
id: e2e
run: bash tests/e2e/test_reconciler_heals_terminated_instance.sh
# Belt-and-braces teardown: the script installs its own EXIT trap, but if
# the runner is cancelled the trap may not fire. This always() step
# double-deletes any e2e-rec-* org from THIS run. The admin DELETE is
# idempotent so double-invoking is safe.
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
# Slug shape: e2e-rec-YYYYMMDD-<run_id>-<attempt>-...
if run_id:
prefixes = tuple(f'e2e-rec-{d}-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-rec-{d}-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/rec-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/rec-cleanup.code
set -e
code=$(cat /tmp/rec-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::reconciler teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/rec-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::reconciler teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
+2 -190
View File
@@ -48,10 +48,7 @@ on:
- 'workspace-server/internal/handlers/a2a_proxy.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/provisioner/**'
- 'workspace-server/internal/providers/providers.yaml'
- 'tests/e2e/test_staging_full_saas.sh'
- 'tests/e2e/lib/completion_assert.sh'
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- '.gitea/workflows/e2e-staging-saas.yml'
@@ -63,10 +60,7 @@ on:
- 'workspace-server/internal/handlers/a2a_proxy.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/provisioner/**'
- 'workspace-server/internal/providers/providers.yaml'
- 'tests/e2e/test_staging_full_saas.sh'
- 'tests/e2e/lib/completion_assert.sh'
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- '.gitea/workflows/e2e-staging-saas.yml'
@@ -124,12 +118,7 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# Raised 45→75: step 10b now exercises pause→resume→online +
# hibernate→wake→online, each of which RE-PROVISIONS the parent (CP
# re-provision + heartbeat recovery, not a fresh EC2 cold start, but still
# minutes). The base provision→online→A2A matrix fits in ~35 min; the two
# extra lifecycle reprovisions need headroom under WORKSPACE_ONLINE_TIMEOUT.
timeout-minutes: 75
timeout-minutes: 45
permissions:
contents: read
@@ -166,39 +155,15 @@ jobs:
# E2E_RUNTIME=hermes or =codex via workflow_dispatch can still
# exercise the OpenAI path.
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
# google-adk (operator-dispatched only) auths Gemini with an
# AI-Studio key. Org policy disallows API keys in PROD (Vertex+ADC
# there); CI uses the keyed AI-Studio path with config model
# google_genai:gemini-2.5-pro. Vertex remains the supported prod path.
E2E_GOOGLE_API_KEY: ${{ secrets.MOLECULE_STAGING_GOOGLE_API_KEY }}
E2E_RUNTIME: ${{ github.event.inputs.runtime || 'claude-code' }}
# Pin the model when running on the default claude-code path —
# the per-runtime default ("sonnet") routes to direct Anthropic
# and defeats the cost saving. Operators can override via the
# workflow_dispatch flow (no input wired here yet — runtime
# override is enough for ad-hoc).
#
# #2263 deploy-skew: the claude-code default is the COLON-namespaced BYOK
# id `minimax:MiniMax-M2.7`, NOT bare `MiniMax-M2`. The deployed staging
# ws-server's compiled registry can lag source; validateRegisteredModelForRuntime
# 400s the bare form on an older image (the sibling Platform Boot job, on
# the SAME image, succeeds with namespaced `moonshot/kimi-k2.6`). The colon
# form stays in the BYOK `minimax` arm (providers.yaml:851) so it resolves
# provider=minimax (BYOK) and the #1994 byok-not-platform guard still
# passes — the slash/platform form `minimax/MiniMax-M2.7` would not.
E2E_MODEL_SLUG: ${{ github.event.inputs.runtime == 'hermes' && 'openai/gpt-4o' || github.event.inputs.runtime == 'codex' && 'openai/gpt-4o' || github.event.inputs.runtime == 'google-adk' && 'google_genai:gemini-2.5-pro' || 'minimax:MiniMax-M2.7' }}
E2E_MODEL_SLUG: ${{ github.event.inputs.runtime == 'hermes' && 'openai/gpt-4o' || github.event.inputs.runtime == 'codex' && 'openai/gpt-4o' || 'MiniMax-M2' }}
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
# Lifecycle transitions (step 10b): pause→resume→online +
# hibernate→wake→online on the provisioned parent. `auto` runs them in
# full mode (this job). Set `off` to skip the ~2x-reprovision cost on an
# ad-hoc dispatch. The timeout-minutes above is sized for this being on.
E2E_LIFECYCLE: auto
# Fail-closed-on-skip: in CI the harness MUST prove ≥1 full
# provision→online→A2A cycle. If it reaches the end having validated
# nothing (a future short-circuit / skip path), it exits 5 rather than
# reporting a false green. Mirrors CP serving-e2e SERVING_E2E_REQUIRE_LIVE.
E2E_REQUIRE_LIVE: '1'
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -245,10 +210,6 @@ jobs:
required_secret_name="MOLECULE_STAGING_OPENAI_API_KEY"
required_secret_value="${E2E_OPENAI_API_KEY:-}"
;;
google-adk)
required_secret_name="MOLECULE_STAGING_GOOGLE_API_KEY"
required_secret_value="${E2E_GOOGLE_API_KEY:-}"
;;
*)
echo "::warning::Unknown E2E_RUNTIME='${E2E_RUNTIME}' — skipping LLM-key check"
required_secret_name=""
@@ -343,152 +304,3 @@ jobs:
echo "::warning::saas teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
# ── PLATFORM-MANAGED BOOT REGRESSION (moonshot/kimi NOT_CONFIGURED) ──────────
#
# The REAL-boot complement to the deterministic unit suite
# (workspace_provision_platform_boot_test.go). Provisions a REAL staging
# claude-code workspace on the PLATFORM-managed path — provider=platform,
# model=moonshot/kimi-k2.6, NO tenant LLM key — and asserts it reaches
# status=online (NOT not_configured) and a completion returns 200, via the same
# online-wait + completion-assert the BYOK job uses.
#
# Why a SEPARATE job (not a matrix leg of e2e-staging-saas): the platform path
# injects NO secret and pins a different model, so its env block diverges from
# the BYOK job's. A dedicated job keeps each path's "verify key present" preflight
# honest (BYOK requires a key; platform requires its ABSENCE not to matter) and
# gives the regression its own named commit-status for branch protection.
#
# Add `E2E Staging Platform Boot` to branch protection after 3 consecutive
# green runs on main (de-flake window; this path shares the cp#245
# boot-timeout flake surface the BYOK job has, so it must prove stable before
# it can BLOCK — see the gate-making plan in the PR body).
# bp-required: pending #2187
e2e-staging-platform-boot:
name: E2E Staging Platform Boot
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface without blocking until the de-flake window
# closes. mc#1982: do NOT renew this mask silently — the gate-making plan
# tracks the flip to false under #2187.
continue-on-error: true
timeout-minutes: 45
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
# The regression combo: claude-code + platform-managed + moonshot/kimi-k2.6.
# NO E2E_*_API_KEY is set — platform-managed billing is owned by Molecule via
# the CP LLM proxy. The harness's E2E_LLM_PATH=platform branch sends empty
# secrets and pin-selects the platform model.
E2E_RUNTIME: claude-code
E2E_LLM_PATH: platform
# Smoke mode: a single parent workspace is enough to prove online +
# completion for the platform path (the A2A/delegation matrix is the BYOK
# job's job). Override E2E_DEFAULT_PLATFORM_MODEL via workflow_dispatch to
# exercise another platform model id.
E2E_MODE: smoke
E2E_RUN_ID: "platform-${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
# Fail-closed-on-skip (see BYOK job). smoke mode still runs steps 2/4/7/8b,
# so all four required milestones (provisioned/tenant_online/
# workspace_online/a2a_roundtrip) fire — the guard is valid for this lane too.
E2E_REQUIRE_LIVE: '1'
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — EC2 leak verification cannot run"
exit 2
fi
done
echo "Admin token present ✓"
- name: Assert NO BYOK key leaks into the platform run
run: |
# The whole point of this job is the platform-managed path. A stray
# E2E_*_API_KEY in the runner env would (via the harness) still be
# skipped by the E2E_LLM_PATH=platform branch — but assert their
# absence loudly here so a future env edit can't silently convert this
# into a masked BYOK run that no longer exercises the regression.
for var in E2E_MINIMAX_API_KEY E2E_ANTHROPIC_API_KEY E2E_OPENAI_API_KEY; do
if [ -n "${!var:-}" ]; then
echo "::warning::$var is set in this platform-boot job's env — the harness ignores it on E2E_LLM_PATH=platform, but it should not be wired here."
fi
done
echo "Platform-managed path: no tenant LLM key required ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a workspace bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run platform-managed boot E2E (online + completion)
id: e2e
run: bash tests/e2e/test_staging_full_saas.sh
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
# smoke mode slugs are e2e-smoke-YYYYMMDD-platform-<run_id>-...
if run_id:
prefixes = tuple(f'e2e-smoke-{d}-platform-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-smoke-{d}-platform-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/plat-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/plat-cleanup.code
set -e
code=$(cat /tmp/plat-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::platform-boot teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/plat-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::platform-boot teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
+1 -1
View File
@@ -26,7 +26,7 @@ env:
concurrency:
group: e2e-staging-sanity
cancel-in-progress: false
cancel-in-progress: true
permissions:
issues: write
@@ -69,7 +69,7 @@ on:
branches: [main, staging]
concurrency:
group: handlers-pg-integ-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -88,9 +88,8 @@ jobs:
# surprises and keeps the routing rule discoverable in one place.
runs-on: docker-host
# mc#1982 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
handlers: ${{ steps.filter.outputs.handlers }}
steps:
@@ -120,9 +119,8 @@ jobs:
# exists). See detect-changes for the full routing rationale.
runs-on: docker-host
# mc#1982 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
# Unique name per run so concurrent jobs don't collide on the
# bridge network. ${RUN_ID}-${RUN_ATTEMPT} is unique even across
@@ -243,8 +241,7 @@ jobs:
# MUST exist for the integration tests to be meaningful. Hard-
# fail if any didn't land — that would be a real regression we
# want loud.
# workspace_schedules added for the #2149 scheduler integration tests.
for tbl in delegations workspaces activity_logs pending_uploads workspace_schedules; do
for tbl in delegations workspaces activity_logs pending_uploads; do
if ! psql -h "${PG_HOST}" -U postgres -d molecule -tA \
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
| grep -q 1; then
@@ -254,19 +251,6 @@ jobs:
echo "✓ $tbl table present"
done
- if: needs.detect-changes.outputs.handlers == 'true'
name: Preflight — INTEGRATION_DB_URL must be present
run: |
# Belt-and-suspenders: if the postgres-start step failed to
# export INTEGRATION_DB_URL, fail loud BEFORE go test can
# t.Skip its way to a green build. Closes the workflow-level
# fail-open gap identified in PR #2166 blocker #2.
if [ -z "${INTEGRATION_DB_URL:-}" ]; then
echo "::error::INTEGRATION_DB_URL is empty — postgres-start step did not export the connection string"
exit 1
fi
echo "INTEGRATION_DB_URL is set"
- if: needs.detect-changes.outputs.handlers == 'true'
name: Run integration tests
run: |
@@ -275,16 +259,6 @@ jobs:
# workflow runs don't fight over a host-net 5432 port.
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
- if: needs.detect-changes.outputs.handlers == 'true'
name: Run scheduler integration tests (#2149)
run: |
# #2149: real-PG regression coverage for the scheduler firing loop
# (tick → A2A fire → write-back of last_run_at/next_run_at/run_count/
# activity_logs jsonb incl. invalid-UTF-8 sanitization + sweepPhantomBusy).
# Reuses the same migrated Postgres (workspace_schedules / activity_logs
# / workspaces all landed by the migration replay step above).
go test -tags=integration -timeout 5m -v ./internal/scheduler/ -run "^TestIntegration_"
- if: failure() && needs.detect-changes.outputs.handlers == 'true'
name: Diagnostic dump on failure
env:
+1 -1
View File
@@ -54,7 +54,7 @@ concurrency:
# cancellation deadlock — see e2e-api.yml's concurrency block for
# the 2026-04-28 incident that codified this pattern.
group: harness-replays-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -1,6 +1,6 @@
name: lint-bp-context-emit-match
# Tier 2f scheduled lint (per mc#1982) — detects drift between
# Tier 2f scheduled lint (per mc#774) — detects drift between
# `branch_protections/<branch>.status_check_contexts` and the set of
# contexts emitted by `.gitea/workflows/*.yml`.
#
@@ -60,7 +60,7 @@ name: lint-bp-context-emit-match
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - internal#349 (cross-repo BP sweep)
# - feedback_phantom_required_check_after_gitea_migration
# - feedback_tier_label_ids_are_per_repo
@@ -91,10 +91,10 @@ jobs:
name: lint-bp-context-emit-match
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port,
# well past the 7-clean-run threshold. Scheduled failure is now
# a hard CI signal.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface drift without blocking. After 7
# clean scheduled runs on main, flip to false so a scheduled
# failure is a hard CI signal.
continue-on-error: true # mc#1982 Phase 3 — flip to false after 7 clean main runs
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
@@ -45,12 +45,12 @@ name: lint-continue-on-error-tracking
# close-and-flip, or document the deliberate keep-mask in a fresh
# 14-day-renewable tracker. After main is clean for 3 days,
# follow-up PR flips this workflow's continue-on-error to false.
# Tracking: mc#1982.
# Tracking: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#1982 (the empirical masked-3-weeks case)
# - mc#774 (the RFC that specs this lint)
# - mc#774 (the empirical masked-3-weeks case)
# - feedback_chained_defects_in_never_tested_workflows
# - feedback_behavior_based_ast_gates
# - feedback_strict_root_only_after_class_a
@@ -48,9 +48,11 @@ jobs:
scan:
name: Scan workflows for curl status-capture pollution
runs-on: ubuntu-latest
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port.
# mc#1982 mask removed — no surfaced defects in this lane.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Find curl ... -w '%{http_code}' ... || echo "000" subshells
@@ -25,21 +25,6 @@ name: Lint forbidden tenant-env keys
# feedback_path_filtered_workflow_cant_be_required). The scan itself
# targets workspace_secrets-writer paths via grep -r; it's fast
# (sub-second) so unconditional run is fine.
#
# ── 2026-06-01 CI-scheduler-fanout consolidation (fix/ci-scheduler-fanout) ──
# The RFC#523 sibling lint formerly in its own file
# `lint-no-tenant-gitea-token.yml` (the broader "no repo-host token into
# any tenant-writer surface" scan) is now a SECOND job in THIS workflow
# (`scan-tenant-token-write`). Both are sub-second Go-source greps that
# fired as two separate workflow runs on every PR — pure scheduler
# fan-out. Folding the sibling in here drops one workflow run + one
# checkout per PR while keeping BOTH scans firing unconditionally on
# every PR (the no-paths discipline above is preserved — neither job is
# paths-filtered). The moved job keeps its exact `name:` so its emitted
# status context is unchanged in substance; its `# bp-exempt:` directive
# moves with it (Tier 2g). The old `Lint no tenant GITEA or GITHUB token
# write / …` context is retired (a disappearing context needs no
# directive; only NEW emitters do).
on:
pull_request:
@@ -181,126 +166,3 @@ jobs:
fi
echo "OK No forbidden operator-scope env key names hardcoded in writer paths."
# bp-exempt: advisory RFC#523 lint; PR review gate is review-driven, not BP-driven.
# (Carried with the workflow-name rename in PR mc#1593 so the renamed
# context emission satisfies lint_required_context_exists_in_bp Tier 2g.)
scan-tenant-token-write:
name: Scan for repo-host token write into tenant workspace surface
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Find Go files referencing a tenant-writer surface AND a repo-host token
run: |
set -euo pipefail
# Repo-host token NAMES — the threat-model subset. Operator-fleet
# tokens (CP_ADMIN_API_TOKEN, RAILWAY_TOKEN, INFISICAL_*) are
# caught by lint-forbidden-env-keys.yml's broader deny set; this
# lint focuses on the git-host class so a single co-occurrence
# match has a low false-positive rate.
FORBIDDEN_KEYS=(
"GITEA_TOKEN"
"GITEA_PAT"
"GITHUB_TOKEN"
"GITHUB_PAT"
"GH_TOKEN"
)
# Tenant-writer surface markers. A file matches the surface set
# if it references ANY of these strings. This is the "is this
# code path writing into a tenant workspace?" heuristic.
# Curated to catch the actual code shapes used in this repo
# (verified by grep against current main 2026-05-19):
# - "workspace_secrets" / "global_secrets" → DB table writes
# - "seedAllowList" → CP-side seed table
# - "/settings/secrets" → tenant HTTP API write
# - "envVars[" → in-memory env map write
# - "containerEnv" → docker-run env-set
# - "userData" → EC2 user-data script
# - "provisionPayload" / "provisionContext" → provision-request shape
SURFACE_PATTERN='workspace_secrets|global_secrets|seedAllowList|/settings/secrets|envVars\[|containerEnv|userData|provisionPayload|provisionContext'
# Files that legitimately reference these names AND a surface
# marker, but do so for guard / strip / test / doc-comment
# reasons. New entries require reviewer signoff and a one-line
# justification in the diff.
EXEMPT_FILES=(
# RFC#523 L1 deny-set source-of-truth + tests
"workspace-server/internal/handlers/workspace_provision_forbidden_env.go"
"workspace-server/internal/handlers/workspace_provision_forbidden_env_test.go"
# Forensic-#145 silent-strip denylist (defense-in-depth, by design lists the names)
"workspace-server/internal/provisioner/provisioner.go"
"workspace-server/internal/provisioner/provisioner_test.go"
# Pre-RFC#523 persona-fallback / org-helper paths. The L1
# fail-closed runs BEFORE these writers; downstream silent-strip
# also covers them. See applyAgentGitHTTPCreds doc-comment.
"workspace-server/internal/handlers/agent_git_identity.go"
"workspace-server/internal/handlers/org_helpers.go"
"workspace-server/internal/handlers/org.go"
# CP→platform admin auth (NOT a tenant env write).
"workspace-server/internal/provisioner/cp_provisioner.go"
)
# Build an extended-regex alternation of forbidden keys.
KEY_ALT="$(IFS='|'; echo "${FORBIDDEN_KEYS[*]}")"
# Find candidate files: Go non-test sources that contain a
# tenant-writer surface marker.
mapfile -t CANDIDATES < <(
grep -rlE --include='*.go' --exclude='*_test.go' \
"${SURFACE_PATTERN}" . 2>/dev/null \
| sed 's|^\./||' \
| sort -u
)
if [ "${#CANDIDATES[@]}" -eq 0 ]; then
echo "OK No tenant-writer-surface files found in tree (unexpected, but not a lint failure)."
exit 0
fi
HITS=""
for f in "${CANDIDATES[@]}"; do
# Skip exempt files.
skip=0
for ex in "${EXEMPT_FILES[@]}"; do
if [ "$f" = "$ex" ]; then skip=1; break; fi
done
[ "$skip" = "1" ] && continue
# File contains a surface marker; now grep for a forbidden
# key NAME. We require a QUOTED-literal match to avoid
# firing on a comment like "// also handle GITEA_TOKEN".
#
# The literal form catches:
# - os.Getenv("GITEA_TOKEN")
# - envVars["GITEA_TOKEN"] = ...
# - {envKey: "GITEA_TOKEN", tenantKey: "GITEA_TOKEN"}
# but not:
# - // see GITEA_TOKEN below (no quotes)
found=$(grep -nE "\"(${KEY_ALT})\"" "$f" 2>/dev/null || true)
if [ -n "$found" ]; then
HITS="${HITS}--- ${f} ---\n${found}\n"
fi
done
if [ -n "$HITS" ]; then
echo "::error::Task #146 lint: repo-host token name(s) quoted in a tenant-writer-surface file:"
printf "$HITS"
echo ""
echo "These files reference a tenant-writer surface (workspace_secrets,"
echo "seedAllowList, /settings/secrets, containerEnv, userData, etc.)"
echo "AND quote a repo-host token name (GITEA_TOKEN/GITHUB_TOKEN/…)."
echo "Per RFC#523 threat model, tenant workspaces MUST NOT receive"
echo "operator-scope repo-host tokens. If your code legitimately needs"
echo "to reference one of these names in a tenant-writer file (e.g."
echo "a deny-set definition or silent-strip list), add the file to"
echo "EXEMPT_FILES with a one-line justification — reviewer signoff"
echo "required."
exit 1
fi
echo "OK No tenant-writer-surface file co-mentions a repo-host token literal."
+4 -4
View File
@@ -1,6 +1,6 @@
name: lint-mask-pr-atomicity
# Tier 2d hard-gate lint (per mc#1982) — blocks PRs that touch
# Tier 2d hard-gate lint (per mc#774) — blocks PRs that touch
# `.gitea/workflows/ci.yml` and modify ONLY ONE of {continue-on-error,
# all-required.sentinel.needs} without a `Paired: #NNN` reference in
# the PR body or in a commit message.
@@ -37,13 +37,13 @@ name: lint-mask-pr-atomicity
# This workflow lands at `continue-on-error: true` (Phase 3 — surface
# regressions without blocking PRs while the rule beds in).
# Follow-up PR flips to `false` once we have ≥3 days of clean runs on
# `main` and no false-positives. Tracking issue: mc#1982.
# `main` and no false-positives. Tracking issue: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - PR#665 / PR#668 (the empirical split-pair)
# - mc#1982 (the main-red incident the split caused)
# - mc#774 (the main-red incident the split caused)
# - feedback_strict_root_only_after_class_a
# - feedback_behavior_based_ast_gates
#
@@ -0,0 +1,182 @@
name: Lint no tenant GITEA or GITHUB token write
# Task #146 — CI guardrail companion to RFC#523's `lint-forbidden-env-keys.yml`.
#
# `lint-forbidden-env-keys.yml` (Layer 3) catches code that hardcodes a
# forbidden env-var key NAME as a quoted literal in workspace_secrets
# writer paths under workspace-server/internal/.
#
# This workflow catches a BROADER class: any code path that reads a
# repo-host token (GITEA_TOKEN / GITHUB_TOKEN / GH_TOKEN) and then writes
# it into a TENANT WORKSPACE's env, secret store, user-data, or
# provision payload. This is the actual RFC#523 threat-model statement —
# the goal is "no tenant workspace ever receives an operator-scope repo
# token," not just "no _quoted_ literal `GITEA_TOKEN`." A future writer
# could route the value via a variable, a struct field, or a config key
# and slip past the existing literal scan; this lint catches those
# routing patterns at PR review time.
#
# Scope
# Scans the WHOLE repo's Go sources (not just workspace-server/) for
# co-occurrences of:
# - a repo-host token NAME (GITEA_TOKEN / GITHUB_TOKEN / GH_TOKEN /
# GITEA_PAT / GITHUB_PAT) used as os.Getenv argument or string
# literal
# - within a file that ALSO references a tenant-writer surface
# (`tenant`, `workspace_secrets`, `global_secrets`, `seedAllowList`,
# `/settings/secrets`, `userData`, `provisionPayload`,
# `envVars[`, `containerEnv`).
#
# Co-occurrence (not single-line) is the false-positive control: a
# file that just LOGS the variable name (e.g. "missing GITEA_TOKEN")
# without touching any tenant surface won't fire.
#
# Drift contract with lint-forbidden-env-keys.yml
# Both lints share the same FORBIDDEN_KEYS list (a subset — only the
# repo-host tokens, since this lint's threat model is "tenant gets
# write access to operator's git host"). If RFC#523's deny set grows,
# update BOTH this file AND lint-forbidden-env-keys.yml AND the Go
# source-of-truth in
# workspace-server/internal/handlers/workspace_provision_forbidden_env.go.
#
# Open-source-template-friendly
# The patterns scanned are generic (no MOLECULE_-prefix literals).
# A fork can copy this workflow as-is and adjust FORBIDDEN_KEYS.
#
# Path-filter discipline
# No `paths:` filter — required-status workflows must run on every PR
# per `feedback_path_filtered_workflow_cant_be_required`. Scan is
# sub-second.
on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main, staging]
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: advisory RFC#523 lint; PR review gate is review-driven, not BP-driven.
# (Carried with the workflow-name rename in PR mc#1593 so the renamed
# context emission satisfies lint_required_context_exists_in_bp Tier 2g.)
scan:
name: Scan for repo-host token write into tenant workspace surface
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Find Go files referencing a tenant-writer surface AND a repo-host token
run: |
set -euo pipefail
# Repo-host token NAMES — the threat-model subset. Operator-fleet
# tokens (CP_ADMIN_API_TOKEN, RAILWAY_TOKEN, INFISICAL_*) are
# caught by lint-forbidden-env-keys.yml's broader deny set; this
# lint focuses on the git-host class so a single co-occurrence
# match has a low false-positive rate.
FORBIDDEN_KEYS=(
"GITEA_TOKEN"
"GITEA_PAT"
"GITHUB_TOKEN"
"GITHUB_PAT"
"GH_TOKEN"
)
# Tenant-writer surface markers. A file matches the surface set
# if it references ANY of these strings. This is the "is this
# code path writing into a tenant workspace?" heuristic.
# Curated to catch the actual code shapes used in this repo
# (verified by grep against current main 2026-05-19):
# - "workspace_secrets" / "global_secrets" → DB table writes
# - "seedAllowList" → CP-side seed table
# - "/settings/secrets" → tenant HTTP API write
# - "envVars[" → in-memory env map write
# - "containerEnv" → docker-run env-set
# - "userData" → EC2 user-data script
# - "provisionPayload" / "provisionContext" → provision-request shape
SURFACE_PATTERN='workspace_secrets|global_secrets|seedAllowList|/settings/secrets|envVars\[|containerEnv|userData|provisionPayload|provisionContext'
# Files that legitimately reference these names AND a surface
# marker, but do so for guard / strip / test / doc-comment
# reasons. New entries require reviewer signoff and a one-line
# justification in the diff.
EXEMPT_FILES=(
# RFC#523 L1 deny-set source-of-truth + tests
"workspace-server/internal/handlers/workspace_provision_forbidden_env.go"
"workspace-server/internal/handlers/workspace_provision_forbidden_env_test.go"
# Forensic-#145 silent-strip denylist (defense-in-depth, by design lists the names)
"workspace-server/internal/provisioner/provisioner.go"
"workspace-server/internal/provisioner/provisioner_test.go"
# Pre-RFC#523 persona-fallback / org-helper paths. The L1
# fail-closed runs BEFORE these writers; downstream silent-strip
# also covers them. See applyAgentGitHTTPCreds doc-comment.
"workspace-server/internal/handlers/agent_git_identity.go"
"workspace-server/internal/handlers/org_helpers.go"
"workspace-server/internal/handlers/org.go"
# CP→platform admin auth (NOT a tenant env write).
"workspace-server/internal/provisioner/cp_provisioner.go"
)
# Build an extended-regex alternation of forbidden keys.
KEY_ALT="$(IFS='|'; echo "${FORBIDDEN_KEYS[*]}")"
# Find candidate files: Go non-test sources that contain a
# tenant-writer surface marker.
mapfile -t CANDIDATES < <(
grep -rlE --include='*.go' --exclude='*_test.go' \
"${SURFACE_PATTERN}" . 2>/dev/null \
| sed 's|^\./||' \
| sort -u
)
if [ "${#CANDIDATES[@]}" -eq 0 ]; then
echo "OK No tenant-writer-surface files found in tree (unexpected, but not a lint failure)."
exit 0
fi
HITS=""
for f in "${CANDIDATES[@]}"; do
# Skip exempt files.
skip=0
for ex in "${EXEMPT_FILES[@]}"; do
if [ "$f" = "$ex" ]; then skip=1; break; fi
done
[ "$skip" = "1" ] && continue
# File contains a surface marker; now grep for a forbidden
# key NAME. We require a QUOTED-literal match to avoid
# firing on a comment like "// also handle GITEA_TOKEN".
#
# The literal form catches:
# - os.Getenv("GITEA_TOKEN")
# - envVars["GITEA_TOKEN"] = ...
# - {envKey: "GITEA_TOKEN", tenantKey: "GITEA_TOKEN"}
# but not:
# - // see GITEA_TOKEN below (no quotes)
found=$(grep -nE "\"(${KEY_ALT})\"" "$f" 2>/dev/null || true)
if [ -n "$found" ]; then
HITS="${HITS}--- ${f} ---\n${found}\n"
fi
done
if [ -n "$HITS" ]; then
echo "::error::Task #146 lint: repo-host token name(s) quoted in a tenant-writer-surface file:"
printf "$HITS"
echo ""
echo "These files reference a tenant-writer surface (workspace_secrets,"
echo "seedAllowList, /settings/secrets, containerEnv, userData, etc.)"
echo "AND quote a repo-host token name (GITEA_TOKEN/GITHUB_TOKEN/…)."
echo "Per RFC#523 threat model, tenant workspaces MUST NOT receive"
echo "operator-scope repo-host tokens. If your code legitimately needs"
echo "to reference one of these names in a tenant-writer file (e.g."
echo "a deny-set definition or silent-strip list), add the file to"
echo "EXEMPT_FILES with a one-line justification — reviewer signoff"
echo "required."
exit 1
fi
echo "OK No tenant-writer-surface file co-mentions a repo-host token literal."
@@ -13,7 +13,7 @@ name: Lint pre-flip continue-on-error
# job-level status. The precondition the PR claimed to verify was
# structurally fooled by the bug being flipped.
#
# mc#1982 captured the surfaced defects (2 mutually-masked regressions):
# mc#774 captured the surfaced defects (2 mutually-masked regressions):
# - Class 1: sqlmock helper drift since 2f36bb9a (24 days old)
# - Class 2: OFFSEC-001 contract collision since 7d1a189f (1 day old)
#
@@ -55,7 +55,7 @@ name: Lint pre-flip continue-on-error
# - YAML parse error in one of the workflow files: warn-only,
# don't block — the YAML lint workflows catch this separately.
#
# Cross-links: PR#656, mc#1982, PR#665 (interim re-mask),
# Cross-links: PR#656, mc#774, PR#665 (interim re-mask),
# Quirk #10 (internal#342 + dup #287), hongming-pc2 charter
# §SOP-N rule (e), feedback_strict_root_only_after_class_a,
# feedback_no_shared_persona_token_use.
@@ -1,6 +1,6 @@
name: lint-required-context-exists-in-bp
# Tier 2g hard-gate lint (per mc#1982) — diff-based PR-time
# Tier 2g hard-gate lint (per mc#774) — diff-based PR-time
# check. When a PR adds a NEW commit-status emission (workflow YAML
# `name:` + job `name:`-or-key + on:-event), the workflow file must
# carry one of three directives adjacent to the new job:
@@ -16,7 +16,7 @@ name: lint-required-context-exists-in-bp
# PR#656 added `CI / all-required (pull_request)` as a sentinel
# context that workflows emit, but BP did NOT list it. When
# platform-build failed, all-required failed, but BP let the PR
# merge anyway → cascade to mc#1982. With this lint, PR#656 would
# merge anyway → cascade to mc#774. With this lint, PR#656 would
# have been blocked until either the BP PATCH ran alongside OR
# the author added a `bp-required: pending` directive.
#
@@ -27,7 +27,7 @@ name: lint-required-context-exists-in-bp
# share the workflow-context enumeration helpers
# (`_event_map`, `workflow_contexts`, `_job_display`) but the
# semantics are intentionally distinct so they're separate scripts.
# Co-design is documented in mc#1982.
# Co-design is documented in mc#774.
#
# Directive comment lives in the workflow file (NOT PR body)
# ----------------------------------------------------------
@@ -42,13 +42,13 @@ name: lint-required-context-exists-in-bp
# Lands at `continue-on-error: true` (Phase 3 — surface the
# pattern without blocking PRs while the directive convention
# beds in). After 7 days of clean runs on `main` with no false
# positives, follow-up flips to `false`. Tracking: mc#1982.
# positives, follow-up flips to `false`. Tracking: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - PR#656 (the empirical case)
# - mc#1982 (the surfaced cascade)
# - mc#774 (the surfaced cascade)
# - feedback_phantom_required_check_after_gitea_migration (Tier 2f cousin)
# - feedback_behavior_based_ast_gates
#
@@ -81,10 +81,10 @@ jobs:
name: lint-required-context-exists-in-bp
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port,
# well past the 7-clean-day threshold. PR-time failure is now
# a hard CI signal.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface the pattern without blocking PRs
# while the directive convention beds in. Follow-up flip to false
# after 7 clean days on main. mc#1982.
continue-on-error: true # mc#1982 Phase 3 — flip to false after 7 clean main runs
steps:
- name: Check out PR head with full history (need base SHA blobs)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -49,56 +49,37 @@ jobs:
GITHUB_SERVER_URL: https://git.moleculesai.app
steps:
- name: Identify runner
id: identify
continue-on-error: true
run: |
set -eu
echo "arch=$(uname -m)"
echo "kernel=$(uname -sr)"
echo "shell=$BASH_VERSION"
# Sanity: must actually be arm64. If amd64 sneaks in here,
# the job skips gracefully rather than hard-failing, because
# a mislabelled runner is an ops concern, not a code defect.
# Pilot lane must not make main red (#2146).
# fail fast — that means the label routing is wrong.
case "$(uname -m)" in
aarch64|arm64)
echo "arm64 confirmed"
echo "arm64=true" >> "$GITHUB_OUTPUT"
;;
*)
echo "ERROR: expected arm64, got $(uname -m) — label routing may be wrong"
echo "arm64=false" >> "$GITHUB_OUTPUT"
exit 1
;;
aarch64|arm64) echo "arm64 confirmed" ;;
*) echo "ERROR: expected arm64, got $(uname -m)"; exit 1 ;;
esac
- name: Checkout
if: steps.identify.outputs.arm64 == 'true'
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Install shellcheck (arm64)
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
if command -v shellcheck >/dev/null 2>&1; then
echo "shellcheck already present: $(shellcheck --version | head -1)"
else
# Prefer apt if the runner base ships it; else download the
# correct platform binary (darwin vs linux).
# Prefer apt if the runner base ships it; else download arm64 binary.
if command -v apt-get >/dev/null 2>&1; then
sudo apt-get update -qq
sudo apt-get install -y --no-install-recommends shellcheck
else
SC_VER=v0.10.0
if [ "$(uname -s)" = "Darwin" ]; then
SC_PKG="shellcheck-${SC_VER}.darwin.aarch64.tar.xz"
else
SC_PKG="shellcheck-${SC_VER}.linux.aarch64.tar.xz"
fi
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/${SC_PKG}" \
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/shellcheck-${SC_VER}.linux.aarch64.tar.xz" \
| tar -xJf - --strip-components=1
sudo mv shellcheck /usr/local/bin/
fi
@@ -106,15 +87,14 @@ jobs:
shellcheck --version | head -2
- name: Run shellcheck on .gitea/scripts/*.sh
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
# Only the scripts we control under .gitea/scripts. Pilot
# scope is intentionally narrow — broaden in a follow-up
# once the lane is proven.
if ! command -v shellcheck >/dev/null 2>&1 || ! shellcheck --version >/dev/null 2>&1; then
echo "WARN: shellcheck not functional — skipping (pilot mode)"
if ! command -v shellcheck >/dev/null 2>&1; then
echo "WARN: shellcheck binary not found — skipping (pilot mode)"
exit 0
fi
# NOTE: macOS ships Bash 3.2 (Apple license), no `mapfile`
+5 -153
View File
@@ -14,37 +14,10 @@ name: publish-canvas-image
# authenticate to ghcr.io.
#
# Builds, pushes, and (ordered) deploys the standalone canvas Docker image to
# ECR whenever a commit lands on main that touches canvas code.
#
# Ordered deploy (core#2226) — mirrors publish-workspace-server-image.yml so the
# standalone `molecule-ai/canvas` image is deterministic + verifiable, not a
# side effect of the platform fleet pulling a mutable `:latest`:
#
# build-and-push: build → push :staging-<sha> + :staging-latest + :sha-<sha>
# (does NOT move :latest — an unpromoted build must never
# become the prod-blessed tag).
# promote-canvas: waits for green main CI on this SHA, then re-points
# :latest to the verified :staging-<sha> by digest
# (imagetools create — no rebuild). So `:latest` == the
# current prod-blessed canvas, byte-identical to staging-<sha>.
#
# Tag scheme produced (parallels platform-tenant):
# :staging-<sha> — per-commit immutable digest, what docker-compose pins to.
# :staging-latest — most recent BUILD on main (last-writer-wins, NOT gated).
# :sha-<sha> — kept for back-compat with any consumer pinning the old tag.
# :latest — most recent CI-GREEN build. Only moved by promote-canvas.
#
# WHY this is the canvas analogue of the platform's deploy-production, not a
# literal copy: the standalone canvas co-deploys with the platform on the same
# host via the root docker-compose.yml (`docker compose pull && up -d`). Gating
# the canvas `:latest` promotion on the SAME green-main-CI signal the platform
# deploy waits on makes platform + canvas roll together by the same SHA. The
# canvas has no per-tenant fleet of its own and no /buildinfo endpoint, so there
# is no fleet-rollout / per-tenant verify step to mirror here — CI-green +
# digest-pin + immutable :staging-<sha> is the determinism contract. (A future
# canvas /buildinfo would let this assert the served SHA like the platform does;
# tracked in core#2226.)
# Builds and pushes the canvas Docker image to ECR whenever a commit lands
# on main that touches canvas code. Previously canvas changes were visible in
# CI (npm run build passed) but the live container was never updated —
# operators had to manually run `docker compose build canvas` each time.
#
# Mirror of publish-platform-image.yml, adapted for the Next.js canvas layer.
# See that workflow for inline notes on macOS Keychain isolation and QEMU.
@@ -57,7 +30,6 @@ on:
# platform-only / docs-only / MCP-only merges.
- 'canvas/**'
- '.gitea/workflows/publish-canvas-image.yml'
workflow_dispatch:
# NOTE (Gitea port): the original GitHub workflow had a
# `workflow_dispatch:` manual trigger for the
# non-canvas-merge-but-need-fresh-image scenario. Dropped in the
@@ -97,10 +69,6 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
# Exposed so promote-canvas re-points :latest to the EXACT per-commit tag
# this build produced (digest-level), never a re-resolved mutable tag.
staging_sha: ${{ steps.tags.outputs.staging_sha }}
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -172,7 +140,6 @@ jobs:
shell: bash
run: |
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
echo "staging_sha=staging-${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
- name: Resolve build args
id: build_args
@@ -208,19 +175,8 @@ jobs:
build-args: |
NEXT_PUBLIC_PLATFORM_URL=${{ steps.build_args.outputs.platform_url }}
NEXT_PUBLIC_WS_URL=${{ steps.build_args.outputs.ws_url }}
# Bake the merge SHA into the image so /api/buildinfo reports the
# served canvas SHA (core#2235). Mirrors how the platform image
# surfaces GIT_SHA at /buildinfo. Full 40-char SHA (not the
# 7-char tag) so the fleet redeploy verification can match exactly.
BUILD_SHA=${{ github.sha }}
# Ordered deploy (core#2226): the build job pushes the immutable
# per-commit tag + the build-tracking staging-latest + the legacy
# back-compat :sha-<sha> tag. It does NOT push :latest — :latest is
# the prod-blessed tag and is only re-pointed by promote-canvas after
# green main CI, so an unpromoted/red build can never become :latest.
tags: |
${{ env.IMAGE_NAME }}:${{ steps.tags.outputs.staging_sha }}
${{ env.IMAGE_NAME }}:staging-latest
${{ env.IMAGE_NAME }}:latest
${{ env.IMAGE_NAME }}:sha-${{ steps.tags.outputs.sha }}
# Gitea artifact-cache reachability is best-effort on the operator
# runner network. Do not let cache export fail an image that already
@@ -229,107 +185,3 @@ jobs:
org.opencontainers.image.source=https://git.moleculesai.app/${{ github.repository }}
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.description=Molecule AI canvas (Next.js 15 + React Flow)
# bp-exempt: post-merge canvas promote side-effect; merge is gated by CI /
# all-required and this job waits for green push CI on the SHA before acting.
promote-canvas:
name: Promote canvas :latest to CI-green build
needs: build-and-push
# Only on a real main push — workflow_dispatch / non-main never promotes.
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
# Side-effect deploy only; the image publish above is the durable artifact.
# mc#1982: do NOT renew this mask silently — it mirrors deploy-production's
# contract (a flaky promote must not red the ship lane), tracked in core#2226.
continue-on-error: true
runs-on: publish
timeout-minutes: 60
env:
# Same green-main-CI gate the platform deploy-production waits on, so
# platform + canvas advance :latest off the identical signal/SHA.
GITEA_HOST: git.moleculesai.app
GITEA_TOKEN: ${{ secrets.PROD_AUTO_DEPLOY_CONTROL_TOKEN || secrets.AUTO_SYNC_TOKEN }}
CI_STATUS_TIMEOUT_SECONDS: "3600"
# Re-uses the platform's disable kill-switch: when prod auto-deploy is
# paused, the canvas :latest promote pauses too (correct — an unpromoted
# build must not become :latest while the fleet is frozen).
PROD_AUTO_DEPLOY_DISABLED: ${{ vars.PROD_AUTO_DEPLOY_DISABLED || secrets.PROD_AUTO_DEPLOY_DISABLED || '' }}
steps:
# The publish runner's default HOME (/home/hongming) is not writable, so
# docker credential saves fail and halt the promote (#2193 on the platform
# side). Point HOME + DOCKER_CONFIG at the writable job temp dir.
- name: Prepare writable HOME + Docker config
run: |
set -euo pipefail
H="$RUNNER_TEMP/canvas-promote-home"
mkdir -p "$H/.docker"
echo "HOME=$H" >> "$GITHUB_ENV"
echo "DOCKER_CONFIG=$H/.docker" >> "$GITHUB_ENV"
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Resolve promote gate
id: gate
env:
PROD_AUTO_DEPLOY_DISABLED: ${{ env.PROD_AUTO_DEPLOY_DISABLED }}
run: |
set -euo pipefail
if [ -n "${PROD_AUTO_DEPLOY_DISABLED:-}" ]; then
case "$(printf '%s' "$PROD_AUTO_DEPLOY_DISABLED" | tr '[:upper:]' '[:lower:]')" in
1|true|yes|on|disabled|disable)
echo "enabled=false" >> "$GITHUB_OUTPUT"
echo "::notice::Canvas :latest promote skipped: PROD_AUTO_DEPLOY_DISABLED=$PROD_AUTO_DEPLOY_DISABLED"
{
echo "## Canvas :latest promote skipped"
echo ""
echo "Reason: \`PROD_AUTO_DEPLOY_DISABLED=$PROD_AUTO_DEPLOY_DISABLED\`. The CI-green build is published as \`:staging-${GITHUB_SHA::7}\`; \`:latest\` was left unchanged."
} >> "$GITHUB_STEP_SUMMARY"
exit 0 ;;
esac
fi
if [ -z "${GITEA_TOKEN:-}" ]; then
echo "::error::AUTO_SYNC_TOKEN/PROD_AUTO_DEPLOY_CONTROL_TOKEN is required so the canvas promote can wait for green CI."
exit 1
fi
echo "enabled=true" >> "$GITHUB_OUTPUT"
- name: Wait for green main CI on this SHA
if: ${{ steps.gate.outputs.enabled == 'true' }}
run: |
set -euo pipefail
# Same SSOT wait the platform deploy uses: blocks until the required
# push contexts (CI / all-required (push) + Secret scan) go green on
# THIS sha, and fails closed if any required context terminally fails.
python3 .gitea/scripts/prod-auto-deploy.py wait-ci
- name: Promote canvas :latest to the CI-green image
if: ${{ steps.gate.outputs.enabled == 'true' }}
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
STAGING_SHA_TAG: ${{ needs.build-and-push.outputs.staging_sha }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
run: |
set -euo pipefail
# Fail-safe: if the build job's output didn't propagate, recompute the
# immutable per-commit tag from the SHA so we never promote a guess.
SHA_TAG="${STAGING_SHA_TAG:-staging-${GITHUB_SHA::7}}"
ECR_REGISTRY="${IMAGE_NAME%%/*}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
# Digest-level re-tag (no pull/rebuild): :latest becomes byte-identical
# to the verified :staging-<sha> for this commit.
docker buildx imagetools create \
--tag "${IMAGE_NAME}:latest" \
"${IMAGE_NAME}:${SHA_TAG}"
{
echo "## Canvas :latest promoted"
echo ""
echo "Re-pointed \`molecule-ai/canvas:latest\` → \`${SHA_TAG}\` (by digest)."
echo ":latest now tracks the CI-green canvas build for commit \`${GITHUB_SHA::7}\`."
echo ""
echo "Tenants/hosts that \`docker compose pull canvas\` now get the same build the platform deploy rolled for this SHA."
} >> "$GITHUB_STEP_SUMMARY"
@@ -16,24 +16,14 @@ name: publish-workspace-server-image
#
# Image tags produced:
# :staging-<sha> — per-commit digest, stable for canary verify
# :staging-latest — tracks most recent BUILD on this branch (set by the
# build job, last-writer-wins, NOT prod-gated)
# :latest — tracks the most recent PROD-PROMOTED build. Re-pointed by the
# deploy-production job ONLY after green main CI + canary +
# fleet rollout + /buildinfo verification pass. So :latest ==
# "current prod image", never the raw build. (Added 2026-06-03
# after a stale :latest — last moved 2026-05-10 — reverted a
# production tenant on a no-arg redeploy.)
# :staging-latest — tracks most recent build on this branch
#
# Production auto-deploy:
# After both platform and tenant images are pushed, deploy-production waits
# for strict required push contexts on the same SHA to go green, then
# calls the production CP redeploy-fleet endpoint with target_tag=
# staging-<sha>. On success (rollout + buildinfo verified) it re-points
# :latest to the same SHA. Set repo variable or secret
# PROD_AUTO_DEPLOY_DISABLED=true to stop production rollout while keeping
# image publishing enabled — in which case :latest is NOT advanced either
# (correct: an unpromoted build must not become :latest).
# staging-<sha>. Set repo variable or secret PROD_AUTO_DEPLOY_DISABLED=true
# to stop production rollout while keeping image publishing enabled.
#
# Primary ECR target: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*
# Optional staging tenant mirror target:
@@ -115,19 +105,6 @@ jobs:
echo "Docker daemon OK"
echo "::endgroup::"
# Pre-flight: verify every repo in manifest.json actually exists.
#
# Why: deleting a template repo without updating manifest.json breaks
# clone-manifest.sh with a generic git 404, which looks like a
# transient network error and wastes debug time. We catch it here
# with a per-entry ::error:: annotation naming the missing repo
# (issue #2192). This is the push-time complement to PR #2186's
# PR-time manifest-entry-existence gate.
- name: Validate manifest entries exist
run: |
set -euo pipefail
bash scripts/check-manifest-repos-exist.sh manifest.json
# Pre-clone manifest deps before docker build.
#
# Why: workspace-template-* repos on Gitea are private. The pre-fix
@@ -275,25 +252,7 @@ jobs:
PROD_AUTO_DEPLOY_BATCH_SIZE: ${{ vars.PROD_AUTO_DEPLOY_BATCH_SIZE || '3' }}
PROD_AUTO_DEPLOY_DRY_RUN: ${{ vars.PROD_AUTO_DEPLOY_DRY_RUN || '' }}
PROD_ALLOW_NON_PROD_CP_URL: ${{ vars.PROD_ALLOW_NON_PROD_CP_URL || '' }}
# #2213: per-tenant /buildinfo settle budget. A freshly-swapped tenant can
# keep serving the old image at the edge for a short drain window; the
# verify step polls each tenant up to this budget before declaring it stale.
PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS || '240' }}
PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS || '20' }}
steps:
# The publish runner's default HOME (/home/hongming) is not writable, so
# git/docker credential saves fail (`Error saving credentials: mkdir
# /home/hongming: permission denied`) and halt the production rollout
# (#2193). Point HOME + DOCKER_CONFIG at the writable job temp dir —
# mirrors build-and-push's "Prepare writable Docker config" fix above.
- name: Prepare writable HOME + Docker config
run: |
set -euo pipefail
H="$RUNNER_TEMP/auto-deploy-home"
mkdir -p "$H/.docker"
echo "HOME=$H" >> "$GITHUB_ENV"
echo "DOCKER_CONFIG=$H/.docker" >> "$GITHUB_ENV"
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -338,50 +297,8 @@ jobs:
set -euo pipefail
python3 .gitea/scripts/prod-auto-deploy.py wait-ci
# Superseded-job guard — BEFORE any production side effect (#2213).
#
# This workflow has no `concurrency:` (see header: Gitea 1.22.6 cancels
# queued prod deploys). So two close main pushes run BOTH deploy-production
# jobs. The verify step already skips its strict /buildinfo check when this
# job is superseded (#2194) — but that guard was AFTER the redeploy and the
# :latest promote, so an OLDER job that started late still:
# 1. rolled the whole fleet BACKWARD to its older tag (canary hongming
# was reverted from the newer SHA — the #2213 red), then
# 2. promoted :latest backward to the older image,
# and only THEN skipped verify and exited green. A superseded job must do
# NEITHER. We re-check the branch head here, immediately before the rollout,
# and skip every side effect when a newer commit already owns main.
#
# exit 0 + non-empty stdout => superseded (newer head printed); the redeploy
# and promote steps are gated off via this output. exit 10 => this job is
# still the latest, proceed to roll the fleet. Fail-safe: a head that can't
# be read returns NOT-superseded (exit 10), so a genuine deploy is never
# silently skipped. (Re-checked again at verify time to catch a newer job
# that lands DURING this rollout.)
- name: Check superseded before production side effects
id: supersede
if: ${{ steps.plan.outputs.enabled == 'true' }}
run: |
set -euo pipefail
set +e
NEWER_HEAD="$(python3 .gitea/scripts/prod-auto-deploy.py check-superseded)"
SUPERSEDED_EXIT=$?
set -e
if [ "$SUPERSEDED_EXIT" -eq 0 ] && [ -n "$NEWER_HEAD" ]; then
echo "superseded=true" >> "$GITHUB_OUTPUT"
echo "::notice::Superseded before rollout: main head is now ${NEWER_HEAD:0:7} (this job deploys ${GITHUB_SHA:0:7}). Skipping redeploy + :latest promote so an older job never rolls the fleet backward."
{
echo "## Production auto-deploy skipped — superseded before rollout"
echo ""
echo "This deploy job's SHA \`${GITHUB_SHA:0:7}\` is no longer the head of \`main\` (now \`${NEWER_HEAD:0:7}\`)."
echo "A newer deploy job owns the fleet; rolling it backward to this older build would revert tenants and \`:latest\`. No side effects performed."
} >> "$GITHUB_STEP_SUMMARY"
else
echo "superseded=false" >> "$GITHUB_OUTPUT"
fi
- name: Call production CP redeploy-fleet
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
if: ${{ steps.plan.outputs.enabled == 'true' }}
run: |
set -euo pipefail
python3 .gitea/scripts/prod-auto-deploy.py assert-enabled
@@ -410,27 +327,13 @@ jobs:
echo ""
echo "### Per-tenant result"
echo ""
echo "| Slug | Phase | SSM Status | Exit | Healthz | On target | Error present |"
echo "|------|-------|------------|------|---------|-----------|---------------|"
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \(.verified_on_target) | \((.error // "") != "") |"' "$HTTP_RESPONSE" || true
# internal#724: stragglers are tenants enumerated but not proven
# on the target build. Surface them loudly — a non-empty list
# means the rollout did NOT fully land.
STRAGGLERS="$(jq -r '(.stragglers // []) | join(", ")' "$HTTP_RESPONSE")"
if [ -n "$STRAGGLERS" ]; then
echo ""
echo "### ⚠ Stragglers (NOT on target tag \`$TARGET_TAG\`)"
echo ""
echo "\`$STRAGGLERS\`"
fi
echo "| Slug | Phase | SSM Status | Exit | Healthz | Error present |"
echo "|------|-------|------------|------|---------|---------------|"
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \((.error // "") != "") |"' "$HTTP_RESPONSE" || true
} >> "$GITHUB_STEP_SUMMARY"
OK="$(jq -r '.ok' "$HTTP_RESPONSE")"
if [ "$OK" != "true" ]; then
STRAGGLERS="$(jq -r '(.stragglers // []) | join(", ")' "$HTTP_RESPONSE")"
if [ -n "$STRAGGLERS" ]; then
echo "::error::incomplete rollout — tenants not on target tag $TARGET_TAG: $STRAGGLERS"
fi
echo "::error::redeploy-fleet reported ok=false; production rollout halted."
exit 1
fi
@@ -440,66 +343,18 @@ jobs:
fi
- name: Verify reachable tenants report this SHA
# Skip when superseded BEFORE rollout: the redeploy step did not run, so
# there is no redeploy-fleet response to verify against and the newer job
# owns verification (#2213). The in-step guard below still catches the
# case where a newer job lands DURING this job's rollout.
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
if: ${{ steps.plan.outputs.enabled == 'true' }}
env:
TENANT_DOMAIN: moleculesai.app
run: |
set -euo pipefail
RESP="$RUNNER_TEMP/prod-redeploy-response.json"
# Superseded-job guard. This workflow has no `concurrency:` (header
# explains why: Gitea 1.22.6 cancels queued prod deploys). So two
# close main pushes run BOTH deploy-production jobs. The newer one
# rolls the fleet to its (newer) build first; this older job's strict
# equality check below would then see tenants on the NEWER SHA and
# false-red "$slug is stale" even though the fleet is AHEAD, not
# behind (git SHAs aren't ordered; /buildinfo exposes only git_sha).
#
# If main's current head is no longer THIS job's SHA, a newer commit
# has landed and this deploy is superseded — the newest job's verify
# is authoritative. Skip strict verify and succeed. exit 0 => newer
# head printed (superseded); exit 10 => still the latest, proceed to
# the strict verify so a genuinely-behind tenant still fails loudly.
set +e
NEWER_HEAD="$(python3 .gitea/scripts/prod-auto-deploy.py check-superseded)"
SUPERSEDED_EXIT=$?
set -e
if [ "$SUPERSEDED_EXIT" -eq 0 ] && [ -n "$NEWER_HEAD" ]; then
echo "::notice::Superseded deploy: main head is now ${NEWER_HEAD:0:7} (this job deployed ${GITHUB_SHA:0:7}). The fleet is at or ahead of this build; the newer deploy job's verify is authoritative. Skipping strict SHA verify."
{
echo ""
echo "### Buildinfo verification skipped — superseded deploy"
echo ""
echo "This deploy job's SHA \`${GITHUB_SHA:0:7}\` is no longer the head of \`main\` (now \`${NEWER_HEAD:0:7}\`)."
echo "A newer deploy job is rolling the fleet forward; its verify is authoritative."
} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
mapfile -t SLUGS < <(jq -r '.results[]? | .slug' "$RESP")
if [ ${#SLUGS[@]} -eq 0 ]; then
echo "::error::No tenants returned from redeploy-fleet; refusing to mark production deploy verified."
exit 1
fi
# Per-tenant settle/retry budget (#2213). A tenant whose container the
# CP just swapped can keep serving the OLD image at the edge for a short
# window while the old container drains — /buildinfo returns HTTP 200
# with the previous SHA, which `curl --retry` does NOT retry (it only
# retries connection/5xx failures, not a stale-but-200 body). Without a
# settle window a still-rolling tenant false-reds "stale" on the very
# first poll. So poll each tenant's /buildinfo until it reports the
# target SHA or the budget is exhausted; only THEN declare it stale or
# unreachable. This never masks a genuinely stuck tenant — a tenant that
# never reaches the target within the budget still fails loud (and the
# superseded-job revert class is already blocked before rollout above).
SETTLE_BUDGET_SECONDS="${PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS:-240}"
SETTLE_INTERVAL_SECONDS="${PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS:-20}"
STALE_COUNT=0
UNREACHABLE_COUNT=0
UNHEALTHY_COUNT=0
@@ -511,36 +366,18 @@ jobs:
continue
fi
url="https://${slug}.${TENANT_DOMAIN}/buildinfo"
deadline=$(( $(date +%s) + SETTLE_BUDGET_SECONDS ))
actual=""
last_actual=""
on_target=false
while :; do
body="$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$url" || true)"
actual="$(echo "$body" | jq -r '.git_sha // ""' 2>/dev/null || echo "")"
[ -n "$actual" ] && last_actual="$actual"
if [ "$actual" = "$GITHUB_SHA" ]; then
on_target=true
break
fi
now=$(date +%s)
if [ "$now" -ge "$deadline" ]; then
break
fi
# Still rolling (stale 200) or transiently unreachable — wait and
# re-poll within the settle budget rather than failing on first read.
remaining=$(( deadline - now ))
echo "$slug: waiting for target SHA (have '${actual:0:7}', want ${GITHUB_SHA:0:7}; ${remaining}s left)"
sleep "$SETTLE_INTERVAL_SECONDS"
done
if [ "$on_target" = true ]; then
echo "$slug: ${actual:0:7}"
elif [ -z "$last_actual" ]; then
echo "::error::$slug did not return /buildinfo after deploy (waited ${SETTLE_BUDGET_SECONDS}s)."
body="$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$url" || true)"
actual="$(echo "$body" | jq -r '.git_sha // ""' 2>/dev/null || echo "")"
if [ -z "$actual" ]; then
echo "::error::$slug did not return /buildinfo after deploy."
UNREACHABLE_COUNT=$((UNREACHABLE_COUNT + 1))
else
echo "::error::$slug is stale: actual=${last_actual:0:7}, expected=${GITHUB_SHA:0:7} (waited ${SETTLE_BUDGET_SECONDS}s)"
continue
fi
if [ "$actual" != "$GITHUB_SHA" ]; then
echo "::error::$slug is stale: actual=${actual:0:7}, expected=${GITHUB_SHA:0:7}"
STALE_COUNT=$((STALE_COUNT + 1))
else
echo "$slug: ${actual:0:7}"
fi
done
@@ -558,69 +395,3 @@ jobs:
if [ "$STALE_COUNT" -gt 0 ] || [ "$UNHEALTHY_COUNT" -gt 0 ] || [ "$UNREACHABLE_COUNT" -gt 0 ]; then
exit 1
fi
# Re-point :latest to the just-promoted image — ONLY after the
# production rollout + buildinfo verification above have passed.
#
# WHY HERE (promote point), not at build time:
# The platform-tenant ECR `:latest` tag was last moved 2026-05-10
# and went 3.5 weeks stale because the build step only pushes
# :staging-<sha> + :staging-latest and never re-points :latest. A
# no-arg POST /cp/admin/tenants/:slug/redeploy (whose default tag
# fell through to "latest") then pulled the 3.5-week-old image and
# REVERTED the tenant (incident: molecule-adk-demo, 2026-06-03).
#
# The defense-in-depth half of this fix changes that redeploy
# default to :staging-latest, but :latest itself must also be
# kept meaningful. We make :latest track the PROD-BLESSED build,
# not the raw build: by living at the end of deploy-production —
# after `wait-ci` (green main CI), the canary-first batched fleet
# rollout, AND the /buildinfo SHA verification — :latest only ever
# advances to a SHA that is actually green and confirmed running
# across the live fleet. So `:latest` == "current prod image",
# and any consumer that pulls :latest (legacy callers, manual
# `docker pull`, a redeploy that somehow still resolves "latest")
# gets the blessed image instead of whatever happened to build.
#
# Re-tag is digest-level (imagetools create), so no rebuild and
# :latest is byte-identical to :staging-<sha> for this commit.
# Gate on supersede: a superseded older job must NOT move :latest backward
# to its older image (#2213 — 275383 promoted :latest → the older
# staging-7a72516 after a newer job had already shipped). :latest must only
# ever advance under the job that owns main's head.
- name: Promote :latest to the verified prod image
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
env:
TENANT_IMAGE_NAME: ${{ env.TENANT_IMAGE_NAME }}
STAGING_TENANT_IMAGE_NAME: ${{ env.STAGING_TENANT_IMAGE_NAME }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
run: |
set -euo pipefail
SHA_TAG="staging-${GITHUB_SHA::7}"
PROD_ECR_REGISTRY="${TENANT_IMAGE_NAME%%/*}"
STAGING_ECR_REGISTRY="${STAGING_TENANT_IMAGE_NAME%%/*}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${PROD_ECR_REGISTRY}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${STAGING_ECR_REGISTRY}"
# imagetools create copies the source manifest to the new tag by
# digest (no pull/rebuild). :latest now points at the exact image
# that just passed the prod gate.
docker buildx imagetools create \
--tag "${TENANT_IMAGE_NAME}:latest" \
"${TENANT_IMAGE_NAME}:${SHA_TAG}"
docker buildx imagetools create \
--tag "${STAGING_TENANT_IMAGE_NAME}:latest" \
"${STAGING_TENANT_IMAGE_NAME}:${SHA_TAG}"
{
echo ""
echo "### :latest promoted"
echo ""
echo "Re-pointed \`platform-tenant:latest\` → \`${SHA_TAG}\` (prod + staging ECR)."
echo ":latest now tracks the prod-blessed, fleet-verified image."
} >> "$GITHUB_STEP_SUMMARY"
+8 -89
View File
@@ -9,22 +9,10 @@
# Triggers on:
# - `pull_request_target`: opened, synchronize, reopened
# → initial status posts when PR opens / re-pushes
# - `pull_request_review` types: [submitted]
# → re-evaluate when a team member submits an APPROVE review so
# the gate flips immediately (no wait for the next push or
# slash-command). Verified live: sop-tier-check.yml uses this
# same event and provably fires (produces
# `sop-tier-check / tier-check (pull_request_review)` contexts).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so
# only APPROVE reviews run the evaluator; COMMENT and
# REQUEST_CHANGES are skipped at the job level.
# Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS
# the required context via the API. Trust boundary preserved
# (BASE ref, no PR-head).
# - comment refires are handled by `sop-checklist.yml` review-refire job
# → `/qa-recheck` slash-command re-evaluates this gate.
# - comment refires are handled by `review-refire-comments.yml`
# → a single issue_comment dispatcher prevents every SOP/review
# comment from enqueueing separate qa/security/tier jobs on
# Gitea 1.22.6 before job-level `if:` can skip them.
# Workflow name = `qa-review` ; job name = `approved`.
# The job's own pass/fail conclusion publishes the status context
# `qa-review / approved (<event>)` — NO `POST /statuses` call → NO
@@ -97,26 +85,21 @@ name: qa-review
on:
pull_request_target:
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
permissions:
contents: read
pull-requests: read
statuses: write
secrets: read
jobs:
# bp-exempt: PR review bot signal; required merge state is enforced by CI / all-required.
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
# Comment-triggered refires live in review-refire-comments.yml. Keeping
# this workflow PR-only avoids comment-triggered queue storms.
if: |
github.event_name == 'pull_request_target' ||
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
github.event_name == 'pull_request_target'
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -160,7 +143,6 @@ jobs:
ref: ${{ github.event.repository.default_branch }}
- name: Evaluate qa-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -175,66 +157,3 @@ jobs:
REVIEW_CHECK_DEBUG: '0'
REVIEW_CHECK_STRICT: '0'
run: bash .gitea/scripts/review-check.sh
- name: Post required status context on pull_request_review
# Gitea Actions auto-publishes (pull_request_review) context
# for this event, but branch-protection requires (pull_request_target).
# We explicitly POST the BP-required context so the gate flips.
# Trust boundary: same BASE-ref script result, no PR-head code.
#
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
GITEA_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
EVAL_OUTCOME: ${{ steps.eval.outcome }}
run: |
set -euo pipefail
authfile=$(mktemp)
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
prfile=$(mktemp)
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/pulls/${PR_NUMBER}")
if [ "$code" != "200" ]; then
echo "::error::GET /pulls/${PR_NUMBER} returned HTTP ${code}"
rm -f "$prfile" "$authfile"
exit 1
fi
head_sha=$(jq -r '.head.sha // ""' "$prfile")
rm -f "$prfile"
if [ "$EVAL_OUTCOME" = "success" ]; then
status_state="success"
description="Approved via pull_request_review trigger"
else
status_state="failure"
description="Review check failed via pull_request_review trigger"
fi
body=$(jq -nc \
--arg state "$status_state" \
--arg context "qa-review / approved (pull_request_target)" \
--arg description "$description" \
'{state:$state, context:$context, description:$description}')
post_code=$(curl -sS -o /dev/null -w '%{http_code}' -X POST \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/statuses/${head_sha}")
rm -f "$authfile"
if [ "$post_code" != "200" ] && [ "$post_code" != "201" ]; then
echo "::error::POST /statuses/${head_sha} returned HTTP ${post_code}"
exit 1
fi
echo "::notice::posted ${status_state} for context=\"qa-review / approved (pull_request_target)\" on sha=${head_sha}"
+1 -1
View File
@@ -40,7 +40,7 @@ env:
concurrency:
group: railway-pin-audit
cancel-in-progress: false
cancel-in-progress: true
permissions:
issues: write
+4 -87
View File
@@ -6,44 +6,25 @@
#
# See `qa-review.yml` header for the full A1-α / A1.1 / A4 / A5 design
# rationale; everything below is identical in shape.
#
# A1-α addendum (internal#760): review-event trigger added so the security
# gate flips immediately when a team member submits an APPROVE review.
# Uses `pull_request_review` types: [submitted] — verified live via
# sop-tier-check.yml which provably fires this event (produces
# `sop-tier-check / tier-check (pull_request_review)` contexts).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so only APPROVE
# reviews run the evaluator; COMMENT and REQUEST_CHANGES are skipped at
# the job level. Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS the required
# context via the API. Trust boundary preserved (BASE ref, no PR-head).
name: security-review
on:
pull_request_target:
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
permissions:
contents: read
pull-requests: read
statuses: write
secrets: read
jobs:
# bp-exempt: PR security review bot signal; required merge state is enforced by CI / all-required.
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
# Comment-triggered refires live in review-refire-comments.yml. Keeping
# this workflow PR-only avoids comment-triggered queue storms.
if: |
github.event_name == 'pull_request_target' ||
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
github.event_name == 'pull_request_target'
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -76,7 +57,6 @@ jobs:
ref: ${{ github.event.repository.default_branch }}
- name: Evaluate security-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -88,66 +68,3 @@ jobs:
REVIEW_CHECK_DEBUG: '0'
REVIEW_CHECK_STRICT: '0'
run: bash .gitea/scripts/review-check.sh
- name: Post required status context on pull_request_review
# Gitea Actions auto-publishes (pull_request_review) context
# for this event, but branch-protection requires (pull_request_target).
# We explicitly POST the BP-required context so the gate flips.
# Trust boundary: same BASE-ref script result, no PR-head code.
#
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
GITEA_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
EVAL_OUTCOME: ${{ steps.eval.outcome }}
run: |
set -euo pipefail
authfile=$(mktemp)
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
prfile=$(mktemp)
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/pulls/${PR_NUMBER}")
if [ "$code" != "200" ]; then
echo "::error::GET /pulls/${PR_NUMBER} returned HTTP ${code}"
rm -f "$prfile" "$authfile"
exit 1
fi
head_sha=$(jq -r '.head.sha // ""' "$prfile")
rm -f "$prfile"
if [ "$EVAL_OUTCOME" = "success" ]; then
status_state="success"
description="Approved via pull_request_review trigger"
else
status_state="failure"
description="Review check failed via pull_request_review trigger"
fi
body=$(jq -nc \
--arg state "$status_state" \
--arg context "security-review / approved (pull_request_target)" \
--arg description "$description" \
'{state:$state, context:$context, description:$description}')
post_code=$(curl -sS -o /dev/null -w '%{http_code}' -X POST \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/statuses/${head_sha}")
rm -f "$authfile"
if [ "$post_code" != "200" ] && [ "$post_code" != "201" ]; then
echo "::error::POST /statuses/${head_sha} returned HTTP ${post_code}"
exit 1
fi
echo "::notice::posted ${status_state} for context=\"security-review / approved (pull_request_target)\" on sha=${head_sha}"
+6 -6
View File
@@ -179,10 +179,10 @@ jobs:
- name: Refire qa-review status
if: steps.classify.outputs.run_qa == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
# RFC_324_TEAM_READ_TOKEN is read-only (team membership read scope only).
# review-refire-status.sh POSTs to /statuses — requires write scope.
# SOP_TIER_CHECK_TOKEN carries write:repository + write:issue + read:organization.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
@@ -198,10 +198,10 @@ jobs:
- name: Refire security-review status
if: steps.classify.outputs.run_security == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
# RFC_324_TEAM_READ_TOKEN is read-only (team membership read scope only).
# review-refire-status.sh POSTs to /statuses — requires write scope.
# SOP_TIER_CHECK_TOKEN carries write:repository + write:issue + read:organization.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
+18 -32
View File
@@ -33,20 +33,11 @@
# 2026-05-17 (internal#189 Phase 1).
#
# BURN-IN CLOSED 2026-05-17 (internal#189 Phase 1): The 7-day burn-in
# window closed. As of 2026-06-04 the residual masks left behind by the
# burn-in are removed for real (the comment previously claimed this while
# the masks still persisted — that was stale):
# - continue-on-error: true on the jq-install step (redundant; the step
# already exits 0) and on the tier-check step (the burn-in mask).
# - the `|| true` after the sop-tier-check.sh invocation, which masked
# real tier-gate verdicts.
# AND-composition is now fully enforced and the tier-check step can
# honestly red CI on a real SOP-6 violation. SOP_FAIL_OPEN=1 is RETAINED
# as sanctioned infra-resilience: it fails-open only on token/network/jq
# faults, never on a real gate verdict. If you need to temporarily
# re-introduce a mask, file a tracker and follow the mc#1982 protocol
# (Tier 2e lint requires a current tracker within 2 lines of any
# continue-on-error: true).
# window closed. continue-on-error: true has been removed from the
# tier-check job; AND-composition is now fully enforced. If you need
# to temporarily re-introduce a mask, file a tracker and follow the
# mc#1982 protocol (Tier 2e lint requires a current tracker within
# 2 lines of any continue-on-error: true).
name: sop-tier-check
@@ -99,11 +90,10 @@ jobs:
# GitHub releases may be unreachable from some runner networks
# (infra#241 follow-up: GitHub timeout after 3s on 5.78.80.188
# runners). The sop-tier-check script has its own fallback as a
# third line of defense, and this step's final command
# (`jq --version ... || echo`) already exits 0 unconditionally — so
# the step cannot fail the job on its own.
# continue-on-error REMOVED 2026-06-04 (mc#1982 directive: root-fix
# and remove, do not renew). It was redundant masking, not a gate.
# third line of defense. continue-on-error: true ensures this step
# failing does not block the job.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
run: |
# apt-get is the primary method — Ubuntu package mirrors are reliably
# reachable from runner containers. GitHub releases may be blocked
@@ -120,11 +110,11 @@ jobs:
jq --version 2>/dev/null || echo "::notice::jq not yet available — script fallback will retry"
- name: Verify tier label + reviewer team membership
# continue-on-error REMOVED 2026-06-04 (expired internal#189 Phase 1
# burn-in, window closed 2026-05-17; mc#1982 directive: root-fix and
# remove, do not renew). SOP_FAIL_OPEN=1 below still fails-open on
# token/network/infra errors only (never on a real tier-gate verdict),
# so this step can now honestly fail CI on a genuine SOP-6 violation.
# continue-on-error: true at step level — job-level is ignored by Gitea
# Actions (quirk #10, internal runbooks). Belt-and-suspenders with
# SOP_FAIL_OPEN=1 + || true below.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -133,13 +123,9 @@ jobs:
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
SOP_DEBUG: '0'
SOP_LEGACY_CHECK: '0'
# SOP_FAIL_OPEN=1 fails-open ONLY on infra faults (empty/invalid
# token, unreachable Gitea API, missing jq) — see the guarded
# `exit 0` branches in sop-tier-check.sh. It does NOT mask a real
# tier-gate verdict: a missing tier label, no approving review, or
# an unsatisfied AND-clause still `exit 1`. Kept as sanctioned
# infra-resilience; the `|| true` mask was REMOVED with the burn-in
# COE (2026-06-04) so a genuine SOP-6 violation now reds CI.
# SOP_FAIL_OPEN=1 makes the script always exit 0. The UI enforces
# the actual merge gate. Combined with continue-on-error: true
# above, this step never fails the job regardless of script exit.
SOP_FAIL_OPEN: '1'
run: |
bash .gitea/scripts/sop-tier-check.sh
bash .gitea/scripts/sop-tier-check.sh || true
+1 -1
View File
@@ -38,7 +38,7 @@ on:
# full run, but two smoke runs SHOULD queue against each other.
concurrency:
group: staging-smoke
cancel-in-progress: false
cancel-in-progress: true
permissions:
# Needed to open / close the alerting issue.
+1 -1
View File
@@ -50,7 +50,7 @@ on:
# Don't let two sweeps race the same AWS account.
concurrency:
group: sweep-aws-secrets
cancel-in-progress: false
cancel-in-progress: true
permissions:
contents: read
+1 -1
View File
@@ -58,7 +58,7 @@ on:
# scheduled run would otherwise issue duplicate DELETE calls.
concurrency:
group: sweep-cf-orphans
cancel-in-progress: false
cancel-in-progress: true
permissions:
contents: read
+1 -1
View File
@@ -42,7 +42,7 @@ on:
# Don't let two sweeps race the same account.
concurrency:
group: sweep-cf-tunnels
cancel-in-progress: false
cancel-in-progress: true
permissions:
contents: read
+1 -1
View File
@@ -51,7 +51,7 @@ on:
# on a manual trigger; queue rather than parallel-delete.
concurrency:
group: sweep-stale-e2e-orgs
cancel-in-progress: false
cancel-in-progress: true
permissions:
contents: read
-1
View File
@@ -60,7 +60,6 @@ concurrency:
cancel-in-progress: true
jobs:
# bp-required: pending #718 — soak-then-promote, not in BP yet.
compare:
name: Compare synced providers.yaml against controlplane canonical
runs-on: ubuntu-latest
-19
View File
@@ -35,26 +35,8 @@ name: verify-providers-gen
on:
pull_request:
types: [opened, synchronize, reopened]
# CI-scheduler-overload fix (fix/ci-scheduler-fanout, 2026-06-01):
# this gate only verifies that the generated providers artifact is in
# sync with the schema SSOT. Its verdict can ONLY change when one of
# the codegen inputs/outputs changes, so firing the Go toolchain on
# every unrelated PR (docs, canvas, scripts) is pure fan-out cost.
# Scoped to the codegen surface. SAFE because this workflow is NOT a
# branch-protection status_check_context (see header §ENFORCEMENT
# GATING) — lint-required-no-paths only forbids paths filters on
# REQUIRED workflows; this is advisory, so a paths filter is allowed.
# Mirrors the sibling sync-providers-yaml.yml scoping convention.
paths:
- 'workspace-server/internal/providers/**'
- 'workspace-server/cmd/gen-providers/**'
- '.gitea/workflows/verify-providers-gen.yml'
push:
branches: [main, staging]
paths:
- 'workspace-server/internal/providers/**'
- 'workspace-server/cmd/gen-providers/**'
- '.gitea/workflows/verify-providers-gen.yml'
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
@@ -67,7 +49,6 @@ concurrency:
cancel-in-progress: true
jobs:
# bp-required: pending #718 — soak-then-promote, not in BP yet.
verify:
name: Regenerate providers artifact and fail on drift
runs-on: ubuntu-latest
+2 -2
View File
@@ -49,8 +49,8 @@
## Quick Start
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
cd molecule-core
git clone https://git.moleculesai.app/molecule-ai/molecule-monorepo.git
cd molecule-monorepo
./scripts/dev-start.sh
```
-11
View File
@@ -24,17 +24,6 @@ COPY --from=builder /app/public ./public
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
# Git SHA the image was built from, surfaced at /api/buildinfo so canvas
# deploys are verifiable by the served SHA the same way workspace-server's
# /buildinfo is (core#2235). Wired from `${{ github.sha }}` in
# publish-canvas-image.yml. Server-only (not NEXT_PUBLIC_) — the route
# handler reads it at runtime on the standalone Node server, so it stays
# out of the client bundle. Set on the final stage (not the builder) so it
# lives in the runtime env that force-dynamic reads per request. Default
# "dev" matches the route + workspace-server sentinel: an unwired build
# fails the SHA comparison closed instead of looking deployed.
ARG BUILD_SHA=dev
ENV BUILD_SHA=$BUILD_SHA
# Non-root runtime — use addgroup/adduser without fixed GID/UID to avoid conflicts with base image
RUN addgroup canvas 2>/dev/null || true && adduser -G canvas -s /bin/sh -D canvas 2>/dev/null || true
USER canvas
+4 -13
View File
@@ -101,19 +101,10 @@ test.describe("Desktop ChatTab", () => {
await textarea.fill("Trigger activity");
await page.getByRole("button", { name: /Send/ }).first().click();
// FALSE-GREEN FIX: the prior `.catch(() => {})` swallowed the assertion
// entirely, so this test passed whether or not the activity log ever
// rendered. The activity-log container is optional per layout, so we
// gate on its presence in the DOM: if it's not part of this layout,
// skip explicitly (a recorded skip, not a silent pass); if it IS
// present, it MUST become visible during the send flow — that's the
// behaviour this test exists to protect.
const activityLog = page.locator("[data-testid='activity-log']").first();
if ((await activityLog.count()) === 0) {
test.skip(true, "activity-log not part of this layout");
return;
}
await expect(activityLog).toBeVisible({ timeout: 10_000 });
// Activity log container should appear during the send flow.
await expect(page.locator("[data-testid='activity-log']").first()).toBeVisible({ timeout: 10_000 }).catch(() => {
// Activity log may not be present in all layouts.
});
});
});
+2 -17
View File
@@ -60,26 +60,11 @@ test.describe("MobileChat", () => {
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible({ timeout: 15_000 });
// Reload and deterministically wait for the chat-history GET that
// rehydrates the transcript to come back 2xx, rather than racing a
// fixed-timeout render assertion against an in-flight fetch. The
// server now persists the a2a_receive row SYNCHRONOUSLY before the
// send's 200 (workspace-server logA2ASuccess), so the row is
// guaranteed present by the time this GET runs — the wait is for
// hydration latency, not for a still-racing write.
const historyResponse = page.waitForResponse(
(resp) =>
resp.url().includes("/chat-history") &&
resp.request().method() === "GET" &&
resp.status() === 200,
{ timeout: 15_000 },
);
await page.reload();
await page.waitForSelector("[data-testid='chat-panel']", { timeout: 10_000 });
await historyResponse;
await expect(page.getByText("Mobile persistence", { exact: true })).toBeVisible();
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible();
await expect(page.getByText("Mobile persistence", { exact: true })).toBeVisible({ timeout: 5_000 });
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible({ timeout: 5_000 });
});
test("composer auto-grows with multi-line text", async ({ page }) => {
-329
View File
@@ -1,329 +0,0 @@
/**
* Staging canvas E2E — REAL desktop take-control path (core#2261 "Gap 1").
*
* This is the live-e2e gate that the existing staging-tabs.spec.ts does NOT
* provide. staging-tabs only opens the 13 declared workspace-panel tabs
* (TAB_IDS at staging-tabs.spec.ts:24-38 — `display` is NOT among them) and
* asserts they render without a "Failed to load" toast. It never acquires
* display control, never opens the noVNC WebSocket, and never asserts a
* framebuffer frame arrives. The companion unit test
* canvas/src/components/tabs/__tests__/DisplayTab.test.tsx mocks the RFB
* constructor (vi.mock("@novnc/novnc"), see its lines 8/20-39) so NO real
* WebSocket is ever opened there either. Result: a broken take-control path
* (acquire → noVNC WS upgrade → ws-proxy → EIC → websockify → x11vnc → Xvfb)
* ships GREEN. This spec closes that gap by exercising the REAL wire path
* end to end against a live, desktop-capable staging workspace.
*
* What it asserts (the real path, no mocks):
* 1. POST /workspaces/<id>/display/control/acquire returns 200 with a
* session_url that carries the signed token in its `#token=` fragment
* (mirrors workspace_display_control.go:signedDisplaySessionURL).
* 2. Opening the noVNC WebSocket at session_url with the subprotocols
* ["binary", "molecule-display-token.<token>"] (exactly what the canvas
* sends — DisplayTab.tsx:339) UPGRADES (onopen fires, readyState===OPEN,
* no immediate 1006 abnormal close). A 1006 / 403 means the handshake
* failed somewhere in the proxy chain.
* 3. At least one BINARY framebuffer message arrives on that socket — a
* real frame off x11vnc, not just a panel mount. RFB sends a
* ProtocolVersion banner ("RFB 003.00x\n") as the first server message,
* which proves the upstream VNC server is live behind the EIC tunnel.
*
* Auth model (important): the WS upgrade is gated by workspace-server
* middleware.AdminAuth. A browser WebSocket CANNOT set an Authorization
* header, so in production the canvas WS upgrade passes AdminAuth via the
* same-origin-canvas path (wsauth_middleware.go:isSameOriginCanvas, which
* keys off the Origin header the browser sets automatically on a same-origin
* WS upgrade). We therefore open the socket from inside the browser page via
* page.evaluate AFTER navigating to the tenant origin — so the browser sends
* `Origin: https://<slug>.staging.moleculesai.app`, exactly as production
* does. The acquire POST (which CAN carry a header) uses the per-tenant admin
* bearer set on the context. This is the faithful production handshake, not a
* synthetic one.
*
* Gate / cost: this test only runs when STAGING_DISPLAY_WORKSPACE_ID points
* at a STANDING desktop-capable workspace (compute.display.mode ==
* "desktop-control"). We deliberately do NOT provision one in the shared
* staging-setup.ts: a desktop AMI boots in ~12-15 min and would tax the
* existing tabs harness on every run. Standing that workspace up is a cost
* item for the CTO (one always-on desktop EC2 on staging). Until that exists,
* the test SKIPS loud. When the env IS present, any failure in
* provision/acquire/upgrade is a HARD error — fail-closed, never silently
* green (no "flaky" disposition: a 1006 names a broken proxy hop).
*/
import { test, expect } from "@playwright/test";
const STAGING = process.env.CANVAS_E2E_STAGING === "1";
// The standing desktop-capable workspace id. Absent => skip loud. This is
// the single knob that activates the gate; see file header for the cost note.
const DISPLAY_WS_ID = process.env.STAGING_DISPLAY_WORKSPACE_ID;
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — skipping staging-only tests");
test.skip(
!DISPLAY_WS_ID,
"STAGING_DISPLAY_WORKSPACE_ID not set — no standing desktop-capable staging " +
"workspace to exercise the take-control path. Set it to a workspace whose " +
"compute.display.mode == 'desktop-control' to activate this real-e2e gate. " +
"(Standing that workspace up is a CTO cost item — one always-on desktop EC2.)",
);
// How long we wait for the WS to upgrade + deliver the first frame. The EIC
// tunnel + websockify handshake adds real latency on top of the edge; budget
// generously but bounded, so a genuinely-dead path fails LOUD instead of
// hanging to the suite timeout.
const WS_UPGRADE_TIMEOUT_MS = 30_000;
const FIRST_FRAME_TIMEOUT_MS = 30_000;
test.describe("staging desktop take-control (real noVNC path)", () => {
test("acquire → WS upgrades → first framebuffer frame arrives", async ({
page,
context,
}) => {
// The standing desktop workspace lives in its OWN standing org (it can't
// live in the per-run ephemeral org — that gets torn down each run). When
// STAGING_DISPLAY_SLUG is configured, staging-setup.ts resolves that org's
// tenant URL / admin token / org id and exports them under STAGING_DISPLAY_*.
// Fall back to the ephemeral org's exports only if the display org wasn't
// separately configured (e.g. the desktop workspace happens to live in the
// run's own tenant — not the expected topology, but supported).
const tenantURL =
process.env.STAGING_DISPLAY_TENANT_URL || process.env.STAGING_TENANT_URL;
const tenantToken =
process.env.STAGING_DISPLAY_TENANT_TOKEN || process.env.STAGING_TENANT_TOKEN;
const orgID =
process.env.STAGING_DISPLAY_ORG_ID || process.env.STAGING_ORG_ID;
// Fail-closed: when the gate env IS present (we got past the skips above),
// the rest of the staging context MUST be wired or this is a hard error,
// never a silent pass. Mirrors staging-tabs.spec.ts:53-57.
if (!tenantURL || !tenantToken) {
throw new Error(
"STAGING_DISPLAY_WORKSPACE_ID is set but no tenant URL/token is available " +
"for the take-control gate. Set STAGING_DISPLAY_SLUG so staging-setup.ts " +
"resolves STAGING_DISPLAY_TENANT_URL / STAGING_DISPLAY_TENANT_TOKEN for the " +
"standing desktop org (or ensure the ephemeral STAGING_TENANT_* exports exist).",
);
}
const workspaceId = DISPLAY_WS_ID as string;
// The per-tenant admin bearer satisfies AdminAuth for the acquire POST
// (which can carry a header). The WS upgrade below relies on Origin
// (same-origin canvas), NOT this header.
await context.setExtraHTTPHeaders({
Authorization: `Bearer ${tenantToken}`,
// X-Molecule-Org-Id is required by workspace-server TenantGuard for
// cross-org requests routed through the CP edge; staging-setup exports it.
// Harmless (and correct) to send on the same-origin tenant box too.
...(orgID ? { "X-Molecule-Org-Id": orgID } : {}),
});
// 0. Sanity: the workspace must actually be display-enabled, else the
// whole gate is meaningless. Hit the availability endpoint first so a
// mis-pointed STAGING_DISPLAY_WORKSPACE_ID fails with a precise message
// instead of an opaque acquire error.
const availResp = await page.request.get(
`${tenantURL}/workspaces/${workspaceId}/display`,
);
expect(
availResp.status(),
`GET /display for ${workspaceId} should be 200`,
).toBe(200);
const avail = await availResp.json();
expect(
avail.available,
`workspace ${workspaceId} is not display-available (reason=${avail.reason}). ` +
"STAGING_DISPLAY_WORKSPACE_ID must point at a workspace with " +
"compute.display.mode == 'desktop-control' AND a live instance_id.",
).toBe(true);
// 1. Acquire display control. The handler returns session_url +
// expires_at; session_url embeds the signed token in its #token=
// fragment (workspace_display_control.go:signedDisplaySessionURL).
const acquireResp = await page.request.post(
`${tenantURL}/workspaces/${workspaceId}/display/control/acquire`,
{ data: { controller: "user", ttl_seconds: 300 } },
);
expect(
acquireResp.status(),
`acquire should be 200; body: ${await acquireResp.text()}`,
).toBe(200);
const acquire = await acquireResp.json();
expect(acquire.controller, "controller should be 'user'").toBe("user");
expect(
typeof acquire.session_url,
`acquire response missing session_url: ${JSON.stringify(acquire)}`,
).toBe("string");
// The token rides in the URL fragment (#token=...), never as a query
// param — confirm the contract the client (DisplayTab.tsx:459-466)
// depends on so a server-side change to the URL shape fails HERE.
const sessionUrl: string = acquire.session_url;
expect(
sessionUrl,
`session_url should carry the token in a #token= fragment: ${sessionUrl}`,
).toContain("#token=");
// 2. Open the REAL noVNC WebSocket from inside the page, so the browser
// sends Origin: <tenant> and the same-origin-canvas AdminAuth path
// accepts the upgrade (a browser WS can't set Authorization). We
// navigate to the tenant origin first purely to anchor the Origin
// header; we don't need the canvas bundle to hydrate.
await page.goto(tenantURL, { waitUntil: "domcontentloaded" });
// Reproduce DisplayTab.tsx:459-466 (displayWebSocketConnection): resolve
// session_url against the tenant origin, pull the token out of the
// fragment, strip the fragment, switch http(s)->ws(s). Then connect with
// the exact subprotocols the canvas uses (DisplayTab.tsx:339).
const result = await page.evaluate(
async ({ rawSessionUrl, upgradeTimeoutMs, frameTimeoutMs }) => {
const u = new URL(rawSessionUrl, window.location.href);
const token =
new URLSearchParams(u.hash.replace(/^#/, "")).get("token") ?? "";
if (!token) {
return { ok: false, stage: "token-parse", detail: "no #token in session_url" };
}
u.hash = "";
u.protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
const wsUrl = u.toString();
return await new Promise<{
ok: boolean;
stage: string;
detail: string;
frameBytes?: number;
frameKind?: string;
closeCode?: number;
}>((resolve) => {
let upgraded = false;
let settled = false;
const finish = (r: {
ok: boolean;
stage: string;
detail: string;
frameBytes?: number;
frameKind?: string;
closeCode?: number;
}) => {
if (settled) return;
settled = true;
try {
ws.close();
} catch {
/* ignore */
}
resolve(r);
};
let ws: WebSocket;
try {
ws = new WebSocket(wsUrl, [`binary`, `molecule-display-token.${token}`]);
} catch (e) {
resolve({ ok: false, stage: "construct", detail: String(e) });
return;
}
ws.binaryType = "arraybuffer";
const upgradeTimer = setTimeout(() => {
finish({
ok: false,
stage: "upgrade-timeout",
detail: `WS did not open within ${upgradeTimeoutMs}ms (readyState=${ws.readyState})`,
});
}, upgradeTimeoutMs);
let frameTimer: ReturnType<typeof setTimeout> | null = null;
ws.onopen = () => {
upgraded = true;
clearTimeout(upgradeTimer);
// Now wait for the first server message. RFB's ProtocolVersion
// banner is the first thing x11vnc sends; if nothing arrives the
// tunnel opened but the VNC server behind it is dead.
frameTimer = setTimeout(() => {
finish({
ok: false,
stage: "frame-timeout",
detail: `WS upgraded but no framebuffer message within ${frameTimeoutMs}ms`,
});
}, frameTimeoutMs);
};
ws.onmessage = (ev) => {
if (frameTimer) clearTimeout(frameTimer);
let bytes = 0;
let kind: string = typeof ev.data;
if (ev.data instanceof ArrayBuffer) {
bytes = ev.data.byteLength;
kind = "ArrayBuffer";
} else if (typeof Blob !== "undefined" && ev.data instanceof Blob) {
bytes = ev.data.size;
kind = "Blob";
} else if (typeof ev.data === "string") {
bytes = ev.data.length;
kind = "string";
}
finish({
ok: bytes > 0,
stage: "frame",
detail:
bytes > 0
? "received framebuffer message"
: "first message was empty",
frameBytes: bytes,
frameKind: kind,
});
};
ws.onclose = (ev) => {
// A close BEFORE open === failed upgrade (1006 abnormal / 403
// forbidden surface here). A close AFTER we already saw a frame is
// benign (our own finish() triggered it).
if (!upgraded) {
clearTimeout(upgradeTimer);
finish({
ok: false,
stage: "upgrade-close",
detail: `WS closed before upgrade (code=${ev.code}, reason="${ev.reason}") — handshake rejected somewhere in edge → ws-proxy → EIC → websockify → x11vnc`,
closeCode: ev.code,
});
}
};
ws.onerror = () => {
if (!upgraded) {
clearTimeout(upgradeTimer);
finish({
ok: false,
stage: "upgrade-error",
detail: "WS error before upgrade — proxy chain rejected the handshake",
});
}
};
});
},
{
rawSessionUrl: sessionUrl,
upgradeTimeoutMs: WS_UPGRADE_TIMEOUT_MS,
frameTimeoutMs: FIRST_FRAME_TIMEOUT_MS,
},
);
// 3. Assert the real outcome. No "flaky" escape hatch: each failure stage
// names the broken hop so a reviewer can act on it directly.
expect(
result.ok,
`take-control failed at stage="${result.stage}": ${result.detail}` +
(result.closeCode ? ` (close code ${result.closeCode})` : ""),
).toBe(true);
expect(
result.stage,
`expected to reach the 'frame' stage; got '${result.stage}' (${result.detail})`,
).toBe("frame");
expect(
result.frameBytes ?? 0,
`framebuffer message should be non-empty (kind=${result.frameKind})`,
).toBeGreaterThan(0);
});
});
+9 -151
View File
@@ -241,14 +241,7 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
name: "E2E Canvas Test",
runtime: "hermes",
tier: 2,
// Provider-registry SSOT (internal#718) registers ONLY Kimi models for
// the hermes runtime — `moonshot/kimi-k2.6` is the platform-managed
// entry (workspace-server/internal/providers/providers.yaml, hermes ->
// platform). The old `gpt-4o` was never a registered hermes model and
// now 422s UNREGISTERED_MODEL_FOR_RUNTIME (core#2225). This workspace
// defaults closed to platform_managed (see the boot-shape note below),
// so a platform-namespaced model id is the registry-correct choice.
model: "moonshot/kimi-k2.6",
model: "gpt-4o",
}),
});
if (ws.status >= 400 || !ws.body?.id) {
@@ -257,38 +250,7 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
const workspaceId = ws.body.id as string;
console.log(`[staging-setup] Workspace created: ${workspaceId}`);
// 6. Wait for workspace RENDERABLE.
//
// This harness exists to verify the canvas *tab UI* renders (staging-
// tabs.spec.ts: open each of the 13 workspace-panel tabs, assert no hard
// crash / no "Failed to load" toast). It does NOT exercise the agent —
// no LLM call is made, the spec even mocks /cp/auth/me and 401→200. All
// it needs is a workspace ROW that the canvas lists so the node renders
// and the side-panel tabs open. A fully-`online` agent is NOT required.
//
// That distinction became load-bearing on 2026-06-03: workspace-server
// #2162 (fix(provision): platform-managed workspace must fail-closed when
// CP proxy env absent) made a platform_managed workspace ABORT AT BOOT
// with MISSING_PLATFORM_PROXY when MOLECULE_LLM_BASE_URL /
// MOLECULE_LLM_USAGE_TOKEN are not present in the tenant's env. The
// canvas E2E creates a bare hermes/moonshot platform workspace, which defaults
// closed to platform_managed (workspace_provision.go:~1009), and the
// staging tenant does not carry the CP proxy env — so the agent never
// starts. Pre-#2162 this same workspace booted credential-less (the bug
// #2162 fixed) and the tabs rendered fine; #2162 is a correct production
// safety fix, but it surfaced here as `status:"failed", uptime_seconds:0,
// last_sample_error:null` — the pre-start credential-abort shape — and the
// old hard-throw turned a UI-irrelevant boot skip into a main-red
// (core#2199). The agent boot stage is simply not what this test gates.
//
// So: online is the happy path. A `failed` row that is the PRE-START
// credential-abort shape (the agent process never ran: uptime_seconds==0
// AND no last_sample_error) is treated as RENDERABLE — the row exists,
// the node + tabs render, proceed. We do NOT mask a real boot regression:
// any `failed` carrying a last_sample_error, OR a non-zero uptime (the
// agent started then crashed — image pull, panic, PYTHONPATH, etc.),
// still hard-throws. Genuine *infra* provision failure is already caught
// loud one step earlier at the org level (instance_status === "failed").
// 6. Wait for workspace online
await waitFor<boolean>(
async () => {
const r = await jsonFetch(`${tenantURL}/workspaces/${workspaceId}`, {
@@ -297,24 +259,6 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
if (r.status !== 200) return null;
if (r.body?.status === "online") return true;
if (r.body?.status === "failed") {
const uptime = Number(r.body?.uptime_seconds ?? 0);
const sampleErr = r.body?.last_sample_error;
const preStartCredentialAbort = uptime === 0 && !sampleErr;
if (preStartCredentialAbort) {
// Agent never started (no LLM cred on this staging tenant — the
// expected #2162 platform-proxy gap). The workspace row still
// renders, which is all the tab-UI test needs. Proceed, but log
// loudly so a real "agent never booted because of something else"
// is not silently normalized.
console.warn(
`[staging-setup] workspace ${workspaceId} is 'failed' with the pre-start ` +
`credential-abort shape (uptime_seconds=0, no last_sample_error) — agent did ` +
`not boot (expected on staging without CP LLM proxy env, post workspace-server ` +
`#2162). The tab-UI test does not exercise the agent; proceeding with the ` +
`workspace row, which renders regardless. full body: ${JSON.stringify(r.body)}`,
);
return true;
}
// last_sample_error is often empty when the failure happens before
// the agent emits a sample (e.g. boot crash, image pull error,
// missing PYTHONPATH, OpenAI quota at startup). Dumping the full
@@ -322,8 +266,8 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
// needs without a second probe. Otherwise this propagates as a
// bare "Workspace failed: " — the exact useless message that
// sent #2632 to the issue tracker.
const detail = sampleErr
? sampleErr
const detail = r.body.last_sample_error
? r.body.last_sample_error
: `(no last_sample_error) full body: ${JSON.stringify(r.body)}`;
throw new Error(`Workspace failed: ${detail}`);
}
@@ -333,103 +277,17 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
10_000,
"workspace online",
);
console.log(`[staging-setup] Workspace renderable`);
console.log(`[staging-setup] Workspace online`);
// 7. Hand state off to tests + teardown — overwrite the slug-only
// bootstrap state with the full state spec tests need.
//
// FAIL-CLOSED handoff: every field the spec reads must be non-empty. If
// any is missing here, the spec's env-presence guard would throw with a
// generic "did setup run?" message that hides WHICH field was lost. Catch
// it at the source — a partial provision must hard-fail setup, never hand
// off a half-built state that the spec then has to diagnose (or worse,
// skip). This is the loud, fail-closed contract: STAGING was requested,
// so an incomplete provision is an error, not a skip.
const handoff = { slug, tenantURL, workspaceId, tenantToken };
const missingFields = Object.entries(handoff)
.filter(([, v]) => !v)
.map(([k]) => k);
if (missingFields.length > 0) {
throw new Error(
`[staging-setup] provision incomplete — empty handoff field(s): ` +
`${missingFields.join(", ")}. Refusing to hand off a partial state ` +
`that would surface downstream as an opaque spec failure.`,
);
}
writeFileSync(stateFile, JSON.stringify(handoff, null, 2));
writeFileSync(
stateFile,
JSON.stringify({ slug, tenantURL, workspaceId, tenantToken }, null, 2),
);
process.env.STAGING_SLUG = slug;
process.env.STAGING_TENANT_URL = tenantURL;
process.env.STAGING_WORKSPACE_ID = workspaceId;
process.env.STAGING_TENANT_TOKEN = tenantToken;
// The ephemeral org's UUID — exported so specs that route through the CP
// edge can send X-Molecule-Org-Id (workspace-server TenantGuard). The tabs
// harness hits the tenant box same-origin and doesn't need it, but the
// take-control gate (staging-display.spec.ts) does.
process.env.STAGING_ORG_ID = orgID;
console.log(`[staging-setup] Ready — ${stateFile}`);
// 8. (core#2261 Gap 1) Resolve the STANDING desktop-capable org, if one is
// configured, for the live take-control e2e (staging-display.spec.ts).
//
// This block is FULLY env-gated and additive: it provisions NOTHING and is
// a no-op unless STAGING_DISPLAY_SLUG is set. We deliberately do NOT spin a
// desktop workspace inside this shared setup — a desktop AMI boots in
// ~12-15 min and would tax every tabs run. Instead an operator stands up
// one always-on desktop org once (a CTO cost item) and points
// STAGING_DISPLAY_SLUG + STAGING_DISPLAY_WORKSPACE_ID at it. Here we just
// resolve that standing org's tenant URL, admin token, and org id so the
// display spec can reach it. Fail-closed: if STAGING_DISPLAY_SLUG is set but
// we can't resolve its token/id, we THROW — the gate must never silently
// fall back to the (non-desktop) ephemeral org and pass.
const displaySlug = process.env.STAGING_DISPLAY_SLUG;
if (displaySlug) {
console.log(`[staging-setup] Resolving standing desktop org: ${displaySlug}`);
// org id for the standing slug (admin-orgs row carries it + status).
const orgsRes = await jsonFetch(`${CP_URL}/cp/admin/orgs`, { headers: adminAuth });
if (orgsRes.status !== 200) {
throw new Error(
`STAGING_DISPLAY_SLUG=${displaySlug} set, but GET /cp/admin/orgs returned ` +
`${orgsRes.status} — cannot resolve the standing desktop org for the ` +
`take-control gate.`,
);
}
const displayRow = (orgsRes.body?.orgs || []).find(
(o: any) => o.slug === displaySlug,
);
if (!displayRow?.id) {
throw new Error(
`STAGING_DISPLAY_SLUG=${displaySlug} not found in /cp/admin/orgs — the ` +
`standing desktop org for the take-control gate does not exist. Provision ` +
`it (one always-on desktop EC2) or unset STAGING_DISPLAY_SLUG/` +
`STAGING_DISPLAY_WORKSPACE_ID to skip the gate.`,
);
}
if (displayRow.instance_status !== "running") {
throw new Error(
`Standing desktop org ${displaySlug} is '${displayRow.instance_status}', ` +
`not 'running' — the take-control gate needs a live desktop tenant. ` +
`full row: ${JSON.stringify(displayRow)}`,
);
}
const displayTokRes = await jsonFetch(
`${CP_URL}/cp/admin/orgs/${displaySlug}/admin-token`,
{ headers: adminAuth },
);
if (displayTokRes.status !== 200 || !displayTokRes.body?.admin_token) {
throw new Error(
`admin-token fetch for standing desktop org ${displaySlug} returned ` +
`${displayTokRes.status}: ${JSON.stringify(displayTokRes.body)}`,
);
}
process.env.STAGING_DISPLAY_ORG_ID = displayRow.id;
process.env.STAGING_DISPLAY_TENANT_URL = `https://${displaySlug}.${TENANT_DOMAIN}`;
process.env.STAGING_DISPLAY_TENANT_TOKEN = displayTokRes.body.admin_token;
console.log(
`[staging-setup] Standing desktop org resolved: ${displaySlug} ` +
`(org_id=${displayRow.id}, url=${process.env.STAGING_DISPLAY_TENANT_URL})`,
);
}
}
+33 -305
View File
@@ -1,8 +1,7 @@
/**
* Staging canvas E2E — opens each workspace-panel tab against a fresh
* staging org provisioned in the global setup. Asserts each tab renders
* REAL content (not an empty container, not an error state) and captures a
* screenshot for visual review.
* Staging canvas E2E — opens each of the 13 workspace-panel tabs against a
* fresh staging org provisioned in the global setup. Asserts each tab
* renders without throwing and captures a screenshot for visual review.
*
* Auth model: the tenant platform's AdminAuth middleware accepts a bearer
* token OR a WorkOS session cookie. Playwright can't mint a WorkOS
@@ -11,39 +10,17 @@
* Bearer header via context.setExtraHTTPHeaders(). Every browser
* request inherits the header.
*
* PROMOTION-READINESS (see § at bottom of file): this suite is being
* hardened toward becoming a HARD merge-gate. It currently runs under
* `continue-on-error: true` (RFC internal#219 §1, non-gating) — that is a
* deliberate, CTO-owned call and is NOT changed here. The hardening makes
* every assertion deterministic so that WHEN promotion happens the gate
* does not flap. See the PROMOTION-READINESS block at the foot of this
* file for what is now reliable and what still blocks promotion.
*
* Known SaaS gaps — documented in #1369. These tabs legitimately cannot
* load real content in SaaS mode and are allowed an in-panel empty/error
* state (NOT a hard crash, NOT an ErrorBoundary):
* Known SaaS gaps — documented in #1369 and allowed to render errored
* content without failing the test (the gate is "no hard crash, no
* 'Failed to load' toast"):
* - Files tab: empty (platform can't docker exec into a remote EC2)
* - Terminal tab: WS connect fails
* - Peers tab: 401 without workspace-scoped token
* These are enumerated in KNOWN_DEGRADED_TABS below and asserted with a
* weaker (but still non-trivial) contract: the panel renders and does not
* crash the app. Every OTHER tab must render real content.
*/
import { test, expect, type Page } from "@playwright/test";
import { test, expect } from "@playwright/test";
// Tab ids as declared in canvas/src/components/SidePanel.tsx TABS.
//
// NOTE (drift guard): this list is asserted-complete against the live DOM
// below (see "tab list parity" step) so it cannot silently drift out of
// sync with SidePanel.tsx TABS the way a hand-maintained constant does.
// `display` and `container-config` are intentionally EXCLUDED here:
// - `display` is owned by the in-flight take-control e2e (PR #2275 /
// staging-display.spec.ts); asserting it here would collide.
// - `container-config` only renders when selectedNodeId is set AND is
// gated on tier; it is covered by container-config-specific specs.
// The parity check accounts for these via EXPECTED_EXTRA_TABS so a NEW
// tab appearing in SidePanel still trips the guard.
const TAB_IDS = [
"chat",
"activity",
@@ -60,131 +37,12 @@ const TAB_IDS = [
"audit",
] as const;
// Tabs present in the DOM that this spec intentionally does not drive.
// Keeping this explicit means a genuinely-new tab (not one of these) makes
// the parity assertion fail LOUD instead of being silently un-tested.
const EXPECTED_EXTRA_TABS = ["display", "container-config"] as const;
// Tabs that are KNOWN to degrade in SaaS mode (#1369). They get the weaker
// "renders + no crash" contract instead of the "real content" contract.
// Anything NOT in this set must render real content or the test fails.
const KNOWN_DEGRADED_TABS = new Set<string>(["terminal", "files"]);
const STAGING = process.env.CANVAS_E2E_STAGING === "1";
// IMPORTANT — fail-closed, not skip-green.
//
// `test.skip(!STAGING)` is correct ONLY when the operator never asked for a
// staging run (CANVAS_E2E_STAGING unset). In that case the workflow's
// detect-changes / token-check gates have already decided not to exercise
// staging, and skipping is the documented contract.
//
// But if STAGING *is* requested (CANVAS_E2E_STAGING=1) and global setup did
// NOT hand off the tenant state, that is a HARD failure, not a skip — see
// the explicit env-presence throw inside the test body. A silent skip there
// would let a broken provision ship green, which is exactly the
// weak-gate failure this hardening removes (§ No flakes / internal#828).
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — staging-only suite, not requested");
/**
* Assert the panel for `tabId` rendered real content.
*
* Deterministic contract (no fixed waits — every step is condition-based
* with Playwright's built-in retry / expect.poll):
* 1. The tabpanel container is visible.
* 2. The global ErrorBoundary did NOT trip ("Something went wrong").
* 3. No visible error alert is shown in the panel.
* 4. For non-degraded tabs: the panel settles to non-empty,
* non-spinner content (so an empty <div/> or a stuck "Loading…"
* spinner FAILS instead of passing as it did before).
*/
async function assertPanelRendered(page: Page, tabId: string): Promise<void> {
const panel = page.locator(`#panel-${tabId}`);
// (1) Container visible. Built-in retry up to the expect timeout — no
// arbitrary waitForTimeout. Mechanism: replaces any reliance on a fixed
// settle delay with a real visibility condition.
await expect(panel, `panel for ${tabId} never became visible`).toBeVisible({
timeout: 10_000,
});
// (2) ErrorBoundary trip = hard crash anywhere in the React subtree.
// canvas/src/components/ErrorBoundary.tsx renders "Something went wrong".
// The OLD gate only looked for a "Failed to load" toast and would ship
// an ErrorBoundary-crashed panel GREEN. Mechanism: assert the crash
// surface is absent, retried via expect.poll so a late-mounting crash
// banner is still caught.
await expect
.poll(
async () =>
page.getByText("Something went wrong", { exact: false }).count(),
{
message: `tab ${tabId}: ErrorBoundary tripped (Something went wrong)`,
timeout: 5_000,
},
)
.toBe(0);
// (3) No visible error alert inside the panel. Tabs surface load errors
// as role="alert" with the real error text (EventsTab/ChannelsTab/
// ConfigTab/...). The OLD gate matched ONLY [role=alert]:has-text("Failed
// to load") — it missed (a) error messages that don't contain that exact
// phrase and (b) error divs that omit role="alert" entirely (e.g.
// ActivityTab). We replace it with a broader, but still SaaS-gap-aware,
// check: any *visible* alert OR red error banner inside the panel.
//
// Degraded tabs (#1369) are allowed an error state — for those we only
// require no app-level crash (covered by step 2). For every other tab a
// visible error alert is a real regression.
if (!KNOWN_DEGRADED_TABS.has(tabId)) {
const visibleAlerts = panel.locator('[role="alert"]:visible');
await expect
.poll(async () => visibleAlerts.count(), {
message:
`tab ${tabId}: a visible error alert is shown in the panel ` +
`(was a weak "Failed to load"-only check before)`,
timeout: 5_000,
})
.toBe(0);
}
// (4) Real content. The tabpanel CONTAINER always mounts, so the old
// toBeVisible() on the container passed even when the child rendered
// nothing. Assert the panel's trimmed innerText is non-empty AND not
// stuck on a loading spinner. expect.poll retries until the async
// fetch+render settles — replacing the implicit "the network finished
// by now" timing assumption with an explicit polled condition.
//
// Degraded tabs may legitimately be empty (Files in SaaS mode), so they
// are exempt from the non-empty requirement; step 2 still guards them
// against a hard crash.
if (!KNOWN_DEGRADED_TABS.has(tabId)) {
await expect
.poll(
async () => {
const text = ((await panel.innerText()) || "").trim();
// A panel still showing only a loading spinner has not settled.
const stillLoading = /^(loading\b|loading…|loading\.\.\.)/i.test(
text,
);
return text.length > 0 && !stillLoading;
},
{
message:
`tab ${tabId}: panel rendered empty or stuck on a loading ` +
`spinner — no real content settled (weak "container visible" ` +
`gate would have passed this)`,
// Generous: real tabs fetch from the tenant over the network.
// Polled, so it returns as soon as content appears.
timeout: 20_000,
},
)
.toBe(true);
}
}
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — skipping staging-only tests");
test.describe("staging canvas tabs", () => {
test("each workspace-panel tab renders real content", async ({
test("each workspace-panel tab renders without error", async ({
page,
context,
}) => {
@@ -192,16 +50,9 @@ test.describe("staging canvas tabs", () => {
const tenantToken = process.env.STAGING_TENANT_TOKEN;
const workspaceId = process.env.STAGING_WORKSPACE_ID;
// FAIL-CLOSED (not skip): STAGING was requested but global setup did
// not export tenant state. A silent skip here would paint a broken
// provision GREEN. This is the loud-fail the hardening mandates.
if (!tenantURL || !tenantToken || !workspaceId) {
throw new Error(
"staging-setup.ts did not export STAGING_TENANT_URL / " +
"STAGING_TENANT_TOKEN / STAGING_WORKSPACE_ID. CANVAS_E2E_STAGING=1 " +
"was set (staging WAS requested) but global setup produced no " +
"tenant — this is a provisioning failure, NOT a reason to skip. " +
"Check the [staging-setup] log above for the real error.",
"staging-setup.ts did not export STAGING_TENANT_URL / STAGING_TENANT_TOKEN / STAGING_WORKSPACE_ID — did global setup run?",
);
}
@@ -301,19 +152,11 @@ test.describe("staging canvas tabs", () => {
// omit the URL, so we'd otherwise be flying blind. Logged to the
// test's stdout (visible in the workflow log under the failed step).
page.on("requestfailed", (req) => {
console.log(
`[e2e/requestfailed] ${req.method()} ${req.url()}: ${
req.failure()?.errorText ?? "?"
}`,
);
console.log(`[e2e/requestfailed] ${req.method()} ${req.url()}: ${req.failure()?.errorText ?? "?"}`);
});
page.on("response", (res) => {
if (res.status() >= 400) {
console.log(
`[e2e/response-${res.status()}] ${res
.request()
.method()} ${res.url()}`,
);
console.log(`[e2e/response-${res.status()}] ${res.request().method()} ${res.url()}`);
}
});
@@ -330,8 +173,9 @@ test.describe("staging canvas tabs", () => {
// hydrated, even with zero workspaces) or the hydration-error
// banner — whichever wins first. Previous version of this wait
// used `[role="tablist"]`, but that selector only appears AFTER
// a workspace node is clicked, so the wait would always time out
// at 45s before any meaningful failure surfaced.
// a workspace node is clicked (which happens below at L100), so
// the wait would always time out at 45s before any meaningful
// failure surfaced.
await page.waitForSelector(
'[aria-label="Molecule AI workspace canvas"], [data-testid="hydration-error"]',
{ timeout: 45_000 },
@@ -345,20 +189,10 @@ test.describe("staging canvas tabs", () => {
"canvas hydration failed — check staging CP + tenant reachability",
).toBe(0);
// The global ErrorBoundary must not have tripped at the app root
// either — a crash before the side panel even opens would otherwise
// be invisible until a tab assertion happened to notice it.
await expect(
page.getByText("Something went wrong", { exact: false }),
"app-level ErrorBoundary tripped during hydration",
).toHaveCount(0);
// Click the workspace node to open the side panel. Try a data
// attribute first, fall back to a generic role-based selector so
// the test doesn't break when the node-card markup changes.
const byDataAttr = page
.locator(`[data-workspace-id="${workspaceId}"]`)
.first();
const byDataAttr = page.locator(`[data-workspace-id="${workspaceId}"]`).first();
if ((await byDataAttr.count()) > 0) {
await byDataAttr.click({ timeout: 10_000 });
} else {
@@ -368,56 +202,19 @@ test.describe("staging canvas tabs", () => {
await firstNode.click({ timeout: 10_000 });
}
// The tablist appears once the side panel mounts. Condition-based
// wait — no fixed delay.
const tablist = page.locator('[role="tablist"]');
await expect(
tablist,
"side panel tablist never appeared after clicking the workspace node",
).toBeVisible({ timeout: 15_000 });
// Tab-list parity guard. The hand-maintained TAB_IDS constant used to
// be able to drift silently out of sync with SidePanel.tsx TABS — a
// tab could be added to the UI and never get an assertion, shipping
// broken-but-untested. Read the actual tab ids from the DOM and assert
// every live tab is either driven by this spec (TAB_IDS) or explicitly
// excluded (EXPECTED_EXTRA_TABS). A genuinely-new tab fails LOUD.
const liveTabIds = (
await tablist.locator('[role="tab"][id^="tab-"]').evaluateAll((els) =>
els.map((el) => el.id.replace(/^tab-/, "")),
)
).sort();
const accountedFor = new Set<string>([
...TAB_IDS,
...EXPECTED_EXTRA_TABS,
]);
const unaccounted = liveTabIds.filter((id) => !accountedFor.has(id));
expect(
unaccounted,
`SidePanel exposes tab(s) this spec neither drives nor excludes: ` +
`${unaccounted.join(", ")}. Add them to TAB_IDS (and assert their ` +
`content) or to EXPECTED_EXTRA_TABS with a reason.`,
).toHaveLength(0);
// And the inverse: every TAB_ID we intend to drive must actually exist
// in the DOM, so a renamed/removed tab fails here instead of timing out
// on a missing #tab-<id> selector with an opaque message.
const missing = TAB_IDS.filter((id) => !liveTabIds.includes(id));
expect(
missing,
`TAB_IDS references tab(s) not present in SidePanel: ${missing.join(
", ",
)} — the spec's tab list has drifted from SidePanel.tsx TABS.`,
).toHaveLength(0);
await page.waitForSelector('[role="tablist"]', { timeout: 15_000 });
for (const tabId of TAB_IDS) {
await test.step(`tab: ${tabId}`, async () => {
const tabButton = page.locator(`#tab-${tabId}`);
// The TABS bar is `overflow-x-auto` — tabs past position ~3 are
// clipped behind the right-edge fade gradient on smaller
// viewports. Playwright's toBeVisible() returns false for clipped
// elements, so a bare visibility check fails on later tabs in CI.
// scrollIntoViewIfNeeded brings the button into view before the
// visibility check.
// The TABS bar is `overflow-x-auto` (SidePanel.tsx:~tabs
// wrapper) — tabs after position ~3 are clipped behind the
// right-edge fade gradient on smaller viewports. Playwright's
// `toBeVisible()` returns false for clipped elements, so a
// bare visibility check fails on `skills` and later tabs in
// CI. scrollIntoViewIfNeeded brings the button into view
// before the visibility check, mirroring what SidePanel's own
// keyboard handler does on arrow-key navigation.
await tabButton.scrollIntoViewIfNeeded({ timeout: 5_000 });
await expect(
tabButton,
@@ -425,34 +222,18 @@ test.describe("staging canvas tabs", () => {
).toBeVisible({ timeout: 5_000 });
await tabButton.click();
// Confirm the click actually activated this tab before asserting
// its content — aria-selected flips on the active tab. This closes
// a race where a slow click handler left the PREVIOUS tab's panel
// mounted and we asserted the wrong panel's content. Built-in
// retry, condition-based, no fixed wait.
await expect(
tabButton,
`tab-${tabId} did not become the selected tab after click`,
).toHaveAttribute("aria-selected", "true", { timeout: 5_000 });
const panel = page.locator(`#panel-${tabId}`);
await expect(panel, `panel for ${tabId} never rendered`).toBeVisible({
timeout: 10_000,
});
// Real-content assertion (the core hardening). See
// assertPanelRendered: container visible + no ErrorBoundary + no
// visible error alert + settled non-empty content for non-degraded
// tabs. Replaces the old "panel visible + no Failed-to-load toast"
// pair, which shipped empty/errored panels green.
await assertPanelRendered(page, tabId);
// Belt to the braces: the original toast check stays. A global
// "Failed to load" toast (role=alert outside the panel) is still a
// crash signal worth catching even though the in-panel checks above
// now do the heavy lifting.
// "Failed to load" toast = hard crash. Known SaaS-mode gaps
// (Files empty, Terminal disconnected, Peers 401) surface as
// in-panel content, not toasts.
const errorToasts = await page
.locator('[role="alert"]:has-text("Failed to load")')
.count();
expect(
errorToasts,
`tab ${tabId}: a global "Failed to load" toast is showing`,
).toBe(0);
expect(errorToasts, `tab ${tabId}: "Failed to load" toast`).toBe(0);
await page.screenshot({
path: `test-results/staging-tab-${tabId}.png`,
@@ -486,56 +267,3 @@ test.describe("staging canvas tabs", () => {
).toHaveLength(0);
});
});
/*
* PROMOTION-READINESS — staging canvas E2E → HARD merge-gate
* ----------------------------------------------------------
* NOW RELIABLE (deterministic; these no longer flap on timing):
* - Every wait is condition-based (toBeVisible / toHaveAttribute /
* expect.poll). There is NO fixed waitForTimeout / sleep in the spec;
* the only setTimeout is the bounded poll-interval inside
* staging-setup.ts waitFor(), which has a hard deadline.
* - Tabs are asserted on REAL settled content (non-empty, non-spinner),
* not just "container is visible" — an empty or stuck-loading panel now
* fails instead of shipping green.
* - The ErrorBoundary ("Something went wrong") is asserted absent at app
* hydration AND per tab — a React subtree crash can no longer pass.
* - Visible error alerts inside a panel fail non-degraded tabs (was a
* weak [role=alert]:has-text("Failed to load")-only check that missed
* both other error phrasings and role-less error divs).
* - The driven tab list is parity-checked against the live DOM, so a new
* SidePanel tab can't ship un-tested and a removed one fails loud.
* - Click→activation is confirmed (aria-selected) before asserting the
* panel, removing a wrong-panel race.
* - The suite is fail-closed: CANVAS_E2E_STAGING=1 with no tenant state
* hard-errors (never skips→green); CANVAS_E2E_STAGING unset cleanly
* skips (operator did not request staging).
*
* STILL BLOCKS PROMOTION-TO-REQUIRED (do NOT flip continue-on-error here —
* CTO-owned, RFC internal#219 §1):
* - INFRA DEPENDENCY: each run provisions a real staging EC2 tenant
* (12-20 min cold boot). Required-gate latency + AWS/Cloudflare/CP
* availability become merge-blockers. A staging outage would freeze
* main even though the code is fine — unacceptable for a required check
* until staging has an SLA or this runs against a warm pre-provisioned
* pool.
* - SHARED-RESOURCE FLAKE SURFACE: TLS/DNS/ACME propagation on a shared
* staging zone (staging-setup TLS_TIMEOUT_MS) is outside this repo's
* control. Deterministic here ≠ deterministic upstream.
* - SECRET DEPENDENCY: CP_STAGING_ADMIN_API_TOKEN must be present on the
* runner. The workflow's skip-if-absent (core#2225) keeps a missing
* secret from painting red — correct for non-gating, but a REQUIRED
* check must instead guarantee the secret is always present, else it
* skip-greens the very thing it is supposed to enforce.
* - SINGLE-WORKSPACE COVERAGE: one hermes/platform_managed workspace that
* does NOT boot an agent on staging (no CP LLM proxy env, workspace-
* server #2162). Tabs render, but agent-dependent content paths (live
* chat round-trip, traces from a real run) are not exercised.
*
* PROMOTION CHECKLIST (when CTO signs off on making this required):
* 1. Warm pre-provisioned tenant pool OR a staging SLA bounding boot time.
* 2. Guarantee CP_STAGING_ADMIN_API_TOKEN on the gating runner; turn the
* skip-if-absent into a hard error for the required path.
* 3. Decide whether agent-dependent tabs need a wired LLM proxy on the
* staging tenant (covers chat/traces real content) before gating them.
*/
-8
View File
@@ -7,14 +7,6 @@ export default defineConfig({
fullyParallel: false,
workers: 1,
retries: 0,
// Fail CLOSED when an explicit spec selection matches zero tests.
// Playwright defaults this to true, so `playwright test e2e/chat-*.spec.ts`
// would exit 0 (green) if those files were renamed/moved/deleted — a
// false-green that would silently gut the e2e-chat gate after a refactor.
// forbidOnly likewise stops a stray `test.only` from green-ing the suite
// while skipping every other case.
passWithNoTests: false,
forbidOnly: !!process.env.CI,
use: {
baseURL: process.env.PLAYWRIGHT_BASE_URL || "http://localhost:3000",
headless: true,
@@ -1,17 +1,12 @@
/**
* Canvas /api/buildinfo — version-display endpoint mirroring
* workspace-server's /buildinfo. Lets `curl <url>/api/buildinfo`
* confirm which git SHA is live on a canvas deployment (core#2235).
* confirm which git SHA is live on a canvas deployment.
*/
import { describe, it, expect, beforeEach, afterEach } from "vitest";
import { GET } from "../route";
const ENV_KEYS = [
"BUILD_SHA",
"VERCEL_GIT_COMMIT_SHA",
"VERCEL_GIT_COMMIT_REF",
"VERCEL_ENV",
];
const ENV_KEYS = ["VERCEL_GIT_COMMIT_SHA", "VERCEL_GIT_COMMIT_REF", "VERCEL_ENV"];
describe("GET /api/buildinfo", () => {
let saved: Record<string, string | undefined>;
@@ -28,24 +23,13 @@ describe("GET /api/buildinfo", () => {
}
});
it("returns dev sentinel when no SHA source is set", async () => {
it("returns dev sentinel when Vercel env vars are unset", async () => {
const res = await GET();
const body = await res.json();
expect(body).toEqual({ git_sha: "dev", git_ref: "", vercel_env: "local" });
});
it("reports BUILD_SHA baked into the Docker image (fleet deploy path)", async () => {
// BUILD_SHA is the authoritative source for the ECR-image fleet deploy,
// which never runs on Vercel. It must win even when a Vercel var is also
// present in the environment.
process.env.BUILD_SHA = "deadbeefcafe";
process.env.VERCEL_GIT_COMMIT_SHA = "should-not-win";
const res = await GET();
const body = await res.json();
expect(body.git_sha).toBe("deadbeefcafe");
});
it("falls back to the SHA Vercel injected when BUILD_SHA is unset", async () => {
it("reports the SHA Vercel injected at build time", async () => {
process.env.VERCEL_GIT_COMMIT_SHA = "abc1234567890";
process.env.VERCEL_GIT_COMMIT_REF = "main";
process.env.VERCEL_ENV = "production";
+8 -27
View File
@@ -1,36 +1,17 @@
import { NextResponse } from "next/server";
// Mirror of workspace-server's GET /buildinfo (PR #2398). Lets a developer
// or the fleet redeploy workflow confirm which git SHA is live on a canvas
// deployment with the same `curl <url>/api/buildinfo` flow used against
// tenant workspaces (core#2235; cross-ref core#2226).
// confirm which git SHA is live on a canvas deployment with the same
// `curl <url>/buildinfo` flow they use against tenant workspaces.
//
// SHA source, in priority order:
// 1. BUILD_SHA — server-only env baked into the canvas Docker image at
// build time (Dockerfile `ARG BUILD_SHA` → `ENV BUILD_SHA`, wired
// from `${{ github.sha }}` in publish-canvas-image.yml). This is the
// authoritative source for the fleet's ECR-image deploy path, which
// does NOT run on Vercel. Read server-side here (App Router route
// handler runs on the standalone Node server, `output: "standalone"`),
// so it is intentionally NOT a NEXT_PUBLIC_ var — keeping it out of
// the client bundle.
// 2. VERCEL_GIT_COMMIT_SHA — Vercel injects this at build time when the
// canvas is deployed via Vercel rather than the Docker image.
// 3. "dev" — local `next dev` / test harness, where neither is set. Same
// sentinel workspace-server uses pre-ldflags-injection, so both
// surfaces speak the same vocabulary and an unconfigured deploy
// fails the SHA comparison closed instead of round-tripping "".
//
// force-dynamic so the response is evaluated at request time against the
// runtime env of the standalone server (where ENV BUILD_SHA lives), not
// frozen into a static asset at `next build`.
export const dynamic = "force-dynamic";
// Vercel injects VERCEL_GIT_COMMIT_SHA / _REF / VERCEL_ENV at build time
// from the deploying commit; outside Vercel (local `next dev`, harness)
// these are unset and the endpoint reports `git_sha: "dev"`. Same sentinel
// the workspace-server uses pre-ldflags-injection so both surfaces speak
// the same vocabulary.
export async function GET() {
const sha =
process.env.BUILD_SHA ?? process.env.VERCEL_GIT_COMMIT_SHA ?? "dev";
return NextResponse.json({
git_sha: sha,
git_sha: process.env.VERCEL_GIT_COMMIT_SHA ?? "dev",
git_ref: process.env.VERCEL_GIT_COMMIT_REF ?? "",
vercel_env: process.env.VERCEL_ENV ?? "local",
});
+1 -1
View File
@@ -41,7 +41,7 @@ export default function PricingPage() {
<p className="mt-2 text-ink-mid">
We publish the{" "}
<a
href="https://git.moleculesai.app/molecule-ai/molecule-core"
href="https://git.moleculesai.app/molecule-ai/molecule-monorepo"
className="text-accent underline hover:text-accent"
>
full source on GitHub
+25 -97
View File
@@ -8,13 +8,9 @@ import { ExternalConnectModal, type ExternalConnectionInfo } from "./ExternalCon
import {
ProviderModelSelector,
buildProviderCatalog,
buildProviderCatalogFromRegistry,
findProviderForModel,
isPlatformManagedProvider,
type SelectorModel,
type SelectorValue,
type RegistryProvider,
type RegistryModel,
} from "./ProviderModelSelector";
interface WorkspaceOption {
@@ -36,27 +32,16 @@ interface TemplateSpec {
model?: string;
models?: SelectorModel[];
providers?: string[];
// internal#718 P3 registry-served fields (additive; absent on older
// backends and for non-registry runtimes). When registry_backed is true the
// provider→model catalog is built from registry_providers/registry_models so
// each model's DERIVED provider (e.g. moonshot/kimi-k2.6 → "platform") drives
// the dropdown bucket and the create payload's llm_provider — instead of the
// legacy inferVendor heuristic that slash-splits the id into "moonshot".
// Mirrors ConfigTab's RuntimeOption loader (RFC#340 Fix C).
registry_backed?: boolean;
registry_providers?: RegistryProvider[];
registry_models?: RegistryModel[];
}
const DEFAULT_RUNTIME = "claude-code";
const RUNTIME_OPTIONS = [
{ value: "claude-code", label: "Claude Code" },
{ value: "codex", label: "OpenAI Codex CLI" },
{ value: "google-adk", label: "Google ADK" },
{ value: "hermes", label: "Hermes" },
{ value: "openclaw", label: "OpenClaw" },
];
const BASE_RUNTIME_TEMPLATE_IDS = new Set(["claude-code-default", "codex", "google-adk", "hermes", "openclaw"]);
const BASE_RUNTIME_TEMPLATE_IDS = new Set(["claude-code-default", "codex", "hermes", "openclaw"]);
const DEFAULT_HEADLESS_INSTANCE_TYPE = "t3.medium";
const DEFAULT_HEADLESS_ROOT_GB = 30;
const DEFAULT_DISPLAY_INSTANCE_TYPE = "t3.xlarge";
@@ -182,53 +167,15 @@ export function CreateWorkspaceButton() {
}),
[runtime, templateSpecs],
);
// The /templates row backing the LLM picker: an explicitly-selected
// workspace template wins, else the base runtime template row.
const llmSourceSpec = useMemo<TemplateSpec | null>(
() => selectedTemplateSpec ?? selectedRuntimeTemplateSpec,
const llmModels = useMemo(
() => {
const sourceSpec = selectedTemplateSpec ?? selectedRuntimeTemplateSpec;
if (!sourceSpec?.models?.length) return [];
return sourceSpec.models;
},
[selectedRuntimeTemplateSpec, selectedTemplateSpec],
);
// internal#718 P3 / RFC#340 Fix C: a runtime is registry-backed when the
// /templates row says so AND it served a non-empty registry_models set.
// Mirrors ConfigTab's `registryBacked` derivation exactly.
const registryBacked = useMemo(
() =>
llmSourceSpec?.registry_backed === true &&
(llmSourceSpec.registry_models?.length ?? 0) > 0,
[llmSourceSpec],
);
// Models fed to the selector dropdown. For a registry-backed runtime use the
// registry-served native set, carrying each model's DERIVED provider so the
// selector buckets it correctly (moonshot/kimi-k2.6 → "platform", not the
// inferVendor "moonshot"). Otherwise fall back to the template-served
// models[] + the legacy heuristic — same fallback ConfigTab keeps.
const llmModels = useMemo<SelectorModel[]>(
() => {
if (registryBacked) {
return (llmSourceSpec?.registry_models ?? []).map((m) => ({
id: m.id,
name: m.name,
...(m.provider ? { provider: m.provider } : {}),
}));
}
return llmSourceSpec?.models?.length ? llmSourceSpec.models : [];
},
[registryBacked, llmSourceSpec],
);
// Registry-backed path: build the catalog from registry_providers/
// registry_models so dropdown labels + billing + the derived provider come
// from the provider-registry SSOT (restores the "Platform" bucket). Legacy
// path: re-infer from models[] via buildProviderCatalog (inferVendor).
const llmCatalog = useMemo(
() =>
registryBacked
? buildProviderCatalogFromRegistry(
llmSourceSpec?.registry_providers ?? [],
llmSourceSpec?.registry_models ?? [],
)
: buildProviderCatalog(llmModels),
[registryBacked, llmSourceSpec, llmModels],
);
const llmCatalog = useMemo(() => buildProviderCatalog(llmModels), [llmModels]);
const selectedLLMProvider = useMemo(
() => llmCatalog.find((p) => p.id === llmSelection.providerId) ?? llmCatalog[0],
[llmCatalog, llmSelection.providerId],
@@ -236,7 +183,7 @@ export function CreateWorkspaceButton() {
useEffect(() => {
if (llmCatalog.length === 0) return;
const sourceDefault = llmSourceSpec?.model?.trim();
const sourceDefault = (selectedTemplateSpec ?? selectedRuntimeTemplateSpec)?.model?.trim();
const platformProvider = llmCatalog.find((p) => p.vendor === "platform");
const matched = sourceDefault ? findProviderForModel(llmCatalog, sourceDefault) : null;
const next = platformProvider ?? matched ?? llmCatalog[0];
@@ -249,7 +196,7 @@ export function CreateWorkspaceButton() {
envVars: next.envVars,
});
setLLMSecret("");
}, [llmCatalog, llmSourceSpec]);
}, [llmCatalog, selectedRuntimeTemplateSpec, selectedTemplateSpec]);
// Reset form and load workspaces whenever dialog opens
useEffect(() => {
@@ -291,15 +238,7 @@ export function CreateWorkspaceButton() {
setError("Model is required");
return;
}
// Platform-managed providers need NO user credential — the platform injects
// its own usage token (MOLECULE_LLM_USAGE_TOKEN = tenant admin_token) at
// provision time. Only BYOK providers require a user-supplied key. (#2245)
if (
!isExternal &&
!isPlatformManagedProvider(selectedLLMProvider) &&
selectedLLMProvider?.envVars.length &&
!llmSecret.trim()
) {
if (!isExternal && selectedLLMProvider?.envVars.length && !llmSecret.trim()) {
setError("Provider credential is required");
return;
}
@@ -334,11 +273,7 @@ export function CreateWorkspaceButton() {
? {
model: llmSelection.model.trim(),
llm_provider: nativeProvider.vendor,
// Only BYOK providers carry a user secret. For platform-managed
// the token is provisioner-injected; sending an (empty) secret
// here would clobber it — so omit it entirely. (#2245)
...(nativeProvider.envVars.length > 0 &&
!isPlatformManagedProvider(nativeProvider)
...(nativeProvider.envVars.length > 0
? { secrets: { [nativeProvider.envVars[0]]: llmSecret.trim() } }
: {}),
}
@@ -525,7 +460,6 @@ export function CreateWorkspaceButton() {
</div>
<ProviderModelSelector
models={llmModels}
catalog={registryBacked ? llmCatalog : undefined}
value={llmSelection}
onChange={(next) => {
setLLMSelection(next);
@@ -534,26 +468,20 @@ export function CreateWorkspaceButton() {
idPrefix="create-workspace-llm"
variant="stack"
/>
{isPlatformManagedProvider(selectedLLMProvider) ? (
<div className="text-[11px] text-ink-soft">
Platform-managed no API key required.
{selectedLLMProvider.envVars.length > 0 && (
<div>
<label htmlFor="llm-secret-input" className="text-[11px] text-ink-mid block mb-1">
{selectedLLMProvider.envVars[0]}
</label>
<input
id="llm-secret-input"
type="password"
value={llmSecret}
onChange={(e) => setLLMSecret(e.target.value)}
autoComplete="off"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors font-mono"
/>
</div>
) : (
selectedLLMProvider.envVars.length > 0 && (
<div>
<label htmlFor="llm-secret-input" className="text-[11px] text-ink-mid block mb-1">
{selectedLLMProvider.envVars[0]}
</label>
<input
id="llm-secret-input"
type="password"
value={llmSecret}
onChange={(e) => setLLMSecret(e.target.value)}
autoComplete="off"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors font-mono"
/>
</div>
)
)}
</div>
)}
+1 -116
View File
@@ -49,48 +49,6 @@ export interface ProviderEntry {
wildcard: boolean;
/** Optional tooltip text (rendered as native title=). */
tooltip?: string;
/** Billing mode the DERIVED provider implies, when this entry came from the
* registry-backed payload (internal#718 P3): "platform_managed" | "byok".
* Undefined for entries built by the legacy inferVendor heuristic. */
billingMode?: "platform_managed" | "byok";
}
/** A provider is "platform-managed" when the Molecule platform proxies the LLM
* call and injects its own usage credential — the tenant admin_token, surfaced
* to the workspace as MOLECULE_LLM_USAGE_TOKEN by the CP provisioner
* (controlplane ec2.go: `MOLECULE_LLM_USAGE_TOKEN="$ADMIN_TOKEN"`). The user
* supplies NO key for these: the credential is internal plumbing, not a user
* input. Detected by vendor==="platform" (the platform proxy provider, which
* declares MOLECULE_LLM_USAGE_TOKEN in its AuthEnv) OR
* billingMode==="platform_managed" (registry-backed, internal#718 P3). BYOK
* providers return false and DO require a user-supplied credential. */
export function isPlatformManagedProvider(
p?: Pick<ProviderEntry, "vendor" | "billingMode"> | null,
): boolean {
return p?.vendor === "platform" || p?.billingMode === "platform_managed";
}
/** RegistryProvider mirrors one entry of GET /templates `registry_providers`
* (workspace-server registryProviderView): the registry's native provider for
* a runtime, with its display label, auth-env NAMES, and billing mode. This is
* the SSOT the dropdown labels come from — the canvas drops VENDOR_LABELS for
* registry-backed runtimes (internal#718 P3, retire-list #4). */
export interface RegistryProvider {
name: string;
display_name?: string;
auth_env?: string[];
billing_mode?: "platform_managed" | "byok";
deprecated?: boolean;
}
/** RegistryModel mirrors one entry of GET /templates `registry_models`: a
* native model id annotated with its DERIVED provider (registry name) and the
* billing_mode that provider implies. */
export interface RegistryModel {
id: string;
name?: string;
provider?: string;
billing_mode?: "platform_managed" | "byok";
}
export interface SelectorValue {
@@ -110,13 +68,6 @@ interface Props {
models: SelectorModel[];
value: SelectorValue;
onChange: (next: SelectorValue) => void;
/** Optional pre-built provider catalog. When provided, the selector uses it
* verbatim instead of re-inferring one from `models` via
* buildProviderCatalog — the registry-backed path (internal#718 P3), where
* the parent builds the catalog from the registry-served providers/models
* so dropdown labels + billing come from the provider-registry SSOT rather
* than the inferVendor heuristic. Omitted = legacy heuristic over `models`. */
catalog?: ProviderEntry[];
/** Display variant. "grid" = label+control side-by-side (used in ConfigTab
* Runtime section). "stack" = vertical (used in MissingKeysModal). */
variant?: "grid" | "stack";
@@ -300,66 +251,6 @@ export function buildProviderCatalog(models: SelectorModel[]): ProviderEntry[] {
return Array.from(buckets.values());
}
/** Build the provider catalog from a REGISTRY-BACKED GET /templates payload
* (registry_providers + registry_models) — internal#718 P3, retire-list #4.
*
* Unlike buildProviderCatalog (which RE-INFERS vendor from model-id prefixes
* + env via inferVendor/VENDOR_LABELS/BARE_VENDOR_PATTERNS), this trusts the
* registry: each model carries its DERIVED `provider` (a registry provider
* name) and the dropdown label/billing/auth come from the matching
* `registry_providers` entry. The canvas can render no provider/model the
* registry did not serve ("only registered selectable"), and the billing-mode
* shown reflects the derived provider rather than a hardcoded rule.
*
* A provider with no served model is omitted (no empty buckets). Models whose
* `provider` doesn't match a registry_providers entry still get a bucket
* keyed by the raw provider name (defensive — should not happen for a
* well-formed registry payload), so a model is never silently dropped. */
export function buildProviderCatalogFromRegistry(
registryProviders: RegistryProvider[],
registryModels: RegistryModel[],
): ProviderEntry[] {
const byName = new Map<string, RegistryProvider>();
for (const p of registryProviders) byName.set(p.name, p);
// Bucket models by their derived provider name, preserving registry order.
const buckets = new Map<string, ProviderEntry>();
for (const m of registryModels) {
const vendor = (m.provider ?? "").trim();
if (!vendor) continue; // un-annotated registry model — skip from the
// provider cascade (selectable elsewhere via free-text); it has no
// derived provider to bucket under.
const meta = byName.get(vendor);
const wildcard = m.id.includes("*");
let entry = buckets.get(vendor);
if (!entry) {
entry = {
id: `registry|${vendor}`,
vendor,
label: meta?.display_name || vendor,
envVars: meta?.auth_env ?? [],
models: [],
wildcard,
billingMode: meta?.billing_mode ?? m.billing_mode,
tooltip: VENDOR_TOOLTIPS[vendor],
};
buckets.set(vendor, entry);
}
entry.models.push({ id: m.id, name: m.name, provider: vendor });
entry.wildcard = entry.wildcard || wildcard;
}
// Decorate label with model-count when ≥2 concrete models share the bucket,
// matching buildProviderCatalog's UX.
for (const e of buckets.values()) {
if (!e.wildcard && e.models.length > 1) {
e.label = `${e.label} (${e.models.length} models)`;
}
}
return Array.from(buckets.values());
}
/** Find the provider entry that contains a given model id. Used by
* callers to back-derive the provider when only the model is known
* (e.g. ConfigTab loading from saved state). */
@@ -392,7 +283,6 @@ export function ProviderModelSelector({
models,
value,
onChange,
catalog: catalogProp,
variant = "stack",
allowCustomModelEscape = false,
disabled = false,
@@ -403,12 +293,7 @@ export function ProviderModelSelector({
const providerSelectId = `${baseId}-provider`;
const modelSelectId = `${baseId}-model`;
// Registry-backed path (internal#718 P3): use the parent-supplied catalog
// verbatim; otherwise re-infer one from `models` via the legacy heuristic.
const catalog = useMemo(
() => catalogProp ?? buildProviderCatalog(models),
[catalogProp, models],
);
const catalog = useMemo(() => buildProviderCatalog(models), [models]);
const selected = useMemo(
() => catalog.find((p) => p.id === value.providerId) ?? null,
[catalog, value.providerId],
@@ -1,82 +1,411 @@
// @vitest-environment jsdom
/**
* Focused tests for BudgetSection's PER-PERIOD progress-bar math + aria (#49).
* Tests for BudgetSection (issue #541).
*
* Behavioral coverage (loading, save, 402 banners, USD formatting, legacy
* back-compat) lives in tabs/__tests__/BudgetSection.test.tsx — this file
* deliberately covers only the per-period progress percentage + aria-valuenow
* + the over-budget colouring, which that suite doesn't assert in detail. Kept
* separate to avoid duplicating the behavioral suite (one component, no
* parallel/identical suites).
* Covers:
* - Loading state
* - Stats row: used / limit, "Unlimited" when null
* - Progress bar: correct percentage, capped at 100%, absent when no limit
* - Budget remaining text
* - Input pre-fill (existing limit / blank when null)
* - Save: PATCH with number, PATCH with null (blank input)
* - 402 on GET → exceeded banner, no fetch-error text
* - 402 on PATCH → exceeded banner
* - Non-402 fetch error → error text
* - Non-402 save error → save error alert
* - Section header and subheading
* - Fetch error does not show stats
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, waitFor, cleanup } from "@testing-library/react";
import {
render,
screen,
fireEvent,
waitFor,
cleanup,
act,
} from "@testing-library/react";
// ── Mock api ──────────────────────────────────────────────────────────────────
vi.mock("@/lib/api", () => ({
api: { get: vi.fn(), patch: vi.fn() },
api: {
get: vi.fn(),
patch: vi.fn(),
},
}));
import { api } from "@/lib/api";
import { BudgetSection } from "../tabs/BudgetSection";
const mockGet = vi.mocked(api.get);
const mockPatch = vi.mocked(api.patch);
type P = { limit: number | null; spend: number; remaining: number | null };
// ── Helpers ───────────────────────────────────────────────────────────────────
// Build a periods response where the named period has the given limit/spend.
function withMonthly(limit: number | null, spend: number) {
const blank: P = { limit: null, spend: 0, remaining: null };
const monthly: P = { limit, spend, remaining: limit == null ? null : limit - spend };
function budgetResponse(overrides: Partial<{
budget_limit: number | null;
budget_used: number;
budget_remaining: number | null;
}> = {}) {
return {
periods: { hourly: blank, daily: blank, weekly: blank, monthly },
budget_limit: limit,
monthly_spend: spend,
budget_remaining: monthly.remaining,
budget_limit: 1000,
budget_used: 250,
budget_remaining: 750,
...overrides,
};
}
beforeEach(() => vi.clearAllMocks());
afterEach(() => cleanup());
function make402Error(): Error {
return new Error("API GET /workspaces/ws-1/budget: 402 Payment Required");
}
async function renderLoaded(data: unknown) {
function make402PatchError(): Error {
return new Error("API PATCH /workspaces/ws-1/budget: 402 Payment Required");
}
function makeGenericError(msg = "network timeout"): Error {
return new Error(`API GET /workspaces/ws-1/budget: 500 ${msg}`);
}
beforeEach(() => {
vi.clearAllMocks();
});
afterEach(() => {
cleanup();
});
// ── Rendering helpers ─────────────────────────────────────────────────────────
async function renderLoaded(budgetData = budgetResponse()) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(data as any);
mockGet.mockResolvedValueOnce(budgetData as any);
render(<BudgetSection workspaceId="ws-1" />);
// Wait for loading to finish
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
}
describe("BudgetSection — per-period progress bar", () => {
it("renders the bar for a limited period and omits it for an unlimited one", async () => {
await renderLoaded(withMonthly(1000, 250));
expect(screen.getByTestId("budget-monthly-fill")).toBeTruthy();
expect(screen.queryByTestId("budget-hourly-fill")).toBeNull(); // hourly unlimited
// ── Loading state ─────────────────────────────────────────────────────────────
describe("BudgetSection — loading state", () => {
it("shows loading indicator while fetch is in flight", () => {
// Never resolve
mockGet.mockReturnValue(new Promise(() => {}));
render(<BudgetSection workspaceId="ws-1" />);
expect(screen.getByTestId("budget-loading")).toBeTruthy();
expect(screen.getByText("Loading…")).toBeTruthy();
});
it("fills to 25%", async () => {
await renderLoaded(withMonthly(1000, 250));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("25%");
});
it("fills to 50%", async () => {
await renderLoaded(withMonthly(1000, 500));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("50%");
});
it("caps fill at 100% when spend exceeds limit", async () => {
await renderLoaded(withMonthly(1000, 4000));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("100%");
});
it("sets aria-valuenow to the computed percentage on the progressbar", async () => {
await renderLoaded(withMonthly(1000, 250));
const bars = screen.getAllByRole("progressbar");
// the monthly bar is the only one rendered (others unlimited)
expect(bars).toHaveLength(1);
expect(bars[0].getAttribute("aria-valuenow")).toBe("25");
});
it("shows a 0% bar when spend is 0 against a set limit", async () => {
await renderLoaded(withMonthly(1000, 0));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("0%");
it("hides loading indicator after fetch resolves", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(budgetResponse() as any);
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
});
});
// ── Section header ────────────────────────────────────────────────────────────
describe("BudgetSection — header and subheading", () => {
it("renders 'Budget' as the section heading", async () => {
await renderLoaded();
expect(screen.getByText("Budget")).toBeTruthy();
});
it("renders the subheading 'Limit total message credits for this workspace'", async () => {
await renderLoaded();
expect(
screen.getByText("Limit total message credits for this workspace")
).toBeTruthy();
});
it("renders 'Budget limit (credits)' label for the input", async () => {
await renderLoaded();
expect(screen.getByText("Budget limit (credits)")).toBeTruthy();
});
});
// ── Stats row ─────────────────────────────────────────────────────────────────
describe("BudgetSection — stats row", () => {
it("shows budget_used in the stats row", async () => {
await renderLoaded(budgetResponse({ budget_used: 350, budget_limit: 1000 }));
expect(screen.getByTestId("budget-used-value").textContent).toBe("350");
});
it("shows budget_limit in the stats row", async () => {
await renderLoaded(budgetResponse({ budget_used: 100, budget_limit: 500 }));
expect(screen.getByTestId("budget-limit-value").textContent).toBe("500");
});
it("shows 'Unlimited' when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
expect(screen.getByTestId("budget-limit-value").textContent).toBe("Unlimited");
});
it("shows budget_remaining when present", async () => {
await renderLoaded(budgetResponse({ budget_remaining: 750 }));
expect(screen.getByTestId("budget-remaining").textContent).toContain("750");
expect(screen.getByTestId("budget-remaining").textContent).toContain("credits remaining");
});
it("hides budget_remaining row when null", async () => {
await renderLoaded(budgetResponse({ budget_remaining: null }));
expect(screen.queryByTestId("budget-remaining")).toBeNull();
});
it("does not crash when budget_used is missing from the response", async () => {
// Backend for a provisioning-stuck workspace may return a partial
// shape. Regression: previously this threw
// "Cannot read properties of undefined (reading 'toLocaleString')"
// and crashed the whole Details tab.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await renderLoaded({ budget_limit: 1000, budget_remaining: null } as any);
expect(screen.getByTestId("budget-used-value").textContent).toBe("0");
});
});
// ── Progress bar ──────────────────────────────────────────────────────────────
describe("BudgetSection — progress bar", () => {
it("renders the progress bar when budget_limit is set", async () => {
await renderLoaded(budgetResponse({ budget_used: 250, budget_limit: 1000 }));
expect(screen.getByRole("progressbar")).toBeTruthy();
});
it("does NOT render progress bar when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
expect(screen.queryByRole("progressbar")).toBeNull();
});
it("fills to the correct percentage (25%)", async () => {
await renderLoaded(budgetResponse({ budget_used: 250, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("25%");
});
it("fills to the correct percentage (50%)", async () => {
await renderLoaded(budgetResponse({ budget_used: 500, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("50%");
});
it("caps fill at 100% when budget_used exceeds budget_limit", async () => {
await renderLoaded(budgetResponse({ budget_used: 1500, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("100%");
});
it("progress bar has aria-valuenow equal to the calculated percentage", async () => {
await renderLoaded(budgetResponse({ budget_used: 300, budget_limit: 1000 }));
const bar = screen.getByRole("progressbar");
expect(bar.getAttribute("aria-valuenow")).toBe("30");
});
it("shows 0% progress bar when budget_used is absent from the response", async () => {
// Regression: budget_used is optional (provisioning-stuck workspaces return
// partial shapes). Without the `?? 0` guard the progressPct calculation
// throws a TypeScript strict-null error and the build fails.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await renderLoaded({ budget_limit: 1000, budget_remaining: null } as any);
const bar = screen.getByRole("progressbar");
expect(bar.getAttribute("aria-valuenow")).toBe("0");
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("0%");
});
});
// ── Input pre-fill ────────────────────────────────────────────────────────────
describe("BudgetSection — input pre-fill", () => {
it("pre-fills input with existing budget_limit", async () => {
await renderLoaded(budgetResponse({ budget_limit: 500 }));
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
expect(input.value).toBe("500");
});
it("leaves input empty when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
expect(input.value).toBe("");
});
});
// ── Save — PATCH calls ────────────────────────────────────────────────────────
describe("BudgetSection — save", () => {
it("calls PATCH /workspaces/:id/budget with budget_limit as integer", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: 800 }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "800" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
expect(mockPatch.mock.calls[0][0]).toBe("/workspaces/ws-1/budget");
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBe(800);
});
it("sends budget_limit: 0 (not null) when input is '0' — zero-credit budget", async () => {
// Regression for QA bug report: `parseInt("0") || null` would yield null.
// The correct form `raw !== "" ? parseInt(raw, 10) : null` must return 0.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: 0, budget_used: 0, budget_remaining: 0 }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "0" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBe(0);
expect(body.budget_limit).not.toBeNull();
});
it("sends budget_limit: null when input is blank", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: null, budget_remaining: null }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBeNull();
});
it("updates displayed stats after successful save", async () => {
const updated = budgetResponse({ budget_limit: 2000, budget_used: 500, budget_remaining: 1500 });
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(updated as any);
await renderLoaded(budgetResponse({ budget_limit: 1000, budget_used: 250 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "2000" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-limit-value").textContent).toBe("2,000")
);
});
it("shows save error message on non-402 PATCH failure", async () => {
mockPatch.mockRejectedValueOnce(
new Error("API PATCH /workspaces/ws-1/budget: 500 server error")
);
await renderLoaded();
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-save-error")).toBeTruthy()
);
expect(screen.getByTestId("budget-save-error").textContent).toContain("500");
});
});
// ── 402 handling ──────────────────────────────────────────────────────────────
describe("BudgetSection — 402 handling", () => {
it("shows exceeded banner when GET returns 402", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
expect(screen.getByText("Budget exceeded — messages blocked")).toBeTruthy();
});
it("does NOT show fetch error text when GET returns 402 (only banner)", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.queryByTestId("budget-loading")).toBeNull()
);
expect(screen.queryByTestId("budget-fetch-error")).toBeNull();
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
});
it("shows exceeded banner when PATCH returns 402", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(budgetResponse() as any);
mockPatch.mockRejectedValueOnce(make402PatchError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
// Should NOT also show the save-error alert
expect(screen.queryByTestId("budget-save-error")).toBeNull();
});
it("clears exceeded banner after a successful save", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
// Now a successful PATCH (limit was raised)
const updated = budgetResponse({ budget_limit: 5000, budget_used: 250, budget_remaining: 4750 });
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(updated as any);
await act(async () => {
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "5000" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
});
await waitFor(() =>
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull()
);
});
});
// ── Non-402 fetch error ───────────────────────────────────────────────────────
describe("BudgetSection — non-402 fetch errors", () => {
it("shows fetch error text on non-402 GET failure", async () => {
mockGet.mockRejectedValueOnce(makeGenericError("internal server error"));
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-fetch-error")).toBeTruthy()
);
expect(screen.getByTestId("budget-fetch-error").textContent).toContain("500");
});
it("does NOT show stats row on fetch error", async () => {
mockGet.mockRejectedValueOnce(makeGenericError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
expect(screen.queryByTestId("budget-stats-row")).toBeNull();
});
it("does NOT show exceeded banner on non-402 fetch error", async () => {
mockGet.mockRejectedValueOnce(makeGenericError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull();
});
});
@@ -2,7 +2,6 @@
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, fireEvent, waitFor, cleanup } from "@testing-library/react";
import { CreateWorkspaceButton } from "../CreateWorkspaceDialog";
import { isPlatformManagedProvider } from "../ProviderModelSelector";
vi.mock("@/lib/api", () => ({
api: {
@@ -66,34 +65,6 @@ const SAMPLE_TEMPLATES = [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: [] },
],
},
// #2245 fixtures. The real registry `platform` provider declares
// MOLECULE_LLM_USAGE_TOKEN in its auth_env — the default mock above masks the
// bug by using required_env:[]. This template gives the platform provider a
// non-empty auth env (matching production) so the credential-suppression
// logic is actually exercised.
{
id: "platform-managed-test",
name: "Platform Managed Test",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
providers: ["platform", "minimax"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: ["MOLECULE_LLM_USAGE_TOKEN"] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
],
},
// BYOK-only template (no platform provider) — the credential requirement
// MUST still hold for these (no-regression guard).
{
id: "byok-only-test",
name: "BYOK Only Test",
runtime: "claude-code",
model: "openai/gpt-4o",
providers: ["openai"],
models: [
{ id: "openai/gpt-4o", name: "GPT-4o", required_env: ["OPENAI_API_KEY"] },
],
},
];
beforeEach(() => {
@@ -242,7 +213,6 @@ describe("CreateWorkspaceDialog", () => {
expect(runtimeTexts).toEqual([
"Claude Code",
"OpenAI Codex CLI",
"Google ADK",
"Hermes",
"OpenClaw",
]);
@@ -483,182 +453,6 @@ describe("CreateWorkspaceDialog — dynamic runtime provider picker", () => {
});
});
// ---------------------------------------------------------------------------
// Registry-backed provider catalog (RFC#340 Fix C)
//
// Regression guard for the mis-bucketing bug: when a registry-backed
// claude-code template serves `moonshot/kimi-k2.6` whose DERIVED provider is
// `platform`, the dialog must build the dropdown from registry_providers/
// registry_models (buildProviderCatalogFromRegistry) — NOT the legacy
// inferVendor heuristic which slash-splits the id into "moonshot". The
// distinguishing trait of this fixture: the plain `models[]` array does NOT
// carry an explicit `provider` field, so the LEGACY path would bucket the
// model under "moonshot" and send llm_provider:"moonshot". Only the
// registry-backed path yields the Platform bucket + llm_provider:"platform".
// ---------------------------------------------------------------------------
// claude-code template whose plain models[] is UN-annotated (no explicit
// provider). The derived-provider annotation lives ONLY in registry_models.
const REGISTRY_TEMPLATE = {
id: "claude-code-default",
name: "Claude Code Agent",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
// Legacy fields — note: NO explicit provider on the platform model, so the
// legacy inferVendor path would slash-split it into "moonshot".
providers: ["platform", "minimax", "anthropic"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", required_env: [] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", required_env: ["ANTHROPIC_API_KEY"] },
],
// Registry-served SSOT (internal#718 P3). DeriveProvider resolved
// moonshot/kimi-k2.6 → "platform"; MiniMax-M2.7 → "minimax".
registry_backed: true,
registry_providers: [
{ name: "platform", display_name: "Platform", auth_env: [], billing_mode: "platform_managed" },
{ name: "minimax", display_name: "MiniMax", auth_env: ["MINIMAX_API_KEY"], billing_mode: "byok" },
{ name: "anthropic", display_name: "Anthropic API", auth_env: ["ANTHROPIC_API_KEY"], billing_mode: "byok" },
],
registry_models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", billing_mode: "platform_managed" },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", provider: "minimax", billing_mode: "byok" },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", provider: "anthropic", billing_mode: "byok" },
],
};
// Registry-backed platform provider WITH a non-empty auth_env — this matches
// the PRODUCTION provider view, which ships the raw AuthEnv
// ([MOLECULE_LLM_USAGE_TOKEN]). REGISTRY_TEMPLATE above uses auth_env:[] so it
// never exercises suppression; this one drives the billingMode==="platform_
// managed" branch end-to-end through buildProviderCatalogFromRegistry. (#2245)
const REGISTRY_TEMPLATE_PLATFORM_AUTHENV = {
...REGISTRY_TEMPLATE,
registry_providers: [
{
name: "platform",
display_name: "Platform",
auth_env: ["MOLECULE_LLM_USAGE_TOKEN"],
billing_mode: "platform_managed",
},
{ name: "minimax", display_name: "MiniMax", auth_env: ["MINIMAX_API_KEY"], billing_mode: "byok" },
{ name: "anthropic", display_name: "Anthropic API", auth_env: ["ANTHROPIC_API_KEY"], billing_mode: "byok" },
],
};
describe("CreateWorkspaceDialog — registry-backed provider catalog (RFC#340 Fix C)", () => {
beforeEach(() => {
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return [REGISTRY_TEMPLATE] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
});
it("shows the Platform provider bucket for the registry-backed claude-code runtime", async () => {
await openDialog();
const providerSelect = await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(sel).toBeTruthy();
return sel;
});
const labels = Array.from(providerSelect.options).map((o) => o.text.trim());
// Registry display_name "Platform" appears — NOT "moonshot" from the
// legacy slash-split heuristic.
expect(labels).toContain("Platform");
expect(labels).not.toContain("moonshot");
// Bucket id is the registry-keyed id, vendor is the bare provider name.
const values = Array.from(providerSelect.options).map((o) => o.value);
expect(values).toContain("registry|platform");
});
it("sends llm_provider: platform (not moonshot) for moonshot/kimi-k2.6", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Kimi Agent" },
});
// Wait for the registry default to settle on the Platform bucket + model.
await waitFor(() => {
const modelSelect = document.querySelector("[data-testid='model-select']") as HTMLSelectElement;
expect(modelSelect?.value).toBe("moonshot/kimi-k2.6");
});
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.model).toBe("moonshot/kimi-k2.6");
expect(body.llm_provider).toBe("platform");
// Platform is auth-env-free → no BYOK secret.
expect(body.secrets).toBeUndefined();
});
it("buckets MiniMax-M2.7 under its derived provider and sends llm_provider: minimax", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "MiniMax Agent" },
});
await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(Array.from(sel.options).map((o) => o.value)).toContain("registry|minimax");
});
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "registry|minimax" },
});
fireEvent.change(document.getElementById("llm-secret-input") as HTMLInputElement, {
target: { value: "sk-minimax-test" },
});
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.model).toBe("MiniMax-M2.7");
expect(body.llm_provider).toBe("minimax");
expect(body.secrets).toEqual({ MINIMAX_API_KEY: "sk-minimax-test" });
});
it("suppresses the credential for a registry-backed platform provider that declares an auth_env — billingMode path (#2245)", async () => {
// Override the default REGISTRY_TEMPLATE (auth_env:[]) with the production-
// shaped one whose platform provider declares MOLECULE_LLM_USAGE_TOKEN.
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return [REGISTRY_TEMPLATE_PLATFORM_AUTHENV] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Registry Platform Agent" },
});
// Platform is the default bucket; even with a non-empty auth_env the key
// field must NOT render (suppressed via billingMode==="platform_managed").
await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(sel?.value).toBe("registry|platform");
});
expect(screen.getByText("Platform-managed — no API key required.")).toBeTruthy();
expect(document.getElementById("llm-secret-input")).toBeNull();
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
expect(screen.queryByText("Provider credential is required")).toBeNull();
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.llm_provider).toBe("platform");
// The provisioner-injected MOLECULE_LLM_USAGE_TOKEN must NOT be clobbered.
expect(body.secrets).toBeUndefined();
});
});
// ---------------------------------------------------------------------------
// budget_limit field tests (#541)
// ---------------------------------------------------------------------------
@@ -740,70 +534,3 @@ describe("CreateWorkspaceDialog — budget_limit field", () => {
expect(budgetInput.value).toBe("");
});
});
describe("CreateWorkspaceDialog — platform-managed credential suppression (#2245)", () => {
describe("isPlatformManagedProvider", () => {
it("is true for the platform proxy vendor", () => {
expect(isPlatformManagedProvider({ vendor: "platform" })).toBe(true);
});
it("is true for a registry billingMode of platform_managed", () => {
expect(
isPlatformManagedProvider({ vendor: "minimax", billingMode: "platform_managed" }),
).toBe(true);
});
it("is false for a BYOK provider", () => {
expect(isPlatformManagedProvider({ vendor: "anthropic", billingMode: "byok" })).toBe(false);
expect(isPlatformManagedProvider({ vendor: "minimax" })).toBe(false);
});
it("is false for null/undefined", () => {
expect(isPlatformManagedProvider(null)).toBe(false);
expect(isPlatformManagedProvider(undefined)).toBe(false);
});
});
it("platform-managed provider with a declared auth env requires NO credential, hides the key field, and sends NO secret", async () => {
await openDialog();
await setTemplate("platform-managed-test");
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Platform Agent" },
});
// The credential input must NOT render for platform-managed; a "no key
// required" note appears instead.
await waitFor(() =>
expect(screen.getByText("Platform-managed — no API key required.")).toBeTruthy(),
);
expect(screen.queryByLabelText("MOLECULE_LLM_USAGE_TOKEN")).toBeNull();
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
// No validation error, and the provisioner-injected token is NOT clobbered
// by an empty secret.
expect(screen.queryByText("Provider credential is required")).toBeNull();
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.llm_provider).toBe("platform");
expect(body.secrets).toBeUndefined();
});
it("BYOK provider still requires a credential and renders the key field (no-regression)", async () => {
await openDialog();
await setTemplate("byok-only-test");
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "BYOK Agent" },
});
// The credential field IS rendered for BYOK...
await waitFor(() => expect(screen.getByLabelText("OPENAI_API_KEY")).toBeTruthy());
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
// ...and create stays blocked until it's filled.
await waitFor(() =>
expect(screen.getByText("Provider credential is required")).toBeTruthy(),
);
expect(mockPost).not.toHaveBeenCalled();
});
});
@@ -1,110 +0,0 @@
// @vitest-environment jsdom
//
// internal#718 P3 (retire-list #4) — when GET /templates serves a
// registry-backed selectable list (registry_providers + registry_models with
// display_name / billing_mode / derived provider), the canvas builds the
// provider catalog FROM that registry data instead of re-inferring vendor
// from model-id prefixes (VENDOR_LABELS / BARE_VENDOR_PATTERNS / inferVendor).
// The heuristic path stays only as the fallback for non-registry runtimes /
// older backends.
import { describe, it, expect } from "vitest";
import {
buildProviderCatalogFromRegistry,
type RegistryProvider,
type RegistryModel,
} from "../ProviderModelSelector";
// Mirrors the registry-served claude-code payload from GET /templates
// (registry_providers / registry_models). display_name + billing_mode come
// from the registry, NOT from the canvas VENDOR_LABELS map.
const CLAUDE_CODE_REGISTRY_PROVIDERS: RegistryProvider[] = [
{
name: "anthropic-oauth",
display_name: "Claude Code subscription",
auth_env: ["CLAUDE_CODE_OAUTH_TOKEN"],
billing_mode: "byok",
},
{
name: "anthropic-api",
display_name: "Anthropic API",
auth_env: ["ANTHROPIC_API_KEY"],
billing_mode: "byok",
},
{
name: "platform",
display_name: "Platform",
auth_env: ["ANTHROPIC_API_KEY", "MOLECULE_LLM_USAGE_TOKEN"],
billing_mode: "platform_managed",
},
];
const CLAUDE_CODE_REGISTRY_MODELS: RegistryModel[] = [
{ id: "sonnet", provider: "anthropic-oauth", billing_mode: "byok" },
{ id: "opus", provider: "anthropic-oauth", billing_mode: "byok" },
{ id: "claude-opus-4-7", provider: "anthropic-api", billing_mode: "byok" },
{ id: "anthropic/claude-opus-4-7", provider: "platform", billing_mode: "platform_managed" },
];
describe("buildProviderCatalogFromRegistry", () => {
it("buckets models by their DERIVED registry provider, not by inferred vendor", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
const byVendor = new Map(catalog.map((p) => [p.vendor, p]));
// anthropic-oauth bucket holds the two OAuth-derived models.
const oauth = byVendor.get("anthropic-oauth");
expect(oauth).toBeDefined();
expect(oauth!.models.map((m) => m.id).sort()).toEqual(["opus", "sonnet"]);
// platform bucket holds the platform-namespaced model.
const platform = byVendor.get("platform");
expect(platform).toBeDefined();
expect(platform!.models.map((m) => m.id)).toEqual(["anthropic/claude-opus-4-7"]);
});
it("labels providers from the registry display_name, not VENDOR_LABELS", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
const oauth = catalog.find((p) => p.vendor === "anthropic-oauth");
// Registry display_name "Claude Code subscription" (decorated with the
// model count by the catalog builder is acceptable; assert it carries the
// registry label, not an inferred one).
expect(oauth!.label).toContain("Claude Code subscription");
});
it("carries the registry billing_mode per provider", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
expect(catalog.find((p) => p.vendor === "anthropic-oauth")!.billingMode).toBe("byok");
expect(catalog.find((p) => p.vendor === "platform")!.billingMode).toBe("platform_managed");
});
it("surfaces the registry auth_env on the provider entry", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
expect(catalog.find((p) => p.vendor === "anthropic-oauth")!.envVars).toEqual([
"CLAUDE_CODE_OAUTH_TOKEN",
]);
});
it("only includes providers that actually have at least one served model", () => {
// anthropic-api is a registry provider but has no model in this slice →
// it should not appear as an empty bucket.
const models: RegistryModel[] = [
{ id: "sonnet", provider: "anthropic-oauth", billing_mode: "byok" },
];
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
models,
);
expect(catalog.map((p) => p.vendor)).toEqual(["anthropic-oauth"]);
});
});
+137 -158
View File
@@ -7,28 +7,10 @@ import { api } from "@/lib/api";
// Types
// ---------------------------------------------------------------------------
// Period keys MUST match the server SSOT (workspace-server budget_periods.go).
type BudgetPeriod = "hourly" | "daily" | "weekly" | "monthly";
const PERIODS: { key: BudgetPeriod; label: string }[] = [
{ key: "hourly", label: "Hourly" },
{ key: "daily", label: "Daily" },
{ key: "weekly", label: "Weekly" },
{ key: "monthly", label: "Monthly" },
];
interface PeriodBudget {
limit: number | null; // USD cents; null = no limit
spend: number; // rolling-window spend, USD cents
remaining: number | null; // null when no limit
}
interface BudgetData {
periods?: Partial<Record<BudgetPeriod, PeriodBudget>>;
// legacy fields (pre-multi-period server) — tolerated for back-compat
budget_limit?: number | null;
monthly_spend?: number;
budget_remaining?: number | null;
budget_limit: number | null;
budget_used?: number; // optional — provisioning-stuck workspaces return partial shapes
budget_remaining: number | null;
}
interface Props {
@@ -44,71 +26,31 @@ function isApiError402(e: unknown): boolean {
return e instanceof Error && /: 402( |$)/.test(e.message);
}
/** USD cents → "$X.XX". */
function fmtUSD(cents: number): string {
return `$${(cents / 100).toLocaleString(undefined, { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;
}
/** Normalize the server payload (multi-period or legacy) into a period map. */
function periodsFrom(data: BudgetData | null): Record<BudgetPeriod, PeriodBudget> {
const base: Record<BudgetPeriod, PeriodBudget> = {
hourly: { limit: null, spend: 0, remaining: null },
daily: { limit: null, spend: 0, remaining: null },
weekly: { limit: null, spend: 0, remaining: null },
monthly: { limit: null, spend: 0, remaining: null },
};
if (!data) return base;
if (data.periods) {
for (const { key } of PERIODS) {
const p = data.periods[key];
if (p) base[key] = { limit: p.limit ?? null, spend: p.spend ?? 0, remaining: p.remaining ?? null };
}
return base;
}
// legacy: map the single monthly limit/spend
base.monthly = {
limit: data.budget_limit ?? null,
spend: data.monthly_spend ?? 0,
remaining: data.budget_remaining ?? null,
};
return base;
}
// ---------------------------------------------------------------------------
// Component
// ---------------------------------------------------------------------------
/**
* BudgetSection — per-workspace LLM budget, four independent rolling windows
* (hourly / daily / weekly / monthly). Each period has its own ceiling (USD);
* spend is the rolling-window LLM cost. Crossing ANY period blocks new work
* (server returns 402). Sends PATCH {budget_limits:{period:cents|null}}.
* BudgetSection — dedicated "Budget" section in the workspace details panel.
*
* - Fetches GET /workspaces/:id/budget on mount for live usage stats
* - Shows a progress bar (budget_used / budget_limit, blue-500, capped 100%)
* - Allows updating budget_limit via PATCH /workspaces/:id/budget
* - Shows a 402-specific "Budget exceeded" amber banner for any blocked state
*/
export function BudgetSection({ workspaceId }: Props) {
const [budget, setBudget] = useState<BudgetData | null>(null);
const [loading, setLoading] = useState(true);
const [fetchError, setFetchError] = useState<string | null>(null);
// One input per period, in USD cents (string for controlled inputs).
const [limitInputs, setLimitInputs] = useState<Record<BudgetPeriod, string>>({
hourly: "",
daily: "",
weekly: "",
monthly: "",
});
const [limitInput, setLimitInput] = useState("");
const [saving, setSaving] = useState(false);
const [saveError, setSaveError] = useState<string | null>(null);
/** True when a 402 has been seen from any API call in this section. */
const [budgetExceeded, setBudgetExceeded] = useState(false);
const syncInputs = useCallback((data: BudgetData | null) => {
const p = periodsFrom(data);
setLimitInputs({
hourly: p.hourly.limit != null ? String(p.hourly.limit) : "",
daily: p.daily.limit != null ? String(p.daily.limit) : "",
weekly: p.weekly.limit != null ? String(p.weekly.limit) : "",
monthly: p.monthly.limit != null ? String(p.monthly.limit) : "",
});
}, []);
// ── Fetch current budget data ─────────────────────────────────────────────
const loadBudget = useCallback(async () => {
setLoading(true);
@@ -116,7 +58,7 @@ export function BudgetSection({ workspaceId }: Props) {
try {
const data = await api.get<BudgetData>(`/workspaces/${workspaceId}/budget`);
setBudget(data);
syncInputs(data);
setLimitInput(data.budget_limit != null ? String(data.budget_limit) : "");
} catch (e) {
if (isApiError402(e)) {
setBudgetExceeded(true);
@@ -126,30 +68,29 @@ export function BudgetSection({ workspaceId }: Props) {
} finally {
setLoading(false);
}
}, [workspaceId, syncInputs]);
}, [workspaceId]);
useEffect(() => {
loadBudget();
}, [loadBudget]);
// ── Save handler ──────────────────────────────────────────────────────────
const handleSave = async () => {
setSaving(true);
setSaveError(null);
// Build the per-period map: blank → null (clear); a number → that ceiling.
const budget_limits: Record<BudgetPeriod, number | null> = {
hourly: null,
daily: null,
weekly: null,
monthly: null,
};
for (const { key } of PERIODS) {
const raw = limitInputs[key].trim();
budget_limits[key] = raw !== "" ? parseInt(raw, 10) : null;
}
const raw = limitInput.trim();
// Use explicit empty-string check (not falsy check) so that a
// user-entered "0" is sent as budget_limit: 0, not null (unlimited).
const parsedLimit = raw !== "" ? parseInt(raw, 10) : null;
try {
const updated = await api.patch<BudgetData>(`/workspaces/${workspaceId}/budget`, { budget_limits });
const updated = await api.patch<BudgetData>(`/workspaces/${workspaceId}/budget`, {
budget_limit: parsedLimit,
});
setBudget(updated);
syncInputs(updated);
setLimitInput(updated.budget_limit != null ? String(updated.budget_limit) : "");
// Clear exceeded state if the save succeeded (limit was raised or removed)
setBudgetExceeded(false);
} catch (e) {
if (isApiError402(e)) {
@@ -162,15 +103,24 @@ export function BudgetSection({ workspaceId }: Props) {
}
};
const periods = periodsFrom(budget);
// ── Progress calculation ──────────────────────────────────────────────────
const progressPct =
budget && budget.budget_limit != null && budget.budget_limit > 0
? Math.min(100, Math.round(((budget.budget_used ?? 0) / budget.budget_limit) * 100))
: 0;
// ── Render ────────────────────────────────────────────────────────────────
return (
<div className="space-y-3" data-testid="budget-section">
{/* Section header */}
<div>
<h3 className="text-xs font-semibold text-ink-mid uppercase tracking-wider">Budget</h3>
<h3 className="text-xs font-semibold text-ink-mid uppercase tracking-wider">
Budget
</h3>
<p className="text-[11px] text-ink-mid mt-0.5">
Cap LLM spend for this workspace per period crossing any limit pauses new work
Limit total message credits for this workspace
</p>
</div>
@@ -181,14 +131,32 @@ export function BudgetSection({ workspaceId }: Props) {
data-testid="budget-exceeded-banner"
className="flex items-center gap-2 px-3 py-2 rounded-lg bg-surface border border-amber-700/50 text-warm text-xs font-medium"
>
<svg width="13" height="13" viewBox="0 0 13 13" fill="none" aria-hidden="true" className="shrink-0">
<path d="M6.5 1.5L11.5 10.5H1.5L6.5 1.5Z" stroke="currentColor" strokeWidth="1.4" strokeLinejoin="round" />
<path d="M6.5 5.5V7.5M6.5 9.5h.01" stroke="currentColor" strokeWidth="1.4" strokeLinecap="round" />
<svg
width="13"
height="13"
viewBox="0 0 13 13"
fill="none"
aria-hidden="true"
className="shrink-0"
>
<path
d="M6.5 1.5L11.5 10.5H1.5L6.5 1.5Z"
stroke="currentColor"
strokeWidth="1.4"
strokeLinejoin="round"
/>
<path
d="M6.5 5.5V7.5M6.5 9.5h.01"
stroke="currentColor"
strokeWidth="1.4"
strokeLinecap="round"
/>
</svg>
Budget exceeded new work paused
Budget exceeded messages blocked
</div>
)}
{/* Usage stats */}
{loading ? (
<p className="text-xs text-ink-mid" data-testid="budget-loading">
Loading
@@ -197,78 +165,89 @@ export function BudgetSection({ workspaceId }: Props) {
<p className="text-xs text-bad" data-testid="budget-fetch-error">
{fetchError}
</p>
) : (
<div className="space-y-3">
{PERIODS.map(({ key, label }) => {
const p = periods[key];
const pct =
p.limit != null && p.limit > 0 ? Math.min(100, Math.round((p.spend / p.limit) * 100)) : 0;
const over = p.limit != null && p.spend >= p.limit;
return (
<div key={key} className="space-y-1" data-testid={`budget-period-${key}`}>
<div className="flex items-baseline justify-between">
<label htmlFor={`budget-${key}-${workspaceId}`} className="text-xs text-ink-mid">
{label}
</label>
<span className="text-[11px] font-mono text-ink-mid">
<span data-testid={`budget-${key}-spend`}>{fmtUSD(p.spend)}</span>
<span className="mx-1">/</span>
<span data-testid={`budget-${key}-limit`}>{p.limit != null ? fmtUSD(p.limit) : "∞"}</span>
</span>
</div>
{p.limit != null && (
<div
role="progressbar"
aria-label={`${label} budget usage`}
aria-valuenow={pct}
aria-valuemin={0}
aria-valuemax={100}
className="h-1.5 w-full rounded-full bg-surface-card overflow-hidden"
>
<div
data-testid={`budget-${key}-fill`}
className={`h-full rounded-full transition-all duration-300 ${over ? "bg-bad" : "bg-accent"}`}
style={{ width: `${pct}%` }}
/>
</div>
)}
<input
id={`budget-${key}-${workspaceId}`}
type="number"
min="0"
step="1"
value={limitInputs[key]}
onChange={(e) => setLimitInputs((s) => ({ ...s, [key]: e.target.value }))}
placeholder="USD cents — blank for unlimited"
data-testid={`budget-${key}-input`}
className="w-full bg-surface-card border border-line rounded-lg px-3 py-1.5 text-xs text-ink-mid placeholder-zinc-500 focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/30 transition-colors"
/>
</div>
);
})}
) : budget ? (
<div className="space-y-2">
{/* Stats row */}
<div className="flex items-baseline justify-between" data-testid="budget-stats-row">
<span className="text-xs text-ink-mid">Credits used</span>
<span className="text-xs font-mono text-ink-mid">
<span data-testid="budget-used-value">{(budget.budget_used ?? 0).toLocaleString()}</span>
<span className="text-ink-mid mx-1">/</span>
<span data-testid="budget-limit-value">
{budget.budget_limit != null
? budget.budget_limit.toLocaleString()
: "Unlimited"}
</span>
</span>
</div>
<p className="text-[11px] text-ink-mid">Limits are USD cents (e.g. 500 = $5.00). Blank = unlimited.</p>
{saveError && (
{/* Progress bar (only when limit is set) */}
{budget.budget_limit != null && (
<div
role="alert"
data-testid="budget-save-error"
className="px-3 py-1.5 rounded-lg bg-red-950/40 border border-red-800/50 text-xs text-bad"
role="progressbar"
aria-label="Budget usage"
aria-valuenow={progressPct}
aria-valuemin={0}
aria-valuemax={100}
className="h-1.5 w-full rounded-full bg-surface-card overflow-hidden"
>
{saveError}
<div
data-testid="budget-progress-fill"
className="h-full rounded-full bg-accent transition-all duration-300"
style={{ width: `${progressPct}%` }}
/>
</div>
)}
<button
onClick={handleSave}
disabled={saving}
data-testid="budget-save-btn"
className="px-4 py-1.5 bg-accent-strong hover:bg-accent active:bg-accent-strong rounded-lg text-xs font-medium text-white disabled:opacity-50 transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-900"
>
{saving ? "Saving…" : "Save"}
</button>
{/* Remaining credits */}
{budget.budget_remaining != null && (
<p className="text-[11px] text-ink-mid" data-testid="budget-remaining">
{budget.budget_remaining.toLocaleString()} credits remaining
</p>
)}
</div>
)}
) : null}
{/* Input + Save */}
<div className="space-y-1.5 pt-1">
<label
htmlFor={`budget-limit-input-${workspaceId}`}
className="text-[11px] text-ink-mid block"
>
Budget limit (credits)
</label>
<input
id={`budget-limit-input-${workspaceId}`}
type="number"
min="0"
step="1"
value={limitInput}
onChange={(e) => setLimitInput(e.target.value)}
placeholder="e.g. 1000 — blank for unlimited"
data-testid="budget-limit-input"
className="w-full bg-surface-card border border-line rounded-lg px-3 py-2 text-sm text-ink-mid placeholder-zinc-500 focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/30 transition-colors"
/>
<p className="text-xs text-ink-mid">Leave blank for unlimited</p>
{saveError && (
<div
role="alert"
data-testid="budget-save-error"
className="px-3 py-1.5 rounded-lg bg-red-950/40 border border-red-800/50 text-xs text-bad"
>
{saveError}
</div>
)}
<button
onClick={handleSave}
disabled={saving}
data-testid="budget-save-btn"
className="px-4 py-1.5 bg-accent-strong hover:bg-accent active:bg-accent-strong rounded-lg text-xs font-medium text-white disabled:opacity-50 transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-900"
>
{saving ? "Saving…" : "Save"}
</button>
</div>
</div>
);
}
+15 -127
View File
@@ -11,12 +11,8 @@ import { ExternalConnectionSection } from "./ExternalConnectionSection";
import {
ProviderModelSelector,
buildProviderCatalog,
buildProviderCatalogFromRegistry,
findProviderForModel,
type SelectorValue,
type ProviderEntry,
type RegistryProvider,
type RegistryModel,
} from "../ProviderModelSelector";
import { isExternalLikeRuntime } from "@/lib/externalRuntimes";
@@ -262,17 +258,6 @@ interface RuntimeOption {
// canvas falls back to deriving unique vendor prefixes from
// models[].id (still adapter-driven, just inferred).
providers: string[];
// registryBacked / registryProviders / registryModels come from the
// registry-served GET /templates fields (internal#718 P3). When
// registryBacked is true, the selectable provider+model list is built from
// the registry (registryProviders/registryModels) — display labels +
// billing mode + derived provider come from the provider-registry SSOT, not
// the canvas VENDOR_LABELS / billingModeForProvider vocabularies. When
// false (non-registry runtime / older backend), the canvas falls back to
// the template-served models[] + its inferVendor heuristic.
registryBacked: boolean;
registryProviders: RegistryProvider[];
registryModels: RegistryModel[];
}
// deriveProvidersFromModels — when a template doesn't ship an explicit
@@ -337,32 +322,6 @@ export function billingModeForProvider(provider: string): LLMBillingMode {
return "byok";
}
// billingModeForSelectedProvider — internal#718 P3 (retire-list #5): the
// billing mode the Config tab shows/sends for the selected PROVIDER, sourced
// from the registry-served catalog when available rather than the hardcoded
// billingModeForProvider rule.
//
// When the runtime is registry-backed, GET /templates serves each provider's
// DERIVED billing_mode (platform_managed for the closed platform provider,
// byok otherwise) on the ProviderEntry. We read it off the catalog so the UI
// reflects the registry SSOT — the same predicate billing/credential emission
// keys off the derived provider.
//
// Falls back to billingModeForProvider when: no catalog (non-registry runtime
// / older backend), or the provider string isn't carried by the catalog
// (e.g. a stale saved value). The fallback keeps the legacy behavior intact
// for everything the registry doesn't yet speak to.
export function billingModeForSelectedProvider(
provider: string,
catalog?: ProviderEntry[],
): LLMBillingMode {
if (catalog && catalog.length > 0) {
const entry = catalog.find((p) => p.vendor === provider.trim());
if (entry?.billingMode) return entry.billingMode;
}
return billingModeForProvider(provider);
}
// Fallback used when /templates can't be fetched (offline, older backend).
// Keep in sync with manifest.json workspace_templates as a defensive default.
// Model + env suggestions only flow when the backend is reachable.
@@ -377,20 +336,13 @@ export function billingModeForSelectedProvider(
// config.yaml` on the container is a separate runtime-internal file,
// not this one.
const RUNTIMES_WITH_OWN_CONFIG = new Set<string>(["external", "kimi", "kimi-cli", "openclaw"]);
// The runtime picker is SSOT-driven: options come from GET /templates,
// which workspace-server already gates to the manifest.json maintained set
// (loadRuntimesFromManifest). A hand-maintained frontend allowlist silently
// dropped runtimes the backend added (google-adk shipped in manifest but was
// filtered out, so its workspaces rendered the wrong default option). A
// template may still opt OUT of the picker via `displayable: false` on its
// /templates row. See project_canvas_runtime_dropdown_ssot_fix.
const SUPPORTED_RUNTIME_VALUES = new Set(["claude-code", "codex", "openclaw", "hermes"]);
const FALLBACK_RUNTIME_OPTIONS: RuntimeOption[] = [
{ value: "claude-code", label: "Claude Code", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "codex", label: "Codex", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "google-adk", label: "Google ADK", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "openclaw", label: "OpenClaw", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "hermes", label: "Hermes", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "claude-code", label: "Claude Code", models: [], providers: [] },
{ value: "codex", label: "Codex", models: [], providers: [] },
{ value: "openclaw", label: "OpenClaw", models: [], providers: [] },
{ value: "hermes", label: "Hermes", models: [], providers: [] },
];
export function ConfigTab({ workspaceId }: Props) {
@@ -581,49 +533,20 @@ export function ConfigTab({ workspaceId }: Props) {
useEffect(() => {
let cancelled = false;
api.get<Array<{
id: string;
name?: string;
runtime?: string;
models?: ModelSpec[];
providers?: string[];
// internal#718 P3 registry-served fields (additive; absent on older
// backends and for non-registry runtimes).
registry_backed?: boolean;
registry_providers?: RegistryProvider[];
registry_models?: RegistryModel[];
displayable?: boolean;
}>>("/templates")
api.get<Array<{ id: string; name?: string; runtime?: string; models?: ModelSpec[]; providers?: string[] }>>("/templates")
.then((rows) => {
if (cancelled || !Array.isArray(rows)) return;
const byRuntime = new Map<string, RuntimeOption>();
for (const r of rows) {
const v = (r.runtime || "").trim();
if (!v) continue;
// Honor an explicit opt-out; absent/true means show it.
if (r.displayable === false) continue;
if (!SUPPORTED_RUNTIME_VALUES.has(v)) continue;
// Last template wins if two templates share a runtime — rare, and the
// one with the richer models list is probably newer.
const existing = byRuntime.get(v);
const models = Array.isArray(r.models) ? r.models : [];
const providers = Array.isArray(r.providers) ? r.providers : [];
const registryProviders = Array.isArray(r.registry_providers) ? r.registry_providers : [];
const registryModels = Array.isArray(r.registry_models) ? r.registry_models : [];
const registryBacked = r.registry_backed === true && registryModels.length > 0;
// Prefer the richer payload: a registry-backed entry, then more
// template models. Keeps the "last/richer template wins" intent.
const score = (o: RuntimeOption) => (o.registryBacked ? 1000 : 0) + o.models.length;
const candidate: RuntimeOption = {
value: v,
label: r.name || v,
models,
providers,
registryBacked,
registryProviders,
registryModels,
};
if (!existing || score(candidate) > score(existing)) {
byRuntime.set(v, candidate);
if (!existing || models.length > existing.models.length) {
byRuntime.set(v, { value: v, label: r.name || v, models, providers });
}
}
if (byRuntime.size > 0) setRuntimeOptions(Array.from(byRuntime.values()));
@@ -634,13 +557,7 @@ export function ConfigTab({ workspaceId }: Props) {
// Models + env hints for the currently-selected runtime.
const selectedRuntime = runtimeOptions.find((o) => o.value === (config.runtime || "")) ?? null;
// Memoised so its identity is stable across renders — it feeds several
// useMemo dependency arrays below (registry/legacy catalog, selector models)
// and a fresh `[]` literal each render would defeat their memoisation.
const availableModels: ModelSpec[] = useMemo(
() => selectedRuntime?.models ?? [],
[selectedRuntime?.models],
);
const availableModels: ModelSpec[] = selectedRuntime?.models ?? [];
// Provider suggestions for the legacy free-text input fallback (used
// when /templates returned no models for this runtime, e.g. hermes
// workspaces). Prefer the runtime's declarative providers list,
@@ -654,37 +571,9 @@ export function ConfigTab({ workspaceId }: Props) {
// Vendor-aware catalog shared with the selector. Memoised so the
// catalog identity is stable across renders (selector relies on it).
//
// internal#718 P3: when the runtime is registry-backed, build the catalog
// FROM the registry-served providers/models (display labels + billing +
// derived provider from the provider-registry SSOT) instead of re-inferring
// vendor from model-id prefixes. Falls back to the inferVendor heuristic
// for non-registry runtimes / older backends.
const registryBacked = selectedRuntime?.registryBacked ?? false;
const providerCatalog = useMemo(
() =>
registryBacked
? buildProviderCatalogFromRegistry(
selectedRuntime?.registryProviders ?? [],
selectedRuntime?.registryModels ?? [],
)
: buildProviderCatalog(availableModels),
[registryBacked, selectedRuntime?.registryProviders, selectedRuntime?.registryModels, availableModels],
);
// Models fed to the selector dropdown: the registry-served native set for a
// registry-backed runtime (so the dropdown can render no unregistered
// option), else the template-served models.
const selectorModels: ModelSpec[] = useMemo(
() =>
registryBacked
? (selectedRuntime?.registryModels ?? []).map((m) => ({
id: m.id,
name: m.name,
// carry the derived provider so the selector buckets correctly
...(m.provider ? { provider: m.provider } : {}),
}))
: availableModels,
[registryBacked, selectedRuntime?.registryModels, availableModels],
() => buildProviderCatalog(availableModels),
[availableModels],
);
// Derive the selector's current value from the form state. Provider
@@ -1004,10 +893,9 @@ export function ConfigTab({ workspaceId }: Props) {
— empty = "auto-derive from model slug" was the pre-PR-5
behavior; selecting any provider here writes LLM_PROVIDER
and triggers an auto-restart. */}
{selectorModels.length > 0 ? (
{availableModels.length > 0 ? (
<ProviderModelSelector
models={selectorModels}
catalog={registryBacked ? providerCatalog : undefined}
models={availableModels}
value={selectorValue}
onChange={(next) => {
setSelectorValue(next);
@@ -1020,7 +908,7 @@ export function ConfigTab({ workspaceId }: Props) {
setConfig((prev) => {
const v = next.model;
const prevModelId = prev.runtime_config?.model || prev.model || "";
const prevSpec = selectorModels.find((m) => m.id === prevModelId) ?? null;
const prevSpec = availableModels.find((m) => m.id === prevModelId) ?? null;
const prevRequired = prev.runtime_config?.required_env ?? [];
const wasTemplateDriven =
prevRequired.length === 0 ||
@@ -29,15 +29,8 @@ type FormState = {
displayMode: string;
displayProtocol: string;
resolution: string;
dataPersistence: string; // "" (auto) | "persist" | "ephemeral" — internal#734
};
// internal#734: per-workspace durable-data choice. "" = auto (desktop-control
// keeps data, others follow the org default). Human labels for the selector.
const DATA_PERSISTENCE_OPTIONS = ["", "persist", "ephemeral"];
const dataPersistenceLabel = (v: string): string =>
v === "persist" ? "Always keep (persist)" : v === "ephemeral" ? "Don't keep (ephemeral)" : "Auto";
export function ContainerConfigTab({ workspaceId, data }: Props) {
const runtime = data.runtime;
const instanceType = data.compute?.instance_type;
@@ -46,10 +39,9 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
const displayProtocol = data.compute?.display?.protocol;
const displayWidth = data.compute?.display?.width;
const displayHeight = data.compute?.display?.height;
const dataPersistence = data.compute?.data_persistence;
const initial = useMemo(
() => formFromData({ runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight, dataPersistence }),
[runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight, dataPersistence],
() => formFromData({ runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight }),
[runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight],
);
const [form, setForm] = useState<FormState>(initial);
const [saving, setSaving] = useState(false);
@@ -92,8 +84,6 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
display: form.displayEnabled
? { mode: form.displayMode, protocol: form.displayProtocol, width, height }
: { mode: "none" },
// internal#734: omit when "auto" so the wire/default behavior is unchanged.
...(form.dataPersistence ? { data_persistence: form.dataPersistence } : {}),
};
const resp = await api.patch<{ needs_restart?: boolean }>(`/workspaces/${workspaceId}`, {
@@ -186,18 +176,6 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
onChange={(resolution) => setForm((s) => ({ ...s, resolution }))}
/>
)}
<SelectField
id="data-persistence"
label="Saved data (cookies, downloads, memory)"
value={form.dataPersistence}
options={DATA_PERSISTENCE_OPTIONS}
optionLabel={dataPersistenceLabel}
onChange={(dataPersistence) => setForm((s) => ({ ...s, dataPersistence }))}
/>
<p className="-mt-1 text-[10px] leading-snug text-ink-soft">
Whether this workspace&apos;s data survives a restart/recreate. Auto keeps it for
browser (desktop) workspaces; Ephemeral never keeps it (privacy).
</p>
</div>
<div className="mt-4 flex items-center justify-end gap-2">
@@ -253,7 +231,6 @@ function formFromData(data: {
displayProtocol?: string;
displayWidth?: number;
displayHeight?: number;
dataPersistence?: string;
}): FormState {
const width = data.displayWidth ?? 1920;
const height = data.displayHeight ?? 1080;
@@ -266,7 +243,6 @@ function formFromData(data: {
displayMode: data.displayMode && data.displayMode !== "none" ? data.displayMode : "desktop-control",
displayProtocol: data.displayProtocol || "novnc",
resolution,
dataPersistence: data.dataPersistence || "",
};
}
+1 -19
View File
@@ -29,7 +29,6 @@ export function DetailsTab({ workspaceId, data }: Props) {
const [peers, setPeers] = useState<PeerData[]>([]);
const [saving, setSaving] = useState(false);
const [confirmDelete, setConfirmDelete] = useState(false);
const [eraseData, setEraseData] = useState(false); // internal#734: erase saved data on delete
const [peersError, setPeersError] = useState<string | null>(null);
const [saveError, setSaveError] = useState<string | null>(null);
const [deleteError, setDeleteError] = useState<string | null>(null);
@@ -94,10 +93,7 @@ export function DetailsTab({ workspaceId, data }: Props) {
const handleDelete = async () => {
setDeleteError(null);
try {
// internal#734: erase_data=true asks the server to prune this workspace's
// durable data volume (cookies / downloads / memory). Default off keeps it
// for the orphan-sweeper grace.
await api.del(`/workspaces/${workspaceId}?confirm=true${eraseData ? "&erase_data=true" : ""}`, {
await api.del(`/workspaces/${workspaceId}?confirm=true`, {
headers: { "X-Confirm-Name": name },
});
// Mirror the server-side cascade — drop the row + every
@@ -327,19 +323,6 @@ export function DetailsTab({ workspaceId, data }: Props) {
<h3 id="delete-confirm-title" className="text-xs font-medium text-bad">
Confirm deletion
</h3>
<label className="flex items-start gap-2 text-[11px] text-ink-mid">
<input
type="checkbox"
aria-label="Also erase saved data"
checked={eraseData}
onChange={(e) => setEraseData(e.target.checked)}
className="mt-0.5 h-3.5 w-3.5 accent-red-600"
/>
<span>
Also erase saved data (cookies, downloads, agent memory). Cannot be undone.
Unchecked keeps it recoverable briefly.
</span>
</label>
<div className="flex gap-2">
<button
type="button"
@@ -356,7 +339,6 @@ export function DetailsTab({ workspaceId, data }: Props) {
onClick={() => {
setConfirmDelete(false);
setDeleteError(null);
setEraseData(false);
// Return focus to the trigger so keyboard users aren't stranded
deleteButtonRef.current?.focus();
}}
+8 -94
View File
@@ -1,6 +1,6 @@
"use client";
import { useCallback, useEffect, useRef, useState } from "react";
import { useEffect, useRef, useState } from "react";
import { api } from "@/lib/api";
import type RFB from "@novnc/novnc";
@@ -33,11 +33,6 @@ export function DisplayTab({ workspaceId }: Props) {
const [controlBusy, setControlBusy] = useState(false);
const [sessionUrl, setSessionUrl] = useState<string | null>(null);
const requestGeneration = useRef(0);
// Freshest signed session URL (token bound to the lease's expires_at). The
// renewal timer keeps this current WITHOUT swapping the live stream's
// sessionUrl (which would needlessly reconnect the desktop); the stream uses
// it only when it has to reconnect after an unclean drop.
const latestSessionUrlRef = useRef<string | null>(null);
useEffect(() => {
const generation = requestGeneration.current + 1;
@@ -46,7 +41,6 @@ export function DisplayTab({ workspaceId }: Props) {
setStatus(null);
setControl(null);
setSessionUrl(null);
latestSessionUrlRef.current = null;
setError(null);
setControlError(null);
setControlBusy(false);
@@ -75,41 +69,6 @@ export function DisplayTab({ workspaceId }: Props) {
};
}, [workspaceId]);
// Acquire (or re-acquire) the display-control lease as the current holder.
// Re-acquiring extends the 300s server-side lock AND returns a freshly-signed
// session URL (token bound to the new expires_at). Used both to renew the
// lease on a timer and to mint a non-stale token for each reconnect — a
// cached URL can be past its ~300s expiry, which would make a reconnect 401.
const reacquireSession = useCallback(async (): Promise<string | null> => {
const generation = requestGeneration.current;
try {
const next = await api.post<DisplayControlStatus>(
`/workspaces/${workspaceId}/display/control/acquire`,
{ controller: "user", ttl_seconds: 300 },
);
if (requestGeneration.current !== generation) return null;
setControl(next);
if (next.session_url) latestSessionUrlRef.current = next.session_url;
return next.session_url ?? null;
} catch {
// Transient failure, or another holder took over: the live stream keeps
// running on its existing connection; a reconnect re-evaluates control.
return null;
}
}, [workspaceId]);
// Renew the lease while we hold it. The lock is a 300s lease with no
// server-side auto-renewal, so without this the control (and the session
// token) silently expire mid-session — the user appears "kicked" every ~5
// minutes. We renew well inside the TTL and do not touch the live stream.
useEffect(() => {
if (!sessionUrl) return;
const timer = setInterval(() => {
void reacquireSession();
}, 120_000);
return () => clearInterval(timer);
}, [sessionUrl, reacquireSession]);
const acquireControl = async () => {
const generation = requestGeneration.current;
const controlPath = `/workspaces/${workspaceId}/display/control`;
@@ -123,7 +82,6 @@ export function DisplayTab({ workspaceId }: Props) {
if (requestGeneration.current !== generation) return;
setControl(next);
setSessionUrl(next.session_url || null);
latestSessionUrlRef.current = next.session_url || null;
} catch (err) {
if (requestGeneration.current !== generation) return;
setControlError("Failed to take control");
@@ -150,7 +108,6 @@ export function DisplayTab({ workspaceId }: Props) {
if (requestGeneration.current !== generation) return;
setControl(next);
setSessionUrl(null);
latestSessionUrlRef.current = null;
} catch (err) {
if (requestGeneration.current !== generation) return;
setControlError("Failed to release control");
@@ -278,11 +235,7 @@ export function DisplayTab({ workspaceId }: Props) {
/>
</div>
{sessionUrl ? (
<DesktopStream
sessionUrl={sessionUrl}
latestSessionUrlRef={latestSessionUrlRef}
reacquireSession={reacquireSession}
/>
<DesktopStream sessionUrl={sessionUrl} />
) : (
<div className="flex flex-1 items-center justify-center p-8 text-center">
<div>
@@ -358,15 +311,7 @@ function DisplayControlBar({
);
}
function DesktopStream({
sessionUrl,
latestSessionUrlRef,
reacquireSession,
}: {
sessionUrl: string;
latestSessionUrlRef: { current: string | null };
reacquireSession: () => Promise<string | null>;
}) {
function DesktopStream({ sessionUrl }: { sessionUrl: string }) {
const containerRef = useRef<HTMLDivElement | null>(null);
const rfbRef = useRef<RFB | null>(null);
const [streamError, setStreamError] = useState<string | null>(null);
@@ -384,37 +329,20 @@ function DesktopStream({
clipboardTimer = setTimeout(() => setClipboardStatus(null), 2500);
};
let attempts = 0;
let retryTimer: ReturnType<typeof setTimeout> | null = null;
const maxAttempts = 10;
async function connect(reacquire = false) {
async function connect() {
setStreamError(null);
try {
// On a reconnect, mint a fresh lease + token first — the original token
// is only ~300s, so a cached URL can be expired and would 401. The
// initial connect already holds a fresh token from acquireControl.
if (reacquire) await reacquireSession();
const mod = await import("@novnc/novnc");
if (cancelled || !containerRef.current) return;
const stream = displayWebSocketConnection(latestSessionUrlRef.current || sessionUrl);
const stream = displayWebSocketConnection(sessionUrl);
rfb = new mod.default(containerRef.current, stream.url, {
wsProtocols: ["binary", `molecule-display-token.${stream.token}`],
});
rfbRef.current = rfb;
rfb.scaleViewport = true;
// Do NOT request a server-side resize: the workspace display runs a
// fixed Xorg modeline and x11vnc rejects SetDesktopSize ("Resize is
// administratively prohibited"), which spams the console on every
// (re)connect. scaleViewport already fits the fixed framebuffer to the
// container client-side, so we don't need the server to resize.
rfb.resizeSession = false;
rfb.resizeSession = true;
rfb.focusOnClick = true;
rfb.focus({ preventScroll: true });
rfb.addEventListener("connect", () => {
attempts = 0;
if (!cancelled) setStreamError(null);
});
rfb.addEventListener("clipboard", (event: Event) => {
const text = (event as CustomEvent<{ text?: string }>).detail?.text ?? "";
if (!text) return;
@@ -425,20 +353,7 @@ function DesktopStream({
});
rfb.addEventListener("disconnect", (event: Event) => {
const detail = (event as CustomEvent<{ clean?: boolean }>).detail;
rfbRef.current = null;
if (cancelled || detail?.clean) return;
// Auto-reconnect after an unclean drop (idle/network blip, brief
// agent hiccup); bounded backoff so a genuinely-dead session still
// surfaces an error instead of looping forever.
if (attempts < maxAttempts) {
attempts += 1;
setStreamError(`Reconnecting to desktop… (attempt ${attempts})`);
retryTimer = setTimeout(() => {
if (!cancelled) void connect(true);
}, Math.min(1000 * attempts, 5000));
} else {
setStreamError("Desktop stream disconnected.");
}
if (!cancelled && !detail?.clean) setStreamError("Desktop stream disconnected.");
});
} catch {
if (!cancelled) setStreamError("Desktop stream could not be opened.");
@@ -448,12 +363,11 @@ function DesktopStream({
connect();
return () => {
cancelled = true;
if (retryTimer) clearTimeout(retryTimer);
if (clipboardTimer) clearTimeout(clipboardTimer);
rfbRef.current = null;
rfb?.disconnect();
};
}, [sessionUrl, reacquireSession, latestSessionUrlRef]);
}, [sessionUrl]);
useEffect(() => {
const onPaste = (event: ClipboardEvent) => {
@@ -5,10 +5,9 @@ import React from "react";
import { BudgetSection } from "../BudgetSection";
import { api } from "@/lib/api";
// Multi-period budget (#49): the API now returns a `periods` map
// (hourly/daily/weekly/monthly), each {limit, spend, remaining} in USD cents.
// The UI renders one row per period and PATCHes {budget_limits:{period:cents|null}}.
// Queue-based mock for the api module. Each api call shifts from the queue.
// Tests push with qGet/qPatch and the module-level mockImplementation
// reads from the queue.
type QueueEntry = { body?: unknown; err?: Error };
const apiQueue: QueueEntry[] = [];
@@ -41,49 +40,45 @@ const WS_ID = "budget-test-ws";
function qGet(body: unknown) {
apiQueue.push({ body });
}
function qGetErr(status: number, msg: string) {
apiQueue.push({ err: new Error(`${msg}: ${status}`) });
}
function qPatch(body: unknown) {
apiQueue.push({ body });
}
function qPatchErr(status: number, msg: string) {
apiQueue.push({ err: new Error(`${msg}: ${status}`) });
}
type P = { limit: number | null; spend: number; remaining: number | null };
// makeBudget builds the periods response. Override any subset of periods.
function makeBudget(overrides: Partial<Record<"hourly" | "daily" | "weekly" | "monthly", Partial<P>>> = {}) {
const blank: P = { limit: null, spend: 0, remaining: null };
const mk = (o?: Partial<P>): P => {
const p = { ...blank, ...(o ?? {}) };
if (p.limit != null && p.remaining == null) p.remaining = p.limit - p.spend;
return p;
};
const periods = {
hourly: mk(overrides.hourly),
daily: mk(overrides.daily),
weekly: mk(overrides.weekly),
monthly: mk(overrides.monthly),
};
function makeBudget(overrides: Partial<{
budget_limit: number | null;
budget_used: number;
budget_remaining: number | null;
}> = {}) {
return {
periods,
budget_limit: periods.monthly.limit,
monthly_spend: periods.monthly.spend,
budget_remaining: periods.monthly.remaining,
budget_limit: 10_000,
budget_used: 3_500,
budget_remaining: 6_500,
...overrides,
};
}
describe("BudgetSection (multi-period)", () => {
describe("BudgetSection", () => {
describe("loading state", () => {
it("shows loading indicator while fetching", async () => {
let resolveGet: (v: unknown) => void;
vi.mocked(api.get).mockImplementationOnce(
async () => new Promise((r) => { resolveGet = r as (v: unknown) => void; }),
);
render(<BudgetSection workspaceId={WS_ID} />);
expect(screen.getByTestId("budget-loading")).toBeTruthy();
// Resolve after render to verify state clears
resolveGet!(makeBudget());
await vi.waitFor(() => {
expect(screen.queryByTestId("budget-loading")).toBeNull();
@@ -94,16 +89,21 @@ describe("BudgetSection (multi-period)", () => {
describe("fetch error state", () => {
it("shows error message on non-402 fetch failure", async () => {
qGetErr(500, "Internal Server Error");
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-fetch-error")).toBeTruthy();
});
expect(screen.getByTestId("budget-fetch-error")!.textContent).toContain("500");
});
it("shows the exceeded banner (not a fetch error) on a 402", async () => {
it("shows 402 as exceeded banner, not fetch error", async () => {
// 402 means the budget limit was hit — different UX from a network/API error.
qGetErr(402, "Payment Required");
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
});
@@ -111,105 +111,220 @@ describe("BudgetSection (multi-period)", () => {
});
});
describe("rendering periods", () => {
it("renders all four period rows", async () => {
qGet(makeBudget());
describe("budget loaded — display", () => {
it("renders used / limit stats row", async () => {
qGet(makeBudget({ budget_limit: 10_000, budget_used: 3_500 }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
for (const k of ["hourly", "daily", "weekly", "monthly"]) {
expect(screen.getByTestId(`budget-period-${k}`)).toBeTruthy();
}
expect(screen.getByTestId("budget-used-value")!.textContent).toBe("3,500");
});
expect(screen.getByTestId("budget-limit-value")!.textContent).toBe("10,000");
});
it("renders 'Unlimited' when budget_limit is null", async () => {
qGet(makeBudget({ budget_limit: null, budget_used: 1_000, budget_remaining: null }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-limit-value")!.textContent).toBe("Unlimited");
});
});
it("formats spend and limit as USD per period", async () => {
qGet(makeBudget({ monthly: { limit: 10_000, spend: 3_500 } }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-monthly-spend")!.textContent).toBe("$35.00");
});
expect(screen.getByTestId("budget-monthly-limit")!.textContent).toBe("$100.00");
});
it("renders remaining credits when present", async () => {
qGet(makeBudget({ budget_limit: 10_000, budget_used: 3_500, budget_remaining: 6_500 }));
it("shows ∞ for a period with no limit", async () => {
qGet(makeBudget({ hourly: { limit: null, spend: 1_000 } }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-hourly-limit")!.textContent).toBe("");
expect(screen.getByTestId("budget-remaining")!.textContent).toContain("6,500");
expect(screen.getByTestId("budget-remaining")!.textContent).toContain("credits remaining");
});
});
it("renders the progress bar only for periods with a limit", async () => {
qGet(makeBudget({ monthly: { limit: 10_000, spend: 12_000 }, hourly: { limit: null, spend: 5_000 } }));
it("omits remaining credits when budget_remaining is null", async () => {
qGet(makeBudget({ budget_limit: 10_000, budget_used: 3_500, budget_remaining: null }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-monthly-fill")).toBeTruthy();
expect(screen.queryByTestId("budget-remaining")).toBeNull();
});
});
it("caps progress bar at 100% when used > limit", async () => {
// Over-limit: 12000 used of 10000 limit should show 100%, not 120%.
qGet(makeBudget({ budget_limit: 10_000, budget_used: 12_000, budget_remaining: null }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
const fill = screen.getByTestId("budget-progress-fill");
expect(fill.getAttribute("style")).toContain("100%");
});
});
it("omits progress bar when budget_limit is null (unlimited)", async () => {
qGet(makeBudget({ budget_limit: null, budget_used: 5_000, budget_remaining: null }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.queryByTestId("budget-progress-fill")).toBeNull();
});
expect(screen.queryByTestId("budget-hourly-fill")).toBeNull();
// over-budget fill caps at 100%
const fill = screen.getByTestId("budget-monthly-fill") as HTMLElement;
expect(fill.style.width).toBe("100%");
});
});
describe("save", () => {
it("PATCHes budget_limits for all four periods and clears the exceeded banner", async () => {
qGet(makeBudget({ monthly: { limit: 10_000, spend: 3_500 } }));
qPatch(makeBudget({ hourly: { limit: 500, spend: 0 }, monthly: { limit: 20_000, spend: 0 } }));
describe("budget exceeded (402)", () => {
it("shows exceeded banner when load returns 402", async () => {
qGetErr(402, "Payment Required");
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-hourly-input")).toBeTruthy();
});
fireEvent.change(screen.getByTestId("budget-hourly-input"), { target: { value: "500" } });
fireEvent.click(screen.getByTestId("budget-save-btn"));
await vi.waitFor(() => {
expect(vi.mocked(api.patch)).toHaveBeenCalled();
});
const [, body] = vi.mocked(api.patch).mock.calls[0];
expect((body as { budget_limits: Record<string, number | null> }).budget_limits).toMatchObject({
hourly: 500,
monthly: 10_000, // unchanged input echoes the loaded limit
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
expect(screen.getByTestId("budget-exceeded-banner")!.textContent).toContain("Budget exceeded");
});
});
it("shows a save error on non-402 PATCH failure", async () => {
it("clears exceeded banner after successful save", async () => {
qGetErr(402, "Payment Required");
qPatch(makeBudget({ budget_limit: 50_000, budget_used: 0, budget_remaining: 50_000 }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
});
const input = screen.getByTestId("budget-limit-input");
fireEvent.change(input, { target: { value: "50000" } });
const saveBtn = screen.getByTestId("budget-save-btn");
fireEvent.click(saveBtn);
await vi.waitFor(() => {
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull();
});
});
});
describe("save flow", () => {
it("shows save error on non-402 patch failure", async () => {
qGet(makeBudget());
qPatchErr(500, "Internal Server Error");
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-save-btn")).toBeTruthy();
expect(screen.getByTestId("budget-limit-input")).toBeTruthy();
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
const saveBtn = screen.getByTestId("budget-save-btn");
fireEvent.click(saveBtn);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-save-error")).toBeTruthy();
expect(screen.getByTestId("budget-save-error")!.textContent).toContain("500");
});
expect(screen.getByTestId("budget-save-error")!.textContent).toContain("500");
});
it("surfaces the exceeded banner on a 402 PATCH", async () => {
qGet(makeBudget());
qPatchErr(402, "Payment Required");
it("updates input to new limit value after successful save", async () => {
qGet(makeBudget({ budget_limit: 10_000 }));
qPatch(makeBudget({ budget_limit: 20_000 }));
render(<BudgetSection workspaceId={WS_ID} />);
// Wait for the input to appear (loading → loaded)
await vi.waitFor(() => {
expect(screen.getByTestId("budget-save-btn")).toBeTruthy();
expect(screen.queryByTestId("budget-loading")).toBeNull();
});
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
// Debug: check what values are rendered
const limitValue = screen.getByTestId("budget-limit-value")?.textContent;
expect(input.value).toBe("10000"); // initial value from API
expect(limitValue).toBe("10,000");
fireEvent.change(input, { target: { value: "20000" } });
expect(input.value).toBe("20000");
fireEvent.click(screen.getByTestId("budget-save-btn"));
await vi.waitFor(() => {
expect((screen.getByTestId("budget-limit-input") as HTMLInputElement).value).toBe("20000");
});
});
it("sends null when input is cleared (unlimited)", async () => {
qGet(makeBudget({ budget_limit: 10_000 }));
qPatch(makeBudget({ budget_limit: null }));
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-limit-input")).toBeTruthy();
});
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
fireEvent.change(input, { target: { value: "" } });
fireEvent.click(screen.getByTestId("budget-save-btn"));
await vi.waitFor(() => {
// After save with null limit, input should show empty (unlimited)
expect(input.value).toBe("");
});
});
it("shows saving state on button while patch is in flight", async () => {
qGet(makeBudget());
let resolvePatch: (v: unknown) => void;
vi.mocked(api.patch).mockImplementationOnce(
async () => new Promise((r) => { resolvePatch = r as (v: unknown) => void; }),
);
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-limit-input")).toBeTruthy();
});
fireEvent.change(screen.getByTestId("budget-limit-input"), { target: { value: "50000" } });
fireEvent.click(screen.getByTestId("budget-save-btn"));
const btn = screen.getByTestId("budget-save-btn");
expect(btn.textContent).toContain("Saving");
resolvePatch!(makeBudget({ budget_limit: 50_000 }));
await vi.waitFor(() => {
expect(btn.textContent).toContain("Save");
});
});
});
describe("isApiError402 — regression coverage", () => {
it("classifies ': 402' with space as 402", async () => {
qGetErr(402, "Payment Required");
qPatch(makeBudget());
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
});
});
});
describe("legacy payload back-compat", () => {
it("maps a pre-multi-period {budget_limit, monthly_spend} response to the monthly row", async () => {
qGet({ budget_limit: 5_000, monthly_spend: 1_000, budget_remaining: 4_000 });
it("classifies non-402 error messages as regular fetch errors", async () => {
qGetErr(503, "Service Unavailable");
render(<BudgetSection workspaceId={WS_ID} />);
await vi.waitFor(() => {
expect(screen.getByTestId("budget-monthly-limit")!.textContent).toBe("$50.00");
expect(screen.getByTestId("budget-fetch-error")).toBeTruthy();
});
expect(screen.getByTestId("budget-monthly-spend")!.textContent).toBe("$10.00");
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull();
});
});
});
@@ -1,87 +0,0 @@
// @vitest-environment jsdom
//
// Regression: project_canvas_runtime_dropdown_ssot_fix — a google-adk
// workspace's Config tab showed the wrong runtime ("LangGraph (default)"
// / first option) because a hardcoded frontend allowlist
// (SUPPORTED_RUNTIME_VALUES) dropped google-adk from the /templates-derived
// options even though the backend served it. A Save from that state would
// PATCH runtime to the wrong value and break the ADK agent.
//
// The fix: the dropdown is SSOT-driven — it trusts GET /templates (which the
// backend already gates to the manifest maintained set) and hides a runtime
// only when its row carries `displayable: false`. This pins: a google-adk
// workspace shows "google-adk" selected, and a displayable:false template is
// not offered.
import { describe, it, expect, vi, afterEach, beforeEach } from "vitest";
import { render, screen, cleanup, waitFor } from "@testing-library/react";
import React from "react";
afterEach(cleanup);
const apiGet = vi.fn();
const apiPatch = vi.fn();
const apiPut = vi.fn();
vi.mock("@/lib/api", () => ({
api: {
get: (path: string) => apiGet(path),
patch: (path: string, body: unknown) => apiPatch(path, body),
put: (path: string, body: unknown) => apiPut(path, body),
post: vi.fn(),
del: vi.fn(),
},
}));
vi.mock("@/store/canvas", () => ({
useCanvasStore: Object.assign(
(selector: (s: unknown) => unknown) => selector({ restartWorkspace: vi.fn(), updateNodeData: vi.fn() }),
{ getState: () => ({ restartWorkspace: vi.fn(), updateNodeData: vi.fn() }) },
),
}));
vi.mock("../AgentCardSection", () => ({
AgentCardSection: () => <div data-testid="agent-card-stub" />,
}));
import { ConfigTab } from "../ConfigTab";
function wireApi(templates: Array<{ id: string; name?: string; runtime?: string; models?: unknown[]; displayable?: boolean }>) {
apiGet.mockImplementation((path: string) => {
if (path === "/workspaces/ws-adk") return Promise.resolve({ runtime: "google-adk" });
if (path === "/workspaces/ws-adk/model") return Promise.resolve({ model: "vertex:gemini-2.5-pro" });
if (path === "/workspaces/ws-adk/files/config.yaml") return Promise.resolve({ content: "name: adk\nruntime: google-adk\n" });
if (path === "/templates") return Promise.resolve(templates);
return Promise.reject(new Error(`unmocked api.get: ${path}`));
});
}
beforeEach(() => {
apiGet.mockReset();
apiPatch.mockReset();
apiPut.mockReset();
});
describe("ConfigTab — google-adk runtime (SSOT dropdown)", () => {
it("shows google-adk selected in the runtime dropdown (#ssot-fix)", async () => {
wireApi([
{ id: "claude-code", name: "Claude Code", runtime: "claude-code", models: [] },
{ id: "google-adk", name: "Google ADK", runtime: "google-adk", models: [] },
]);
render(<ConfigTab workspaceId="ws-adk" />);
const select = await waitFor(() => screen.getByRole("combobox", { name: /runtime/i }));
expect((select as HTMLSelectElement).value).toBe("google-adk");
const opts = Array.from((select as HTMLSelectElement).options).map((o) => o.value);
expect(opts).toContain("google-adk");
});
it("hides a template flagged displayable:false", async () => {
wireApi([
{ id: "google-adk", name: "Google ADK", runtime: "google-adk", models: [] },
{ id: "legacy", name: "Legacy", runtime: "legacy", models: [], displayable: false },
]);
render(<ConfigTab workspaceId="ws-adk" />);
const select = await waitFor(() => screen.getByRole("combobox", { name: /runtime/i }));
const opts = Array.from((select as HTMLSelectElement).options).map((o) => o.value);
expect(opts).toContain("google-adk");
expect(opts).not.toContain("legacy");
});
});
@@ -1,78 +0,0 @@
// @vitest-environment jsdom
//
// internal#718 P3 (retire-list #5) — the billing-mode the Config tab shows /
// sends must reflect the DERIVED provider per the registry, not the hardcoded
// billingModeForProvider("" | "platform" → platform_managed else byok) rule.
// When the runtime is registry-backed, billingModeForSelectedProvider reads the
// registry-served billing_mode off the provider catalog entry. The hardcoded
// rule remains only as the fallback for non-registry runtimes / older backends.
import { describe, it, expect } from "vitest";
import { billingModeForSelectedProvider, billingModeForProvider } from "../ConfigTab";
import {
buildProviderCatalogFromRegistry,
type RegistryProvider,
type RegistryModel,
} from "../../ProviderModelSelector";
const REGISTRY_PROVIDERS: RegistryProvider[] = [
{ name: "anthropic-oauth", display_name: "Claude Code subscription", auth_env: ["CLAUDE_CODE_OAUTH_TOKEN"], billing_mode: "byok" },
{ name: "platform", display_name: "Platform", auth_env: ["ANTHROPIC_API_KEY"], billing_mode: "platform_managed" },
// DISCRIMINATING fixture (review #7790): a provider whose registry-served
// billing_mode DISAGREES with the hardcoded name-based rule. Its name is not
// "platform"/"" so billingModeForProvider() would call it "byok", yet the
// registry serves "platform_managed" (the federation-ready shape the SSOT is
// built for — a managed provider that isn't literally named "platform").
// billingModeForSelectedProvider MUST return the REGISTRY value here; the
// only way to get "platform_managed" out is to honor the catalog, so this
// case fails if the impl ever regresses to the hardcoded rule.
{ name: "managed-federated", display_name: "Managed (federated)", auth_env: [], billing_mode: "platform_managed" },
];
const REGISTRY_MODELS: RegistryModel[] = [
{ id: "sonnet", provider: "anthropic-oauth", billing_mode: "byok" },
{ id: "anthropic/claude-opus-4-7", provider: "platform", billing_mode: "platform_managed" },
// model bucketed under the disagreeing provider so the catalog builds an
// entry for it (buildProviderCatalogFromRegistry only emits a provider entry
// for providers that own at least one model).
{ id: "managed/some-model", provider: "managed-federated", billing_mode: "platform_managed" },
];
describe("billingModeForSelectedProvider (registry-driven)", () => {
const catalog = buildProviderCatalogFromRegistry(REGISTRY_PROVIDERS, REGISTRY_MODELS);
it("reads platform_managed from the registry for the platform provider", () => {
expect(billingModeForSelectedProvider("platform", catalog)).toBe("platform_managed");
});
it("reads byok from the registry for a BYOK provider", () => {
// anthropic-oauth derives to byok via the REGISTRY. (Note: the hardcoded
// rule would ALSO say byok for this non-'platform' name, so on its own this
// assertion does NOT prove the registry is authoritative — it agrees either
// way. The registry-WINS proof is the disagreement case below.)
expect(billingModeForSelectedProvider("anthropic-oauth", catalog)).toBe("byok");
});
it("lets the registry billing_mode WIN when it disagrees with the hardcoded rule", () => {
// 'managed-federated' is not '' / 'platform', so the legacy name-based rule
// classifies it byok — but the registry serves platform_managed. The
// registry is the SSOT, so billingModeForSelectedProvider must return
// platform_managed. This is the discriminating case: it FAILS if the impl
// regresses to billingModeForProvider (which would return byok here).
expect(billingModeForProvider("managed-federated")).toBe("byok"); // sanity: the rules genuinely disagree
expect(billingModeForSelectedProvider("managed-federated", catalog)).toBe("platform_managed");
});
it("falls back to the hardcoded rule when no registry catalog is supplied", () => {
// Non-registry runtime / older backend → catalog empty/undefined → the
// legacy mapping still applies ('' | 'platform' → platform_managed).
expect(billingModeForSelectedProvider("", undefined)).toBe("platform_managed");
expect(billingModeForSelectedProvider("platform", undefined)).toBe("platform_managed");
expect(billingModeForSelectedProvider("minimax", undefined)).toBe("byok");
});
it("falls back to the hardcoded rule when the provider is not in the registry catalog", () => {
// A provider string the registry catalog doesn't carry (stale saved
// value) → fall back to the legacy rule rather than guessing.
expect(billingModeForSelectedProvider("some-byo-vendor", catalog)).toBe("byok");
});
});
@@ -297,25 +297,6 @@ describe("DetailsTab — delete workflow", () => {
expect(mockSelectNode).toHaveBeenCalledWith(null);
});
// internal#734: checking "also erase saved data" adds &erase_data=true so the
// server prunes the data volume. Default (unchecked) must NOT send it.
it("checking erase-saved-data sends erase_data=true on delete", async () => {
mockApi.del.mockResolvedValue(undefined);
render(<DetailsTab workspaceId="ws-1" data={data()} />);
await flush();
fireEvent.click(screen.getByRole("button", { name: /delete workspace/i }));
await flush();
fireEvent.click(screen.getByRole("checkbox", { name: /erase saved data/i }));
const confirmBtn = Array.from(document.querySelectorAll("button")).find(
(b) => b.textContent === "Confirm Delete",
) as HTMLButtonElement;
fireEvent(confirmBtn, new MouseEvent("click", { bubbles: true }));
await flush();
expect(mockApi.del).toHaveBeenCalledWith("/workspaces/ws-1?confirm=true&erase_data=true", {
headers: { "X-Confirm-Name": "Test Workspace" },
});
});
it("cancelling delete returns to view mode", async () => {
mockApi.del.mockResolvedValue(undefined);
render(<DetailsTab workspaceId="ws-1" data={data()} />);
@@ -2,13 +2,12 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { cleanup, fireEvent, render, screen, waitFor } from "@testing-library/react";
const { mockGet, mockPost, mockRFBConstructor, mockRFBClipboardPasteFrom, mockRFBFocus, rfbInstances } = vi.hoisted(() => ({
const { mockGet, mockPost, mockRFBConstructor, mockRFBClipboardPasteFrom, mockRFBFocus } = vi.hoisted(() => ({
mockGet: vi.fn(),
mockPost: vi.fn(),
mockRFBConstructor: vi.fn(),
mockRFBClipboardPasteFrom: vi.fn(),
mockRFBFocus: vi.fn(),
rfbInstances: [] as EventTarget[],
}));
vi.mock("@/lib/api", () => ({
@@ -32,7 +31,6 @@ vi.mock("@novnc/novnc", () => ({
this.url = url;
this.options = options;
mockRFBConstructor(target, url, options);
rfbInstances.push(this);
}
clipboardPasteFrom(text: string) {
mockRFBClipboardPasteFrom(text);
@@ -54,7 +52,6 @@ describe("DisplayTab", () => {
mockRFBConstructor.mockReset();
mockRFBClipboardPasteFrom.mockReset();
mockRFBFocus.mockReset();
rfbInstances.length = 0;
});
it("renders unavailable state for non-display workspaces", async () => {
@@ -403,62 +400,6 @@ describe("DisplayTab", () => {
});
expect(screen.getByRole("button", { name: "Take control" })).toBeTruthy();
});
it("auto-reconnects the desktop stream after an unclean disconnect but not a clean one", async () => {
mockGet
.mockResolvedValueOnce({
available: true,
mode: "desktop-control",
protocol: "novnc",
width: 1920,
height: 1080,
})
.mockResolvedValueOnce({ controller: "none" });
// Initial acquire returns token "signed"; the reconnect re-acquire mints a
// FRESH token "signed2" (the lock/token is only ~300s — reconnecting with a
// cached, possibly-expired token would 401 and never recover).
mockPost
.mockResolvedValueOnce({
controller: "user",
controlled_by: "admin-token",
expires_at: "2026-05-23T08:48:27Z",
session_url: "/workspaces/ws-display/display/session/websockify#token=signed",
})
.mockResolvedValue({
controller: "user",
controlled_by: "admin-token",
expires_at: "2026-05-23T08:53:27Z",
session_url: "/workspaces/ws-display/display/session/websockify#token=signed2",
});
render(<DisplayTab workspaceId="ws-display" />);
await waitFor(() => {
expect(screen.getByRole("button", { name: "Take control" })).toBeTruthy();
});
fireEvent.click(screen.getByRole("button", { name: "Take control" }));
await waitFor(() => {
expect(rfbInstances.length).toBe(1);
});
expect(mockRFBConstructor.mock.calls[0][2].wsProtocols).toContain("molecule-display-token.signed");
// An idle/network drop closes the websocket uncleanly. The client must
// re-acquire a fresh token and reconnect instead of giving up — this is the
// "disconnects every ~5 min and stays dead" report.
rfbInstances[0].dispatchEvent(new CustomEvent("disconnect", { detail: { clean: false } }));
await waitFor(
() => {
expect(rfbInstances.length).toBe(2);
},
{ timeout: 3000 },
);
// Reconnect dialed with the FRESH token, not the stale original.
expect(mockRFBConstructor.mock.calls[1][2].wsProtocols).toContain("molecule-display-token.signed2");
// A clean disconnect (the user released control) must NOT reconnect.
rfbInstances[1].dispatchEvent(new CustomEvent("disconnect", { detail: { clean: true } }));
await new Promise((resolve) => setTimeout(resolve, 1100));
expect(rfbInstances.length).toBe(2);
});
});
function deferred<T>() {
-1
View File
@@ -5,7 +5,6 @@
const RUNTIME_NAMES: Record<string, string> = {
"claude-code": "Claude Code",
codex: "Codex",
"google-adk": "Google ADK",
hermes: "Hermes",
openclaw: "OpenClaw",
kimi: "Kimi",
-3
View File
@@ -368,9 +368,6 @@ export interface WorkspaceCompute {
width?: number;
height?: number;
};
// internal#734: per-workspace durable-data choice. "persist" | "ephemeral" |
// undefined (auto). Controls whether the data volume survives recreate.
data_persistence?: string;
}
let socket: ReconnectingSocket | null = null;
+8 -25
View File
@@ -159,28 +159,15 @@ services:
# --- Canvas ---
canvas:
# The publish-canvas-image CI workflow runs an ORDERED deploy (core#2226):
# build → push :staging-<sha> + :staging-latest → (after green main CI)
# re-point :latest to the verified :staging-<sha> by digest. So both tags
# below resolve to a CI-green, reproducible build, never a raw/red one.
#
# Reproducible deploy: pin CANVAS_IMAGE_TAG to the immutable per-commit tag
# the ordered deploy produced, e.g.
# CANVAS_IMAGE_TAG=staging-<sha> docker compose pull canvas && docker compose up -d canvas
# This makes a tenant/host deploy reproducible (resolves the standing
# `TODO: pin canvas ECR image digest`). Unset it and the default `latest`
# is the prod-blessed tag the ordered deploy keeps pointed at the last
# green build — still deterministic vs. the old raw `:latest`.
#
# To pin by content digest instead of tag (fully immutable):
# aws ecr describe-images --repository-name molecule-ai/canvas \
# --image-tags staging-<sha> --region us-east-2 \
# --query 'imageDetails[0].imageDigest' --output text
# then set CANVAS_IMAGE_TAG=staging-<sha>@<digest> (compose passes it through).
#
# The publish-canvas-image CI workflow pushes a fresh image to GHCR on
# every canvas/** merge to main. To update the running container:
# docker compose pull canvas && docker compose up -d canvas
# First-time local setup or testing unreleased changes — build from source:
# docker compose build canvas && docker compose up -d canvas
# Note: ECR images require AWS auth — `aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 153263036946.dkr.ecr.us-east-2.amazonaws.com` before pull.
# Local dev keeps working via the `build:` context below (docker compose build canvas).
image: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas:${CANVAS_IMAGE_TAG:-latest}
# Digest-pin requires: aws ecr describe-images --repository-name molecule-ai/canvas --image-tags latest --query 'imageDetails[0].imageDigest'
# TODO: pin canvas ECR image digest once AWS creds are available in CI.
image: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas:latest
build:
context: ./canvas
dockerfile: Dockerfile
@@ -188,10 +175,6 @@ services:
NEXT_PUBLIC_PLATFORM_URL: ${NEXT_PUBLIC_PLATFORM_URL:-http://localhost:${PLATFORM_PUBLISH_PORT:-8080}}
NEXT_PUBLIC_WS_URL: ${NEXT_PUBLIC_WS_URL:-ws://localhost:${PLATFORM_PUBLISH_PORT:-8080}/ws}
NEXT_PUBLIC_ADMIN_TOKEN: ${ADMIN_TOKEN:-}
# SHA surfaced at /api/buildinfo (core#2235). CI passes the real merge
# SHA via the publish-canvas-image workflow build-args; local compose
# builds default to "dev" (the route's unwired sentinel).
BUILD_SHA: ${BUILD_SHA:-dev}
depends_on:
platform:
condition: service_healthy
+6 -6
View File
@@ -1,7 +1,7 @@
# Molecule AI — Comprehensive Technical Documentation
> Definitive technical reference for the Molecule AI Agent Team platform.
> Based on a full non-invasive scan of the [molecule-core](https://git.moleculesai.app/molecule-ai/molecule-core) repository.
> Based on a full non-invasive scan of the [molecule-monorepo](https://git.moleculesai.app/molecule-ai/molecule-monorepo) repository.
---
@@ -1131,11 +1131,11 @@ Molecule AI's workspace abstraction is **runtime-agnostic by design**. A workspa
## Links
- **GitHub**: https://git.moleculesai.app/molecule-ai/molecule-core
- **Architecture Docs**: https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/architecture
- **API Protocol**: https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/api-protocol
- **Agent Runtime**: https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/agent-runtime
- **Product Docs**: https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/product
- **GitHub**: https://git.moleculesai.app/molecule-ai/molecule-monorepo
- **Architecture Docs**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/architecture
- **API Protocol**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/api-protocol
- **Agent Runtime**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/agent-runtime
- **Product Docs**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/product
---
+1 -1
View File
@@ -114,7 +114,7 @@ Opt-in pattern: when `idle_prompt` is non-empty in `config.yaml`, the workspace
Three Gin middleware classes gate server-side routes. Full contract in `docs/runbooks/admin-auth.md`.
- **`middleware.AdminAuth(db.DB)`** — strict bearer-only and **fail-closed in every environment** (harden/no-fail-open-auth). Used for any route where a forged request could leak prompts/memory, create/mutate workspaces, or leak ops intel. The former lazy-bootstrap fail-open (pass when `HasAnyLiveTokenGlobal` returns 0) and the dev-mode escape hatch have both been removed — a fresh install must provision `ADMIN_TOKEN` to reach admin routes.
- **`middleware.AdminAuth(db.DB)`** — strict bearer-only. Used for any route where a forged request could leak prompts/memory, create/mutate workspaces, or leak ops intel. Lazy-bootstrap fail-open when `HasAnyLiveTokenGlobal` returns 0.
- **`middleware.CanvasOrBearer(db.DB)`** — accepts a bearer token OR an Origin matching `CORS_ORIGINS`. Used **only** for cosmetic routes where a forged request has zero data/security impact. Currently only on `PUT /canvas/viewport`. Do not extend this to any route that leaks data or creates resources — see the runbook.
- **`middleware.WorkspaceAuth(db.DB)`** — binds a bearer token to `:id`. Workspace A's token cannot hit workspace B's sub-routes. Used for the entire `/workspaces/:id/*` group except the A2A proxy (which has its own `CanCommunicate` layer).
+1 -1
View File
@@ -82,7 +82,7 @@ DATABASE_URL=postgres://dev:dev@postgres:5432/molecule?sslmode=prefer
REDIS_URL=redis://redis:6379
PORT=8080
SECRETS_ENCRYPTION_KEY=dev-key-change-in-production
WORKSPACE_DIR=/path/to/molecule-core # Optional global fallback; prefer per-workspace workspace_dir in org.yaml or API
WORKSPACE_DIR=/path/to/molecule-monorepo # Optional global fallback; prefer per-workspace workspace_dir in org.yaml or API
```
### Canvas (Next.js)
+5 -3
View File
@@ -16,9 +16,11 @@ workspace container running on it) over an [EC2 Instance Connect
Endpoint](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-setup-ec2-instance-connect-endpoint.html).
End users see a terminal; no direct public SSH ingress is required.
Tracking: originally `molecule-core#1528` (resolved 2026-04-22). Future
terminal work is tracked in `molecule-core` issues (workspace-server scope)
and in `molecule-controlplane` issues for the EIC / per-tenant SG path.
Tracking: originally `molecule-core#1528` (resolved 2026-04-22). The
`molecule-core` repo has since been renamed to `molecule-monorepo` and no
longer accepts new issues under the old name; future terminal work is
tracked in `molecule-monorepo` issues (workspace-server scope) and in
`molecule-controlplane` issues for the EIC / per-tenant SG path.
## Where things are
+1 -1
View File
@@ -64,7 +64,7 @@ When opencode connects to the Molecule MCP endpoint, the agent gains access to:
"tool": "delegate_task",
"arguments": {
"target": "research-lead",
"task": "Summarise the last 7 days of commits in Molecule-AI/molecule-core"
"task": "Summarise the last 7 days of commits in Molecule-AI/molecule-monorepo"
}
}
```
+7 -7
View File
@@ -1,6 +1,6 @@
# Internal content policy
The `Molecule-AI/molecule-core` repo is **public**. Anything internal
The `Molecule-AI/molecule-monorepo` repo is **public**. Anything internal
(positioning, competitive briefs, sales playbooks, PMM/press drip, draft
campaigns, raw research notes, ops runbooks, retrospectives) lives in
**`Molecule-AI/internal`**.
@@ -18,14 +18,14 @@ This page is the canonical decision tree.
| Draft campaign asset (still iterating, not yet customer-visible) | `Molecule-AI/internal/marketing/campaigns/` |
| Roadmap discussion, planning doc, retrospective | `Molecule-AI/internal/PLAN.md` or `Molecule-AI/internal/retrospectives/` |
| Runbook, ops procedure, incident postmortem | `Molecule-AI/internal/runbooks/` |
| **Public-ready** blog post (final draft, ready to ship to docs site) | `Molecule-AI/molecule-core/docs/blog/` |
| **Public-ready** tutorial / quickstart | `Molecule-AI/molecule-core/docs/tutorials/` |
| Public DevRel content (code samples, demos for users) | `Molecule-AI/molecule-core/docs/devrel/` |
| API reference, architecture docs for external developers | `Molecule-AI/molecule-core/docs/api/` |
| **Public-ready** blog post (final draft, ready to ship to docs site) | `Molecule-AI/molecule-monorepo/docs/blog/` |
| **Public-ready** tutorial / quickstart | `Molecule-AI/molecule-monorepo/docs/tutorials/` |
| Public DevRel content (code samples, demos for users) | `Molecule-AI/molecule-monorepo/docs/devrel/` |
| API reference, architecture docs for external developers | `Molecule-AI/molecule-monorepo/docs/api/` |
| Code, tests, infrastructure | wherever is appropriate inside this repo |
**Rule of thumb:** *"Would I be comfortable if a competitor / journalist / customer
read this verbatim today?"* — yes → `molecule-core/docs/`. No / not yet → `internal/`.
read this verbatim today?"* — yes → `monorepo/docs/`. No / not yet → `internal/`.
## Why
@@ -82,7 +82,7 @@ git push -u origin HEAD
gh pr create --base main --fill
```
Yes, this is more steps than `cd molecule-core && git add research/foo.md`.
Yes, this is more steps than `cd molecule-monorepo && git add research/foo.md`.
That cost is intentional: the friction is the point. Public space and
internal space are different products with different audiences and
different durability guarantees.
+7 -13
View File
@@ -17,14 +17,14 @@ This path is aligned to the current repository and current UI. It gets you from
## The one-command path
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
cd molecule-core
git clone https://git.moleculesai.app/molecule-ai/molecule-monorepo.git
cd molecule-monorepo
./scripts/dev-start.sh
```
That single script:
1. Generates an `ADMIN_TOKEN` into `.env` (first run only — preserved on re-runs) and exports the matching `NEXT_PUBLIC_ADMIN_TOKEN` so the canvas authenticates with it. Auth is **fail-closed in every environment** (including local dev) — there is no dev-mode fail-open; the canvas reaches admin/workspace routes only because it sends this bearer.
1. Generates an `ADMIN_TOKEN` into `.env` (first run only — preserved on re-runs)
2. Brings up Postgres, Redis, Langfuse, ClickHouse, and Temporal via `infra/scripts/setup.sh`
3. Populates the workspace template + plugin registry from `manifest.json`
4. Builds and starts the platform on `http://localhost:8080`
@@ -42,8 +42,8 @@ If you'd rather run each component yourself — useful when you're iterating on
### Step 1: Clone the repository
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
cd molecule-core
git clone https://git.moleculesai.app/molecule-ai/molecule-monorepo.git
cd molecule-monorepo
```
### Step 2: Start the shared infrastructure
@@ -62,17 +62,11 @@ If you only want the raw compose flow:
docker compose -f docker-compose.infra.yml up -d
```
> **Auth is fail-closed even in local dev.** Pick any local admin token and
> set it on *both* sides — the platform (`ADMIN_TOKEN`) and the canvas
> (`NEXT_PUBLIC_ADMIN_TOKEN`, same value). Without it the canvas 401s on every
> admin/workspace call. (`scripts/dev-start.sh` does this for you; the manual
> steps below set it explicitly.)
### Step 3: Start the platform
```bash
cd workspace-server
ADMIN_TOKEN=dev-local-admin-token MOLECULE_ENV=development go run ./cmd/server
go run ./cmd/server
```
The control plane listens on `http://localhost:8080`.
@@ -84,7 +78,7 @@ In a new terminal:
```bash
cd canvas
npm install
NEXT_PUBLIC_ADMIN_TOKEN=dev-local-admin-token npm run dev # MUST match ADMIN_TOKEN above
npm run dev
```
Open `http://localhost:3000`.
+4 -32
View File
@@ -1,29 +1,5 @@
# Admin Authentication Runbook
## Auth is fail-CLOSED in every environment — `ADMIN_TOKEN` is the bootstrap credential
Per the CTO "nothing should be fail-open" directive, **every** auth path on the
workspace-server fails closed — there is no dev-mode / zero-token / DB-outage
hatch that grants access. This includes:
- `AdminAuth` and `WorkspaceAuth` (admin + per-workspace routes),
- `CanvasOrBearer` (the cosmetic `PUT /canvas/viewport` route), and
- `validateDiscoveryCaller` (`/registry/:id/peers`, `/registry/discover/:id`).
Consequence for **bootstrap**: a brand-new self-hosted / dev install has **no
DB-backed tokens yet**, and there is no longer a fail-open that lets the first
request through. The **only** way to reach admin routes (and to mint the first
workspace token via `POST /admin/workspaces/:id/tokens`) is to set `ADMIN_TOKEN`
in the platform environment and present it as the bearer. This is the "local
mimics production" principle: there is no zero-config bootstrap.
- **Local dev:** `scripts/dev-start.sh` provisions a deterministic
`ADMIN_TOKEN` into `.env` (and exports the matching `NEXT_PUBLIC_ADMIN_TOKEN`
so the canvas authenticates with it). See `docs/quickstart.md`.
- **Self-hosted / SaaS:** set `ADMIN_TOKEN` to a strong random secret
(`openssl rand -base64 32`) in the platform env and bake the matching
`NEXT_PUBLIC_ADMIN_TOKEN` into the canvas bundle.
## Required: set `MOLECULE_ENV` in all non-dev environments
```bash
@@ -31,10 +7,8 @@ mimics production" principle: there is no zero-config bootstrap.
MOLECULE_ENV=production
```
This matches the production tenant default. NOTE: `MOLECULE_ENV` no longer gates
any auth decision — it only drives NON-security local-dev conveniences (loopback
bind, relaxed rate limit). Setting it to `dev`/`development` does **not** relax
authentication. Staging and production smoke tests should use the real user/API
This matches the production tenant default and disables development-only
shortcuts. Staging and production smoke tests should use the real user/API
workflow: create a workspace, then mint a one-time displayed workspace bearer
with `POST /admin/workspaces/:id/tokens`.
@@ -49,7 +23,5 @@ The platform uses `ADMIN_TOKEN` as the bearer credential for admin-gated endpoin
| `POST /org/import` | `Authorization: Bearer <ADMIN_TOKEN>` |
| `POST /admin/workspaces/:id/tokens` | `Authorization: Bearer <ADMIN_TOKEN>`; plaintext token returned once |
Missing or invalid bearer → **401 in every environment** (fail-closed; no
dev-mode fail-open). If the auth datastore is unreachable, auth-gated routes
return **503** (`platform_unavailable`) — an availability tradeoff that grants no
access — rather than allowing the request through.
Missing or invalid `ADMIN_TOKEN` → AdminAuth fails open in dev mode (no token set), or
returns 401 in production mode (token set but invalid).
@@ -1,124 +0,0 @@
# Engineer-Agent Gitea Token Scope Runbook
## Symptom
Engineer-class agents (e.g. `agent-dev-a`, `agent-dev-b`) fail swarm-pull issue discovery or receive HTTP 403 when calling Gitea issue-list APIs, while PR review and repository API operations continue to work.
Typical failing call:
```bash
GET /api/v1/repos/molecule-ai/molecule-core/issues?state=open&labels=approved&limit=50
# => 403 Forbidden
```
Typical working calls (same token):
```bash
GET /api/v1/repos/molecule-ai/molecule-core/pulls?state=open&limit=50
POST /api/v1/repos/molecule-ai/molecule-core/pulls/1666/comments
# => 200 OK
```
## Root Cause
Gitea v1.22.6 routes issue-list under the `Issue` scope category (`routers/api/v1/api.go:1379-1491`), while PR routes live under repository/pull routing (`api.go:1278-1305`). The scope gate derives required read/write level from HTTP method (`api.go:309-313`), so `GET /issues?...` requires `read:issue`.
Engineer-class agent PATs were provisioned with repository and PR scopes but without `read:issue`, causing the asymmetric 403.
## Detection
1. **Agent-side**: swarm-pull workflow logs show `403 Forbidden` on issue enumeration but not on PR list/review.
2. **Platform-side**: Gitea access logs show `GET /repos/{owner}/{repo}/issues` returning 403 for the affected token.
3. **Reproduction** (from any workspace with a suspected token):
```bash
TOKEN=$(cat /configs/secrets.d/GITEA_TOKEN)
PLATFORM="https://git.moleculesai.app"
# Should succeed — confirms token is live
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: token $TOKEN" \
"$PLATFORM/api/v1/user"
# Will 403 if the token lacks read:issue
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: token $TOKEN" \
"$PLATFORM/api/v1/repos/molecule-ai/molecule-core/issues?state=open&limit=1"
```
## Immediate Fix
### Step 1: Issue fresh PATs with correct scopes
From a Gitea site-admin account (or via the Gitea web UI → Settings → Applications):
1. Navigate to the affected user's profile (e.g. `agent-dev-a`).
2. Go to **Settings → Applications → Generate New Token**.
3. Select scopes:
- `read:repository` (existing)
- `write:repository` (existing, if push is required)
- `read:issue` (**add this**)
- `write:issue` (add only if agents must comment/edit issues)
- `read:pull-request` / `write:pull-request` (existing)
- `read:comment` / `write:comment` (existing, if PR review is required)
4. Copy the plaintext token immediately — it is shown only once.
### Step 2: Update workspace secrets
For each affected engineer workspace, update the Gitea token secret:
```bash
# Via the platform API (admin auth required)
PLATFORM="https://agents-team.moleculesai.app"
ADMIN_TOKEN="<your-admin-token>"
WORKSPACE_ID="<affected-workspace-id>"
NEW_GITEA_TOKEN="<fresh-token-from-step-1>"
curl -X POST "$PLATFORM/workspaces/$WORKSPACE_ID/secrets" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"GITEA_TOKEN\": \"$NEW_GITEA_TOKEN\"
}"
```
Restart the workspace so the runtime re-reads secrets:
```bash
curl -X POST "$PLATFORM/workspaces/$WORKSPACE_ID/restart" \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Step 3: Smoke-test
From the restarted workspace, verify all three paths:
```bash
# 1. Issue list (the previously failing path)
curl -s -H "Authorization: token $GITEA_TOKEN" \
"https://git.moleculesai.app/api/v1/repos/molecule-ai/molecule-core/issues?state=open&labels=approved&limit=1" | jq '.[0].number'
# 2. PR list (should still work)
curl -s -H "Authorization: token $GITEA_TOKEN" \
"https://git.moleculesai.app/api/v1/repos/molecule-ai/molecule-core/pulls?state=open&limit=1" | jq '.[0].number'
# 3. Swarm-pull discovery (end-to-end)
# Trigger the agent's autonomous tick or delegate a task that enumerates open issues.
```
## Long-Term Fix
Update the **workspace secret injection path** that writes `/configs/secrets.d/GITEA_TOKEN` for engineer-class agents. The provisioning template or secret-distribution job should request `read:issue` (and optionally `write:issue`) at token-creation time.
File locations to audit:
- `.gitea/scripts/` — any token-provisioning automation
- `infra/terraform/` or equivalent — IAM/secret-manager templates
- `workspace-configs-templates/` — engineer-class workspace templates that declare required secrets
## Prevention
1. **Token scope checklist**: when provisioning new engineer-class agent tokens, verify the scope set includes `read:issue` before distributing the secret.
2. **Monitoring**: add an agent health-check that probes `GET /repos/molecule-ai/molecule-core/issues?limit=1` and surfaces a non-fatal warning if it returns 403.
3. **Documentation**: update the onboarding runbook for new engineer agents to include the full required scope list.
## References
- Gitea issue #1750: [RCA: engineer-token read:issue scope gap blocks swarm-pull workflow](https://git.moleculesai.app/molecule-ai/molecule-core/issues/1750)
- Gitea source: `routers/api/v1/api.go:309-313` (scope gate), `api.go:1278-1305` (PR routing), `api.go:1379-1491` (issue routing)
- Related: PR #1542 (provisioner git-creds injection), PR #1669 (auth_token inline mint)
-11
View File
@@ -1,16 +1,5 @@
# Running a Gemini CLI Workspace on Molecule AI
> **⚠️ Accuracy correction (2026-05-29):** this page is **aspirational, not
> shipped.** There is **no `gemini-cli` runtime** in `manifest.json` or the
> provisioner's `knownRuntimes`, and the "PR #379" cited below is unrelated (a
> CI-workflow-cleanup PR, not a gemini-cli adapter). Do not follow this as-is.
>
> **For Gemini on Molecule, use the real `google-adk` runtime instead** — see
> [`google-adk-runtime.md`](./google-adk-runtime.md) (ADK engine + Gemini on
> Vertex AI/AI Studio), implemented in PR
> [`molecule-ai-workspace-template-google-adk#1`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-google-adk) per RFC `internal#730`.
> This gemini-cli page is retained only until it's either implemented for real or removed.
Molecule AI now ships a `gemini-cli` runtime adapter alongside the existing `claude-code` adapter. This tutorial walks you from zero to a running Gemini agent workspace in under five minutes.
## What you'll need

Some files were not shown because too many files have changed in this diff Show More