Compare commits

..

2 Commits

Author SHA1 Message Date
Molecule AI Dev Engineer A (Kimi) 4fee8b9d86 docs(readme): remove duplicate trailing heading
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 8s
Check migration collisions / Migration version collision check (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 26s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 34s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 55s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
Harness Replays / detect-changes (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m21s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m11s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 5m7s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m24s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m5s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
qa-review / approved (pull_request) Successful in 4s
security-review / approved (pull_request) Failing after 4s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 2s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m23s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 8m11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
The PR appended a redundant # molecule-core heading after the license
section. The README already contains a full Quick Start section and the
hero header; the trailing duplicate served no purpose and broke document
structure.

Addresses review feedback on PR #1837.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 11:51:30 +00:00
Molecule AI Dev Engineer B (MiniMax) 6d802abcd1 docs: add quick-start context to README
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Has been skipped
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
Check migration collisions / Migration version collision check (pull_request) Successful in 9s
CI / Detect changes (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 5s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
CI / all-required (pull_request) Successful in 45s
Harness Replays / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 17s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 8s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 38s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Bypassed by agent-dev-a
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 53s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Bypassed by agent-dev-a
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m18s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m20s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m25s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m25s
sop-tier-check / tier-check (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 1s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 56s
gate-check-v3 / gate-check (pull_request) Bypassed by agent-dev-a
sop-checklist / na-declarations (pull_request) Bypassed by agent-dev-a
qa-review / approved (pull_request) Bypassed by agent-dev-a
security-review / approved (pull_request) Bypassed by agent-dev-a
sop-checklist / approved (pull_request) Bypassed by agent-dev-a
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 5m1s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m21s
CI / Platform (Go) (pull_request) Has been cancelled
CI / Canvas (Next.js) (pull_request) Has been cancelled
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
No production code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 07:38:11 +00:00
356 changed files with 4781 additions and 30543 deletions
+1 -1
View File
@@ -51,7 +51,7 @@ MOLECULE_ENV=development # Environment label (development/
# MOLECULE_IN_DOCKER= # Set when running the platform inside Docker (accepts 1/0, true/false). Triggers A2A proxy to rewrite 127.0.0.1:<port> agent URLs to Docker bridge hostnames. Auto-detected via /.dockerenv; only set if detection fails or to force off.
# GitHub
# GITHUB_REPO=owner/repo # Target repo for agent initial_prompt clone (e.g. Molecule-AI/molecule-core). Read inside workspace containers.
# GITHUB_REPO=owner/repo # Target repo for agent initial_prompt clone (e.g. Molecule-AI/molecule-monorepo). Read inside workspace containers.
# GITHUB_TOKEN= # Personal access token / installation token used by agents that clone private repos. Register as a global secret via POST /admin/secrets for propagation to workspace env. Token is used in-URL during clone and then scrubbed from .git/config via `git remote set-url`.
# Webhooks
+12 -28
View File
@@ -18,24 +18,15 @@
# per §SOP-6 security model). No-op when merged=false.
#
# Required env (set by the workflow):
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER
# plus one of REQUIRED_CHECKS_JSON (preferred) or REQUIRED_CHECKS (legacy)
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER, REQUIRED_CHECKS
#
# REQUIRED_CHECKS_JSON is a JSON object keyed by branch name. Each value
# is an array of status-check context names that branch protection
# requires for that branch. The script looks up the PR's base branch and
# evaluates only the checks declared for that branch.
#
# {"main": ["CI / all-required (pull_request)", ...],
# "staging": ["CI / all-required (pull_request)", ...]}
#
# REQUIRED_CHECKS (legacy) is a newline-separated list used when the
# JSON variable is not set. Declared in the workflow YAML rather than
# fetched from /branch_protections (which needs admin scope — sop-tier-bot
# has read-only). Trade dynamism for simplicity: when the required-check
# set changes, update both branch protection AND this env. Keeping them
# in sync is less complexity than granting the audit bot admin perms on
# every repo.
# REQUIRED_CHECKS is a newline-separated list of status-check context
# names that branch protection requires. Declared in the workflow YAML
# rather than fetched from /branch_protections (which needs admin
# scope — sop-tier-bot has read-only). Trade dynamism for simplicity:
# when the required-check set changes, update both branch protection
# AND this env. Keeping them in sync is less complexity than granting
# the audit bot admin perms on every repo.
set -euo pipefail
@@ -43,10 +34,7 @@ set -euo pipefail
: "${GITEA_HOST:?required}"
: "${REPO:?required}"
: "${PR_NUMBER:?required}"
if [ -z "${REQUIRED_CHECKS_JSON:-}" ] && [ -z "${REQUIRED_CHECKS:-}" ]; then
echo "::error::Either REQUIRED_CHECKS_JSON or REQUIRED_CHECKS must be set"
exit 1
fi
: "${REQUIRED_CHECKS:?required (newline-separated context names)}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
@@ -77,14 +65,10 @@ if [ -z "$MERGE_SHA" ]; then
exit 0
fi
# 2. Required status checks — branch-aware JSON dict takes precedence.
if [ -n "${REQUIRED_CHECKS_JSON:-}" ]; then
REQUIRED=$(echo "$REQUIRED_CHECKS_JSON" | jq -r --arg branch "$BASE_BRANCH" '.[$branch] // [] | .[]')
else
REQUIRED="$REQUIRED_CHECKS"
fi
# 2. Required status checks declared in the workflow env.
REQUIRED="$REQUIRED_CHECKS"
if [ -z "${REQUIRED//[[:space:]]/}" ]; then
echo "::notice::REQUIRED_CHECKS empty for branch '$BASE_BRANCH' — force-merge not applicable."
echo "::notice::REQUIRED_CHECKS empty — force-merge not applicable."
exit 0
fi
+14 -28
View File
@@ -274,8 +274,7 @@ def required_checks_env(audit_doc: dict) -> set[str]:
found.append(v)
if not found:
sys.stderr.write(
f"::error::REQUIRED_CHECKS env not found in any step of "
f"{AUDIT_WORKFLOW_PATH}\n"
f"::error::REQUIRED_CHECKS env not found in any step of {AUDIT_WORKFLOW_PATH}\n"
)
sys.exit(3)
if len(found) > 1:
@@ -385,15 +384,10 @@ def detect_drift(branch: str) -> tuple[list[str], dict]:
contexts = set(protection.get("status_check_contexts") or [])
# ----- F1: job exists in CI but not under sentinel.needs -----
# Post-#1766 contract: the sentinel may deliberately have no `needs:`
# and instead poll path-relevant statuses dynamically. In that case
# F1 is a false positive — skip it. F1b (typos in existing needs)
# is naturally skipped when needs is empty.
missing_from_needs = sorted(jobs - needs)
if missing_from_needs and needs:
if missing_from_needs:
findings.append(
"F1 — jobs in ci.yml NOT under sentinel `needs:` "
"(sentinel doesn't gate them):\n"
"F1 — jobs in ci.yml NOT under sentinel `needs:` (sentinel doesn't gate them):\n"
+ "\n".join(f" - {n}" for n in missing_from_needs)
)
@@ -403,8 +397,7 @@ def detect_drift(branch: str) -> tuple[list[str], dict]:
stale_needs = sorted(needs - jobs_all)
if stale_needs:
findings.append(
"F1b — sentinel `needs:` lists jobs NOT present in ci.yml "
"(typo or removed job):\n"
"F1b — sentinel `needs:` lists jobs NOT present in ci.yml (typo or removed job):\n"
+ "\n".join(f" - {n}" for n in stale_needs)
)
@@ -412,9 +405,7 @@ def detect_drift(branch: str) -> tuple[list[str], dict]:
# Compute the contexts the CI YAML actually produces. The sentinel
# is in (B) intentionally (`ci / all-required (pull_request)`); we
# whitelist it explicitly.
emitted_contexts = {
expected_context(j) for j in jobs
} | {expected_context(SENTINEL_JOB)}
emitted_contexts = {expected_context(j) for j in jobs} | {expected_context(SENTINEL_JOB)}
# Contexts NOT produced by ci.yml may still come from other
# workflows in the repo (Secret scan etc). We can't enumerate
# every workflow's emissions cheaply; instead, flag only contexts
@@ -427,9 +418,8 @@ def detect_drift(branch: str) -> tuple[list[str], dict]:
)
if stale_protection:
findings.append(
"F2 — protection `status_check_contexts` entries with `ci / ` "
"prefix that NO job in ci.yml emits "
"(stale name → silent advisory gate):\n"
"F2 — protection `status_check_contexts` entries with `ci / ` prefix that NO "
"job in ci.yml emits (stale name → silent advisory gate):\n"
+ "\n".join(f" - {c}" for c in stale_protection)
)
@@ -504,8 +494,7 @@ def render_body(branch: str, findings: list[str], debug: dict) -> str:
f"# Drift detected on `{REPO}/{branch}`",
"",
"Auto-filed by `.gitea/workflows/ci-required-drift.yml` "
"(RFC [internal#219]"
"(https://git.moleculesai.app/molecule-ai/internal/issues/219) §4 + §6).",
"(RFC [internal#219](https://git.moleculesai.app/molecule-ai/internal/issues/219) §4 + §6).",
"",
"## Findings",
"",
@@ -516,11 +505,8 @@ def render_body(branch: str, findings: list[str], debug: dict) -> str:
"",
"## Resolution",
"",
"- **F1 / F1b**: if the sentinel job has a `needs:` block, add "
"the missing job to it in `.gitea/workflows/ci.yml`, or remove "
"the stale entry. If the sentinel deliberately has no `needs:` "
"(path-aware polling sentinel per post-#1766 contract), this "
"finding is expected and F1 is skipped.",
"- **F1 / F1b**: add the missing job to `all-required.needs:` "
"in `.gitea/workflows/ci.yml`, or remove the stale entry.",
"- **F2**: rename the protection context to match an emitter, "
"or remove it from `status_check_contexts` "
"(PATCH `/api/v1/repos/{owner}/{repo}/branch_protections/{branch}`).",
@@ -561,12 +547,12 @@ def file_or_update(
if dry_run:
print(f"::notice::[dry-run] would file/update drift issue for {branch}")
print("::group::[dry-run] title")
print(f"::group::[dry-run] title")
print(title)
print("::endgroup::")
print("::group::[dry-run] body")
print(f"::endgroup::")
print(f"::group::[dry-run] body")
print(body)
print("::endgroup::")
print(f"::endgroup::")
return
existing = find_open_issue(title)
+2 -4
View File
@@ -15,6 +15,7 @@ import subprocess
import sys
from pathlib import Path
PROFILES: dict[str, dict[str, str]] = {
"ci": {
"platform": r"^workspace-server/",
@@ -152,10 +153,7 @@ def parse_args(argv: list[str]) -> argparse.Namespace:
parser.add_argument("--event-name", default=os.environ.get("GITHUB_EVENT_NAME", ""))
parser.add_argument("--pr-base-sha", default="")
parser.add_argument("--base-ref", default="")
parser.add_argument(
"--push-before",
default=os.environ.get("GITHUB_EVENT_BEFORE", ""),
)
parser.add_argument("--push-before", default=os.environ.get("GITHUB_EVENT_BEFORE", ""))
return parser.parse_args(argv)
+1 -3
View File
@@ -183,9 +183,7 @@ def required_contexts_green(
status = latest_statuses.get(context)
state = status_state(status or {})
if state != "success":
if pr_labels and _is_tier_low_pending_ok(
latest_statuses, context, pr_labels
):
if pr_labels and _is_tier_low_pending_ok(latest_statuses, context, pr_labels):
continue # tier:low soft-fail: accept pending sop-checklist
missing_or_bad.append(f"{context}={state or 'missing'}")
return not missing_or_bad, missing_or_bad
@@ -13,9 +13,11 @@ from __future__ import annotations
import argparse
import glob
import re
import sys
from pathlib import Path
from typing import NamedTuple
SELF = ".gitea/workflows/lint-curl-status-capture.yml"
+1 -1
View File
@@ -283,7 +283,7 @@ def _ensure_labels(repo: str, names: list[str]) -> list[int]:
if status != "ok" or not isinstance(labels, list):
return []
out: list[int] = []
by_name = {label["name"]: label["id"] for label in labels if isinstance(label, dict)}
by_name = {l["name"]: l["id"] for l in labels if isinstance(l, dict)}
for n in names:
if n in by_name:
out.append(by_name[n])
@@ -82,7 +82,7 @@ import sys
import urllib.error
import urllib.parse
import urllib.request
from datetime import datetime, timezone
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
@@ -466,40 +466,12 @@ def fetch_log(target_url: str) -> str | None:
def grep_fail_markers(log_text: str) -> list[str]:
"""Return up to 5 sample matching lines for any FAIL_PATTERNS hit.
Empty list = clean log.
Heuristic: skip lines where the marker appears inside script source
(e.g. ``echo "::error::..."`` in a ``::group::Run`` block) rather
than actual execution output. The Gitea Actions log prints the raw
script before executing it; ``echo "::error::"`` lines in that
display are false positives.
"""
Empty list = clean log."""
matches: list[str] = []
in_run_group = False
group_depth = 0
for line in log_text.splitlines():
stripped = line.strip()
# Track Gitea Actions group markers so we can skip the
# ``::group::Run`` script-source display blocks.
if stripped.startswith("::group::Run"):
in_run_group = True
group_depth = 1
continue
if stripped == "::endgroup::":
if in_run_group:
in_run_group = False
group_depth = 0
continue
if in_run_group:
continue
for pat in FAIL_PATTERNS:
if pat in line:
# Additional false-positive guard: ``echo "::error::"``
# is script source, not a runtime error emission.
if pat == "::error::":
prefix = line[: line.index(pat)].strip()
if prefix.endswith('echo') or prefix.endswith("echo '") or prefix.endswith('echo "'):
break
# Truncate to keep error output bounded.
matches.append(line.strip()[:240])
break
if len(matches) >= 5:
@@ -669,15 +641,6 @@ def main(argv: list[str] | None = None) -> int:
base_workflows = workflows_at_sha(BASE_SHA)
head_workflows = workflows_at_sha(HEAD_SHA)
# Ignore workflow files that are identical on both sides — old branches
# that haven't rebased onto main carry stale copies of workflows that
# were updated later. Comparing those stale copies against the current
# base produces false-positive "flips".
base_workflows = {
p: t for p, t in base_workflows.items()
if p in head_workflows and head_workflows[p] != t
}
head_workflows = {p: t for p, t in head_workflows.items() if p in base_workflows}
flips = detect_flips(base_workflows, head_workflows)
if not flips:
+30 -268
View File
@@ -90,15 +90,6 @@ API = f"https://{GITEA_HOST}/api/v1" if GITEA_HOST else ""
# match by exact title without parsing.
TITLE_PREFIX = "[main-red]"
# Contexts that are scheduled or non-required — their pending/failure
# state should not block stale-issue closeout (mc#1789).
SCHEDULED_CONTEXT_PATTERNS = (
"Staging SaaS smoke",
"Continuous synthetic E2E",
"main-red-watchdog",
"ci-arm64-advisory",
)
# Settling window (seconds) between initial red detection and the
# pre-file recheck. The recheck filters out the two largest false-
# positive classes seen in mc#1597..1630 (task #394, 2026-05-21):
@@ -274,11 +265,6 @@ def get_combined_status(sha: str) -> dict:
return body
def _entry_state(s: dict) -> str:
"""Per-entry status key in Gitea 1.22.6 is `status`; fall back to `state`."""
return s.get("status") or s.get("state") or ""
def is_red(status: dict) -> tuple[bool, list[dict]]:
"""Return (is_red, failed_statuses).
@@ -326,6 +312,9 @@ def is_red(status: dict) -> tuple[bool, list[dict]]:
# "no per-context entries were in a red state" fallback even when
# the combined-state correctly flagged red. See
# `feedback_smoke_test_vendor_truth_not_shape_match`.
def _entry_state(s: dict) -> str:
return s.get("status") or s.get("state") or ""
def _is_cancel_cascade(s: dict) -> bool:
"""status=3 entry per Gitea 1.22.6 description-string contract.
Match exactly (after strip) — substring match would catch
@@ -364,15 +353,6 @@ def title_for(sha: str) -> str:
return f"{TITLE_PREFIX} {REPO}: {sha[:10]}"
def _is_scheduled_context(context: str) -> bool:
"""Return True if `context` is a known scheduled/non-required job.
These contexts run on a schedule and should not block stale-issue
closeout when main's required CI has recovered (mc#1789).
"""
return any(pattern.lower() in context.lower() for pattern in SCHEDULED_CONTEXT_PATTERNS)
def list_open_red_issues() -> list[dict]:
"""All open issues whose title starts with `[main-red] {repo}: `.
@@ -382,34 +362,23 @@ def list_open_red_issues() -> list[dict]:
file-or-update path to POST a duplicate — exactly the regression
class the helper-raises contract closes.
Pagination is exhausted (mc#1789). The old "by design ≤ 1" invariant
was false — backlog can exceed 50 open issues.
Gitea issue search returns at most 50/page; we only need open
`[main-red]` issues which are by design ≤ 1 at any time per repo,
so a single page is enough.
"""
prefix = f"{TITLE_PREFIX} {REPO}: "
all_issues: list[dict] = []
page = 1
limit = 50
while True:
_, results = api(
"GET",
f"/repos/{OWNER}/{NAME}/issues",
query={"state": "open", "type": "issues", "limit": str(limit), "page": str(page)},
_, results = api(
"GET",
f"/repos/{OWNER}/{NAME}/issues",
query={"state": "open", "type": "issues", "limit": "50"},
)
if not isinstance(results, list):
raise ApiError(
f"issue search returned non-list body (got {type(results).__name__})"
)
if not isinstance(results, list):
raise ApiError(
f"issue search returned non-list body (got {type(results).__name__})"
)
matched = [
i for i in results
if isinstance(i, dict)
prefix = f"{TITLE_PREFIX} {REPO}: "
return [i for i in results if isinstance(i, dict)
and isinstance(i.get("title"), str)
and i["title"].startswith(prefix)
]
all_issues.extend(matched)
if len(results) < limit:
break
page += 1
return all_issues
and i["title"].startswith(prefix)]
def find_open_issue_for_sha(sha: str) -> dict | None:
@@ -605,156 +574,10 @@ def file_or_update_red(
sys.stderr.write(f"::warning::label '{RED_LABEL}' not found on repo\n")
def close_stale_red_issues(
current_sha: str,
current_status: dict,
*,
dry_run: bool = False,
) -> int:
"""Close open [main-red] issues whose specific failing contexts have
all recovered on `current_sha`, even though `main` is still red for
other reasons (mc#1789).
When main stays red across consecutive SHAs for *different* causes,
`close_open_red_issues_for_other_shas` never fires (it only runs when
main is green). This function prevents stale issues from accumulating
indefinitely by comparing per-context recovery across SHAs.
An issue is considered stale when every context that was in a failed
state on the issue's SHA is now either `success` on the current HEAD
or absent (workflow removed / renamed). Issues whose original SHA had
a combined-red-with-no-detail (empty statuses list) are skipped — we
cannot verify recovery without per-context data.
Returns the number of issues closed.
"""
open_red = list_open_red_issues()
if not open_red:
return 0
current_statuses = current_status.get("statuses") or []
closed = 0
for issue in open_red:
title = issue.get("title", "")
prefix = f"{TITLE_PREFIX} {REPO}: "
if not title.startswith(prefix):
continue
short_sha = title[len(prefix):]
if short_sha == current_sha[:10]:
continue
# Query status for the old SHA. Short SHA should resolve; if it
# doesn't (GC'd, force-pushed, ambiguous), skip conservatively.
try:
old_status = get_combined_status(short_sha)
except ApiError:
continue
old_red, old_failed = is_red(old_status)
if not old_red:
# Open issue for a now-green SHA — close it via the normal path.
num = issue.get("number")
if isinstance(num, int):
comment = (
f"Commit `{short_sha}` is no longer red. Closing as the "
f"failure context has recovered or expired."
)
if dry_run:
print(
f"::notice::[dry-run] would close issue #{num} "
f"({title}) — old SHA is now green"
)
closed += 1
continue
api(
"POST",
f"/repos/{OWNER}/{NAME}/issues/{num}/comments",
body={"body": comment},
)
api(
"PATCH",
f"/repos/{OWNER}/{NAME}/issues/{num}",
body={"state": "closed"},
)
print(
f"::notice::Closed stale main-red issue #{num} "
f"(old SHA {short_sha} is now green)"
)
closed += 1
continue
if not old_failed:
# Combined red with no per-context detail — can't verify recovery.
continue
# Verify every failed context from the old SHA has recovered.
all_recovered = True
recovered_ctxs: list[str] = []
still_failing_ctxs: list[str] = []
for s in old_failed:
ctx = s.get("context", "")
if not ctx:
continue
current_match = None
for cs in current_statuses:
if isinstance(cs, dict) and cs.get("context") == ctx:
current_match = cs
break
if current_match is None:
recovered_ctxs.append(ctx)
elif _entry_state(current_match) == "success":
recovered_ctxs.append(ctx)
else:
all_recovered = False
still_failing_ctxs.append(ctx)
if not all_recovered:
continue
num = issue.get("number")
if not isinstance(num, int):
continue
comment = (
f"The failing contexts from this SHA (`{short_sha}`) have "
f"recovered on current HEAD `{current_sha[:10]}`: "
f"{', '.join(recovered_ctxs)}. "
f"Main is still red for other reasons; see the current "
f"`[main-red]` issue for `{current_sha[:10]}`."
)
if dry_run:
print(
f"::notice::[dry-run] would close stale issue #{num} "
f"({title}) — contexts recovered"
)
closed += 1
continue
api(
"POST",
f"/repos/{OWNER}/{NAME}/issues/{num}/comments",
body={"body": comment},
)
api(
"PATCH",
f"/repos/{OWNER}/{NAME}/issues/{num}",
body={"state": "closed"},
)
print(
f"::notice::Closed stale main-red issue #{num} "
f"(contexts recovered at {current_sha[:10]})"
)
closed += 1
return closed
def close_open_red_issues_for_other_shas(
current_sha: str,
*,
dry_run: bool = False,
close_same_sha: bool = False,
) -> int:
"""When main is green at current_sha, close any open `[main-red]`
issues whose title references a different SHA. Returns the number
@@ -763,25 +586,15 @@ def close_open_red_issues_for_other_shas(
Lineage note: we only close issues whose title prefix matches; if
a human renamed the issue or added a suffix this won't touch it.
That's intentional — manual editorial state takes precedence.
Args:
close_same_sha: set True when the caller already knows main is
green at current_sha (e.g. recovery block) and wants to close
the open issue for THIS SHA too. Defaults False so the
green-path callers never accidentally close an issue they just
filed on the same tick.
"""
target_title = title_for(current_sha)
open_red = list_open_red_issues()
closed = 0
for issue in open_red:
if issue.get("title") == target_title:
if not close_same_sha:
# Same SHA — caller should not have invoked this if main is
# green. Skip defensively (guards against green-path callers
# that accidentally pass the SHA they just filed for).
continue
# close_same_sha=True: close even this SHA's issue (recovery path)
# Same SHA — caller should not have invoked this if main is
# green. Skip defensively.
continue
num = issue.get("number")
if not isinstance(num, int):
continue
@@ -886,10 +699,6 @@ def run_once(*, dry_run: bool = False) -> int:
f"{sha[:10]} but HEAD is now {recheck_sha[:10]} on "
f"{WATCH_BRANCH}; next cron tick will re-evaluate."
)
# HEAD drifted — close any stale main-red issue for the prior SHA
# before returning, so we don't leave stale open issues when main
# is no longer pointing at the red commit.
close_open_red_issues_for_other_shas(recheck_sha, dry_run=dry_run)
return 0
recheck_status = get_combined_status(sha)
@@ -902,9 +711,6 @@ def run_once(*, dry_run: bool = False) -> int:
f"{recheck_status.get('state')!r} on recheck; "
f"initial red was a transient cancel-cascade."
)
# CI recovered on the same SHA — close any stale main-red issue
# that was filed on a prior tick for this SHA.
close_open_red_issues_for_other_shas(sha, dry_run=dry_run, close_same_sha=True)
return 0
# Still red after settling — file/update. Use the recheck data
@@ -920,68 +726,24 @@ def run_once(*, dry_run: bool = False) -> int:
print(f"::warning::main is RED at {sha[:10]} on {WATCH_BRANCH}: "
f"{len(failed)} failed context(s)")
file_or_update_red(sha, failed, debug, dry_run=dry_run)
stale_closed = close_stale_red_issues(sha, recheck_status, dry_run=dry_run)
if stale_closed:
emit_loki_event("main_red_stale_closed", sha, [])
print(
f"::notice::Closed {stale_closed} stale main-red issue(s) "
f"whose contexts recovered at {sha[:10]}"
)
else:
# Green or pending-with-no-real-failures. Close stale issues
# from earlier SHAs when required CI has recovered.
#
# mc#1789: main often sits at combined `pending` because
# scheduled/non-required contexts (Staging SaaS smoke,
# Continuous synthetic E2E, main-red-watchdog itself,
# ci-arm64-advisory) are still running. We close stale issues
# as long as no *non-scheduled* context has failed and no
# *non-scheduled* context is still pending — i.e. required CI
# is effectively green.
#
# The success-only gate is preserved for the canonical green
# path; the extended check below only fires when combined is
# `pending` but all required work is done.
combined_state = status.get("state")
if combined_state == "success":
should_close = True
close_reason = "GREEN"
else:
statuses = status.get("statuses") or []
non_scheduled_pending = [
s for s in statuses
if isinstance(s, dict)
and (_entry_state(s) == "pending")
and not _is_scheduled_context(s.get("context", ""))
]
non_scheduled_failed = [
s for s in statuses
if isinstance(s, dict)
and (_entry_state(s) in {"failure", "error"})
and not _is_scheduled_context(s.get("context", ""))
]
# Cancel-cascade already filtered by is_red(); red=False
# here means no real failures. We additionally check that
# no non-scheduled context is still pending.
should_close = not non_scheduled_pending and not non_scheduled_failed
close_reason = "pending-but-required-green"
if should_close:
# Green (or pending — pending is treated as not-red so we don't
# spam during the post-merge CI window). Close any stale issues
# from earlier SHAs only when we're actually green; pending
# means CI hasn't finished and the prior issue might still be
# accurate.
if status.get("state") == "success":
closed = close_open_red_issues_for_other_shas(sha, dry_run=dry_run)
if closed:
emit_loki_event(
"main_returned_to_green", sha,
[],
)
print(
f"::notice::main is {close_reason} at {sha[:10]} on {WATCH_BRANCH} "
f"(closed {closed} stale issue(s))"
)
print(f"::notice::main is GREEN at {sha[:10]} on {WATCH_BRANCH} "
f"(closed {closed} stale issue(s))")
else:
print(
f"::notice::main has pending-or-failed required CI at {sha[:10]} "
f"on {WATCH_BRANCH} (combined state={combined_state!r}; no action)"
)
print(f"::notice::main is PENDING at {sha[:10]} on {WATCH_BRANCH} "
f"(combined state={status.get('state')!r}; no action)")
return 0
+1 -301
View File
@@ -17,6 +17,7 @@ import urllib.error
import urllib.request
from urllib.parse import quote
TRUE_VALUES = {"1", "true", "yes", "on", "disabled", "disable"}
PROD_CP_URL = "https://api.moleculesai.app"
DEFAULT_REQUIRED_CONTEXTS = [
@@ -24,7 +25,6 @@ DEFAULT_REQUIRED_CONTEXTS = [
"Secret scan / Scan diff for credential-shaped strings (push)",
]
TERMINAL_FAILURE_STATES = {"failure", "error", "cancelled", "canceled", "skipped"}
REDEPLOY_PATH = "/cp/admin/tenants/redeploy-fleet"
def truthy_flag(value: str | None) -> bool:
@@ -130,217 +130,6 @@ def required_contexts(env: dict[str, str]) -> list[str]:
return [line.strip() for line in raw.replace(",", "\n").splitlines() if line.strip()]
def chunks(items: list[str], size: int) -> list[list[str]]:
return [items[i : i + size] for i in range(0, len(items), size)]
class RolloutFailed(RuntimeError):
def __init__(self, message: str, response: dict):
super().__init__(message)
self.response = response
def slugs_from_redeploy_response(body: dict) -> list[str]:
slugs: list[str] = []
for row in body.get("results") or []:
slug = str(row.get("slug") or "").strip()
if slug:
slugs.append(slug)
return slugs
def scoped_redeploy_body(base: dict, slugs: list[str]) -> dict:
body = dict(base)
body.pop("canary_slug", None)
body["only_slugs"] = slugs
body["soak_seconds"] = 0
body["batch_size"] = max(1, len(slugs))
return body
def cp_api_json(method: str, url: str, token: str, body: dict | None = None) -> tuple[int, dict]:
data = None
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/json",
}
if body is not None:
data = json.dumps(body).encode("utf-8")
headers["Content-Type"] = "application/json"
req = urllib.request.Request(url, data=data, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=120) as resp:
return resp.status, json.loads(resp.read())
except urllib.error.HTTPError as exc:
raw = exc.read().decode("utf-8", errors="replace")
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
parsed = {"error": raw[:500]}
return exc.code, parsed
def plan_rollout_slugs(cp_url: str, token: str, body: dict, redeploy=None) -> list[str]:
if redeploy is None:
redeploy = redeploy_scoped
dry_run_body = dict(body)
dry_run_body["dry_run"] = True
status, resp = redeploy(cp_url, token, dry_run_body)
if status != 200:
raise RuntimeError(f"dry-run redeploy-fleet returned HTTP {status}: {resp.get('error', '')}")
if resp.get("ok") is not True:
raise RuntimeError(f"dry-run redeploy-fleet reported ok={resp.get('ok')}: {resp.get('error', '')}")
slugs = slugs_from_redeploy_response(resp)
if not slugs:
raise RuntimeError("dry-run redeploy-fleet returned no rollout candidates")
return slugs
def redeploy_scoped(cp_url: str, token: str, body: dict) -> tuple[int, dict]:
return cp_api_json("POST", f"{cp_url}{REDEPLOY_PATH}", token, body)
def _raise_for_redeploy_result(status: int, body: dict, slugs: list[str]) -> None:
if status != 200 or body.get("ok") is not True:
raise RuntimeError(
"redeploy scoped call failed for "
f"{','.join(slugs)}: HTTP {status}, ok={body.get('ok')}"
)
def rollout_stragglers(enumerated: list[str], results: list[dict]) -> list[str]:
"""Return every enumerated tenant NOT proven on the target build.
A straggler is any tenant the rollout was supposed to cover that the
CP could not verify is running the target image tag — whether it
errored, was skipped, or SSM-succeeded onto the wrong image
(internal#724). CP marks each per-tenant result row with
``verified_on_target`` (the REDEPLOY_RUNNING_IMAGE docker-inspect
proof). A tenant enumerated for the rollout but absent from the
result set (no batch ever ran it) is also a straggler — that is the
exact agents-team silent-skip class.
Backward-compat: an OLDER CP that doesn't emit ``verified_on_target``
yet returns rows without the key. Treat a missing key as verified so
this surfacing degrades to the previous (ok-based) behavior against an
un-upgraded CP, rather than failing every deploy spuriously. Once the
CP fix is deployed the key is always present and real stragglers are
caught.
"""
verified: set[str] = set()
for row in results:
if str(row.get("ssm_status") or "") == "DryRun":
continue
slug = str(row.get("slug") or "").strip()
if not slug:
continue
# Missing key (old CP) => assume verified; present key is authoritative.
if "verified_on_target" not in row or row.get("verified_on_target"):
verified.add(slug)
return sorted(s for s in dict.fromkeys(enumerated) if s not in verified)
def assert_full_coverage(enumerated: list[str], aggregate: dict, dry_run: bool) -> None:
"""Fail the rollout if any enumerated tenant is not on the target build.
This is the no-silent-skip gate (internal#724). A dry run proves
nothing landed, so coverage is not asserted for it.
"""
if dry_run:
return
stragglers = rollout_stragglers(enumerated, aggregate.get("results") or [])
if stragglers:
msg = (
f"incomplete rollout: {len(stragglers)} tenant(s) not verified on target "
f"after redeploy-fleet: {', '.join(stragglers)} "
f"(enumerated {len(set(enumerated))})"
)
aggregate["ok"] = False
aggregate["error"] = msg
aggregate["stragglers"] = stragglers
raise RolloutFailed(msg, aggregate)
def execute_scoped_rollout(
plan: dict,
token: str,
list_slugs=plan_rollout_slugs,
redeploy=redeploy_scoped,
sleep=time.sleep,
) -> dict:
cp_url = plan["cp_url"]
base_body = plan["body"]
all_slugs = list_slugs(cp_url, token, base_body)
batch_size = int(base_body.get("batch_size") or 1)
canary_slug = str(base_body.get("canary_slug") or "").strip()
dry_run = bool(base_body.get("dry_run"))
aggregate = {"ok": True, "results": []}
if canary_slug:
if canary_slug not in all_slugs:
raise RuntimeError(f"configured canary slug {canary_slug!r} is not a running tenant")
body = scoped_redeploy_body(base_body, [canary_slug])
print(f"POST {cp_url}{REDEPLOY_PATH} only_slugs={','.join(body['only_slugs'])}")
status, resp = redeploy(cp_url, token, body)
aggregate["results"].extend(resp.get("results") or [])
try:
_raise_for_redeploy_result(status, resp, [canary_slug])
except RuntimeError as exc:
aggregate["ok"] = False
aggregate["error"] = str(exc)
raise RolloutFailed(str(exc), aggregate) from exc
soak_seconds = int(base_body.get("soak_seconds") or 0)
if soak_seconds > 0 and not dry_run:
print(f"Canary passed; soaking locally for {soak_seconds}s")
sleep(soak_seconds)
remaining = [slug for slug in all_slugs if slug != canary_slug]
for group in chunks(remaining, batch_size):
body = scoped_redeploy_body(base_body, group)
print(f"POST {cp_url}{REDEPLOY_PATH} only_slugs={','.join(group)}")
status, resp = redeploy(cp_url, token, body)
aggregate["results"].extend(resp.get("results") or [])
try:
_raise_for_redeploy_result(status, resp, group)
except RuntimeError as exc:
aggregate["ok"] = False
aggregate["error"] = str(exc)
raise RolloutFailed(str(exc), aggregate) from exc
# No-silent-skip coverage gate (internal#724): every enumerated tenant
# must be PROVEN on the target build. A per-tenant HTTP-200/ok response
# is not proof — a tenant that SSM-succeeded but stayed on the old tag,
# or one enumerated but never batched, is a straggler. Surfacing it as
# a RolloutFailed makes the deploy step exit non-zero instead of
# silently reporting success (the exact agents-team failure mode).
assert_full_coverage(all_slugs, aggregate, dry_run)
return aggregate
def rollout_from_plan_file(plan_path: str, response_path: str, env: dict[str, str]) -> None:
token = env.get("CP_ADMIN_API_TOKEN", "").strip()
if not token:
raise ValueError("CP_ADMIN_API_TOKEN is required for production auto-deploy")
with open(plan_path, "r", encoding="utf-8") as fh:
plan = json.load(fh)
if not plan.get("enabled"):
raise RuntimeError("production auto-deploy plan is disabled")
try:
response = execute_scoped_rollout(plan, token)
except RolloutFailed as exc:
response = exc.response
with open(response_path, "w", encoding="utf-8") as fh:
json.dump(response, fh, sort_keys=True)
fh.write("\n")
raise
with open(response_path, "w", encoding="utf-8") as fh:
json.dump(response, fh, sort_keys=True)
fh.write("\n")
def _api_json(url: str, token: str) -> dict:
req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
try:
@@ -364,71 +153,6 @@ def _api_json_optional(url: str, token: str) -> tuple[int, dict | None]:
return exc.code, None
def current_branch_head(env: dict[str, str]) -> str | None:
"""Return the SHA at the tip of the deploy branch (main) per Gitea, or None.
Used to detect a *superseded* deploy job (see `superseded_by`). Fail-safe:
any read error / missing token returns None so the caller treats the job as
NOT superseded and the strict /buildinfo verify still runs. We never let an
unreadable head silently green a deploy.
"""
token = env.get("GITEA_TOKEN", "").strip()
if not token:
return None
host = env.get("GITEA_HOST", "git.moleculesai.app")
repo = env.get("GITHUB_REPOSITORY", "molecule-ai/molecule-core")
# Deploy lane is on: push:main; the branch is always main here, but read it
# from the ref name when present so a future branch rename doesn't break us.
branch = env.get("GITHUB_REF_NAME", "").strip() or "main"
url = f"https://{host}/api/v1/repos/{repo}/branches/{quote(branch, safe='')}"
status, body = _api_json_optional(url, token)
if status != 200 or not isinstance(body, dict):
return None
commit = body.get("commit")
if isinstance(commit, dict):
head = commit.get("id") or commit.get("sha")
if isinstance(head, str) and head.strip():
return head.strip()
return None
def superseded_by(env: dict[str, str]) -> str | None:
"""Return the newer head SHA if THIS deploy job has been superseded, else None.
This workflow runs with no `concurrency:` (intentional — Gitea 1.22.6 cancels
queued runs, which is unacceptable for a prod deploy). When two main pushes
land close together, BOTH deploy-production jobs run. The newer push rolls the
fleet forward first; the OLDER job's strict /buildinfo verify then sees tenants
on the NEWER SHA and false-reds with "$slug is stale" — even though the fleet
is AHEAD, not behind. Git SHAs aren't ordered, so the verify can't tell ahead
from behind on its own (and /buildinfo exposes only git_sha, no build time).
Resolve it at the source of truth for ordering — the branch ref: if main's
current head is a DIFFERENT SHA than the one this job is deploying, a newer
commit has landed and this job is superseded; the newest job's verify is the
authoritative one. We return that head SHA so the caller can log it and exit
success early, skipping the strict-equality verify for this stale job.
Fail-safe: returns None (NOT superseded) when the head can't be read or equals
our SHA, so a genuinely-behind tenant under the LATEST deploy job still fails
the strict verify loudly. This never suppresses a real-stale signal — it only
excuses a job that is no longer the latest from asserting exact equality.
"""
sha = env.get("GITHUB_SHA", "").strip()
if not sha:
return None
head = current_branch_head(env)
if not head:
return None
# SHA lengths can differ (short vs full); compare on the shorter prefix.
n = min(len(head), len(sha))
if head[:n].lower() == sha[:n].lower():
return None
return head
def live_disable_flag(env: dict[str, str]) -> str:
"""Return a live disable value from Gitea variables when readable.
@@ -507,17 +231,6 @@ def main() -> int:
sub.add_parser("plan", help="print production deploy plan as JSON")
sub.add_parser("assert-enabled", help="fail if production deploy is currently disabled")
sub.add_parser("wait-ci", help="block until required CI context is green")
sub.add_parser(
"check-superseded",
help=(
"exit 0 if a newer commit has landed on the deploy branch (this job "
"is superseded; prints the newer head SHA), exit 10 if this job is "
"still the latest"
),
)
rollout_parser = sub.add_parser("rollout", help="execute canary-first scoped production rollout")
rollout_parser.add_argument("--plan", required=True, help="path to prod-auto-deploy plan JSON")
rollout_parser.add_argument("--response", required=True, help="path to write aggregate response JSON")
args = parser.parse_args()
try:
@@ -530,19 +243,6 @@ def main() -> int:
if args.command == "wait-ci":
wait_for_ci_context(dict(os.environ))
return 0
if args.command == "check-superseded":
newer = superseded_by(dict(os.environ))
if newer:
print(newer)
return 0
# Exit 10 (not 0, not 1): "this job is still the latest". The
# workflow treats only exit 0 as superseded; 10 means proceed to
# the strict verify. A non-zero code here is informational, not a
# failure — the workflow step swallows it.
return 10
if args.command == "rollout":
rollout_from_plan_file(args.plan, args.response, dict(os.environ))
return 0
except Exception as exc: # noqa: BLE001 - CLI should render operator-friendly errors.
print(f"::error::{exc}", file=sys.stderr)
return 1
+6 -25
View File
@@ -12,7 +12,6 @@
# ≥ 1 review on the PR where:
# • state == APPROVED
# • review.dismissed == false
# • review.official != false (excludes draft/mis-filed APPROVED reviews)
# • review.user.login != PR.user.login (non-author)
# • review.user.login ∈ team-members
#
@@ -202,7 +201,6 @@ fi
JQ_FILTER='.[]
| select(.state == "APPROVED")
| select(.dismissed != true)
| select(.official != false)
| select(.user.login != $author)'
if [ "${REVIEW_CHECK_STRICT:-}" = "1" ]; then
JQ_FILTER="${JQ_FILTER}
@@ -296,15 +294,7 @@ fi
# 403 → token owner is not in this team (Gitea 1.22.6 'Must be a team
# member' constraint — see follow-up issue for token-provisioning)
# 404 → not a member
# Track whether every candidate returned 403 (token owner not in team).
# When this happens the root cause is a token-provisioning issue, not a
# reviewer-eligibility issue — surface it clearly so ops don't waste time
# verifying team roster (Bug C / RFC#324 follow-up).
_ALL_CANDIDATES_403="yes"
_CANDIDATE_COUNT=0
for U in $CANDIDATES; do
_CANDIDATE_COUNT=$((_CANDIDATE_COUNT + 1))
CODE=$(curl -sS -o "$TEAM_PROBE_TMP" -w '%{http_code}' \
-K "$CURL_AUTH_FILE" "${API}/teams/${TEAM_ID}/members/${U}")
debug "probe ${U} in team ${TEAM} (id=${TEAM_ID}) → HTTP ${CODE}"
@@ -314,31 +304,22 @@ for U in $CANDIDATES; do
exit 0
;;
403)
# Token owner is not in the team being probed; Gitea 1.22.6 refuses
# to confirm membership in this case. Do NOT hard-fail the gate on a
# 403 — doing so would fail the entire gate if ANY candidate triggers
# a 403, even when other valid team-members exist. Instead skip this
# candidate and continue checking others. If all candidates produce
# 403 (token owner can't query any of them) the final exit fires.
echo "::warning::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — skipping; cannot confirm membership)"
# Token owner is not in the team being probed; the API refuses to
# confirm membership. This is the RFC#324 follow-up token-scope gap.
# Fail closed — never grant approval on a 403; surface clearly.
echo "::error::team-probe for ${U} in ${TEAM} returned 403 (token owner not in ${TEAM} team — RFC#324 token-scope follow-up). Cannot confirm membership; failing closed."
cat "$TEAM_PROBE_TMP" >&2
continue
exit 1
;;
404)
_ALL_CANDIDATES_403="no"
debug "${U} not a member of ${TEAM}"
;;
*)
_ALL_CANDIDATES_403="no"
echo "::warning::team-probe for ${U} in ${TEAM} returned unexpected HTTP ${CODE}"
cat "$TEAM_PROBE_TMP" >&2
;;
esac
done
if [ "$_ALL_CANDIDATES_403" = "yes" ] && [ "$_CANDIDATE_COUNT" -gt 0 ]; then
echo "::error::${TEAM}-review FAILED — every candidate returned 403 (token owner is not a member of the ${TEAM} team). This is a TOKEN PROVISIONING issue, not a reviewer-eligibility issue. Add the token owner to the '${TEAM}' Gitea team (id=${TEAM_ID}) or use a token whose owner is already in that team."
else
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
fi
echo "::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (candidates: $(echo "$CANDIDATES" | tr '\n' ',' | sed 's/,$//') — none are in team)"
exit 1
+4 -10
View File
@@ -13,26 +13,20 @@ set -euo pipefail
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
# Branch-protection requires the (pull_request_target) context variant.
# The refire path must post the EXACT BP-required name so the gate flips.
CONTEXT="${TEAM}-review / approved (pull_request_target)"
CONTEXT="${TEAM}-review / approved (pull_request)"
TARGET_URL="https://${GITEA_HOST}/${OWNER}/${NAME}/pulls/${PR_NUMBER}"
authfile=$(mktemp)
post_authfile=$(mktemp)
prfile=$(mktemp)
postfile=$(mktemp)
# shellcheck disable=SC2329 # invoked by EXIT trap
cleanup() {
rm -f "$authfile" "$post_authfile" "$prfile" "$postfile"
rm -f "$authfile" "$prfile" "$postfile"
}
trap cleanup EXIT
chmod 600 "$authfile" "$post_authfile"
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
# STATUS_POST_TOKEN is narrow-scoped write:repository for explicit status POST.
# Falls back to GITEA_TOKEN for backward compatibility (e.g. local test).
printf 'header = "Authorization: token %s"\n' "${STATUS_POST_TOKEN:-$GITEA_TOKEN}" > "$post_authfile"
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}")
@@ -74,7 +68,7 @@ body=$(jq -nc \
'{state:$state, context:$context, description:$description, target_url:$target_url}')
code=$(curl -sS -o "$postfile" -w '%{http_code}' -X POST \
-K "$post_authfile" -H "Content-Type: application/json" \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"${API}/repos/${OWNER}/${NAME}/statuses/${head_sha}")
if [ "$code" != "200" ] && [ "$code" != "201" ]; then
+13 -107
View File
@@ -6,8 +6,8 @@
# RFC#351 Step 2 of 6 (implementation MVP).
#
# Invoked by .gitea/workflows/sop-checklist.yml on:
# - pull_request_target: [opened, edited, synchronize, reopened, labeled, unlabeled]
# - issue_comment: [created] # edited/deleted omitted (Gitea 1.22.6 job-parsing quirk)
# - pull_request_target: [opened, edited, synchronize, reopened]
# - issue_comment: [created, edited, deleted]
#
# Flow:
# 1. Load .gitea/sop-checklist-config.yaml (from BASE ref — trusted).
@@ -338,6 +338,7 @@ def compute_ack_state(
# Filter out self-acks and unknown slugs.
ackers_per_slug: dict[str, list[str]] = {s: [] for s in items_by_slug}
rejected_self: dict[str, list[str]] = {s: [] for s in items_by_slug}
rejected_unknown: dict[str, list[str]] = {s: [] for s in items_by_slug}
pending_team_check: dict[str, list[str]] = {s: [] for s in items_by_slug}
for (user, slug), kind in latest_directive.items():
@@ -636,11 +637,8 @@ def load_config(path: str) -> dict[str, Any]:
dep by keeping the config shape constrained.
"""
try:
# yaml is an optional dep; the canonical loader is used when available,
# but the SOP runs on runners that may not have PyYAML installed. The
# fallback _load_config_minimal covers the same config shape without
import yaml # type: ignore[import-not-found] # optional dep; fall back silently if absent
with open(path, encoding="utf-8") as f:
import yaml # type: ignore[import-not-found]
with open(path) as f:
return yaml.safe_load(f)
except ImportError:
return _load_config_minimal(path)
@@ -654,19 +652,13 @@ def _load_config_minimal(path: str) -> dict[str, Any]:
item map: scalars + lists of scalars. Does NOT support nested lists,
YAML anchors, multi-doc, or flow style.
"""
with open(path, encoding="utf-8") as f:
with open(path) as f:
lines = f.readlines()
return _parse_minimal_yaml(lines)
def _parse_minimal_yaml(lines: list[str]) -> dict[str, Any]:
"""Hand-rolled subset parser. See _load_config_minimal docstring.
C901: function is necessarily long — it implements a finite-state YAML
subset (scalars, maps, lists of maps at fixed depth). No utility refactors
meaningfully reduce length without degrading readability. All branches
are exhaustively tested in test_parse_minimal_yaml.py.
"""
def _parse_minimal_yaml(lines: list[str]) -> dict[str, Any]: # noqa: C901
"""Hand-rolled subset parser. See _load_config_minimal docstring."""
# Strip comments + blank lines but preserve indentation.
cleaned: list[tuple[int, str]] = []
for raw in lines:
@@ -850,7 +842,7 @@ def render_status(
def get_tier_mode(pr: dict[str, Any], cfg: dict[str, Any]) -> str:
"""Read tier label, return 'hard' or 'soft' per cfg.tier_failure_mode."""
labels = pr.get("labels") or []
tier_labels = [label.get("name", "") for label in labels if (label.get("name", "") or "").startswith("tier:")]
tier_labels = [l.get("name", "") for l in labels if (l.get("name", "") or "").startswith("tier:")]
mode_map = cfg.get("tier_failure_mode") or {}
default_mode = cfg.get("default_mode", "hard")
for tl in tier_labels:
@@ -873,7 +865,7 @@ def is_high_risk(pr: dict[str, Any], cfg: dict[str, Any]) -> bool:
Governance fix for internal#442 — closes the inconsistency between
sop-tier-check (tier-aware) and sop-checklist (was tier-blind).
"""
label_set = {(label.get("name") or "") for label in (pr.get("labels") or [])}
label_set = {(l.get("name") or "") for l in (pr.get("labels") or [])}
if "tier:high" in label_set:
return True
high_risk_labels = set(cfg.get("high_risk_labels") or [])
@@ -895,47 +887,6 @@ def resolve_required_teams(item: dict[str, Any], high_risk: bool) -> list[str]:
return list(item.get("required_teams") or [])
# ---------------------------------------------------------------------------
# CI status validation for testing-class AI acks (internal#760 CTO hardening)
# ---------------------------------------------------------------------------
# Slugs that require CI / all-required green before an AI ack is valid.
_TESTING_CLASS_SLUGS = {"comprehensive-testing", "local-postgres-e2e", "staging-smoke"}
# Human-only carve-out: these items can NEVER be acked by AI, regardless
# of config drift. Any item in this set MUST NOT have ai_ack_eligible.
# migration / schema are future-proofing — not yet in config items, but
# the code guard rejects them proactively (CTO hardening, msg 1388c76f).
_HUMAN_ONLY_SLUGS = {"root-cause", "no-backwards-compat", "migration", "schema"}
def get_ci_status(client: GiteaClient, owner: str, repo: str, sha: str) -> str:
"""Return the state of CI / all-required (pull_request) for `sha`.
Looks through the commit statuses and returns the state string
("success", "failure", "pending", "error") or "missing" if the
context is not found. This prevents an AI agent from attesting
"tests pass" independently of the actual CI run.
"""
code, data = client._req( # noqa: SLF001
"GET", f"/repos/{owner}/{repo}/statuses/{sha}"
)
if code != 200:
return "unknown"
if not data or not isinstance(data, list):
return "missing"
# Gitea returns statuses newest-first. Find the latest for our context.
for status in data:
if status.get("context") == "CI / all-required (pull_request)":
return status.get("state", "unknown")
return "missing"
# ---------------------------------------------------------------------------
# Main entry point
# ---------------------------------------------------------------------------
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser()
p.add_argument("--owner", required=True)
@@ -1029,9 +980,6 @@ def main(argv: list[str] | None = None) -> int:
# one membership lookup per team.
team_member_cache: dict[tuple[str, int], bool | None] = {}
# Pre-resolve the ai-sop-ack team id once (None if the team does not exist).
ai_sop_ack_team_id = client.resolve_team_id(args.owner, "ai-sop-ack")
def probe(slug: str, users: list[str]) -> list[str]:
# `slug` may be either an items-key (compute_ack_state caller) OR
# an n/a-gate key (compute_na_state caller). Previously this hard
@@ -1068,14 +1016,14 @@ def main(argv: list[str] | None = None) -> int:
tid = client.resolve_team_id(args.owner, tn)
if tid is None:
# Try the list endpoint as a fallback.
code, data = client._req( # noqa: SLF001 # internal helper; called from loop in caller context
code, data = client._req( # noqa: SLF001
"GET", f"/orgs/{args.owner}/teams"
)
if code == 200 and isinstance(data, list):
for t in data:
if t.get("name") == tn:
tid = t.get("id")
client._team_id_cache[(args.owner, tn)] = tid # noqa: SLF001 # write-through cache; intentional side-effect for reuse across calls
client._team_id_cache[(args.owner, tn)] = tid # noqa: SLF001
break
if tid is not None:
team_ids.append(tid)
@@ -1086,18 +1034,14 @@ def main(argv: list[str] | None = None) -> int:
file=sys.stderr,
)
approved: list[str] = []
rejected_ai_ineligible: list[str] = []
rejected_ci_not_green: list[str] = []
for u in users:
# 1) Human required_teams membership check
in_human_team = False
for tid in team_ids:
cache_key = (u, tid)
if cache_key not in team_member_cache:
team_member_cache[cache_key] = client.is_team_member(tid, u)
result = team_member_cache[cache_key]
if result is True:
in_human_team = True
approved.append(u)
break
if result is None:
print(
@@ -1107,44 +1051,6 @@ def main(argv: list[str] | None = None) -> int:
)
# Treat as not-in-team for this user/team pair; loop
# may still find membership in another team.
if in_human_team:
approved.append(u)
continue
# 2) AI-sop-ack team membership check (only for items that allow it).
if slug in items_by_slug:
item = items_by_slug[slug]
# Defensive: human-only carve-out is enforced in code, not just
# config. Even if ai_ack_eligible were mistakenly added to a
# migration/schema item, the AI path is rejected here.
if slug in _HUMAN_ONLY_SLUGS:
rejected_ai_ineligible.append(u)
continue
if item.get("ai_ack_eligible") and ai_sop_ack_team_id is not None:
cache_key = (u, ai_sop_ack_team_id)
if cache_key not in team_member_cache:
team_member_cache[cache_key] = client.is_team_member(
ai_sop_ack_team_id, u
)
result = team_member_cache[cache_key]
if result is True:
# 2a) Testing-class items require real CI artifact evidence.
if slug in _TESTING_CLASS_SLUGS:
ci_state = get_ci_status(
client, args.owner, args.repo, head_sha
)
if ci_state != "success":
print(
f"::warning::AI ack for {slug} rejected: "
f"CI / all-required is {ci_state}, not success",
file=sys.stderr,
)
rejected_ci_not_green.append(u)
continue
approved.append(u)
continue
# If we get here, user is not approved for this slug.
rejected_ai_ineligible.append(u)
return approved
ack_state = compute_ack_state(
+1 -1
View File
@@ -33,7 +33,7 @@ def scenario() -> str:
p = os.path.join(STATE_DIR, "scenario")
if not os.path.isfile(p):
return "T1_success"
with open(p, encoding="utf-8") as f:
with open(p) as f:
return f.read().strip()
+4 -13
View File
@@ -21,7 +21,6 @@ Scenarios:
T16_comments_generic_approval — reviews empty; comments have "APPROVED" by team member → exit 0
T17_comments_no_approval — reviews empty; comments have no approval keywords → exit 1
T18_review_wrong_team_comment_right_team — review candidate 404s, comment candidate passes
T19_ai_sop_ack_approved — ai-sop-ack member APPROVED review → team probe 404 → exit 1
Usage:
FIXTURE_STATE_DIR=/tmp/x python3 _review_check_fixture.py 8080
@@ -34,6 +33,7 @@ import re
import sys
import urllib.parse
STATE_DIR = os.environ.get("FIXTURE_STATE_DIR", "/tmp")
@@ -41,7 +41,7 @@ def scenario() -> str:
p = os.path.join(STATE_DIR, "scenario")
if not os.path.isfile(p):
return "T1_pr_open"
with open(p, encoding="utf-8") as f:
with open(p) as f:
return f.read().strip()
@@ -81,7 +81,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
# GET /repos/{owner}/{name}/pulls/{pr_number}
m = re.match(r"^/api/v1/repos/([^/]+)/([^/]+)/pulls/(\d+)$", path)
if m:
pr_num = m.group(3)
owner, name, pr_num = m.group(1), m.group(2), m.group(3)
if sc == "T2_pr_closed":
return self._json(200, {
"number": int(pr_num),
@@ -117,12 +117,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
{"state": "CHANGES_REQUESTED", "dismissed": False, "user": {"login": "bob"}, "commit_id": "abc1234"},
{"state": "APPROVED", "dismissed": False, "user": {"login": "core-devops"}, "commit_id": "abc1234"},
])
if sc == "T19_ai_sop_ack_approved":
# ai-sop-ack member submitted APPROVED review — must NOT count
# toward qa-review (team_id=20) or security-review (team_id=21).
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "user": {"login": "ai-reviewer"}, "commit_id": "abc1234"},
])
# Default: one non-author APPROVED
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "user": {"login": "core-devops"}, "commit_id": "abc1234"},
@@ -157,16 +151,13 @@ class Handler(http.server.BaseHTTPRequestHandler):
# GET /teams/{team_id}/members/{username}
m = re.match(r"^/api/v1/teams/(\d+)/members/([^/]+)$", path)
if m:
login = m.group(2)
team_id, login = m.group(1), m.group(2)
if sc == "T8_team_not_member":
return self._empty(404)
if sc == "T9_team_403":
return self._empty(403)
if sc == "T18_review_wrong_team_comment_right_team" and login == "core-devops":
return self._empty(404)
if sc == "T19_ai_sop_ack_approved" and login == "ai-reviewer":
# ai-sop-ack member is NOT in qa (20) or security (21).
return self._empty(404)
# T7_team_member: member
return self._empty(204)
@@ -1,176 +0,0 @@
import importlib.util
import sys
from pathlib import Path
from unittest.mock import patch
SCRIPT = Path(__file__).resolve().parents[1] / "ci-required-drift.py"
spec = importlib.util.spec_from_file_location("ci_required_drift", SCRIPT)
drift = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = drift
spec.loader.exec_module(drift)
# Module-level constants are loaded from env at import time; set them
# explicitly so unit tests can import without the full env contract.
drift.SENTINEL_JOB = "all-required"
drift.CI_WORKFLOW_PATH = ".gitea/workflows/ci.yml"
drift.AUDIT_WORKFLOW_PATH = ".gitea/workflows/audit-force-merge.yml"
# ---------------------------------------------------------------------------
# Helper fixtures
# ---------------------------------------------------------------------------
def _make_ci_doc(jobs: dict) -> dict:
return {"jobs": jobs}
def _make_audit_doc(required_checks: list[str]) -> dict:
return {
"jobs": {
"audit": {
"steps": [
{"env": {"REQUIRED_CHECKS": "\n".join(required_checks)}}
]
}
}
}
# ---------------------------------------------------------------------------
# sentinel_needs
# ---------------------------------------------------------------------------
def test_sentinel_needs_returns_empty_when_absent():
doc = _make_ci_doc({"all-required": {"runs-on": "ubuntu-latest"}})
assert drift.sentinel_needs(doc) == set()
def test_sentinel_needs_parses_list():
doc = _make_ci_doc(
{"all-required": {"needs": ["platform-build", "canvas-build"]}}
)
assert drift.sentinel_needs(doc) == {"platform-build", "canvas-build"}
def test_sentinel_needs_parses_string():
doc = _make_ci_doc({"all-required": {"needs": "platform-build"}})
assert drift.sentinel_needs(doc) == {"platform-build"}
# ---------------------------------------------------------------------------
# ci_job_names / ci_jobs_all
# ---------------------------------------------------------------------------
def test_ci_job_names_excludes_sentinel_and_event_gated():
doc = _make_ci_doc(
{
"platform-build": {},
"canvas-build": {"if": "github.event_name == 'pull_request'"},
"main-push": {"if": "github.ref == 'refs/heads/main'"},
"all-required": {},
}
)
assert drift.ci_job_names(doc) == {"platform-build"}
def test_ci_jobs_all_includes_event_gated():
doc = _make_ci_doc(
{
"platform-build": {},
"canvas-build": {"if": "github.event_name == 'pull_request'"},
"all-required": {},
}
)
assert drift.ci_jobs_all(doc) == {"platform-build", "canvas-build"}
# ---------------------------------------------------------------------------
# detect_drift — F1 / F1b with mocked I/O
# ---------------------------------------------------------------------------
SAMPLE_PROTECTION = {
"status_check_contexts": [
"CI / all-required (pull_request)",
"Secret scan / Scan diff for credential-shaped strings (pull_request)",
]
}
def test_detect_drift_no_needs_sentinel_skips_f1():
"""Post-#1766 contract: all-required has no needs: → F1 is a false positive."""
ci = _make_ci_doc(
{
"platform-build": {},
"canvas-build": {},
"all-required": {},
}
)
audit = _make_audit_doc(
[
"CI / all-required (pull_request)",
"Secret scan / Scan diff for credential-shaped strings (pull_request)",
]
)
with patch.object(drift, "load_yaml", side_effect=[ci, audit]):
with patch.object(drift, "api", return_value=(200, SAMPLE_PROTECTION)):
findings, debug = drift.detect_drift("main")
assert findings == []
assert debug["sentinel_needs"] == []
def test_detect_drift_typo_in_needs_triggers_f1b():
"""F1b still catches typos when needs exists."""
ci = _make_ci_doc(
{
"platform-build": {},
"all-required": {"needs": ["platfom-build"]}, # typo
}
)
audit = _make_audit_doc(["CI / all-required (pull_request)"])
with patch.object(drift, "load_yaml", side_effect=[ci, audit]):
with patch.object(drift, "api", return_value=(200, SAMPLE_PROTECTION)):
findings, _ = drift.detect_drift("main")
assert any("F1b" in f for f in findings)
assert any("platfom-build" in f for f in findings)
def test_detect_drift_missing_job_in_needs_triggers_f1():
"""F1 still fires when needs is non-empty and jobs are missing."""
ci = _make_ci_doc(
{
"platform-build": {},
"canvas-build": {},
"all-required": {"needs": ["platform-build"]},
}
)
audit = _make_audit_doc(["CI / all-required (pull_request)"])
with patch.object(drift, "load_yaml", side_effect=[ci, audit]):
with patch.object(drift, "api", return_value=(200, SAMPLE_PROTECTION)):
findings, _ = drift.detect_drift("main")
assert any("F1 —" in f for f in findings)
assert any("canvas-build" in f for f in findings)
assert not any("F1b" in f for f in findings)
def test_detect_drift_no_f1_when_needs_empty_even_with_jobs():
"""Explicit regression guard: empty needs + existing jobs = no F1."""
ci = _make_ci_doc(
{
"platform-build": {},
"canvas-build": {},
"all-required": {"needs": []},
}
)
audit = _make_audit_doc(["CI / all-required (pull_request)"])
with patch.object(drift, "load_yaml", side_effect=[ci, audit]):
with patch.object(drift, "api", return_value=(200, SAMPLE_PROTECTION)):
findings, _ = drift.detect_drift("main")
assert not any("F1 —" in f for f in findings)
@@ -11,100 +11,21 @@ def load_workflow(name: str) -> dict:
return yaml.safe_load(f)
def _all_required(workflow: dict) -> dict:
return workflow["jobs"]["all-required"]
def test_all_required_uses_dedicated_meta_runner_lane():
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
all_required = workflow["jobs"]["all-required"]
# Stays on the dedicated `ci-meta` lane (the sentinel does no docker
# work, so it must NOT occupy the general docker-host pool).
assert all_required["runs-on"] == "ci-meta"
assert "needs" not in all_required
def test_all_required_is_needs_aggregator_not_a_polling_gate():
"""fix/ci-scheduler-fanout (2026-06-01): the sentinel was converted
from a status-polling loop (which squatted a ci-meta executor slot for
up to 40 min per PR) into a plain `needs:` aggregator that frees the
slot immediately. Pin the new shape so a regression to the poller is
caught.
"""
def test_all_required_reuses_path_filter_before_polling():
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
all_required = workflow["jobs"]["all-required"]
rendered = str(all_required)
# The job MUST aggregate via `needs:` (the slot-freeing design).
assert "needs" in all_required, "all-required must be a needs: aggregator"
# It MUST NOT reintroduce the polling loop / per-SHA status fetch that
# was the throughput sink.
assert "detect-changes.py" not in rendered, (
"all-required must not run the detect-changes poller path"
)
assert "commits/" not in rendered and "statuses" not in rendered, (
"all-required must not poll commit statuses (the slot-squat path)"
)
def test_all_required_does_not_use_if_always():
"""Plain `needs:` works on Gitea 1.22.6 / act_runner v0.6.1; `needs:` +
`if: always()` is BROKEN (feedback_gitea_needs_works_only_ifalways_broken)
and would let a non-success need pass the gate. The sentinel must use
plain `needs:` WITHOUT a job-level `if: always()`.
"""
workflow = load_workflow("ci.yml")
all_required = _all_required(workflow)
job_if = all_required.get("if")
assert not (isinstance(job_if, str) and "always()" in job_if), (
"all-required must not combine needs: with if: always()"
)
def test_all_required_needs_matches_ci_required_drift_f1_set():
"""The sentinel `needs:` list MUST equal ci-required-drift.py's
`ci_job_names()` set: every job MINUS the sentinel itself MINUS jobs
whose `if:` gates on github.event_name/github.ref (event-gated jobs
skip on PRs and a `needs:` on a skipped job would never let the
sentinel run). If they diverge, ci-required-drift F1 fires.
"""
workflow = load_workflow("ci.yml")
jobs = workflow["jobs"]
sentinel = "all-required"
expected = set()
for key, body in jobs.items():
if key == sentinel:
continue
gate = body.get("if") if isinstance(body, dict) else None
if isinstance(gate, str) and (
"github.event_name" in gate or "github.ref" in gate
):
# event-gated → legitimately skips on some triggers; excluded
# from both `needs:` and the F1 set.
continue
expected.add(key)
needs = jobs[sentinel].get("needs", [])
if isinstance(needs, str):
needs = [needs]
actual = set(needs)
assert actual == expected, (
f"all-required needs: {sorted(actual)} != ci_job_names() "
f"{sorted(expected)} — ci-required-drift F1 would fire"
)
def test_all_required_needs_reference_real_jobs():
"""F1b guard: every entry in `needs:` must name an existing job."""
workflow = load_workflow("ci.yml")
jobs = workflow["jobs"]
needs = jobs["all-required"].get("needs", [])
if isinstance(needs, str):
needs = [needs]
job_keys = set(jobs)
for dep in needs:
assert dep in job_keys, f"all-required needs unknown job {dep!r}"
assert "--profile ci" in rendered
assert ".gitea/scripts/detect-changes.py" in rendered
assert "REQUIRE_PLATFORM" in rendered
assert "REQUIRE_CANVAS" in rendered
assert "REQUIRE_SCRIPTS" in rendered
@@ -1,244 +0,0 @@
"""Live-fire regression test for #2159 — gate auto-fire runtime verification.
Static tests (test_gate_review_auto_fire.py) validate that the workflow YAML
is structurally correct. This test validates the *runtime* path: submitting an
APPROVED review to a PR whose head contains the current gate workflows causes
Gitea Actions to queue the qa-review + security-review workflows and POST the
branch-protection-required (pull_request_target) contexts within a reasonable
window.
Skipped when Gitea API credentials are not available. Intended for:
- manual developer verification
- CI jobs provisioned with a service-account token
Environment:
GITEA_HOST — default: git.moleculesai.app
GITEA_TOKEN — token with read:repository + write:issues (for review POST)
REPO — default: molecule-ai/molecule-core
LIVEFIRE_PR_NUMBER — optional; if omitted the test tries to find a
suitable open PR automatically, or skips.
LIVEFIRE_TIMEOUT_SEC — default: 120
"""
import base64
import json
import os
import re
import time
import urllib.error
import urllib.request
from pathlib import Path
import pytest
import yaml
GITEA_HOST = os.environ.get("GITEA_HOST", "git.moleculesai.app")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
REPO = os.environ.get("REPO", "molecule-ai/molecule-core")
LIVEFIRE_PR_NUMBER = os.environ.get("LIVEFIRE_PR_NUMBER", "")
LIVEFIRE_TIMEOUT_SEC = int(os.environ.get("LIVEFIRE_TIMEOUT_SEC", "120"))
REQUIRED_CONTEXTS = [
"qa-review / approved (pull_request_target)",
"security-review / approved (pull_request_target)",
]
skip_no_token = pytest.mark.skipif(
not GITEA_TOKEN,
reason="GITEA_TOKEN not set — live-fire test requires API credentials",
)
def _api(method: str, path: str, body: dict | None = None) -> tuple[int, dict]:
url = f"https://{GITEA_HOST}/api/v1{path}"
headers = {
"Authorization": f"token {GITEA_TOKEN}",
"Content-Type": "application/json",
}
data = json.dumps(body).encode() if body else None
req = urllib.request.Request(url, data=data, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
raw = resp.read()
code = resp.status
except urllib.error.HTTPError as exc:
raw = exc.read()
code = exc.code
payload = json.loads(raw) if raw else {}
return code, payload
def _get_pr(number: int) -> dict:
code, pr = _api("GET", f"/repos/{REPO}/pulls/{number}")
if code != 200:
pytest.fail(f"GET /pulls/{number} returned HTTP {code}: {pr}")
return pr
def _list_open_prs() -> list[dict]:
code, prs = _api("GET", f"/repos/{REPO}/pulls?state=open&limit=50")
if code != 200:
pytest.fail(f"GET /pulls?state=open returned HTTP {code}: {prs}")
return prs
def _pr_has_trigger_in_head(pr: dict) -> bool:
"""Return True if the PR head contains pull_request_review in both workflows."""
head_sha = pr["head"]["sha"]
for wf_name in ("qa-review.yml", "security-review.yml"):
path = f"/repos/{REPO}/contents/.gitea/workflows/{wf_name}?ref={head_sha}"
code, payload = _api("GET", path)
if code != 200:
return False
raw = base64.b64decode(payload.get("content", "")).decode("utf-8")
wf = yaml.safe_load(raw)
on = wf.get(True) or wf.get("on") or {}
if isinstance(on, str):
if on != "pull_request_review":
return False
elif "pull_request_review" not in on:
return False
return True
def _find_suitable_pr() -> dict:
if LIVEFIRE_PR_NUMBER:
pr = _get_pr(int(LIVEFIRE_PR_NUMBER))
if pr.get("state") != "open":
pytest.skip(f"PR {LIVEFIRE_PR_NUMBER} is not open")
return pr
prs = _list_open_prs()
for pr in prs:
if _pr_has_trigger_in_head(pr):
return pr
pytest.skip("No open PR found whose head contains the pull_request_review trigger")
def _submit_approved_review(pr_number: int) -> dict:
code, review = _api(
"POST",
f"/repos/{REPO}/pulls/{pr_number}/reviews",
{"body": "Live-fire test APPROVED review", "event": "APPROVED"},
)
# 200 = created, 422 = review already exists (idempotent enough for our purposes)
if code not in (200, 201, 422):
pytest.fail(f"POST /pulls/{pr_number}/reviews returned HTTP {code}")
return review
def _get_status_snapshot(sha: str) -> dict[str, dict]:
"""Return mapping context -> {id, updated_at, target_url} for required contexts."""
code, statuses = _api("GET", f"/repos/{REPO}/statuses/{sha}?limit=100")
if code != 200:
return {}
result: dict[str, dict] = {}
for st in statuses:
ctx = st.get("context", "")
if ctx in REQUIRED_CONTEXTS:
result[ctx] = {
"id": st.get("id"),
"updated_at": st.get("updated_at", st.get("created_at", "")),
"target_url": st.get("target_url"),
}
return result
def _extract_run_id(target_url: str | None) -> str | None:
"""Extract the Actions run_id from a status target_url."""
if not target_url:
return None
m = re.search(r"/actions/runs/(\d+)", target_url)
return m.group(1) if m else None
def _poll_fresh_statuses(
sha: str,
prior_snapshot: dict[str, dict],
timeout_sec: int = LIVEFIRE_TIMEOUT_SEC,
) -> dict[str, dict]:
"""Poll until required contexts appear fresh (newer timestamp, id, or run)."""
deadline = time.monotonic() + timeout_sec
found: dict[str, dict] = {}
while time.monotonic() < deadline:
code, statuses = _api("GET", f"/repos/{REPO}/statuses/{sha}?limit=100")
if code == 200:
for st in statuses:
ctx = st.get("context", "")
if ctx in REQUIRED_CONTEXTS:
updated_at = st.get("updated_at", st.get("created_at", ""))
status_id = st.get("id")
target_url = st.get("target_url")
prior = prior_snapshot.get(ctx, {})
# Fresh if timestamp changed, id changed, or target_url changed.
is_fresh = (
ctx not in prior_snapshot
or updated_at != prior.get("updated_at", "")
or status_id != prior.get("id")
or target_url != prior.get("target_url")
)
if is_fresh:
found[ctx] = {
"state": st.get("state", st.get("status", "")),
"updated_at": updated_at,
"id": status_id,
"target_url": target_url,
}
if all(ctx in found for ctx in REQUIRED_CONTEXTS):
return found
time.sleep(5)
return found
@skip_no_token
class TestGateAutoFireLive:
def test_auto_fire_posts_required_contexts(self):
"""Submit APPROVED review; assert BP-required contexts appear fresh within timeout."""
pr = _find_suitable_pr()
pr_number = pr["number"]
head_sha = pr["head"]["sha"]
# Capture pre-existing status snapshot so we can prove FRESH contexts
# were posted after the review submission (not stale from a prior run).
prior_snapshot = _get_status_snapshot(head_sha)
prior_run_ids = {
_extract_run_id(s["target_url"])
for s in prior_snapshot.values()
if _extract_run_id(s["target_url"])
}
review = _submit_approved_review(pr_number)
found = _poll_fresh_statuses(head_sha, prior_snapshot)
missing = [ctx for ctx in REQUIRED_CONTEXTS if ctx not in found]
if missing:
pytest.fail(
f"After {LIVEFIRE_TIMEOUT_SEC}s, fresh contexts still missing: {missing}. "
f"Found: {found}. Prior snapshot: {prior_snapshot}. "
f"PR #{pr_number} head={head_sha}. "
f"This indicates the pull_request_review trigger did not fire at runtime."
)
# The contexts appeared fresh — that's the proof of auto-fire.
# We do NOT assert success vs failure; the evaluator decides that.
# The point of #2159 is that the workflows QUEUE and POST at all.
for ctx, info in found.items():
state = info["state"]
assert state in ("pending", "success", "failure"), (
f"Unexpected state {state!r} for {ctx}"
)
# CR2 Finding 1: prove a NEW workflow run was triggered, not just
# an in-place status update. Gitea 1.22.6 lacks REST /actions/runs/*
# endpoints, so we use the run_id embedded in the status target_url
# as a proxy for distinct run_id.
run_id = _extract_run_id(info.get("target_url"))
if run_id and run_id in prior_run_ids:
pytest.fail(
f"Context {ctx!r} has target_url run_id {run_id} which existed "
f"BEFORE the review was submitted. This means the status was "
f"updated in-place by an existing run, not by a new workflow "
f"run triggered from the pull_request_review event."
)
@@ -1,168 +0,0 @@
"""Regression test #765 — gate auto-fire on real qa/security APPROVED review.
Validates the structural configuration of qa-review.yml and security-review.yml
so that a real team-member APPROVED review fires the workflow and POSTs the
exact branch-protection-required context name. This is the test #2020's
stale-context failure would have caught.
"""
from pathlib import Path
import yaml
ROOT = Path(__file__).resolve().parents[2]
def load_workflow(name: str) -> dict:
with (ROOT / "workflows" / name).open() as f:
return yaml.safe_load(f)
def _job_guard_string(workflow: dict) -> str:
"""Return the raw job-level `if:` string for the single job."""
jobs = workflow["jobs"]
# Both qa-review and security-review have exactly one job named "approved".
job = jobs["approved"]
return str(job.get("if", ""))
def _post_step(workflow: dict) -> dict:
"""Return the explicit POST /statuses step from the job steps list."""
jobs = workflow["jobs"]
steps = jobs["approved"]["steps"]
for step in steps:
name = step.get("name", "")
if "Post required status context" in name:
return step
raise AssertionError("No explicit POST status step found")
class TestQaReviewDirectTrigger:
def test_trigger_is_pull_request_review_submitted(self):
wf = load_workflow("qa-review.yml")
# PyYAML parses bare 'on' as boolean True.
on = wf[True]
assert "pull_request_review" in on, (
"qa-review must trigger on pull_request_review"
)
types = on["pull_request_review"].get("types", [])
assert "submitted" in types, (
"pull_request_review must include 'submitted' type"
)
def test_job_guard_requires_approved_state(self):
wf = load_workflow("qa-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
def test_post_step_uses_status_post_token(self):
wf = load_workflow("qa-review.yml")
post = _post_step(wf)
env = post.get("env", {})
assert env.get("GITEA_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"POST step must use STATUS_POST_TOKEN for write-scoped status POST"
)
def test_post_step_context_name_exact(self):
"""The context POSTed must byte-match the branch-protection requirement."""
wf = load_workflow("qa-review.yml")
post = _post_step(wf)
run = post.get("run", "")
assert '"qa-review / approved (pull_request_target)"' in run, (
"POST step must emit exact BP-required context name"
)
class TestSecurityReviewDirectTrigger:
def test_trigger_is_pull_request_review_submitted(self):
wf = load_workflow("security-review.yml")
# PyYAML parses bare 'on' as boolean True.
on = wf[True]
assert "pull_request_review" in on, (
"security-review must trigger on pull_request_review"
)
types = on["pull_request_review"].get("types", [])
assert "submitted" in types, (
"pull_request_review must include 'submitted' type"
)
def test_job_guard_requires_approved_state(self):
wf = load_workflow("security-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
def test_post_step_uses_status_post_token(self):
wf = load_workflow("security-review.yml")
post = _post_step(wf)
env = post.get("env", {})
assert env.get("GITEA_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"POST step must use STATUS_POST_TOKEN for write-scoped status POST"
)
def test_post_step_context_name_exact(self):
"""The context POSTed must byte-match the branch-protection requirement."""
wf = load_workflow("security-review.yml")
post = _post_step(wf)
run = post.get("run", "")
assert '"security-review / approved (pull_request_target)"' in run, (
"POST step must emit exact BP-required context name"
)
class TestRefireScriptContextName:
"""review-refire-status.sh must emit the BP-required (pull_request_target) context."""
def test_refire_script_context_is_pull_request_target(self):
script = ROOT / "scripts" / "review-refire-status.sh"
content = script.read_text()
assert 'CONTEXT="${TEAM}-review / approved (pull_request_target)"' in content, (
"refire script CONTEXT must be the exact BP-required (pull_request_target) variant"
)
assert 'approved (pull_request)"' not in content, (
"refire script must NOT post bare (pull_request) context"
)
class TestRefireTokenSeparation:
"""The /qa-recheck + /security-recheck backstop must also use STATUS_POST_TOKEN."""
def _refire_step(self, workflow_name: str, step_name_keyword: str) -> dict:
wf = load_workflow(workflow_name)
jobs = wf["jobs"]
steps = jobs["review-refire"]["steps"]
for step in steps:
name = step.get("name", "")
if step_name_keyword in name:
return step
raise AssertionError(f"No refire step matching {step_name_keyword!r}")
def test_qa_refire_uses_status_post_token(self):
step = self._refire_step("sop-checklist.yml", "Refire qa-review")
env = step.get("env", {})
assert env.get("STATUS_POST_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"qa refire must receive STATUS_POST_TOKEN env var"
)
# Evaluator stays on read token
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"qa refire evaluator must stay on read-scoped token"
)
def test_security_refire_uses_status_post_token(self):
step = self._refire_step("sop-checklist.yml", "Refire security-review")
env = step.get("env", {})
assert env.get("STATUS_POST_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"security refire must receive STATUS_POST_TOKEN env var"
)
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"security refire evaluator must stay on read-scoped token"
)
@@ -1,145 +0,0 @@
"""Stale-head diagnostic test for #2159.
Deterministically reports whether a PR's HEAD contains the pull_request_review
trigger in qa-review.yml and security-review.yml. If the trigger is absent,
auto-fire on APPROVED review is impossible for that PR.
This is used as a self-diagnostic for future stale-PR situations (PRs opened
before #2157 merged, or branches cut from old bases).
Environment:
GITEA_HOST — default: git.moleculesai.app
GITEA_TOKEN — token with read:repository scope (optional; falls back to local files)
REPO — default: molecule-ai/molecule-core
PR_NUMBER — required when running against a real PR
"""
import base64
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
import pytest
import yaml
GITEA_HOST = os.environ.get("GITEA_HOST", "git.moleculesai.app")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
REPO = os.environ.get("REPO", "molecule-ai/molecule-core")
PR_NUMBER = os.environ.get("PR_NUMBER", "")
ROOT = Path(__file__).resolve().parents[2]
def _api(method: str, path: str) -> tuple[int, dict]:
url = f"https://{GITEA_HOST}/api/v1{path}"
headers = {"Authorization": f"token {GITEA_TOKEN}"}
req = urllib.request.Request(url, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
return resp.status, json.loads(resp.read())
except urllib.error.HTTPError as exc:
body = exc.read()
return exc.code, json.loads(body) if body else {}
def _fetch_workflow_from_ref(workflow_name: str, ref: str) -> dict:
path = f"/repos/{REPO}/contents/.gitea/workflows/{workflow_name}?ref={ref}"
code, payload = _api("GET", path)
if code != 200:
pytest.fail(
f"GET {path} returned HTTP {code}: {payload}. "
f"Cannot determine whether PR head contains the trigger."
)
raw = base64.b64decode(payload.get("content", "")).decode("utf-8")
return yaml.safe_load(raw)
def _fetch_workflow_local(workflow_name: str) -> dict:
p = ROOT / "workflows" / workflow_name
if not p.exists():
pytest.fail(f"Local workflow file not found: {p}")
return yaml.safe_load(p.read_text())
def _has_pull_request_review_trigger(wf: dict) -> bool:
on = wf.get(True) or wf.get("on") or {}
if isinstance(on, list):
return "pull_request_review" in on
if isinstance(on, dict):
return "pull_request_review" in on
if isinstance(on, str):
return on == "pull_request_review"
return False
def _diagnose_pr(pr_number: int) -> dict[str, bool]:
code, pr = _api("GET", f"/repos/{REPO}/pulls/{pr_number}")
if code != 200:
pytest.fail(f"GET /pulls/{pr_number} returned HTTP {code}: {pr}")
head_ref = pr["head"]["ref"]
head_sha = pr["head"]["sha"]
results: dict[str, bool] = {}
for wf_name in ("qa-review.yml", "security-review.yml"):
wf = _fetch_workflow_from_ref(wf_name, head_sha)
results[wf_name] = _has_pull_request_review_trigger(wf)
return {
"pr_number": pr_number,
"head_ref": head_ref,
"head_sha": head_sha,
"triggers": results,
"auto_fire_possible": all(results.values()),
}
def _diagnose_local() -> dict[str, bool]:
results: dict[str, bool] = {}
for wf_name in ("qa-review.yml", "security-review.yml"):
wf = _fetch_workflow_local(wf_name)
results[wf_name] = _has_pull_request_review_trigger(wf)
return {
"pr_number": None,
"head_ref": "local-checkout",
"head_sha": None,
"triggers": results,
"auto_fire_possible": all(results.values()),
}
class TestStaleHeadDiagnostic:
"""Test deterministically reports 'auto-fire impossible for this PR' when
the PR head lacks the pull_request_review trigger.
"""
def test_local_checkout_has_pull_request_review_trigger(self):
"""Local files (the ones in this checkout) must contain the trigger.
This is the baseline: if the checkout itself is stale, every PR cut
from it will also be stale.
"""
diag = _diagnose_local()
missing = [n for n, ok in diag["triggers"].items() if not ok]
if missing:
pytest.fail(
f"Local checkout is missing pull_request_review trigger in: {missing}. "
f"This branch cannot produce PRs that auto-fire."
)
@pytest.mark.skipif(not GITEA_TOKEN, reason="GITEA_TOKEN not set")
@pytest.mark.skipif(not PR_NUMBER, reason="PR_NUMBER not set")
def test_pr_head_has_pull_request_review_trigger(self):
"""When PR_NUMBER is given, assert the PR head contains the trigger."""
diag = _diagnose_pr(int(PR_NUMBER))
if not diag["auto_fire_possible"]:
missing = [n for n, ok in diag["triggers"].items() if not ok]
pytest.fail(
f"Auto-fire impossible for PR #{diag['pr_number']}. "
f"Head ref={diag['head_ref']} sha={diag['head_sha']}. "
f"Missing trigger in: {missing}. "
f"This PR needs /qa-recheck + /security-recheck fallback, or a rebase onto current main."
)
@@ -2,6 +2,7 @@ import importlib.util
import sys
from pathlib import Path
SCRIPT = Path(__file__).resolve().parents[1] / "gitea-merge-queue.py"
spec = importlib.util.spec_from_file_location("gitea_merge_queue", SCRIPT)
mq = importlib.util.module_from_spec(spec)
@@ -15,6 +15,7 @@ Mirrors the pattern in scripts/ops/test_check_migration_collisions.py
from __future__ import annotations
import importlib.util
import os
import sys
import unittest
from pathlib import Path
@@ -1,283 +0,0 @@
import importlib.util
import sys
from pathlib import Path
from unittest.mock import patch, MagicMock
SCRIPT = Path(__file__).resolve().parents[1] / "main-red-watchdog.py"
spec = importlib.util.spec_from_file_location("main_red_watchdog", SCRIPT)
wd = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = wd
spec.loader.exec_module(wd)
# Module-level constants are loaded from env at import time; set them
# explicitly so unit tests can import without the full env contract.
wd.GITEA_TOKEN = "fake-token"
wd.GITEA_HOST = "git.example.com"
wd.REPO = "molecule-ai/molecule-core"
wd.OWNER = "molecule-ai"
wd.NAME = "molecule-core"
wd.WATCH_BRANCH = "main"
wd.RED_LABEL = "tier:high"
wd.API = "https://git.example.com/api/v1"
# ---------------------------------------------------------------------------
# _is_scheduled_context
# ---------------------------------------------------------------------------
def test_is_scheduled_context_matches_staging_saas_smoke():
assert wd._is_scheduled_context("Staging SaaS smoke") is True
def test_is_scheduled_context_matches_case_insensitive():
assert wd._is_scheduled_context("continuous synthetic e2e") is True
def test_is_scheduled_context_no_match_for_required_ci():
assert wd._is_scheduled_context("CI / all-required") is False
# ---------------------------------------------------------------------------
# _entry_state
# ---------------------------------------------------------------------------
def test_entry_state_prefers_status_over_state():
"""Gitea 1.22.6 per-entry key is `status`; `state` is fallback."""
assert wd._entry_state({"status": "failure", "state": "success"}) == "failure"
def test_entry_state_falls_back_to_state():
assert wd._entry_state({"state": "pending"}) == "pending"
def test_entry_state_empty_when_neither_key_present():
assert wd._entry_state({"context": "foo"}) == ""
# ---------------------------------------------------------------------------
# is_red
# ---------------------------------------------------------------------------
def test_is_red_combined_failure_no_statuses():
"""Combined failure with empty statuses[] still trips red."""
red, failed = wd.is_red({"state": "failure", "statuses": []})
assert red is True
assert failed == []
def test_is_red_cancel_cascade_filtered():
"""status=3 (cancelled) mapped to failure string must be filtered."""
status = {
"state": "failure",
"statuses": [
{"context": "CI / build", "status": "failure", "description": "Has been cancelled"},
],
}
red, failed = wd.is_red(status)
assert red is False
assert failed == []
def test_is_red_real_failure_not_filtered():
"""Real failures with different descriptions are kept."""
status = {
"state": "failure",
"statuses": [
{"context": "CI / build", "status": "failure", "description": "Failing after 12s"},
],
}
red, failed = wd.is_red(status)
assert red is True
assert len(failed) == 1
assert failed[0]["context"] == "CI / build"
def test_is_red_uses_entry_state_not_top_level_state():
"""Regression: per-entry key is `status`, not `state`."""
status = {
"state": "failure",
"statuses": [
# Only `status` present; pre-rev4 code read `state` and got None
{"context": "CI / test", "status": "failure"},
],
}
red, failed = wd.is_red(status)
assert red is True
assert len(failed) == 1
# ---------------------------------------------------------------------------
# list_open_red_issues — pagination (mc#1789)
# ---------------------------------------------------------------------------
def test_list_open_red_issues_exhausts_pagination():
"""Backlog can exceed 50 issues; all pages must be fetched."""
calls = []
def fake_api(method, path, **kwargs):
calls.append((method, path, kwargs))
query = (kwargs.get("query") or {})
page = int(query.get("page", "1"))
limit = int(query.get("limit", "50"))
# Page 1 returns full limit; page 2 returns partial → break
if page == 1:
return 200, [
{"title": f"[main-red] molecule-ai/molecule-core: sha{i:04d}"}
for i in range(limit)
]
if page == 2:
return 200, [
{"title": "[main-red] molecule-ai/molecule-core: extra1"},
{"title": "[main-red] molecule-ai/molecule-core: extra2"},
{"title": " unrelated issue "}, # filtered out
]
return 200, []
with patch.object(wd, "api", side_effect=fake_api):
issues = wd.list_open_red_issues()
assert len(issues) == 52 # 50 + 2 matched
titles = {i["title"] for i in issues}
assert "[main-red] molecule-ai/molecule-core: extra1" in titles
assert "[main-red] molecule-ai/molecule-core: extra2" in titles
def test_list_open_red_issues_single_page():
"""When results < limit, loop breaks after first page."""
def fake_api(method, path, **kwargs):
return 200, [
{"title": "[main-red] molecule-ai/molecule-core: abc123"},
]
with patch.object(wd, "api", side_effect=fake_api):
issues = wd.list_open_red_issues()
assert len(issues) == 1
# ---------------------------------------------------------------------------
# run_once — close logic (mc#1789)
# ---------------------------------------------------------------------------
def test_run_once_green_closes_stale_issues(monkeypatch):
"""Combined success → close stale issues."""
monkeypatch.setattr(wd, "get_head_sha", lambda b: "abc123")
monkeypatch.setattr(wd, "get_combined_status", lambda s: {"state": "success", "statuses": []})
monkeypatch.setattr(wd, "is_red", lambda s: (False, []))
closed = []
def capture_close(current_sha, *, dry_run=False, close_same_sha=False):
closed.append(current_sha)
return 1
monkeypatch.setattr(wd, "close_open_red_issues_for_other_shas", capture_close)
monkeypatch.setattr(wd, "emit_loki_event", lambda *a, **k: None)
assert wd.run_once(dry_run=True) == 0
assert closed == ["abc123"]
def test_run_once_pending_scheduled_only_closes_stale_issues(monkeypatch):
"""Combined pending, but only scheduled contexts pending → close stale."""
monkeypatch.setattr(wd, "get_head_sha", lambda b: "abc123")
monkeypatch.setattr(
wd, "get_combined_status",
lambda s: {
"state": "pending",
"statuses": [
{"context": "CI / all-required", "status": "success"},
{"context": "Staging SaaS smoke", "status": "pending"},
],
}
)
monkeypatch.setattr(wd, "is_red", lambda s: (False, []))
closed = []
def capture_close(current_sha, *, dry_run=False, close_same_sha=False):
closed.append(current_sha)
return 1
monkeypatch.setattr(wd, "close_open_red_issues_for_other_shas", capture_close)
monkeypatch.setattr(wd, "emit_loki_event", lambda *a, **k: None)
assert wd.run_once(dry_run=True) == 0
assert closed == ["abc123"]
def test_run_once_pending_required_does_not_close(monkeypatch):
"""Combined pending with a real required context still pending → no close."""
monkeypatch.setattr(wd, "get_head_sha", lambda b: "abc123")
monkeypatch.setattr(
wd, "get_combined_status",
lambda s: {
"state": "pending",
"statuses": [
{"context": "CI / all-required", "status": "pending"},
{"context": "Staging SaaS smoke", "status": "success"},
],
}
)
monkeypatch.setattr(wd, "is_red", lambda s: (False, []))
closed = []
def capture_close(current_sha, *, dry_run=False, close_same_sha=False):
closed.append(current_sha)
return 0
monkeypatch.setattr(wd, "close_open_red_issues_for_other_shas", capture_close)
monkeypatch.setattr(wd, "emit_loki_event", lambda *a, **k: None)
assert wd.run_once(dry_run=True) == 0
assert closed == []
def test_run_once_failure_does_not_close(monkeypatch):
"""Real failure in non-scheduled context → no close."""
monkeypatch.setattr(wd, "get_head_sha", lambda b: "abc123")
monkeypatch.setattr(
wd, "get_combined_status",
lambda s: {
"state": "failure",
"statuses": [
{"context": "CI / all-required", "status": "failure"},
],
}
)
# is_red will return True, so we enter the red path, not the green close path
monkeypatch.setattr(wd, "is_red", lambda s: (True, s.get("statuses", [])))
monkeypatch.setattr(wd, "time", MagicMock(sleep=lambda x: None))
monkeypatch.setattr(wd, "emit_loki_event", lambda *a, **k: None)
filed = []
def capture_file(sha, failed, debug, *, dry_run=False):
filed.append(sha)
monkeypatch.setattr(wd, "file_or_update_red", capture_file)
monkeypatch.setattr(wd, "close_open_red_issues_for_other_shas", lambda *a, **k: 0)
monkeypatch.setattr(wd, "close_stale_red_issues", lambda *a, **k: 0)
assert wd.run_once(dry_run=True) == 0
assert filed == ["abc123"]
# ---------------------------------------------------------------------------
# title_for / find_open_issue_for_sha
# ---------------------------------------------------------------------------
def test_title_for_uses_short_sha():
assert wd.title_for("abcdef123456") == "[main-red] molecule-ai/molecule-core: abcdef1234"
def test_find_open_issue_for_sha_matches_exact_title(monkeypatch):
fake_issue = {"title": "[main-red] molecule-ai/molecule-core: abc1234567", "number": 42}
monkeypatch.setattr(wd, "list_open_red_issues", lambda: [fake_issue])
assert wd.find_open_issue_for_sha("abc1234567") == fake_issue
def test_find_open_issue_for_sha_returns_none_when_no_match(monkeypatch):
monkeypatch.setattr(wd, "list_open_red_issues", lambda: [])
assert wd.find_open_issue_for_sha("abc123") is None
@@ -153,462 +153,3 @@ def test_default_required_contexts_delegate_path_gating_to_all_required():
"CI / all-required (push)",
"Secret scan / Scan diff for credential-shaped strings (push)",
]
def test_slugs_from_redeploy_response_uses_controlplane_plan_rows():
body = {
"results": [
{"slug": "hongming", "phase": "canary", "ssm_status": "DryRun"},
{"slug": "tenant-a", "phase": "batch-1", "ssm_status": "DryRun"},
{"slug": "", "phase": "batch-1", "ssm_status": "DryRun"},
{"phase": "batch-1", "ssm_status": "DryRun"},
]
}
assert prod.slugs_from_redeploy_response(body) == ["hongming", "tenant-a"]
def test_plan_rollout_slugs_asks_controlplane_for_dry_run_plan():
calls = []
def fake_redeploy(_cp_url, _token, body):
calls.append(body)
return 200, {
"ok": True,
"results": [
{"slug": "hongming", "phase": "canary", "ssm_status": "DryRun"},
{"slug": "tenant-a", "phase": "batch-1", "ssm_status": "DryRun"},
],
}
slugs = prod.plan_rollout_slugs(
"https://api.moleculesai.app",
"secret",
{
"target_tag": "staging-abcdef1",
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 3,
"dry_run": False,
"confirm": True,
},
redeploy=fake_redeploy,
)
assert slugs == ["hongming", "tenant-a"]
assert calls == [
{
"target_tag": "staging-abcdef1",
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 3,
"dry_run": True,
"confirm": True,
}
]
def test_scoped_redeploy_body_removes_canary_and_local_soak():
base = {
"target_tag": "staging-abcdef1",
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 3,
"dry_run": False,
"confirm": True,
}
scoped = prod.scoped_redeploy_body(base, ["tenant-a", "tenant-b"])
assert scoped == {
"target_tag": "staging-abcdef1",
"soak_seconds": 0,
"batch_size": 2,
"dry_run": False,
"confirm": True,
"only_slugs": ["tenant-a", "tenant-b"],
}
def test_plan_scoped_rollout_preserves_canary_then_batches():
calls, sleeps = [], []
def fake_list(_cp_url, _token, _body):
return ["tenant-a", "hongming", "tenant-b", "tenant-c"]
def fake_redeploy(_cp_url, _token, body):
calls.append(body)
return 200, {
"ok": True,
"results": [{"slug": slug, "healthz_ok": True} for slug in body["only_slugs"]],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-abcdef1",
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 2,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=fake_list,
redeploy=fake_redeploy,
sleep=sleeps.append,
)
assert [call["only_slugs"] for call in calls] == [
["hongming"],
["tenant-a", "tenant-b"],
["tenant-c"],
]
assert sleeps == [60]
assert aggregate["ok"] is True
assert [result["slug"] for result in aggregate["results"]] == [
"hongming",
"tenant-a",
"tenant-b",
"tenant-c",
]
def test_scoped_rollout_halts_after_failed_canary():
calls = []
def fake_redeploy(_cp_url, _token, body):
calls.append(body)
return 200, {"ok": False, "results": [{"slug": body["only_slugs"][0], "error": "bad"}]}
try:
prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-abcdef1",
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 2,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _cp_url, _token, _body: ["hongming", "tenant-a"],
redeploy=fake_redeploy,
sleep=lambda _seconds: None,
)
except prod.RolloutFailed as exc:
assert "redeploy scoped call failed" in str(exc)
assert exc.response["ok"] is False
assert exc.response["results"] == [{"slug": "hongming", "error": "bad"}]
else:
raise AssertionError("expected failed canary to halt rollout")
assert [call["only_slugs"] for call in calls] == [["hongming"]]
def test_rollout_from_plan_file_writes_partial_response_on_failure(tmp_path):
plan_path = tmp_path / "plan.json"
response_path = tmp_path / "response.json"
plan_path.write_text(
"""
{
"enabled": true,
"cp_url": "https://api.moleculesai.app",
"body": {"target_tag": "staging-abcdef1", "confirm": true}
}
""",
encoding="utf-8",
)
original = prod.execute_scoped_rollout
def fake_execute(_plan, _token):
raise prod.RolloutFailed(
"redeploy scoped call failed for hongming: HTTP 500, ok=false",
{
"ok": False,
"error": "redeploy scoped call failed for hongming: HTTP 500, ok=false",
"results": [{"slug": "hongming", "error": "bad"}],
},
)
prod.execute_scoped_rollout = fake_execute
try:
try:
prod.rollout_from_plan_file(
str(plan_path),
str(response_path),
{"CP_ADMIN_API_TOKEN": "secret"},
)
except prod.RolloutFailed:
pass
else:
raise AssertionError("expected rollout failure")
finally:
prod.execute_scoped_rollout = original
assert response_path.read_text(encoding="utf-8").strip()
assert '"ok": false' in response_path.read_text(encoding="utf-8")
assert '"slug": "hongming"' in response_path.read_text(encoding="utf-8")
# ──────────────────────────────────────────────────────────────────────
# No-silent-skip coverage gate (internal#724)
# ──────────────────────────────────────────────────────────────────────
def test_rollout_stragglers_flags_tenant_not_on_target():
# b SSM-succeeded but its container is on the old tag → straggler.
stragglers = prod.rollout_stragglers(
["a", "b", "c"],
[
{"slug": "a", "verified_on_target": True},
{"slug": "b", "verified_on_target": False, "running_image": "platform-tenant:staging-old"},
{"slug": "c", "verified_on_target": True},
],
)
assert stragglers == ["b"]
def test_rollout_stragglers_flags_enumerated_tenant_with_no_result():
# agents-team class: enumerated but no batch ever produced a row for it.
stragglers = prod.rollout_stragglers(
["a", "agents-team"],
[{"slug": "a", "verified_on_target": True}],
)
assert stragglers == ["agents-team"]
def test_rollout_stragglers_missing_key_is_backward_compatible():
# Older CP without verified_on_target → treat as verified (no spurious fail).
stragglers = prod.rollout_stragglers(
["a", "b"],
[{"slug": "a", "healthz_ok": True}, {"slug": "b", "healthz_ok": True}],
)
assert stragglers == []
def test_rollout_stragglers_ignores_dry_run_rows():
stragglers = prod.rollout_stragglers(
["a"], [{"slug": "a", "ssm_status": "DryRun"}]
)
# dry-run row is skipped, so "a" has no verifying row → straggler.
assert stragglers == ["a"]
def test_scoped_rollout_fails_when_a_tenant_stays_on_old_tag():
# Every per-tenant call returns ok=True, but agents-team is NOT
# verified_on_target. The rollout must still fail loudly — this is
# the exact "reported success, one tenant silently skipped" bug.
def fake_redeploy(_cp_url, _token, body):
rows = []
for slug in body["only_slugs"]:
rows.append({"slug": slug, "verified_on_target": slug != "agents-team"})
return 200, {"ok": True, "results": rows}
try:
prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
except prod.RolloutFailed as exc:
assert "incomplete rollout" in str(exc)
assert exc.response["stragglers"] == ["agents-team"]
assert exc.response["ok"] is False
else:
raise AssertionError("expected an incomplete rollout to fail loudly")
def test_scoped_rollout_passes_when_all_tenants_verified_on_target():
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [{"slug": s, "verified_on_target": True} for s in body["only_slugs"]],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
assert aggregate["ok"] is True
assert "stragglers" not in aggregate
def test_scoped_rollout_dry_run_does_not_assert_coverage():
# A dry run proves nothing landed; coverage must NOT be asserted or
# every plan would fail.
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [{"slug": s, "ssm_status": "DryRun"} for s in body["only_slugs"]],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": True,
"confirm": True,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["a", "b"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
assert aggregate["ok"] is True
# --- Superseded-deploy guard (false-stale fix) -----------------------------
#
# Scenario this fixes: no `concurrency:` on the prod-deploy workflow means two
# close main pushes run BOTH deploy-production jobs. eb31bcf (Fix A) and 286338
# (Fix C) merge back-to-back; the 286338 job rolls the fleet to staging-2863380
# first; the OLDER eb31bcf job's strict verify then sees tenants on 2863380 and
# false-reds "stale" though the fleet is AHEAD. superseded_by detects that main's
# head is no longer eb31bcf and lets the older job succeed without weakening the
# behind-tenant signal for whichever job IS the latest.
def test_superseded_by_returns_newer_head_when_main_moved_ahead(monkeypatch):
# eb31bcf job: main head is now 2863380 -> superseded, return the newer head.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
newer = prod.superseded_by({"GITHUB_SHA": "eb31bcffullhash"})
assert newer == "2863380fullhash"
def test_superseded_by_none_when_this_job_is_still_head(monkeypatch):
# 2863380 job (the latest): head == our SHA -> NOT superseded -> strict verify
# runs, so a genuinely-behind tenant still fails loudly.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
def test_superseded_by_matches_on_short_vs_full_sha_prefix(monkeypatch):
# GITHUB_SHA is full; Gitea may return a different-length id. Equal prefixes
# must NOT count as superseded (avoid false-skipping the real latest job).
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380FULLHASH")
assert prod.superseded_by({"GITHUB_SHA": "2863380fullhash"}) is None
def test_superseded_by_fail_safe_returns_none_when_head_unreadable(monkeypatch):
# Fail-safe: unreadable head (no token / API error) must NOT be treated as
# superseded, so the strict verify still runs and never silently greens.
monkeypatch.setattr(prod, "current_branch_head", lambda _env: None)
assert prod.superseded_by({"GITHUB_SHA": "eb31bcffullhash"}) is None
def test_superseded_by_none_without_github_sha(monkeypatch):
monkeypatch.setattr(prod, "current_branch_head", lambda _env: "2863380fullhash")
assert prod.superseded_by({}) is None
def test_current_branch_head_parses_gitea_branch_commit_id(monkeypatch):
captured = {}
def fake_optional(url, _token):
captured["url"] = url
return 200, {"name": "main", "commit": {"id": "2863380fullhash"}}
monkeypatch.setattr(prod, "_api_json_optional", fake_optional)
head = prod.current_branch_head(
{"GITEA_TOKEN": "secret", "GITHUB_REPOSITORY": "molecule-ai/molecule-core"}
)
assert head == "2863380fullhash"
assert captured["url"].endswith("/repos/molecule-ai/molecule-core/branches/main")
def test_current_branch_head_uses_ref_name_branch(monkeypatch):
captured = {}
def fake_optional(url, _token):
captured["url"] = url
return 200, {"commit": {"sha": "deadbeef"}}
monkeypatch.setattr(prod, "_api_json_optional", fake_optional)
head = prod.current_branch_head(
{"GITEA_TOKEN": "secret", "GITHUB_REF_NAME": "release"}
)
assert head == "deadbeef"
assert captured["url"].endswith("/branches/release")
def test_current_branch_head_none_without_token():
assert prod.current_branch_head({}) is None
def test_current_branch_head_none_on_non_200(monkeypatch):
monkeypatch.setattr(prod, "_api_json_optional", lambda _u, _t: (500, None))
assert prod.current_branch_head({"GITEA_TOKEN": "secret"}) is None
# --- #2213: superseded check must fire BEFORE production side effects ----------
#
# Real incident shape: two main pushes land ~2 min apart. The OLDER deploy job
# (GITHUB_SHA=7a72516, target staging-7a72516) started LATE — main head was
# already 7f25373. The #2194 guard only protected the *verify* step, so the
# older job still:
# 1. rolled the canary (hongming) BACKWARD to staging-7a72516 (the #2213 red,
# seen as the newer job's verify reading hongming on the old SHA), then
# 2. promoted :latest backward to the older image,
# before finally skipping verify. The workflow now calls this same superseded
# check BEFORE the redeploy + promote steps and gates both off when it fires.
# These tests pin the contract that check-superseded relies on for the exact
# incident shape.
def test_superseded_by_fires_for_older_job_when_newer_already_head(monkeypatch):
# Older job (7a72516) re-checks the head just before rollout and finds the
# newer merge (7f25373) already owns main -> superseded -> skip side effects.
monkeypatch.setattr(
prod, "current_branch_head", lambda _env: "7f25373309eca54a36f08c371ff783c3a47c3f8d"
)
newer = prod.superseded_by(
{"GITHUB_SHA": "7a72516f7e7ba1a710c4f393fef08be8d22e1866"}
)
assert newer == "7f25373309eca54a36f08c371ff783c3a47c3f8d"
def test_superseded_by_none_for_latest_job_so_it_still_rolls(monkeypatch):
# The newer job (7f25373) IS the head -> NOT superseded -> it proceeds to
# roll the fleet and verify, so a genuinely-behind tenant still fails loud.
monkeypatch.setattr(
prod, "current_branch_head", lambda _env: "7f25373309eca54a36f08c371ff783c3a47c3f8d"
)
assert (
prod.superseded_by(
{"GITHUB_SHA": "7f25373309eca54a36f08c371ff783c3a47c3f8d"}
)
is None
)
+2 -23
View File
@@ -205,8 +205,6 @@ chmod +x "$FIXTURE_DIR/bin/curl"
# Helper: run the script with fixture environment
run_review_check() {
local scenario="$1"
local team="${2:-qa}"
local team_id="${3:-20}"
echo "$scenario" >"$FIX_STATE_DIR/scenario"
local out
set +e
@@ -217,8 +215,8 @@ run_review_check() {
REPO="molecule-ai/molecule-core" \
PR_NUMBER="999" \
DEFAULT_BRANCH="main" \
TEAM="$team" \
TEAM_ID="$team_id" \
TEAM="qa" \
TEAM_ID="20" \
REVIEW_CHECK_DEBUG="0" \
REVIEW_CHECK_STRICT="0" \
bash "$SCRIPT" 2>&1
@@ -374,25 +372,6 @@ assert_eq "T18 exit code 0 (comment approval still considered)" "0" "$T18_RC"
assert_contains "T18 comment candidate notice" "comment-based approval" "$T18_OUT"
assert_contains "T18 comment approver accepted" "APPROVED by core-qa-agent" "$T18_OUT"
# T19 — ai-sop-ack member APPROVED review must NOT count toward qa-review
# or security-review (R1 hardening refinement, msg 1388c76f).
echo
echo "== T19 ai-sop-ack APPROVED review excluded from qa-review gate =="
T19_OUT=$(run_review_check "T19_ai_sop_ack_approved" "qa" "20")
T19_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T19 exit code 1 (ai-sop-ack not in qa team)" "1" "$T19_RC"
assert_contains "T19 ai-reviewer excluded from qa" "candidates: ai-reviewer" "$T19_OUT"
assert_contains "T19 none are in qa team" "none are in team" "$T19_OUT"
# T20 — same ai-sop-ack member must also be excluded from security-review gate.
echo
echo "== T20 ai-sop-ack APPROVED review excluded from security-review gate =="
T20_OUT=$(run_review_check "T19_ai_sop_ack_approved" "security" "21")
T20_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T20 exit code 1 (ai-sop-ack not in security team)" "1" "$T20_RC"
assert_contains "T20 ai-reviewer excluded from security" "candidates: ai-reviewer" "$T20_OUT"
assert_contains "T20 none are in security team" "none are in team" "$T20_OUT"
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL"
+1 -296
View File
@@ -22,6 +22,7 @@ from __future__ import annotations
import os
import sys
import tempfile
import unittest
# Resolve sibling script regardless of where pytest is invoked from.
@@ -1003,299 +1004,3 @@ class TestComputeNaStateAcceptsGateNotInItems(unittest.TestCase):
comments, "alice", na_gates, lambda *_: ["alice"]
)
self.assertFalse(na_state["security-review"]["declared"])
# ---------------------------------------------------------------------------
# internal#760 ceremony — ai-sop-ack team + ai_ack_eligible per-item flag
# ---------------------------------------------------------------------------
class TestAIAckEligibleConfig(unittest.TestCase):
"""CTO-controlled allowlist (msg 1388c76f):
ai_ack_eligible: comprehensive-testing, local-postgres-e2e, staging-smoke,
five-axis-review, memory-consulted
human-only: root-cause, no-backwards-compat
"""
def test_ai_ack_eligible_items(self):
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
eligible = {
"comprehensive-testing",
"local-postgres-e2e",
"staging-smoke",
"five-axis-review",
"memory-consulted",
}
for slug in eligible:
self.assertTrue(
items_by_slug[slug].get("ai_ack_eligible"),
f"{slug} must be ai_ack_eligible",
)
def test_human_only_items(self):
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
human_only = {"root-cause", "no-backwards-compat"}
for slug in human_only:
self.assertFalse(
items_by_slug[slug].get("ai_ack_eligible", False),
f"{slug} must NOT be ai_ack_eligible (human-only)",
)
def test_testing_class_slugs_constant(self):
"""_TESTING_CLASS_SLUGS must match the three testing items."""
self.assertEqual(
sop._TESTING_CLASS_SLUGS,
{"comprehensive-testing", "local-postgres-e2e", "staging-smoke"},
)
def test_human_only_slugs_constant(self):
"""_HUMAN_ONLY_SLUGS encodes the migration/schema carve-out.
If this set changes, the CTO must approve the widening.
"""
self.assertEqual(
sop._HUMAN_ONLY_SLUGS,
{"root-cause", "no-backwards-compat", "migration", "schema"},
)
def test_human_only_invariant_enforced_in_code_and_config(self):
"""Every config-present slug in _HUMAN_ONLY_SLUGS must be human-only.
This test fails if a migration/schema-class item accidentally
acquires ai_ack_eligible via config drift. migration/schema are
future-proofing slugs not yet in the live config; they are checked
by the production probe closure but skipped here.
"""
cfg = sop.load_config(CONFIG_PATH)
items_by_slug = {it["slug"]: it for it in cfg["items"]}
for slug in sop._HUMAN_ONLY_SLUGS:
if slug not in items_by_slug:
# Future-proofing slug (e.g. migration, schema) — not yet
# in config, but the code guard still rejects AI acks.
continue
self.assertFalse(
items_by_slug[slug].get("ai_ack_eligible", False),
f"{slug} is in _HUMAN_ONLY_SLUGS and must NEVER be ai_ack_eligible",
)
class TestAIAckEligibilityProbe(unittest.TestCase):
"""The probe closure in main() delegates to compute_ack_state.
We simulate the AI-ack path by injecting a probe that behaves like
the production probe (human team first, then ai-sop-ack fallback).
"""
def setUp(self):
self.items = _items_by_slug()
self.aliases = _numeric_aliases()
def _probe_human_then_ai(self, human_users, ai_users):
"""Return users in human_users immediately; users in ai_users only
if the item is ai_ack_eligible."""
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u in human_users:
approved.append(u)
elif u in ai_users and item.get("ai_ack_eligible"):
approved.append(u)
return approved
return probe
def test_ai_ack_passes_for_eligible_item(self):
comments = [_comment("ai-bot", "/sop-ack five-axis-review")]
probe = self._probe_human_then_ai(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["five-axis-review"]["ackers"], ["ai-bot"])
def test_ai_ack_rejected_for_human_only_item(self):
comments = [_comment("ai-bot", "/sop-ack root-cause")]
probe = self._probe_human_then_ai(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["root-cause"]["ackers"], [])
self.assertIn("ai-bot", state["root-cause"]["rejected"]["not_in_team"])
def test_human_ack_still_works_for_ai_eligible_item(self):
comments = [_comment("bob", "/sop-ack comprehensive-testing")]
probe = self._probe_human_then_ai(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], ["bob"])
def test_ai_ack_rejected_for_testing_item_when_ci_red(self):
# Simulate the production probe that checks CI status for testing items.
# When CI is not green, ai-sop-ack member is rejected.
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u == "ai-bot" and item.get("ai_ack_eligible"):
# Testing items require CI green; simulate CI red.
if slug in sop._TESTING_CLASS_SLUGS:
continue # rejected: CI not green
approved.append(u)
return approved
comments = [_comment("ai-bot", "/sop-ack comprehensive-testing")]
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], [])
def test_ai_ack_passes_for_testing_item_when_ci_green(self):
# Simulate CI green → AI ack passes.
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u == "ai-bot" and item.get("ai_ack_eligible"):
if slug in sop._TESTING_CLASS_SLUGS:
# CI is green → allow
pass
approved.append(u)
return approved
comments = [_comment("ai-bot", "/sop-ack comprehensive-testing")]
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["comprehensive-testing"]["ackers"], ["ai-bot"])
class TestAIAckHumanOnlyMigrationSchema(unittest.TestCase):
"""RC 8322: migration and schema items are human-only regardless of
any future config that might accidentally mark them ai_ack_eligible.
These slugs are not yet in the live config items list; the tests use
synthetic items so the production guard can be exercised directly.
"""
def setUp(self):
# Synthetic items — if live config ever adds migration/schema,
# they MUST stay human-only. The probe below mirrors the actual
# production closure logic (human team first, then AI fallback
# with _HUMAN_ONLY_SLUGS guard).
self.items = {
"migration": {
"slug": "migration",
"ai_ack_eligible": True,
"required_teams": ["engineers"],
},
"schema": {
"slug": "schema",
"ai_ack_eligible": True,
"required_teams": ["engineers"],
},
}
self.aliases = {}
def _production_like_probe(self, human_users, ai_users):
"""Return a probe that mirrors the production closure's guard."""
def probe(slug, users):
item = self.items.get(slug, {})
approved = []
for u in users:
if u in human_users:
approved.append(u)
elif u in ai_users:
# Production guard: _HUMAN_ONLY_SLUGS rejects AI acks
# regardless of the ai_ack_eligible flag.
if slug in sop._HUMAN_ONLY_SLUGS:
continue
if item.get("ai_ack_eligible"):
approved.append(u)
return approved
return probe
def test_ai_ack_rejected_for_migration(self):
comments = [_comment("ai-bot", "/sop-ack migration")]
probe = self._production_like_probe(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["migration"]["ackers"], [])
self.assertIn("ai-bot", state["migration"]["rejected"]["not_in_team"])
def test_ai_ack_rejected_for_schema(self):
comments = [_comment("ai-bot", "/sop-ack schema")]
probe = self._production_like_probe(human_users=set(), ai_users={"ai-bot"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["schema"]["ackers"], [])
self.assertIn("ai-bot", state["schema"]["rejected"]["not_in_team"])
def test_human_ack_still_works_for_migration(self):
# Human team member acking migration/schema is unaffected.
comments = [_comment("bob", "/sop-ack migration")]
probe = self._production_like_probe(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["migration"]["ackers"], ["bob"])
def test_human_ack_still_works_for_schema(self):
comments = [_comment("bob", "/sop-ack schema")]
probe = self._production_like_probe(human_users={"bob"}, ai_users=set())
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe
)
self.assertEqual(state["schema"]["ackers"], ["bob"])
class TestGetCIStatus(unittest.TestCase):
"""Verify get_ci_status reads the correct context from commit statuses."""
def _client_with_statuses(self, statuses):
client = sop.GiteaClient("git.example.com", "tok")
def fake_req(method, path, body=None, ok_codes=(200, 201, 204)):
return 200, statuses
client._req = fake_req # type: ignore[method-assign]
return client
def test_ci_green_returns_success(self):
client = self._client_with_statuses([
{"context": "CI / all-required (pull_request)", "state": "success"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "success"
)
def test_ci_red_returns_failure(self):
client = self._client_with_statuses([
{"context": "CI / all-required (pull_request)", "state": "failure"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "failure"
)
def test_missing_context_returns_missing(self):
client = self._client_with_statuses([
{"context": "some-other-context", "state": "success"},
])
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "missing"
)
def test_api_error_returns_unknown(self):
client = sop.GiteaClient("git.example.com", "tok")
def fake_req(method, path, body=None, ok_codes=(200, 201, 204)):
return 500, {"error": "boom"}
client._req = fake_req # type: ignore[method-assign]
self.assertEqual(
sop.get_ci_status(client, "o", "r", "sha1"), "unknown"
)
+1 -29
View File
@@ -32,26 +32,6 @@
# AUTHOR SELF-ACK IS FORBIDDEN regardless of which team contains them
# — the gate script enforces commenter != PR author before checking
# team membership.
#
# AI-SOP-ACK TEAM (internal#760 ceremony design, CTO-approved):
# The `ai-sop-ack` team contains AI agent identities that can ack
# SOP-checklist items ON BEHALF OF automated evidence. An AI ack is
# only valid when:
# 1. the item has `ai_ack_eligible: true`
# 2. the item is NOT in the human-only carve-out (migration/schema)
# 3. for testing-class items, CI / all-required (pull_request) is
# green on the current head SHA
#
# AI acks NEVER count toward qa-review or security-review gates —
# those remain human-team-only (enforced by review-check.sh team
# probe against TEAM_ID 20/21).
#
# INITIAL ai_ack_eligible allowlist (CTO-controlled, msg 1388c76f):
# comprehensive-testing, local-postgres-e2e, staging-smoke,
# five-axis-review, memory-consulted
# HUMAN-ONLY carve-out:
# root-cause, no-backwards-compat
# Any widening requires an explicit config change reviewed by CTO.
version: 1
@@ -103,31 +83,25 @@ items:
numeric_alias: 1
pr_section_marker: "Comprehensive testing performed"
required_teams: [qa, engineers]
ai_ack_eligible: true
description: >-
What was tested, how, edge cases covered. Ack from any qa-team
member (or engineers fallback while qa is small). AI ack valid
only when CI / all-required (pull_request) is green.
member (or engineers fallback while qa is small).
- slug: local-postgres-e2e
numeric_alias: 2
pr_section_marker: "Local-postgres E2E run"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Link to local CI artifact, or "N/A: pure-frontend change". Ack
from any engineer who can verify the local DB test actually ran.
AI ack valid only when CI / all-required (pull_request) is green.
- slug: staging-smoke
numeric_alias: 3
pr_section_marker: "Staging-smoke verified or pending"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Link to canary run, or "scheduled post-merge". Ack from any
engineer (core-devops/infra-sre are members of engineers team).
AI ack valid only when CI / all-required (pull_request) is green.
- slug: root-cause
numeric_alias: 4
@@ -146,7 +120,6 @@ items:
numeric_alias: 5
pr_section_marker: "Five-Axis review walked"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
Correctness / readability / architecture / security / performance.
Ack from any non-author engineer.
@@ -167,7 +140,6 @@ items:
numeric_alias: 7
pr_section_marker: "Memory/saved-feedback consulted"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
List of feedback memories applicable to this change. Ack from
any engineer who has the same memory access.
+5 -18
View File
@@ -47,25 +47,12 @@ jobs:
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
# Required-status-check contexts to evaluate at merge time.
# Branch-aware JSON dict: keys are protected branch names,
# values are arrays of context names that branch protection
# requires for that branch. Mirror this against branch
# protection (settings → branches → protected branch →
# required checks) for each branch listed here.
#
# Newline-separated. Mirror this against branch protection
# (settings → branches → protected branch → required checks).
# Declared here rather than fetched from /branch_protections
# because that endpoint requires admin write — sop-tier-bot is
# read-only by design (least-privilege).
REQUIRED_CHECKS_JSON: |
{
"main": [
"CI / all-required (pull_request)",
"E2E API Smoke Test / E2E API Smoke Test (pull_request)",
"Handlers Postgres Integration / Handlers Postgres Integration (pull_request)"
],
"staging": [
"CI / all-required (pull_request)",
"sop-checklist / all-items-acked (pull_request)"
]
}
REQUIRED_CHECKS: |
CI / all-required (pull_request)
sop-checklist / all-items-acked (pull_request)
run: bash .gitea/scripts/audit-force-merge.sh
+1 -1
View File
@@ -37,7 +37,7 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -42,9 +42,11 @@ jobs:
check:
name: Migration version collision check
runs-on: ubuntu-latest
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port.
# mc#1982 mask removed — no surfaced defects in this lane.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 5
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+1 -2
View File
@@ -96,13 +96,12 @@ env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: advisory arm64 pilot, non-gating by design (internal#418).
fast-checks:
name: fast-checks
# AND-set: only the Mac arm64 runner advertises macos-self-hosted.
# See "RUNNER TARGETING" header note for why bare self-hosted is unsafe.
runs-on: [self-hosted, macos-self-hosted]
# ADVISORY: never blocks. See safety contract point 3. mc#1982
# ADVISORY: never blocks. See safety contract point 3. mc#774
# internal#418 — tracked: arm64 advisory pilot, non-gating by design.
continue-on-error: true
# event_name gate: functional (only meaningful on push/PR) AND keeps
+129 -107
View File
@@ -106,7 +106,7 @@ jobs:
name: Platform (Go)
needs: changes
runs-on: ubuntu-latest
# mc#1982 (closed 2026-05-14): Phase 4 flip of the platform-build job.
# mc#774 (closed 2026-05-14): Phase 4 flip of the platform-build job.
# Phase 4 (#656) originally flipped this to continue-on-error: false based on
# Phase-3-masked "green on main 2026-05-12". Two failure classes then surfaced:
# (1) 4x delegation_test.go sqlmock gaps (PR #669 / #634 fix-forward, closed).
@@ -161,23 +161,15 @@ jobs:
echo "::group::pendinguploads exit=$pu_exit (last 100 lines)"
tail -100 /tmp/test-pu.log
echo "::endgroup::"
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- if: ${{ needs.changes.outputs.platform == 'true' }}
name: Run tests with coverage (blocking gate)
# Removed -race from the blocking gate per #1184: cold runners
# take 13-25 min to compile with race instrumentation, exceeding
# the 10m step timeout and causing false failures. Race detection
# now runs as a non-blocking advisory step below.
run: go test -timeout 10m -coverprofile=coverage.out ./...
- if: ${{ needs.changes.outputs.platform == 'true' }}
name: Race detection (advisory, non-blocking)
# mc#1184: runs race detector as an advisory check so cold-runner
# compile-time spikes don't block merges. Failures here surface in
# the run log but do not fail the build.
run: go test -race -timeout 10m ./...
continue-on-error: true
name: Run tests with race detection and coverage
# Explicit timeout: cold runner cache causes OOM kills at ~4m39s on the
# full ./... suite with race detection + coverage. A 10m per-step timeout
# lets the suite complete on cold cache (~5-7m) while failing cleanly
# instead of OOM-killing. The job-level timeout (15m) is a backstop.
run: go test -race -timeout 10m -coverprofile=coverage.out ./...
- if: ${{ needs.changes.outputs.platform == 'true' }}
name: Per-file coverage report
@@ -357,14 +349,6 @@ jobs:
name: Run E2E bash unit tests (no live infra)
run: |
bash tests/e2e/test_model_slug.sh
# molecule-core#1995 (#1994 follow-on): fail-direction proof for
# the A2A real-completion + byok-routing assertion helpers
# (lib/completion_assert.sh). Offline (no LLM, no network): it
# asserts an error-as-text payload FAILS the real-completion gate
# — the exact trap the historical shape-only `"kind":"text"`
# check missed. If a refactor weakens the gate to a shape check,
# this step goes red on every PR.
bash tests/e2e/test_completion_assert_unit.sh
- if: ${{ needs.changes.outputs.scripts == 'true' }}
name: Test ECR promote-tenant-image script (mock-driven, no live infra)
@@ -392,7 +376,7 @@ jobs:
canvas-deploy-reminder:
name: Canvas Deploy Reminder
runs-on: docker-host
# mc#1982 root-fix: added job-level `if:` so ci-required-drift.py's
# mc#774 root-fix: added job-level `if:` so ci-required-drift.py's
# ci_job_names() detects this as github.ref-gated and skips it from F1.
# The step-level exit 0 handles the "not main push" case; the job-level
# `if:` makes the gating explicit so the drift script sees it.
@@ -475,10 +459,10 @@ jobs:
#
# Emits `CI / all-required (<event>)` where <event> is the workflow trigger
# (e.g. `CI / all-required (pull_request)`, `CI / all-required (push)`).
# Branch protection requires the event-suffixed name —
# Branch protection MUST be updated to require the event-suffixed name —
# requiring `CI / all-required` (bare, no suffix) silently blocks all merges
# because Gitea treats absent status contexts as pending (not skipped), and
# no workflow emits the bare name. BP requires
# no workflow emits the bare name. Fixed: BP now requires
# `CI / all-required (pull_request)` per issue #1473.
#
# Closes the failure mode where status_check_contexts on molecule-core/main
@@ -487,91 +471,129 @@ jobs:
# red silently merged through. See internal#286 for the three concrete
# tonight-of-2026-05-11 incidents that prompted the emergency bump.
#
# ── 2026-06-01 CI-scheduler-overload fix (fix/ci-scheduler-fanout) ──
# PREVIOUS shape: a poll-gate that ran detect-changes then LOOPED on
# `GET /commits/{sha}/statuses` every 15s for up to 40 min, occupying a
# `ci-meta` executor slot the entire time it waited for upstream jobs.
# With only 2 ci-meta runners, that poll-loop squatted half the lane on
# every PR — a confirmed throughput sink in the live RCA (two concurrent
# `JOB-all-required` containers observed pinning the lane). The polling
# design existed only to dodge the Gitea `needs:` + `if: always()` bug,
# where an always()-guarded sentinel could be marked skipped before
# upstream jobs settled (leaving BP pending forever).
# This job deliberately has no `needs:`. Gitea 1.22/act_runner can mark a
# job-level `if: always()` + `needs:` sentinel as skipped before upstream
# jobs settle, leaving branch protection with a permanent pending
# `CI / all-required` context. Instead, this independent sentinel polls the
# required commit-status contexts for this SHA and fails if any fail, skip,
# or never emit. It runs the same path detector as `changes` and only waits
# for path-relevant jobs; Gitea can otherwise leave needs/output-skipped
# jobs permanently pending with "Blocked by required conditions". It runs on
# the dedicated `ci-meta` lane so the poller does not occupy the same
# general runner pool as the jobs it is waiting for.
#
# NEW shape: a plain `needs:` aggregator with NO polling loop. This is
# safe here — and was NOT safe at the time the poller was written —
# because every aggregated CI job now gates its real work PER-STEP
# (`if: needs.changes.outputs.* != 'true'`) rather than at the JOB level.
# A per-step-gated job always reaches a terminal SUCCESS (it no-ops its
# expensive steps but the job itself still completes), so it is never
# `skipped`. Plain `needs:` (WITHOUT `if: always()`) works correctly on
# Gitea 1.22.6 / act_runner v0.6.1 — only `needs:` + `if: always()` is
# broken (feedback_gitea_needs_works_only_ifalways_broken). We therefore
# use plain `needs:` + an explicit per-need result check (NOT
# `if: always()`); if any need fails/errors, Gitea never starts this job
# and BP sees `CI / all-required` go red via the failed dependency
# propagation — exactly the gate we want, with zero runner-squat.
# canvas-deploy-reminder is intentionally NOT included in all-required.needs.
# It is an informational main-push reminder, not a PR quality gate. Keeping
# it in this dependency list lets a skipped reminder skip the required
# sentinel before the `always()` guard can emit a branch-protection status.
#
# The `needs:` list MUST stay in lockstep with ci-required-drift.py's
# F1 check (`ci_job_names()` = every job MINUS the sentinel MINUS jobs
# whose `if:` gates on github.event_name/github.ref). canvas-deploy-
# reminder is event-gated (`if: github.ref == refs/heads/{main,staging}`)
# so it is intentionally EXCLUDED — it skips on PRs and a `needs:` on a
# skipped job would never let the sentinel run. If a new always-running
# CI job is added, add it here too or ci-required-drift F1 will flag it.
#
# Stays on the dedicated `ci-meta` lane (no docker work, so the
# docker-host-pin lint does not apply), but now the job is sub-second:
# it only inspects already-settled `needs.*.result` values, so it frees
# the slot immediately instead of holding it for the whole CI duration.
#
needs:
- changes
- platform-build
- canvas-build
- shellcheck
- python-lint
continue-on-error: false
runs-on: ci-meta
timeout-minutes: 5
timeout-minutes: 45
steps:
- name: Verify all aggregated CI jobs succeeded
# NO polling, NO API call, NO checkout. Because this job lists the
# aggregated jobs under `needs:` (without `if: always()`), Gitea only
# starts it once every need has reached SUCCESS — a failed/errored
# need short-circuits the job and propagates red to the
# `CI / all-required` context. This explicit check is a
# belt-and-suspenders assertion + a readable run summary; the real
# gating is the `needs:` edge itself.
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- id: check
env:
CHANGES_RESULT: ${{ needs.changes.result }}
PLATFORM_RESULT: ${{ needs.platform-build.result }}
CANVAS_RESULT: ${{ needs.canvas-build.result }}
SHELLCHECK_RESULT: ${{ needs.shellcheck.result }}
PYTHON_LINT_RESULT: ${{ needs.python-lint.result }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_BASE_REF: ${{ github.event.pull_request.base.ref }}
PUSH_BEFORE: ${{ github.event.before }}
run: |
python3 .gitea/scripts/detect-changes.py \
--profile ci \
--event-name "${{ github.event_name }}" \
--pr-base-sha "$PR_BASE_SHA" \
--base-ref "$PR_BASE_REF" \
--push-before "${GITHUB_EVENT_BEFORE:-$PUSH_BEFORE}"
- name: Wait for required CI contexts
env:
GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}
API_ROOT: ${{ github.server_url }}/api/v1
REPOSITORY: ${{ github.repository }}
COMMIT_SHA: ${{ github.sha }}
EVENT_NAME: ${{ github.event_name }}
REQUIRE_PLATFORM: ${{ steps.check.outputs.platform }}
REQUIRE_CANVAS: ${{ steps.check.outputs.canvas }}
REQUIRE_SCRIPTS: ${{ steps.check.outputs.scripts }}
run: |
set -euo pipefail
fail=0
check() {
name="$1"; result="$2"
printf 'CI / %s = %s\n' "$name" "$result"
# `success` is the only green terminal state we accept. A plain
# `needs:` job is only started when all needs succeed, so reaching
# this step already implies success — but assert explicitly so a
# future `if: always()` reintroduction (which WOULD let non-success
# through) fails loudly instead of silently passing the gate.
if [ "$result" != "success" ]; then
echo "::error::aggregated CI job '${name}' did not succeed (result=${result})"
fail=1
fi
}
check "Detect changes" "$CHANGES_RESULT"
check "Platform (Go)" "$PLATFORM_RESULT"
check "Canvas (Next.js)" "$CANVAS_RESULT"
check "Shellcheck (E2E scripts)" "$SHELLCHECK_RESULT"
check "Python Lint & Test" "$PYTHON_LINT_RESULT"
if [ "$fail" -ne 0 ]; then
echo "::error::all-required: one or more aggregated CI jobs did not succeed"
exit 1
fi
echo "OK: all aggregated CI jobs succeeded — CI / all-required green."
python3 - <<'PY'
import json
import os
import sys
import time
import urllib.error
import urllib.request
token = os.environ["GITEA_TOKEN"]
api_root = os.environ["API_ROOT"].rstrip("/")
repo = os.environ["REPOSITORY"]
sha = os.environ["COMMIT_SHA"]
event = os.environ["EVENT_NAME"]
required = [
f"CI / Detect changes ({event})",
f"CI / Python Lint & Test ({event})",
]
if os.environ.get("REQUIRE_PLATFORM") == "true":
required.append(f"CI / Platform (Go) ({event})")
if os.environ.get("REQUIRE_CANVAS") == "true":
required.append(f"CI / Canvas (Next.js) ({event})")
if os.environ.get("REQUIRE_SCRIPTS") == "true":
required.append(f"CI / Shellcheck (E2E scripts) ({event})")
terminal_bad = {"failure", "error"}
deadline = time.time() + 40 * 60
last_summary = None
def fetch_statuses():
statuses = []
for page in range(1, 6):
url = f"{api_root}/repos/{repo}/commits/{sha}/statuses?page={page}&limit=100"
req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
with urllib.request.urlopen(req, timeout=10) as resp:
chunk = json.load(resp)
if not chunk:
break
statuses.extend(chunk)
latest = {}
for item in statuses:
ctx = item.get("context")
if not ctx:
continue
prev = latest.get(ctx)
if prev is None or (item.get("updated_at") or item.get("created_at") or "") >= (prev.get("updated_at") or prev.get("created_at") or ""):
latest[ctx] = item
return latest
while True:
try:
latest = fetch_statuses()
except (TimeoutError, OSError, urllib.error.URLError) as exc:
if time.time() >= deadline:
print(f"FAIL: status polling did not recover before deadline: {exc}", file=sys.stderr)
sys.exit(1)
print(f"WARN: status poll failed, retrying: {exc}", flush=True)
time.sleep(15)
continue
states = {ctx: (latest.get(ctx) or {}).get("status") or (latest.get(ctx) or {}).get("state") or "missing" for ctx in required}
summary = ", ".join(f"{ctx}={state}" for ctx, state in states.items())
if summary != last_summary:
print(summary, flush=True)
last_summary = summary
bad = {ctx: state for ctx, state in states.items() if state in terminal_bad}
if bad:
print("FAIL: required CI context failed:", file=sys.stderr)
for ctx, state in bad.items():
desc = (latest.get(ctx) or {}).get("description") or ""
print(f" - {ctx}: {state} {desc}", file=sys.stderr)
sys.exit(1)
if all(state == "success" for state in states.values()):
print(f"OK: all {len(required)} required CI contexts succeeded")
sys.exit(0)
if time.time() >= deadline:
print("FAIL: timed out waiting for required CI contexts:", file=sys.stderr)
for ctx, state in states.items():
print(f" - {ctx}: {state}", file=sys.stderr)
sys.exit(1)
time.sleep(15)
PY
+1 -9
View File
@@ -102,7 +102,7 @@ jobs:
name: Synthetic E2E against staging
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# Bumped from 12 → 20 (2026-05-04). Tenant user-data install phase
# (apt-get update + install docker.io/jq/awscli/caddy + snap install
@@ -166,10 +166,6 @@ jobs:
# canary path. The script picks the right blob shape based on
# which key is non-empty.
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
# google-adk canary path — AI-Studio key (config model
# google_genai:gemini-2.5-pro). PROD disallows API keys (Vertex+ADC);
# the keyed path is CI-only. Dispatch with E2E_RUNTIME=google-adk.
E2E_GOOGLE_API_KEY: ${{ secrets.MOLECULE_STAGING_GOOGLE_API_KEY }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -221,10 +217,6 @@ jobs:
required_secret_name="MOLECULE_STAGING_OPENAI_API_KEY"
required_secret_value="${E2E_OPENAI_API_KEY:-}"
;;
google-adk)
required_secret_name="MOLECULE_STAGING_GOOGLE_API_KEY"
required_secret_value="${E2E_GOOGLE_API_KEY:-}"
;;
*)
echo "::warning::Unknown E2E_RUNTIME='${E2E_RUNTIME}' — skipping LLM-key check"
required_secret_name=""
+9 -35
View File
@@ -123,9 +123,8 @@ jobs:
# integration). See internal#512 for the class defect.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
api: ${{ steps.decide.outputs.api }}
steps:
@@ -161,9 +160,8 @@ jobs:
# detect-changes for the full rationale.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 15
env:
# Unique per-run container names so concurrent runs on the host-
@@ -330,40 +328,16 @@ jobs:
- name: Wait for /health
if: needs.detect-changes.outputs.api == 'true'
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction (it WILL be exceeded as migrations accrue).
# Use a generous wall-clock budget that comfortably exceeds
# cold-start + full-migration time, polling fast. This is robust to a
# growing chain WITHOUT masking a genuinely dead platform: if the
# background platform-server process has exited (e.g. a broken
# migration crashed it), we stop and fail loudly at once instead of
# waiting out the whole budget.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
for i in $(seq 1 30); do
if curl -sf "$BASE/health" > /dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
echo "Platform up after ${i}s"
exit 0
fi
# Fast-fail: if the platform process died, /health will never come.
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true
exit 1
- name: Assert migrations applied
if: needs.detect-changes.outputs.api == 'true'
run: |
+15 -57
View File
@@ -48,7 +48,7 @@ jobs:
# defect.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
chat: ${{ steps.decide.outputs.chat }}
@@ -112,7 +112,7 @@ jobs:
# Must land on operator-host Linux (docker-host).
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 15
env:
@@ -242,36 +242,16 @@ jobs:
- name: Wait for /health
if: needs.detect-changes.outputs.chat == 'true'
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
for i in $(seq 1 30); do
if curl -sf "http://127.0.0.1:${PLATFORM_PORT}/health" > /dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
echo "Platform up after ${i}s"
exit 0
fi
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true
exit 1
- name: Install canvas dependencies
if: needs.detect-changes.outputs.chat == 'true'
@@ -298,38 +278,16 @@ jobs:
export NEXT_PUBLIC_WS_URL="ws://127.0.0.1:${PLATFORM_PORT}/ws"
npx next dev --turbopack -p "${CANVAS_PORT}" > canvas.log 2>&1 &
echo $! > canvas.pid
# Readiness must wait for the actual chat route to *compile*, not
# just for the dev server to bind the port. `next dev --turbopack`
# accepts the TCP connection well before it has compiled a route
# on first request, so a bare `curl /` can 200 (or hang) while the
# page the tests load is still building. We therefore probe the
# real route the specs navigate to (`/?m=chat`) and require a 2xx,
# which only happens once Turbopack has finished the first
# compile. The previous 30s budget was also too tight for a cold
# Turbopack first-compile on a loaded operator-host runner — the
# `Canvas did not start in 30s` flake. Raise to 120s (job
# timeout-minutes is 15, so this is comfortably bounded) and probe
# every 2s.
READY=""
for i in $(seq 1 60); do
# Tempfile-routed -w + set +e/-e prevents curl-exit-code
# pollution of the captured status (lint-curl-status-capture.yml).
set +e
curl -s -o /dev/null -w '%{http_code}' "http://localhost:${CANVAS_PORT}/?m=chat" > /tmp/canvas-ready.code
set -e
CODE=$(cat /tmp/canvas-ready.code 2>/dev/null || echo "000")
if [ "$CODE" -ge 200 ] && [ "$CODE" -lt 400 ]; then
echo "Canvas (chat route compiled) up after ~$((i*2))s (HTTP ${CODE})"
READY=1
break
for i in $(seq 1 30); do
if curl -sf "http://localhost:${CANVAS_PORT}" > /dev/null 2>&1; then
echo "Canvas up after ${i}s"
exit 0
fi
sleep 2
sleep 1
done
if [ -z "$READY" ]; then
echo "::error::Canvas chat route did not compile in 120s (last HTTP ${CODE})"
cat canvas.log || true
exit 1
fi
echo "::error::Canvas did not start in 30s"
cat canvas.log || true
exit 1
- name: Run Playwright E2E tests
if: needs.detect-changes.outputs.chat == 'true'
-266
View File
@@ -1,266 +0,0 @@
name: E2E Legacy Advisory
# Advisory lane for older/manual E2E scripts that are too broad or
# environment-dependent for required PR CI. This intentionally does not run on
# pull_request or push so it cannot block merges/deploys; scheduled/manual reds
# still surface drift in scripts that would otherwise only be shellchecked.
#
# Gitea 1.22.6 rejects workflow_dispatch.inputs, so keep dispatch input-free.
on:
schedule:
# Stagger after the staging smoke/canvas morning lanes.
- cron: '15 9 * * *'
workflow_dispatch:
concurrency:
group: e2e-legacy-advisory
cancel-in-progress: false
permissions:
contents: read
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
legacy-local-platform:
name: Legacy local-platform E2E
runs-on: docker-host
timeout-minutes: 45
env:
PG_CONTAINER: pg-e2e-legacy-${{ github.run_id }}-${{ github.run_attempt }}
REDIS_CONTAINER: redis-e2e-legacy-${{ github.run_id }}-${{ github.run_attempt }}
MOLECULE_ENV: development
BIND_ADDR: 127.0.0.1
MOLECULE_IN_DOCKER: "false"
A2A_TIMEOUT: "30"
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Prepare local platform dependencies
run: |
set -euo pipefail
docker pull postgres:16 >/dev/null
docker pull redis:7 >/dev/null
docker pull alpine:latest >/dev/null
docker network create molecule-core-net >/dev/null 2>&1 || true
- name: Start Postgres
run: |
set -euo pipefail
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker run -d --name "$PG_CONTAINER" \
-e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule \
-p 0:5432 postgres:16 >/dev/null
PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
if [ -z "$PG_PORT" ]; then
PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | head -1 | awk -F: '{print $NF}')
fi
if [ -z "$PG_PORT" ]; then
echo "::error::Could not resolve host port for $PG_CONTAINER"
docker port "$PG_CONTAINER" 5432/tcp || true
docker logs "$PG_CONTAINER" || true
exit 1
fi
echo "DATABASE_URL=postgres://dev:dev@127.0.0.1:${PG_PORT}/molecule?sslmode=disable" >> "$GITHUB_ENV"
for i in $(seq 1 30); do
docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1 && exit 0
sleep 1
done
docker logs "$PG_CONTAINER" || true
exit 1
- name: Start Redis
run: |
set -euo pipefail
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
docker run -d --name "$REDIS_CONTAINER" -p 0:6379 redis:7 >/dev/null
REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
if [ -z "$REDIS_PORT" ]; then
REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | head -1 | awk -F: '{print $NF}')
fi
if [ -z "$REDIS_PORT" ]; then
echo "::error::Could not resolve host port for $REDIS_CONTAINER"
docker port "$REDIS_CONTAINER" 6379/tcp || true
docker logs "$REDIS_CONTAINER" || true
exit 1
fi
echo "REDIS_URL=redis://127.0.0.1:${REDIS_PORT}" >> "$GITHUB_ENV"
for i in $(seq 1 15); do
docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG && exit 0
sleep 1
done
docker logs "$REDIS_CONTAINER" || true
exit 1
- name: Pick platform port
run: |
set -euo pipefail
PLATFORM_PORT=$(python3 - <<'PY'
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
print(s.getsockname()[1])
PY
)
echo "PORT=${PLATFORM_PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PLATFORM_PORT}" >> "$GITHUB_ENV"
- name: Build platform
working-directory: workspace-server
run: go build -o platform-server ./cmd/server
- name: Populate template manifests for dev-mode E2E
run: |
set -euo pipefail
if command -v jq >/dev/null 2>&1; then
bash scripts/clone-manifest.sh manifest.json workspace-configs-templates org-templates plugins
else
echo "::warning::jq unavailable; dev-mode template assertion may fail if templates are absent"
fi
- name: Start platform
run: |
set -euo pipefail
./workspace-server/platform-server > workspace-server/platform.log 2>&1 &
PLATFORM_PID=$!
echo "$PLATFORM_PID" > workspace-server/platform.pid
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
start=$(date +%s)
while :; do
if curl -sf "$BASE/health" >/dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
exit 0
fi
if ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
sleep 1
done
- name: Run comprehensive E2E
run: bash tests/e2e/test_comprehensive_e2e.sh
- name: Run workspace abilities E2E
run: bash tests/e2e/test_workspace_abilities_e2e.sh
- name: Run dev-mode E2E
run: bash tests/e2e/test_dev_mode.sh
- name: Start stub A2A agents
run: |
set -euo pipefail
cat > /tmp/molecule-stub-a2a.py <<'PY'
import json
from http.server import BaseHTTPRequestHandler, HTTPServer
class Handler(BaseHTTPRequestHandler):
def do_POST(self):
length = int(self.headers.get("content-length", "0"))
raw = self.rfile.read(length) if length else b"{}"
try:
req = json.loads(raw)
except Exception:
req = {}
method = req.get("method")
if method not in ("message/send", None):
body = {"jsonrpc": "2.0", "id": req.get("id"), "error": {"code": -32601, "message": "method not found"}}
else:
body = {
"jsonrpc": "2.0",
"id": req.get("id", "stub"),
"result": {
"role": "agent",
"parts": [{"kind": "text", "type": "text", "text": "stub agent response"}],
},
}
data = json.dumps(body, separators=(",", ":")).encode()
self.send_response(200)
self.send_header("content-type", "application/json")
self.send_header("content-length", str(len(data)))
self.end_headers()
self.wfile.write(data)
def log_message(self, *_):
return
HTTPServer(("127.0.0.1", 18080), Handler).serve_forever()
PY
python3 /tmp/molecule-stub-a2a.py > /tmp/molecule-stub-a2a.log 2>&1 &
echo $! > /tmp/molecule-stub-a2a.pid
- name: Seed external agents for legacy A2A/activity scripts
run: |
set -euo pipefail
create_agent() {
local name="$1" role="$2"
curl -sS -X POST "$BASE/workspaces" \
-H "Content-Type: application/json" \
-d "{\"name\":\"${name}\",\"role\":\"${role}\",\"tier\":1,\"runtime\":\"external\",\"external\":true,\"url\":\"http://127.0.0.1:18080\"}" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['id'])"
}
ECHO_ID=$(create_agent "Echo Agent" "Echo")
SEO_ID=$(create_agent "SEO Agent" "SEO")
curl -sS -X POST "$BASE/registry/register" -H "Content-Type: application/json" \
-d "{\"id\":\"$ECHO_ID\",\"url\":\"http://127.0.0.1:18080\",\"agent_card\":{\"name\":\"Echo Agent\",\"skills\":[{\"id\":\"echo\",\"name\":\"Echo\"}]}}" >/dev/null
curl -sS -X POST "$BASE/registry/register" -H "Content-Type: application/json" \
-d "{\"id\":\"$SEO_ID\",\"url\":\"http://127.0.0.1:18080\",\"agent_card\":{\"name\":\"SEO Agent\",\"skills\":[{\"id\":\"seo\",\"name\":\"SEO\"}]}}" >/dev/null
- name: Run activity E2E
run: bash tests/e2e/test_activity_e2e.sh
- name: Run A2A E2E
run: bash tests/e2e/test_a2a_e2e.sh
- name: Runtime-dependent legacy E2E preflight
run: |
set -euo pipefail
if [ -f workspace-configs-templates/claude-code-default/.auth-token ] && docker image inspect workspace:latest >/dev/null 2>&1; then
bash tests/e2e/test_claude_code_e2e.sh
bash tests/e2e/test_chat_upload_e2e.sh
else
echo "::notice::Skipping test_claude_code_e2e.sh and test_chat_upload_e2e.sh: require workspace:latest plus workspace-configs-templates/claude-code-default/.auth-token"
fi
- name: Dump platform log on failure
if: failure()
run: cat workspace-server/platform.log || true
- name: Stop platform and stub agents
if: always()
run: |
if [ -f workspace-server/platform.pid ]; then
kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true
fi
if [ -f /tmp/molecule-stub-a2a.pid ]; then
kill "$(cat /tmp/molecule-stub-a2a.pid)" 2>/dev/null || true
fi
- name: Stop service containers
if: always()
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
+4 -29
View File
@@ -126,7 +126,6 @@ jobs:
# push/dispatch/cron only (30+ min). This is NOT a fake-green mask of
# the real assertion — it validates the driving script's bash syntax
# and inline-python so a broken test script fails at PR time.
# bp-required: pending #1296 — PR emitter, not yet required (tracked in #1296).
pr-validate:
name: E2E Peer Visibility
runs-on: ubuntu-latest
@@ -268,36 +267,12 @@ jobs:
echo $! > platform.pid
- name: Wait for /health
run: |
# Readiness signal: the platform binds /health only AFTER the full
# migration chain has been applied on cold start (it prints
# "Platform starting on :PORT" at that point). So a 200 from /health
# is the real "migrations done + server listening" signal.
#
# The migration chain grows every release, so a fixed ~30s budget is
# brittle by construction. Use a generous wall-clock budget that
# comfortably exceeds cold-start + full-migration time, polling fast.
# Robust to a growing chain WITHOUT masking a dead platform: if the
# background platform-server process has exited, fail loudly at once.
DEADLINE_SECS=180 # cold-start + full migration chain headroom
PLATFORM_PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"
start=$(date +%s)
while :; do
if curl -sf "$BASE/health" > /dev/null; then
echo "Platform healthy after $(( $(date +%s) - start ))s"
exit 0
fi
if [ -n "$PLATFORM_PID" ] && ! kill -0 "$PLATFORM_PID" 2>/dev/null; then
echo "::error::platform-server (pid ${PLATFORM_PID}) exited before /health became reachable — see log below"
cat workspace-server/platform.log || true
exit 1
fi
if [ "$(( $(date +%s) - start ))" -ge "$DEADLINE_SECS" ]; then
echo "::error::Platform did not become healthy within ${DEADLINE_SECS}s — see log below"
cat workspace-server/platform.log || true
exit 1
fi
for i in $(seq 1 30); do
curl -sf "$BASE/health" > /dev/null && { echo "Platform up after ${i}s"; exit 0; }
sleep 1
done
echo "::error::Platform did not become healthy in 30s"
cat workspace-server/platform.log || true; exit 1
- name: Run LOCAL fresh-provision peer-visibility E2E (literal MCP list_peers)
# HONEST gate — NO continue-on-error. The local backend uses
# external-mode workspaces so this context tests the literal MCP
+2 -2
View File
@@ -71,7 +71,7 @@ jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
canvas: ${{ steps.decide.outputs.canvas }}
@@ -140,7 +140,7 @@ jobs:
name: Canvas tabs E2E
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 40
+1 -1
View File
@@ -84,7 +84,7 @@ jobs:
name: E2E Staging External Runtime
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
+5 -165
View File
@@ -48,10 +48,7 @@ on:
- 'workspace-server/internal/handlers/a2a_proxy.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/provisioner/**'
- 'workspace-server/internal/providers/providers.yaml'
- 'tests/e2e/test_staging_full_saas.sh'
- 'tests/e2e/lib/completion_assert.sh'
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- '.gitea/workflows/e2e-staging-saas.yml'
@@ -63,10 +60,7 @@ on:
- 'workspace-server/internal/handlers/a2a_proxy.go'
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/provisioner/**'
- 'workspace-server/internal/providers/providers.yaml'
- 'tests/e2e/test_staging_full_saas.sh'
- 'tests/e2e/lib/completion_assert.sh'
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- '.gitea/workflows/e2e-staging-saas.yml'
@@ -98,20 +92,20 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
- name: YAML validation (best-effort)
run: |
echo "e2e-staging-saas.yml — PR validation: workflow YAML is valid."
echo "E2E step runs only when provisioning-critical files change."
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# Actual E2E: runs on trunk pushes and PRs that touch provisioning-critical
@@ -122,7 +116,7 @@ jobs:
name: E2E Staging SaaS
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 45
permissions:
@@ -161,18 +155,13 @@ jobs:
# E2E_RUNTIME=hermes or =codex via workflow_dispatch can still
# exercise the OpenAI path.
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
# google-adk (operator-dispatched only) auths Gemini with an
# AI-Studio key. Org policy disallows API keys in PROD (Vertex+ADC
# there); CI uses the keyed AI-Studio path with config model
# google_genai:gemini-2.5-pro. Vertex remains the supported prod path.
E2E_GOOGLE_API_KEY: ${{ secrets.MOLECULE_STAGING_GOOGLE_API_KEY }}
E2E_RUNTIME: ${{ github.event.inputs.runtime || 'claude-code' }}
# Pin the model when running on the default claude-code path —
# the per-runtime default ("sonnet") routes to direct Anthropic
# and defeats the cost saving. Operators can override via the
# workflow_dispatch flow (no input wired here yet — runtime
# override is enough for ad-hoc).
E2E_MODEL_SLUG: ${{ github.event.inputs.runtime == 'hermes' && 'openai/gpt-4o' || github.event.inputs.runtime == 'codex' && 'openai/gpt-4o' || github.event.inputs.runtime == 'google-adk' && 'google_genai:gemini-2.5-pro' || 'MiniMax-M2' }}
E2E_MODEL_SLUG: ${{ github.event.inputs.runtime == 'hermes' && 'openai/gpt-4o' || github.event.inputs.runtime == 'codex' && 'openai/gpt-4o' || 'MiniMax-M2' }}
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
@@ -221,10 +210,6 @@ jobs:
required_secret_name="MOLECULE_STAGING_OPENAI_API_KEY"
required_secret_value="${E2E_OPENAI_API_KEY:-}"
;;
google-adk)
required_secret_name="MOLECULE_STAGING_GOOGLE_API_KEY"
required_secret_value="${E2E_GOOGLE_API_KEY:-}"
;;
*)
echo "::warning::Unknown E2E_RUNTIME='${E2E_RUNTIME}' — skipping LLM-key check"
required_secret_name=""
@@ -319,148 +304,3 @@ jobs:
echo "::warning::saas teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
# ── PLATFORM-MANAGED BOOT REGRESSION (moonshot/kimi NOT_CONFIGURED) ──────────
#
# The REAL-boot complement to the deterministic unit suite
# (workspace_provision_platform_boot_test.go). Provisions a REAL staging
# claude-code workspace on the PLATFORM-managed path — provider=platform,
# model=moonshot/kimi-k2.6, NO tenant LLM key — and asserts it reaches
# status=online (NOT not_configured) and a completion returns 200, via the same
# online-wait + completion-assert the BYOK job uses.
#
# Why a SEPARATE job (not a matrix leg of e2e-staging-saas): the platform path
# injects NO secret and pins a different model, so its env block diverges from
# the BYOK job's. A dedicated job keeps each path's "verify key present" preflight
# honest (BYOK requires a key; platform requires its ABSENCE not to matter) and
# gives the regression its own named commit-status for branch protection.
#
# Add `E2E Staging Platform Boot` to branch protection after 3 consecutive
# green runs on main (de-flake window; this path shares the cp#245
# boot-timeout flake surface the BYOK job has, so it must prove stable before
# it can BLOCK — see the gate-making plan in the PR body).
# bp-required: pending #2187
e2e-staging-platform-boot:
name: E2E Staging Platform Boot
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface without blocking until the de-flake window
# closes. mc#1982: do NOT renew this mask silently — the gate-making plan
# tracks the flip to false under #2187.
continue-on-error: true
timeout-minutes: 45
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
# The regression combo: claude-code + platform-managed + moonshot/kimi-k2.6.
# NO E2E_*_API_KEY is set — platform-managed billing is owned by Molecule via
# the CP LLM proxy. The harness's E2E_LLM_PATH=platform branch sends empty
# secrets and pin-selects the platform model.
E2E_RUNTIME: claude-code
E2E_LLM_PATH: platform
# Smoke mode: a single parent workspace is enough to prove online +
# completion for the platform path (the A2A/delegation matrix is the BYOK
# job's job). Override E2E_DEFAULT_PLATFORM_MODEL via workflow_dispatch to
# exercise another platform model id.
E2E_MODE: smoke
E2E_RUN_ID: "platform-${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — EC2 leak verification cannot run"
exit 2
fi
done
echo "Admin token present ✓"
- name: Assert NO BYOK key leaks into the platform run
run: |
# The whole point of this job is the platform-managed path. A stray
# E2E_*_API_KEY in the runner env would (via the harness) still be
# skipped by the E2E_LLM_PATH=platform branch — but assert their
# absence loudly here so a future env edit can't silently convert this
# into a masked BYOK run that no longer exercises the regression.
for var in E2E_MINIMAX_API_KEY E2E_ANTHROPIC_API_KEY E2E_OPENAI_API_KEY; do
if [ -n "${!var:-}" ]; then
echo "::warning::$var is set in this platform-boot job's env — the harness ignores it on E2E_LLM_PATH=platform, but it should not be wired here."
fi
done
echo "Platform-managed path: no tenant LLM key required ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a workspace bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run platform-managed boot E2E (online + completion)
id: e2e
run: bash tests/e2e/test_staging_full_saas.sh
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
# smoke mode slugs are e2e-smoke-YYYYMMDD-platform-<run_id>-...
if run_id:
prefixes = tuple(f'e2e-smoke-{d}-platform-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-smoke-{d}-platform-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/plat-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/plat-cleanup.code
set -e
code=$(cat /tmp/plat-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::platform-boot teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/plat-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::platform-boot teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
+1 -1
View File
@@ -37,7 +37,7 @@ jobs:
name: Intentional-failure teardown sanity
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 20
+2 -3
View File
@@ -7,11 +7,10 @@
# PR_NUMBER — set via ${{ github.event.pull_request.number }} from the trigger
# POST_COMMENT — "true" to post/update comment on PR
#
# Gating logic (MVP signals 1,2,3,4,6):
# Gating logic (MVP signals 1,2,3,6):
# 1. Author-aware agent-tag comment scan
# 2. REQUEST_CHANGES reviews state machine
# 3. Staleness detection (SOP-12: review.commit_id != PR.head_sha + >1 working day)
# 4. Branch divergence / scope-creep guard (base-sha vs target HEAD; mc#365)
# 6. CI required-checks awareness
#
# Exit code: 0=CLEAR, 1=BLOCKED, 2=ERROR
@@ -66,7 +65,7 @@ jobs:
# bp-exempt: PR advisory bot; merge blocking is enforced by CI status and branch protection.
gate-check:
runs-on: ubuntu-latest
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # Never block on our own detector failing
steps:
- name: Check out BASE ref (never PR-head under pull_request_target)
@@ -87,10 +87,9 @@ jobs:
# both jobs on the same label avoids workspace-volume cross-host
# surprises and keeps the routing rule discoverable in one place.
runs-on: docker-host
# mc#1982 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#774 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
handlers: ${{ steps.filter.outputs.handlers }}
steps:
@@ -119,10 +118,9 @@ jobs:
# mc#1529 §1: must run on operator-host (where `molecule-core-net`
# exists). See detect-changes for the full routing rationale.
runs-on: docker-host
# mc#1982 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#1982: mask removed. If regressions appear, root-fix the underlying
# test — do NOT renew the mask silently.
continue-on-error: false
# mc#774 Phase 3 (RFC §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
# Unique name per run so concurrent jobs don't collide on the
# bridge network. ${RUN_ID}-${RUN_ATTEMPT} is unique even across
@@ -253,19 +251,6 @@ jobs:
echo "✓ $tbl table present"
done
- if: needs.detect-changes.outputs.handlers == 'true'
name: Preflight — INTEGRATION_DB_URL must be present
run: |
# Belt-and-suspenders: if the postgres-start step failed to
# export INTEGRATION_DB_URL, fail loud BEFORE go test can
# t.Skip its way to a green build. Closes the workflow-level
# fail-open gap identified in PR #2166 blocker #2.
if [ -z "${INTEGRATION_DB_URL:-}" ]; then
echo "::error::INTEGRATION_DB_URL is empty — postgres-start step did not export the connection string"
exit 1
fi
echo "INTEGRATION_DB_URL is set"
- if: needs.detect-changes.outputs.handlers == 'true'
name: Run integration tests
run: |
+2 -2
View File
@@ -70,7 +70,7 @@ jobs:
# of mc#1543; see internal#512 for class defect.
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
run: ${{ steps.decide.outputs.run }}
@@ -172,7 +172,7 @@ jobs:
# beta containers. Must run on operator-host Linux (docker-host).
runs-on: docker-host
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 30
steps:
@@ -1,6 +1,6 @@
name: lint-bp-context-emit-match
# Tier 2f scheduled lint (per mc#1982) — detects drift between
# Tier 2f scheduled lint (per mc#774) — detects drift between
# `branch_protections/<branch>.status_check_contexts` and the set of
# contexts emitted by `.gitea/workflows/*.yml`.
#
@@ -60,7 +60,7 @@ name: lint-bp-context-emit-match
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - internal#349 (cross-repo BP sweep)
# - feedback_phantom_required_check_after_gitea_migration
# - feedback_tier_label_ids_are_per_repo
@@ -91,10 +91,10 @@ jobs:
name: lint-bp-context-emit-match
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port,
# well past the 7-clean-run threshold. Scheduled failure is now
# a hard CI signal.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface drift without blocking. After 7
# clean scheduled runs on main, flip to false so a scheduled
# failure is a hard CI signal.
continue-on-error: true # mc#774 Phase 3 — flip to false after 7 clean main runs
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
@@ -1,6 +1,6 @@
name: lint-continue-on-error-tracking
# Tier 2e hard-gate lint (per mc#1982) — every
# Tier 2e hard-gate lint (per mc#774) — every
# `continue-on-error: true` in `.gitea/workflows/*.yml` must carry a
# `# mc#NNNN` or `# internal#NNNN` tracker comment within 2 lines,
# the referenced issue must be OPEN, and ≤14 days old.
@@ -8,7 +8,7 @@ name: lint-continue-on-error-tracking
# Why this exists
# ---------------
# `continue-on-error: true` on `platform-build` had been hiding
# mc#1982-class regressions for ~3 weeks before #656 surfaced them on
# mc#774-class regressions for ~3 weeks before #656 surfaced them on
# 2026-05-12. A 14-day cap on tracker age forces a review cycle and
# surfaces mask-drift within at most 14 days of the original defect.
# Each `continue-on-error: true` gets a paper trail — close or renew.
@@ -45,12 +45,12 @@ name: lint-continue-on-error-tracking
# close-and-flip, or document the deliberate keep-mask in a fresh
# 14-day-renewable tracker. After main is clean for 3 days,
# follow-up PR flips this workflow's continue-on-error to false.
# Tracking: mc#1982.
# Tracking: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#1982 (the empirical masked-3-weeks case)
# - mc#774 (the RFC that specs this lint)
# - mc#774 (the empirical masked-3-weeks case)
# - feedback_chained_defects_in_never_tested_workflows
# - feedback_behavior_based_ast_gates
# - feedback_strict_root_only_after_class_a
@@ -97,9 +97,9 @@ jobs:
# Phase 3 (RFC #219 §1): surface masked defects without blocking
# PRs. Pre-existing continue-on-error: true directives on main
# all violate this lint at first — intentional. Flip to false
# follow-up after main is clean for 3 days. mc#1982.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # mc#1982 Phase 3 mask — 14d forced-renewal cadence
# follow-up after main is clean for 3 days. mc#774.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # mc#774 Phase 3 mask — 14d forced-renewal cadence
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
@@ -48,9 +48,11 @@ jobs:
scan:
name: Scan workflows for curl status-capture pollution
runs-on: ubuntu-latest
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port.
# mc#1982 mask removed — no surfaced defects in this lane.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Find curl ... -w '%{http_code}' ... || echo "000" subshells
@@ -25,21 +25,6 @@ name: Lint forbidden tenant-env keys
# feedback_path_filtered_workflow_cant_be_required). The scan itself
# targets workspace_secrets-writer paths via grep -r; it's fast
# (sub-second) so unconditional run is fine.
#
# ── 2026-06-01 CI-scheduler-fanout consolidation (fix/ci-scheduler-fanout) ──
# The RFC#523 sibling lint formerly in its own file
# `lint-no-tenant-gitea-token.yml` (the broader "no repo-host token into
# any tenant-writer surface" scan) is now a SECOND job in THIS workflow
# (`scan-tenant-token-write`). Both are sub-second Go-source greps that
# fired as two separate workflow runs on every PR — pure scheduler
# fan-out. Folding the sibling in here drops one workflow run + one
# checkout per PR while keeping BOTH scans firing unconditionally on
# every PR (the no-paths discipline above is preserved — neither job is
# paths-filtered). The moved job keeps its exact `name:` so its emitted
# status context is unchanged in substance; its `# bp-exempt:` directive
# moves with it (Tier 2g). The old `Lint no tenant GITEA or GITHUB token
# write / …` context is retired (a disappearing context needs no
# directive; only NEW emitters do).
on:
pull_request:
@@ -181,126 +166,3 @@ jobs:
fi
echo "OK No forbidden operator-scope env key names hardcoded in writer paths."
# bp-exempt: advisory RFC#523 lint; PR review gate is review-driven, not BP-driven.
# (Carried with the workflow-name rename in PR mc#1593 so the renamed
# context emission satisfies lint_required_context_exists_in_bp Tier 2g.)
scan-tenant-token-write:
name: Scan for repo-host token write into tenant workspace surface
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Find Go files referencing a tenant-writer surface AND a repo-host token
run: |
set -euo pipefail
# Repo-host token NAMES — the threat-model subset. Operator-fleet
# tokens (CP_ADMIN_API_TOKEN, RAILWAY_TOKEN, INFISICAL_*) are
# caught by lint-forbidden-env-keys.yml's broader deny set; this
# lint focuses on the git-host class so a single co-occurrence
# match has a low false-positive rate.
FORBIDDEN_KEYS=(
"GITEA_TOKEN"
"GITEA_PAT"
"GITHUB_TOKEN"
"GITHUB_PAT"
"GH_TOKEN"
)
# Tenant-writer surface markers. A file matches the surface set
# if it references ANY of these strings. This is the "is this
# code path writing into a tenant workspace?" heuristic.
# Curated to catch the actual code shapes used in this repo
# (verified by grep against current main 2026-05-19):
# - "workspace_secrets" / "global_secrets" → DB table writes
# - "seedAllowList" → CP-side seed table
# - "/settings/secrets" → tenant HTTP API write
# - "envVars[" → in-memory env map write
# - "containerEnv" → docker-run env-set
# - "userData" → EC2 user-data script
# - "provisionPayload" / "provisionContext" → provision-request shape
SURFACE_PATTERN='workspace_secrets|global_secrets|seedAllowList|/settings/secrets|envVars\[|containerEnv|userData|provisionPayload|provisionContext'
# Files that legitimately reference these names AND a surface
# marker, but do so for guard / strip / test / doc-comment
# reasons. New entries require reviewer signoff and a one-line
# justification in the diff.
EXEMPT_FILES=(
# RFC#523 L1 deny-set source-of-truth + tests
"workspace-server/internal/handlers/workspace_provision_forbidden_env.go"
"workspace-server/internal/handlers/workspace_provision_forbidden_env_test.go"
# Forensic-#145 silent-strip denylist (defense-in-depth, by design lists the names)
"workspace-server/internal/provisioner/provisioner.go"
"workspace-server/internal/provisioner/provisioner_test.go"
# Pre-RFC#523 persona-fallback / org-helper paths. The L1
# fail-closed runs BEFORE these writers; downstream silent-strip
# also covers them. See applyAgentGitHTTPCreds doc-comment.
"workspace-server/internal/handlers/agent_git_identity.go"
"workspace-server/internal/handlers/org_helpers.go"
"workspace-server/internal/handlers/org.go"
# CP→platform admin auth (NOT a tenant env write).
"workspace-server/internal/provisioner/cp_provisioner.go"
)
# Build an extended-regex alternation of forbidden keys.
KEY_ALT="$(IFS='|'; echo "${FORBIDDEN_KEYS[*]}")"
# Find candidate files: Go non-test sources that contain a
# tenant-writer surface marker.
mapfile -t CANDIDATES < <(
grep -rlE --include='*.go' --exclude='*_test.go' \
"${SURFACE_PATTERN}" . 2>/dev/null \
| sed 's|^\./||' \
| sort -u
)
if [ "${#CANDIDATES[@]}" -eq 0 ]; then
echo "OK No tenant-writer-surface files found in tree (unexpected, but not a lint failure)."
exit 0
fi
HITS=""
for f in "${CANDIDATES[@]}"; do
# Skip exempt files.
skip=0
for ex in "${EXEMPT_FILES[@]}"; do
if [ "$f" = "$ex" ]; then skip=1; break; fi
done
[ "$skip" = "1" ] && continue
# File contains a surface marker; now grep for a forbidden
# key NAME. We require a QUOTED-literal match to avoid
# firing on a comment like "// also handle GITEA_TOKEN".
#
# The literal form catches:
# - os.Getenv("GITEA_TOKEN")
# - envVars["GITEA_TOKEN"] = ...
# - {envKey: "GITEA_TOKEN", tenantKey: "GITEA_TOKEN"}
# but not:
# - // see GITEA_TOKEN below (no quotes)
found=$(grep -nE "\"(${KEY_ALT})\"" "$f" 2>/dev/null || true)
if [ -n "$found" ]; then
HITS="${HITS}--- ${f} ---\n${found}\n"
fi
done
if [ -n "$HITS" ]; then
echo "::error::Task #146 lint: repo-host token name(s) quoted in a tenant-writer-surface file:"
printf "$HITS"
echo ""
echo "These files reference a tenant-writer surface (workspace_secrets,"
echo "seedAllowList, /settings/secrets, containerEnv, userData, etc.)"
echo "AND quote a repo-host token name (GITEA_TOKEN/GITHUB_TOKEN/…)."
echo "Per RFC#523 threat model, tenant workspaces MUST NOT receive"
echo "operator-scope repo-host tokens. If your code legitimately needs"
echo "to reference one of these names in a tenant-writer file (e.g."
echo "a deny-set definition or silent-strip list), add the file to"
echo "EXEMPT_FILES with a one-line justification — reviewer signoff"
echo "required."
exit 1
fi
echo "OK No tenant-writer-surface file co-mentions a repo-host token literal."
+6 -6
View File
@@ -1,6 +1,6 @@
name: lint-mask-pr-atomicity
# Tier 2d hard-gate lint (per mc#1982) — blocks PRs that touch
# Tier 2d hard-gate lint (per mc#774) — blocks PRs that touch
# `.gitea/workflows/ci.yml` and modify ONLY ONE of {continue-on-error,
# all-required.sentinel.needs} without a `Paired: #NNN` reference in
# the PR body or in a commit message.
@@ -37,13 +37,13 @@ name: lint-mask-pr-atomicity
# This workflow lands at `continue-on-error: true` (Phase 3 — surface
# regressions without blocking PRs while the rule beds in).
# Follow-up PR flips to `false` once we have ≥3 days of clean runs on
# `main` and no false-positives. Tracking issue: mc#1982.
# `main` and no false-positives. Tracking issue: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - PR#665 / PR#668 (the empirical split-pair)
# - mc#1982 (the main-red incident the split caused)
# - mc#774 (the main-red incident the split caused)
# - feedback_strict_root_only_after_class_a
# - feedback_behavior_based_ast_gates
#
@@ -92,8 +92,8 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken shapes without blocking
# PRs. Follow-up PR flips this to `false` once recent runs on main
# are confirmed clean (eat-our-own-dogfood discipline mirrors
# PR#673's same-shape comment). Tracking: mc#1982.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# PR#673's same-shape comment). Tracking: mc#774.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- name: Check out PR head with full history (need base SHA blobs)
@@ -0,0 +1,182 @@
name: Lint no tenant GITEA or GITHUB token write
# Task #146 — CI guardrail companion to RFC#523's `lint-forbidden-env-keys.yml`.
#
# `lint-forbidden-env-keys.yml` (Layer 3) catches code that hardcodes a
# forbidden env-var key NAME as a quoted literal in workspace_secrets
# writer paths under workspace-server/internal/.
#
# This workflow catches a BROADER class: any code path that reads a
# repo-host token (GITEA_TOKEN / GITHUB_TOKEN / GH_TOKEN) and then writes
# it into a TENANT WORKSPACE's env, secret store, user-data, or
# provision payload. This is the actual RFC#523 threat-model statement —
# the goal is "no tenant workspace ever receives an operator-scope repo
# token," not just "no _quoted_ literal `GITEA_TOKEN`." A future writer
# could route the value via a variable, a struct field, or a config key
# and slip past the existing literal scan; this lint catches those
# routing patterns at PR review time.
#
# Scope
# Scans the WHOLE repo's Go sources (not just workspace-server/) for
# co-occurrences of:
# - a repo-host token NAME (GITEA_TOKEN / GITHUB_TOKEN / GH_TOKEN /
# GITEA_PAT / GITHUB_PAT) used as os.Getenv argument or string
# literal
# - within a file that ALSO references a tenant-writer surface
# (`tenant`, `workspace_secrets`, `global_secrets`, `seedAllowList`,
# `/settings/secrets`, `userData`, `provisionPayload`,
# `envVars[`, `containerEnv`).
#
# Co-occurrence (not single-line) is the false-positive control: a
# file that just LOGS the variable name (e.g. "missing GITEA_TOKEN")
# without touching any tenant surface won't fire.
#
# Drift contract with lint-forbidden-env-keys.yml
# Both lints share the same FORBIDDEN_KEYS list (a subset — only the
# repo-host tokens, since this lint's threat model is "tenant gets
# write access to operator's git host"). If RFC#523's deny set grows,
# update BOTH this file AND lint-forbidden-env-keys.yml AND the Go
# source-of-truth in
# workspace-server/internal/handlers/workspace_provision_forbidden_env.go.
#
# Open-source-template-friendly
# The patterns scanned are generic (no MOLECULE_-prefix literals).
# A fork can copy this workflow as-is and adjust FORBIDDEN_KEYS.
#
# Path-filter discipline
# No `paths:` filter — required-status workflows must run on every PR
# per `feedback_path_filtered_workflow_cant_be_required`. Scan is
# sub-second.
on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main, staging]
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: advisory RFC#523 lint; PR review gate is review-driven, not BP-driven.
# (Carried with the workflow-name rename in PR mc#1593 so the renamed
# context emission satisfies lint_required_context_exists_in_bp Tier 2g.)
scan:
name: Scan for repo-host token write into tenant workspace surface
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Find Go files referencing a tenant-writer surface AND a repo-host token
run: |
set -euo pipefail
# Repo-host token NAMES — the threat-model subset. Operator-fleet
# tokens (CP_ADMIN_API_TOKEN, RAILWAY_TOKEN, INFISICAL_*) are
# caught by lint-forbidden-env-keys.yml's broader deny set; this
# lint focuses on the git-host class so a single co-occurrence
# match has a low false-positive rate.
FORBIDDEN_KEYS=(
"GITEA_TOKEN"
"GITEA_PAT"
"GITHUB_TOKEN"
"GITHUB_PAT"
"GH_TOKEN"
)
# Tenant-writer surface markers. A file matches the surface set
# if it references ANY of these strings. This is the "is this
# code path writing into a tenant workspace?" heuristic.
# Curated to catch the actual code shapes used in this repo
# (verified by grep against current main 2026-05-19):
# - "workspace_secrets" / "global_secrets" → DB table writes
# - "seedAllowList" → CP-side seed table
# - "/settings/secrets" → tenant HTTP API write
# - "envVars[" → in-memory env map write
# - "containerEnv" → docker-run env-set
# - "userData" → EC2 user-data script
# - "provisionPayload" / "provisionContext" → provision-request shape
SURFACE_PATTERN='workspace_secrets|global_secrets|seedAllowList|/settings/secrets|envVars\[|containerEnv|userData|provisionPayload|provisionContext'
# Files that legitimately reference these names AND a surface
# marker, but do so for guard / strip / test / doc-comment
# reasons. New entries require reviewer signoff and a one-line
# justification in the diff.
EXEMPT_FILES=(
# RFC#523 L1 deny-set source-of-truth + tests
"workspace-server/internal/handlers/workspace_provision_forbidden_env.go"
"workspace-server/internal/handlers/workspace_provision_forbidden_env_test.go"
# Forensic-#145 silent-strip denylist (defense-in-depth, by design lists the names)
"workspace-server/internal/provisioner/provisioner.go"
"workspace-server/internal/provisioner/provisioner_test.go"
# Pre-RFC#523 persona-fallback / org-helper paths. The L1
# fail-closed runs BEFORE these writers; downstream silent-strip
# also covers them. See applyAgentGitHTTPCreds doc-comment.
"workspace-server/internal/handlers/agent_git_identity.go"
"workspace-server/internal/handlers/org_helpers.go"
"workspace-server/internal/handlers/org.go"
# CP→platform admin auth (NOT a tenant env write).
"workspace-server/internal/provisioner/cp_provisioner.go"
)
# Build an extended-regex alternation of forbidden keys.
KEY_ALT="$(IFS='|'; echo "${FORBIDDEN_KEYS[*]}")"
# Find candidate files: Go non-test sources that contain a
# tenant-writer surface marker.
mapfile -t CANDIDATES < <(
grep -rlE --include='*.go' --exclude='*_test.go' \
"${SURFACE_PATTERN}" . 2>/dev/null \
| sed 's|^\./||' \
| sort -u
)
if [ "${#CANDIDATES[@]}" -eq 0 ]; then
echo "OK No tenant-writer-surface files found in tree (unexpected, but not a lint failure)."
exit 0
fi
HITS=""
for f in "${CANDIDATES[@]}"; do
# Skip exempt files.
skip=0
for ex in "${EXEMPT_FILES[@]}"; do
if [ "$f" = "$ex" ]; then skip=1; break; fi
done
[ "$skip" = "1" ] && continue
# File contains a surface marker; now grep for a forbidden
# key NAME. We require a QUOTED-literal match to avoid
# firing on a comment like "// also handle GITEA_TOKEN".
#
# The literal form catches:
# - os.Getenv("GITEA_TOKEN")
# - envVars["GITEA_TOKEN"] = ...
# - {envKey: "GITEA_TOKEN", tenantKey: "GITEA_TOKEN"}
# but not:
# - // see GITEA_TOKEN below (no quotes)
found=$(grep -nE "\"(${KEY_ALT})\"" "$f" 2>/dev/null || true)
if [ -n "$found" ]; then
HITS="${HITS}--- ${f} ---\n${found}\n"
fi
done
if [ -n "$HITS" ]; then
echo "::error::Task #146 lint: repo-host token name(s) quoted in a tenant-writer-surface file:"
printf "$HITS"
echo ""
echo "These files reference a tenant-writer surface (workspace_secrets,"
echo "seedAllowList, /settings/secrets, containerEnv, userData, etc.)"
echo "AND quote a repo-host token name (GITEA_TOKEN/GITHUB_TOKEN/…)."
echo "Per RFC#523 threat model, tenant workspaces MUST NOT receive"
echo "operator-scope repo-host tokens. If your code legitimately needs"
echo "to reference one of these names in a tenant-writer file (e.g."
echo "a deny-set definition or silent-strip list), add the file to"
echo "EXEMPT_FILES with a one-line justification — reviewer signoff"
echo "required."
exit 1
fi
echo "OK No tenant-writer-surface file co-mentions a repo-host token literal."
@@ -4,7 +4,7 @@ name: Lint pre-flip continue-on-error
# on any job in `.gitea/workflows/*.yml` WITHOUT proof that the affected
# job's recent runs on the target branch (PR base) are actually green.
#
# Empirical class: PR #656 / mc#1982. PR #656 (RFC internal#219 Phase 4)
# Empirical class: PR #656 / mc#774. PR #656 (RFC internal#219 Phase 4)
# flipped 5 platform-build-class jobs `continue-on-error: true → false`
# on the basis of a "verified green on main via combined-status check".
# But that "green" was the LIE the prior `continue-on-error: true`
@@ -13,7 +13,7 @@ name: Lint pre-flip continue-on-error
# job-level status. The precondition the PR claimed to verify was
# structurally fooled by the bug being flipped.
#
# mc#1982 captured the surfaced defects (2 mutually-masked regressions):
# mc#774 captured the surfaced defects (2 mutually-masked regressions):
# - Class 1: sqlmock helper drift since 2f36bb9a (24 days old)
# - Class 2: OFFSEC-001 contract collision since 7d1a189f (1 day old)
#
@@ -55,7 +55,7 @@ name: Lint pre-flip continue-on-error
# - YAML parse error in one of the workflow files: warn-only,
# don't block — the YAML lint workflows catch this separately.
#
# Cross-links: PR#656, mc#1982, PR#665 (interim re-mask),
# Cross-links: PR#656, mc#774, PR#665 (interim re-mask),
# Quirk #10 (internal#342 + dup #287), hongming-pc2 charter
# §SOP-N rule (e), feedback_strict_root_only_after_class_a,
# feedback_no_shared_persona_token_use.
@@ -99,8 +99,8 @@ jobs:
timeout-minutes: 8
# Phase 3 (RFC internal#219 §1): surface broken flips without blocking
# the PR yet. Follow-up flips this to `false` once the workflow itself
# has clean recent runs on main. mc#1982 interim — remove when CoE→false.
continue-on-error: true # mc#1982
# has clean recent runs on main. mc#774 interim — remove when CoE→false.
continue-on-error: true # mc#774
steps:
- name: Check out PR head (full history for base-SHA access)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -1,6 +1,6 @@
name: lint-required-context-exists-in-bp
# Tier 2g hard-gate lint (per mc#1982) — diff-based PR-time
# Tier 2g hard-gate lint (per mc#774) — diff-based PR-time
# check. When a PR adds a NEW commit-status emission (workflow YAML
# `name:` + job `name:`-or-key + on:-event), the workflow file must
# carry one of three directives adjacent to the new job:
@@ -16,7 +16,7 @@ name: lint-required-context-exists-in-bp
# PR#656 added `CI / all-required (pull_request)` as a sentinel
# context that workflows emit, but BP did NOT list it. When
# platform-build failed, all-required failed, but BP let the PR
# merge anyway → cascade to mc#1982. With this lint, PR#656 would
# merge anyway → cascade to mc#774. With this lint, PR#656 would
# have been blocked until either the BP PATCH ran alongside OR
# the author added a `bp-required: pending` directive.
#
@@ -27,7 +27,7 @@ name: lint-required-context-exists-in-bp
# share the workflow-context enumeration helpers
# (`_event_map`, `workflow_contexts`, `_job_display`) but the
# semantics are intentionally distinct so they're separate scripts.
# Co-design is documented in mc#1982.
# Co-design is documented in mc#774.
#
# Directive comment lives in the workflow file (NOT PR body)
# ----------------------------------------------------------
@@ -42,13 +42,13 @@ name: lint-required-context-exists-in-bp
# Lands at `continue-on-error: true` (Phase 3 — surface the
# pattern without blocking PRs while the directive convention
# beds in). After 7 days of clean runs on `main` with no false
# positives, follow-up flips to `false`. Tracking: mc#1982.
# positives, follow-up flips to `false`. Tracking: mc#774.
#
# Cross-links
# -----------
# - mc#1982 (the RFC that specs this lint)
# - mc#774 (the RFC that specs this lint)
# - PR#656 (the empirical case)
# - mc#1982 (the surfaced cascade)
# - mc#774 (the surfaced cascade)
# - feedback_phantom_required_check_after_gitea_migration (Tier 2f cousin)
# - feedback_behavior_based_ast_gates
#
@@ -81,10 +81,10 @@ jobs:
name: lint-required-context-exists-in-bp
runs-on: ubuntu-latest
timeout-minutes: 5
# Phase 4 (RFC #219 §1): 22 days green since 2026-05-11 port,
# well past the 7-clean-day threshold. PR-time failure is now
# a hard CI signal.
continue-on-error: false
# Phase 3 (RFC #219 §1): surface the pattern without blocking PRs
# while the directive convention beds in. Follow-up flip to false
# after 7 clean days on main. mc#774.
continue-on-error: true # mc#774 Phase 3 — flip to false after 7 clean main runs
steps:
- name: Check out PR head with full history (need base SHA blobs)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -3,26 +3,11 @@ name: Lint shellcheck (arm64 pilot)
# Mac-CI dual-track pilot (#233). ADDITIVE / NOT REQUIRED.
#
# Validates the arm64 self-hosted lane (no docker.sock, no privileged
# ops) before any required gate moves onto it.
# ops) before any required gate moves onto it. Until a Mac arm64 runner
# is registered with the `arm64` label, this workflow sits PENDING —
# that is FINE: `arm64` is NOT in branch_protections required contexts.
#
# Runner label mapping (2026-05-22 fix): the actual Mac mini runner
# registered in this Gitea ships labels
# ["self-hosted","macos-self-hosted-arm64","arm64-darwin"]
# — no plain `arm64`. The earlier `runs-on: [self-hosted, arm64]`
# could not match any registered runner so every fire of this workflow
# was assigned task_id=0 / runner_id=NULL → Gitea cancelled it. The
# rows showed up as Cancelled in the action status feed (not Failed)
# but the lane never actually ran. Workflow now selects on
# `arm64-darwin` which is the canonical Mac-arm64 label per the
# Mac mini's registration (per internal#494 capability-honest labels).
#
# If we later want to add a Linux-arm64 runner to the same lane, add
# both labels to that runner's registration AND broaden the selector
# here — don't rename `arm64-darwin` (it's Mac-specific by design and
# `feedback_pc2_runner_labels_must_stay_narrow` rule applies).
#
# Pairs with internal#543 (RFC: Mac arm64 multi-arch runner-base) and
# internal#494 (multi-arch runner-base capability-honest labels).
# Pairs with internal#543 (RFC: Mac arm64 multi-arch runner-base).
# No paths: filter on purpose (feedback_path_filtered_workflow_cant_be_required).
on:
@@ -49,56 +34,37 @@ jobs:
GITHUB_SERVER_URL: https://git.moleculesai.app
steps:
- name: Identify runner
id: identify
continue-on-error: true
run: |
set -eu
echo "arch=$(uname -m)"
echo "kernel=$(uname -sr)"
echo "shell=$BASH_VERSION"
# Sanity: must actually be arm64. If amd64 sneaks in here,
# the job skips gracefully rather than hard-failing, because
# a mislabelled runner is an ops concern, not a code defect.
# Pilot lane must not make main red (#2146).
# fail fast — that means the label routing is wrong.
case "$(uname -m)" in
aarch64|arm64)
echo "arm64 confirmed"
echo "arm64=true" >> "$GITHUB_OUTPUT"
;;
*)
echo "ERROR: expected arm64, got $(uname -m) — label routing may be wrong"
echo "arm64=false" >> "$GITHUB_OUTPUT"
exit 1
;;
aarch64|arm64) echo "arm64 confirmed" ;;
*) echo "ERROR: expected arm64, got $(uname -m)"; exit 1 ;;
esac
- name: Checkout
if: steps.identify.outputs.arm64 == 'true'
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Install shellcheck (arm64)
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
if command -v shellcheck >/dev/null 2>&1; then
echo "shellcheck already present: $(shellcheck --version | head -1)"
else
# Prefer apt if the runner base ships it; else download the
# correct platform binary (darwin vs linux).
# Prefer apt if the runner base ships it; else download arm64 binary.
if command -v apt-get >/dev/null 2>&1; then
sudo apt-get update -qq
sudo apt-get install -y --no-install-recommends shellcheck
else
SC_VER=v0.10.0
if [ "$(uname -s)" = "Darwin" ]; then
SC_PKG="shellcheck-${SC_VER}.darwin.aarch64.tar.xz"
else
SC_PKG="shellcheck-${SC_VER}.linux.aarch64.tar.xz"
fi
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/${SC_PKG}" \
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/shellcheck-${SC_VER}.linux.aarch64.tar.xz" \
| tar -xJf - --strip-components=1
sudo mv shellcheck /usr/local/bin/
fi
@@ -106,26 +72,17 @@ jobs:
shellcheck --version | head -2
- name: Run shellcheck on .gitea/scripts/*.sh
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
# Only the scripts we control under .gitea/scripts. Pilot
# scope is intentionally narrow — broaden in a follow-up
# once the lane is proven.
if ! command -v shellcheck >/dev/null 2>&1 || ! shellcheck --version >/dev/null 2>&1; then
echo "WARN: shellcheck not functional — skipping (pilot mode)"
if ! command -v shellcheck >/dev/null 2>&1; then
echo "WARN: shellcheck binary not found — skipping (pilot mode)"
exit 0
fi
# NOTE: macOS ships Bash 3.2 (Apple license), no `mapfile`
# (Bash 4+ builtin). Mac mini runner empirically failed at
# `mapfile: command not found` (run 79275 / task 145654).
# Use the portable `while read` pattern instead — works on
# both Bash 3.2 (macOS) and Bash 4+ (Linux).
TARGETS=()
while IFS= read -r f; do
TARGETS+=("$f")
done < <(find .gitea/scripts -maxdepth 2 -type f -name '*.sh' | sort)
mapfile -t TARGETS < <(find .gitea/scripts -maxdepth 2 -type f -name '*.sh' | sort)
if [ "${#TARGETS[@]}" -eq 0 ]; then
echo "No .sh files found under .gitea/scripts — nothing to check"
exit 0
+1 -1
View File
@@ -55,7 +55,7 @@ jobs:
# Phase 3 (RFC #219 §1): surface broken shapes without blocking PRs.
# Follow-up PR flips this off after the 4 existing-on-main rule-2
# (workflow_run) violations are migrated to a supported trigger.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+1 -1
View File
@@ -67,7 +67,7 @@ jobs:
# in this rollout (internal#462) so the precondition holds.
runs-on: publish
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- name: Checkout
@@ -16,24 +16,14 @@ name: publish-workspace-server-image
#
# Image tags produced:
# :staging-<sha> — per-commit digest, stable for canary verify
# :staging-latest — tracks most recent BUILD on this branch (set by the
# build job, last-writer-wins, NOT prod-gated)
# :latest — tracks the most recent PROD-PROMOTED build. Re-pointed by the
# deploy-production job ONLY after green main CI + canary +
# fleet rollout + /buildinfo verification pass. So :latest ==
# "current prod image", never the raw build. (Added 2026-06-03
# after a stale :latest — last moved 2026-05-10 — reverted a
# production tenant on a no-arg redeploy.)
# :staging-latest — tracks most recent build on this branch
#
# Production auto-deploy:
# After both platform and tenant images are pushed, deploy-production waits
# for strict required push contexts on the same SHA to go green, then
# calls the production CP redeploy-fleet endpoint with target_tag=
# staging-<sha>. On success (rollout + buildinfo verified) it re-points
# :latest to the same SHA. Set repo variable or secret
# PROD_AUTO_DEPLOY_DISABLED=true to stop production rollout while keeping
# image publishing enabled — in which case :latest is NOT advanced either
# (correct: an unpromoted build must not become :latest).
# staging-<sha>. Set repo variable or secret PROD_AUTO_DEPLOY_DISABLED=true
# to stop production rollout while keeping image publishing enabled.
#
# Primary ECR target: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*
# Optional staging tenant mirror target:
@@ -244,43 +234,24 @@ jobs:
name: Production auto-deploy
needs: build-and-push
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
# Side-effect deploy only; image publish success is the durable artifact. mc#1982
# Side-effect deploy only; image publish success is the durable artifact. mc#774
continue-on-error: true
# Publish/release lane (internal#462) — production deploy of a merged
# fix; reserved capacity, never queued behind PR-CI.
runs-on: publish
timeout-minutes: 90
timeout-minutes: 75
env:
CP_URL: ${{ vars.PROD_CP_URL || 'https://api.moleculesai.app' }}
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
GITEA_HOST: git.moleculesai.app
GITEA_TOKEN: ${{ secrets.PROD_AUTO_DEPLOY_CONTROL_TOKEN || secrets.AUTO_SYNC_TOKEN }}
CI_STATUS_TIMEOUT_SECONDS: "3600"
PROD_AUTO_DEPLOY_DISABLED: ${{ vars.PROD_AUTO_DEPLOY_DISABLED || secrets.PROD_AUTO_DEPLOY_DISABLED || '' }}
PROD_AUTO_DEPLOY_CANARY_SLUG: ${{ vars.PROD_AUTO_DEPLOY_CANARY_SLUG || 'hongming' }}
PROD_AUTO_DEPLOY_SOAK_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_SOAK_SECONDS || '60' }}
PROD_AUTO_DEPLOY_BATCH_SIZE: ${{ vars.PROD_AUTO_DEPLOY_BATCH_SIZE || '3' }}
PROD_AUTO_DEPLOY_DRY_RUN: ${{ vars.PROD_AUTO_DEPLOY_DRY_RUN || '' }}
PROD_ALLOW_NON_PROD_CP_URL: ${{ vars.PROD_ALLOW_NON_PROD_CP_URL || '' }}
# #2213: per-tenant /buildinfo settle budget. A freshly-swapped tenant can
# keep serving the old image at the edge for a short drain window; the
# verify step polls each tenant up to this budget before declaring it stale.
PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS || '240' }}
PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS || '20' }}
steps:
# The publish runner's default HOME (/home/hongming) is not writable, so
# git/docker credential saves fail (`Error saving credentials: mkdir
# /home/hongming: permission denied`) and halt the production rollout
# (#2193). Point HOME + DOCKER_CONFIG at the writable job temp dir —
# mirrors build-and-push's "Prepare writable Docker config" fix above.
- name: Prepare writable HOME + Docker config
run: |
set -euo pipefail
H="$RUNNER_TEMP/auto-deploy-home"
mkdir -p "$H/.docker"
echo "HOME=$H" >> "$GITHUB_ENV"
echo "DOCKER_CONFIG=$H/.docker" >> "$GITHUB_ENV"
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -325,68 +296,33 @@ jobs:
set -euo pipefail
python3 .gitea/scripts/prod-auto-deploy.py wait-ci
# Superseded-job guard — BEFORE any production side effect (#2213).
#
# This workflow has no `concurrency:` (see header: Gitea 1.22.6 cancels
# queued prod deploys). So two close main pushes run BOTH deploy-production
# jobs. The verify step already skips its strict /buildinfo check when this
# job is superseded (#2194) — but that guard was AFTER the redeploy and the
# :latest promote, so an OLDER job that started late still:
# 1. rolled the whole fleet BACKWARD to its older tag (canary hongming
# was reverted from the newer SHA — the #2213 red), then
# 2. promoted :latest backward to the older image,
# and only THEN skipped verify and exited green. A superseded job must do
# NEITHER. We re-check the branch head here, immediately before the rollout,
# and skip every side effect when a newer commit already owns main.
#
# exit 0 + non-empty stdout => superseded (newer head printed); the redeploy
# and promote steps are gated off via this output. exit 10 => this job is
# still the latest, proceed to roll the fleet. Fail-safe: a head that can't
# be read returns NOT-superseded (exit 10), so a genuine deploy is never
# silently skipped. (Re-checked again at verify time to catch a newer job
# that lands DURING this rollout.)
- name: Check superseded before production side effects
id: supersede
if: ${{ steps.plan.outputs.enabled == 'true' }}
run: |
set -euo pipefail
set +e
NEWER_HEAD="$(python3 .gitea/scripts/prod-auto-deploy.py check-superseded)"
SUPERSEDED_EXIT=$?
set -e
if [ "$SUPERSEDED_EXIT" -eq 0 ] && [ -n "$NEWER_HEAD" ]; then
echo "superseded=true" >> "$GITHUB_OUTPUT"
echo "::notice::Superseded before rollout: main head is now ${NEWER_HEAD:0:7} (this job deploys ${GITHUB_SHA:0:7}). Skipping redeploy + :latest promote so an older job never rolls the fleet backward."
{
echo "## Production auto-deploy skipped — superseded before rollout"
echo ""
echo "This deploy job's SHA \`${GITHUB_SHA:0:7}\` is no longer the head of \`main\` (now \`${NEWER_HEAD:0:7}\`)."
echo "A newer deploy job owns the fleet; rolling it backward to this older build would revert tenants and \`:latest\`. No side effects performed."
} >> "$GITHUB_STEP_SUMMARY"
else
echo "superseded=false" >> "$GITHUB_OUTPUT"
fi
- name: Call production CP redeploy-fleet
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
if: ${{ steps.plan.outputs.enabled == 'true' }}
run: |
set -euo pipefail
python3 .gitea/scripts/prod-auto-deploy.py assert-enabled
PLAN="$RUNNER_TEMP/prod-auto-deploy-plan.json"
TARGET_TAG="$(jq -r '.target_tag' "$PLAN")"
BODY="$(jq -c '.body' "$PLAN")"
echo "POST $CP_URL/cp/admin/tenants/redeploy-fleet"
echo " target_tag: $TARGET_TAG"
echo " body: $BODY"
HTTP_RESPONSE="$RUNNER_TEMP/prod-redeploy-response.json"
HTTP_CODE_FILE="$RUNNER_TEMP/prod-redeploy-http-code.txt"
set +e
python3 .gitea/scripts/prod-auto-deploy.py rollout \
--plan "$PLAN" \
--response "$HTTP_RESPONSE"
ROLLOUT_EXIT=$?
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
-m 1200 \
-H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
-d "$BODY" > "$HTTP_CODE_FILE"
set -e
if [ ! -s "$HTTP_RESPONSE" ]; then
jq -nc --arg error "rollout command exited $ROLLOUT_EXIT before writing a response" \
'{ok:false, results:[], error:$error}' > "$HTTP_RESPONSE"
fi
HTTP_CODE="$(cat "$HTTP_CODE_FILE" 2>/dev/null || echo "000")"
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
echo "HTTP $HTTP_CODE"
jq '{ok, result_count: (.results // [] | length)}' "$HTTP_RESPONSE" || true
{
@@ -394,99 +330,38 @@ jobs:
echo ""
echo "**Commit:** \`${GITHUB_SHA:0:7}\`"
echo "**Target tag:** \`$TARGET_TAG\`"
echo "**HTTP:** $HTTP_CODE"
echo ""
echo "### Per-tenant result"
echo ""
echo "| Slug | Phase | SSM Status | Exit | Healthz | On target | Error present |"
echo "|------|-------|------------|------|---------|-----------|---------------|"
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \(.verified_on_target) | \((.error // "") != "") |"' "$HTTP_RESPONSE" || true
# internal#724: stragglers are tenants enumerated but not proven
# on the target build. Surface them loudly — a non-empty list
# means the rollout did NOT fully land.
STRAGGLERS="$(jq -r '(.stragglers // []) | join(", ")' "$HTTP_RESPONSE")"
if [ -n "$STRAGGLERS" ]; then
echo ""
echo "### ⚠ Stragglers (NOT on target tag \`$TARGET_TAG\`)"
echo ""
echo "\`$STRAGGLERS\`"
fi
echo "| Slug | Phase | SSM Status | Exit | Healthz | Error present |"
echo "|------|-------|------------|------|---------|---------------|"
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \((.error // "") != "") |"' "$HTTP_RESPONSE" || true
} >> "$GITHUB_STEP_SUMMARY"
if [ "$HTTP_CODE" != "200" ]; then
echo "::error::redeploy-fleet returned HTTP $HTTP_CODE"
exit 1
fi
OK="$(jq -r '.ok' "$HTTP_RESPONSE")"
if [ "$OK" != "true" ]; then
STRAGGLERS="$(jq -r '(.stragglers // []) | join(", ")' "$HTTP_RESPONSE")"
if [ -n "$STRAGGLERS" ]; then
echo "::error::incomplete rollout — tenants not on target tag $TARGET_TAG: $STRAGGLERS"
fi
echo "::error::redeploy-fleet reported ok=false; production rollout halted."
exit 1
fi
if [ "$ROLLOUT_EXIT" -ne 0 ]; then
echo "::error::redeploy-fleet rollout failed with exit code $ROLLOUT_EXIT."
exit "$ROLLOUT_EXIT"
fi
- name: Verify reachable tenants report this SHA
# Skip when superseded BEFORE rollout: the redeploy step did not run, so
# there is no redeploy-fleet response to verify against and the newer job
# owns verification (#2213). The in-step guard below still catches the
# case where a newer job lands DURING this job's rollout.
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
if: ${{ steps.plan.outputs.enabled == 'true' }}
env:
TENANT_DOMAIN: moleculesai.app
run: |
set -euo pipefail
RESP="$RUNNER_TEMP/prod-redeploy-response.json"
# Superseded-job guard. This workflow has no `concurrency:` (header
# explains why: Gitea 1.22.6 cancels queued prod deploys). So two
# close main pushes run BOTH deploy-production jobs. The newer one
# rolls the fleet to its (newer) build first; this older job's strict
# equality check below would then see tenants on the NEWER SHA and
# false-red "$slug is stale" even though the fleet is AHEAD, not
# behind (git SHAs aren't ordered; /buildinfo exposes only git_sha).
#
# If main's current head is no longer THIS job's SHA, a newer commit
# has landed and this deploy is superseded — the newest job's verify
# is authoritative. Skip strict verify and succeed. exit 0 => newer
# head printed (superseded); exit 10 => still the latest, proceed to
# the strict verify so a genuinely-behind tenant still fails loudly.
set +e
NEWER_HEAD="$(python3 .gitea/scripts/prod-auto-deploy.py check-superseded)"
SUPERSEDED_EXIT=$?
set -e
if [ "$SUPERSEDED_EXIT" -eq 0 ] && [ -n "$NEWER_HEAD" ]; then
echo "::notice::Superseded deploy: main head is now ${NEWER_HEAD:0:7} (this job deployed ${GITHUB_SHA:0:7}). The fleet is at or ahead of this build; the newer deploy job's verify is authoritative. Skipping strict SHA verify."
{
echo ""
echo "### Buildinfo verification skipped — superseded deploy"
echo ""
echo "This deploy job's SHA \`${GITHUB_SHA:0:7}\` is no longer the head of \`main\` (now \`${NEWER_HEAD:0:7}\`)."
echo "A newer deploy job is rolling the fleet forward; its verify is authoritative."
} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
mapfile -t SLUGS < <(jq -r '.results[]? | .slug' "$RESP")
if [ ${#SLUGS[@]} -eq 0 ]; then
echo "::error::No tenants returned from redeploy-fleet; refusing to mark production deploy verified."
exit 1
fi
# Per-tenant settle/retry budget (#2213). A tenant whose container the
# CP just swapped can keep serving the OLD image at the edge for a short
# window while the old container drains — /buildinfo returns HTTP 200
# with the previous SHA, which `curl --retry` does NOT retry (it only
# retries connection/5xx failures, not a stale-but-200 body). Without a
# settle window a still-rolling tenant false-reds "stale" on the very
# first poll. So poll each tenant's /buildinfo until it reports the
# target SHA or the budget is exhausted; only THEN declare it stale or
# unreachable. This never masks a genuinely stuck tenant — a tenant that
# never reaches the target within the budget still fails loud (and the
# superseded-job revert class is already blocked before rollout above).
SETTLE_BUDGET_SECONDS="${PROD_AUTO_DEPLOY_VERIFY_BUDGET_SECONDS:-240}"
SETTLE_INTERVAL_SECONDS="${PROD_AUTO_DEPLOY_VERIFY_INTERVAL_SECONDS:-20}"
STALE_COUNT=0
UNREACHABLE_COUNT=0
UNHEALTHY_COUNT=0
@@ -498,36 +373,18 @@ jobs:
continue
fi
url="https://${slug}.${TENANT_DOMAIN}/buildinfo"
deadline=$(( $(date +%s) + SETTLE_BUDGET_SECONDS ))
actual=""
last_actual=""
on_target=false
while :; do
body="$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$url" || true)"
actual="$(echo "$body" | jq -r '.git_sha // ""' 2>/dev/null || echo "")"
[ -n "$actual" ] && last_actual="$actual"
if [ "$actual" = "$GITHUB_SHA" ]; then
on_target=true
break
fi
now=$(date +%s)
if [ "$now" -ge "$deadline" ]; then
break
fi
# Still rolling (stale 200) or transiently unreachable — wait and
# re-poll within the settle budget rather than failing on first read.
remaining=$(( deadline - now ))
echo "$slug: waiting for target SHA (have '${actual:0:7}', want ${GITHUB_SHA:0:7}; ${remaining}s left)"
sleep "$SETTLE_INTERVAL_SECONDS"
done
if [ "$on_target" = true ]; then
echo "$slug: ${actual:0:7}"
elif [ -z "$last_actual" ]; then
echo "::error::$slug did not return /buildinfo after deploy (waited ${SETTLE_BUDGET_SECONDS}s)."
body="$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$url" || true)"
actual="$(echo "$body" | jq -r '.git_sha // ""' 2>/dev/null || echo "")"
if [ -z "$actual" ]; then
echo "::error::$slug did not return /buildinfo after deploy."
UNREACHABLE_COUNT=$((UNREACHABLE_COUNT + 1))
else
echo "::error::$slug is stale: actual=${last_actual:0:7}, expected=${GITHUB_SHA:0:7} (waited ${SETTLE_BUDGET_SECONDS}s)"
continue
fi
if [ "$actual" != "$GITHUB_SHA" ]; then
echo "::error::$slug is stale: actual=${actual:0:7}, expected=${GITHUB_SHA:0:7}"
STALE_COUNT=$((STALE_COUNT + 1))
else
echo "$slug: ${actual:0:7}"
fi
done
@@ -545,69 +402,3 @@ jobs:
if [ "$STALE_COUNT" -gt 0 ] || [ "$UNHEALTHY_COUNT" -gt 0 ] || [ "$UNREACHABLE_COUNT" -gt 0 ]; then
exit 1
fi
# Re-point :latest to the just-promoted image — ONLY after the
# production rollout + buildinfo verification above have passed.
#
# WHY HERE (promote point), not at build time:
# The platform-tenant ECR `:latest` tag was last moved 2026-05-10
# and went 3.5 weeks stale because the build step only pushes
# :staging-<sha> + :staging-latest and never re-points :latest. A
# no-arg POST /cp/admin/tenants/:slug/redeploy (whose default tag
# fell through to "latest") then pulled the 3.5-week-old image and
# REVERTED the tenant (incident: molecule-adk-demo, 2026-06-03).
#
# The defense-in-depth half of this fix changes that redeploy
# default to :staging-latest, but :latest itself must also be
# kept meaningful. We make :latest track the PROD-BLESSED build,
# not the raw build: by living at the end of deploy-production —
# after `wait-ci` (green main CI), the canary-first batched fleet
# rollout, AND the /buildinfo SHA verification — :latest only ever
# advances to a SHA that is actually green and confirmed running
# across the live fleet. So `:latest` == "current prod image",
# and any consumer that pulls :latest (legacy callers, manual
# `docker pull`, a redeploy that somehow still resolves "latest")
# gets the blessed image instead of whatever happened to build.
#
# Re-tag is digest-level (imagetools create), so no rebuild and
# :latest is byte-identical to :staging-<sha> for this commit.
# Gate on supersede: a superseded older job must NOT move :latest backward
# to its older image (#2213 — 275383 promoted :latest → the older
# staging-7a72516 after a newer job had already shipped). :latest must only
# ever advance under the job that owns main's head.
- name: Promote :latest to the verified prod image
if: ${{ steps.plan.outputs.enabled == 'true' && steps.supersede.outputs.superseded != 'true' }}
env:
TENANT_IMAGE_NAME: ${{ env.TENANT_IMAGE_NAME }}
STAGING_TENANT_IMAGE_NAME: ${{ env.STAGING_TENANT_IMAGE_NAME }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
run: |
set -euo pipefail
SHA_TAG="staging-${GITHUB_SHA::7}"
PROD_ECR_REGISTRY="${TENANT_IMAGE_NAME%%/*}"
STAGING_ECR_REGISTRY="${STAGING_TENANT_IMAGE_NAME%%/*}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${PROD_ECR_REGISTRY}"
aws ecr get-login-password --region us-east-2 | \
docker login --username AWS --password-stdin "${STAGING_ECR_REGISTRY}"
# imagetools create copies the source manifest to the new tag by
# digest (no pull/rebuild). :latest now points at the exact image
# that just passed the prod gate.
docker buildx imagetools create \
--tag "${TENANT_IMAGE_NAME}:latest" \
"${TENANT_IMAGE_NAME}:${SHA_TAG}"
docker buildx imagetools create \
--tag "${STAGING_TENANT_IMAGE_NAME}:latest" \
"${STAGING_TENANT_IMAGE_NAME}:${SHA_TAG}"
{
echo ""
echo "### :latest promoted"
echo ""
echo "Re-pointed \`platform-tenant:latest\` → \`${SHA_TAG}\` (prod + staging ECR)."
echo ":latest now tracks the prod-blessed, fleet-verified image."
} >> "$GITHUB_STEP_SUMMARY"
+8 -89
View File
@@ -9,22 +9,10 @@
# Triggers on:
# - `pull_request_target`: opened, synchronize, reopened
# → initial status posts when PR opens / re-pushes
# - `pull_request_review` types: [submitted]
# → re-evaluate when a team member submits an APPROVE review so
# the gate flips immediately (no wait for the next push or
# slash-command). Verified live: sop-tier-check.yml uses this
# same event and provably fires (produces
# `sop-tier-check / tier-check (pull_request_review)` contexts).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so
# only APPROVE reviews run the evaluator; COMMENT and
# REQUEST_CHANGES are skipped at the job level.
# Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS
# the required context via the API. Trust boundary preserved
# (BASE ref, no PR-head).
# - comment refires are handled by `sop-checklist.yml` review-refire job
# → `/qa-recheck` slash-command re-evaluates this gate.
# - comment refires are handled by `review-refire-comments.yml`
# → a single issue_comment dispatcher prevents every SOP/review
# comment from enqueueing separate qa/security/tier jobs on
# Gitea 1.22.6 before job-level `if:` can skip them.
# Workflow name = `qa-review` ; job name = `approved`.
# The job's own pass/fail conclusion publishes the status context
# `qa-review / approved (<event>)` — NO `POST /statuses` call → NO
@@ -97,26 +85,21 @@ name: qa-review
on:
pull_request_target:
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
permissions:
contents: read
pull-requests: read
statuses: write
secrets: read
jobs:
# bp-exempt: PR review bot signal; required merge state is enforced by CI / all-required.
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
# Comment-triggered refires live in review-refire-comments.yml. Keeping
# this workflow PR-only avoids comment-triggered queue storms.
if: |
github.event_name == 'pull_request_target' ||
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
github.event_name == 'pull_request_target'
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -160,7 +143,6 @@ jobs:
ref: ${{ github.event.repository.default_branch }}
- name: Evaluate qa-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -175,66 +157,3 @@ jobs:
REVIEW_CHECK_DEBUG: '0'
REVIEW_CHECK_STRICT: '0'
run: bash .gitea/scripts/review-check.sh
- name: Post required status context on pull_request_review
# Gitea Actions auto-publishes (pull_request_review) context
# for this event, but branch-protection requires (pull_request_target).
# We explicitly POST the BP-required context so the gate flips.
# Trust boundary: same BASE-ref script result, no PR-head code.
#
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
GITEA_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
EVAL_OUTCOME: ${{ steps.eval.outcome }}
run: |
set -euo pipefail
authfile=$(mktemp)
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
prfile=$(mktemp)
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/pulls/${PR_NUMBER}")
if [ "$code" != "200" ]; then
echo "::error::GET /pulls/${PR_NUMBER} returned HTTP ${code}"
rm -f "$prfile" "$authfile"
exit 1
fi
head_sha=$(jq -r '.head.sha // ""' "$prfile")
rm -f "$prfile"
if [ "$EVAL_OUTCOME" = "success" ]; then
status_state="success"
description="Approved via pull_request_review trigger"
else
status_state="failure"
description="Review check failed via pull_request_review trigger"
fi
body=$(jq -nc \
--arg state "$status_state" \
--arg context "qa-review / approved (pull_request_target)" \
--arg description "$description" \
'{state:$state, context:$context, description:$description}')
post_code=$(curl -sS -o /dev/null -w '%{http_code}' -X POST \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/statuses/${head_sha}")
rm -f "$authfile"
if [ "$post_code" != "200" ] && [ "$post_code" != "201" ]; then
echo "::error::POST /statuses/${head_sha} returned HTTP ${post_code}"
exit 1
fi
echo "::notice::posted ${status_state} for context=\"qa-review / approved (pull_request_target)\" on sha=${head_sha}"
+1 -1
View File
@@ -51,7 +51,7 @@ jobs:
name: Audit Railway env vars for drift-prone pins
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 10
@@ -73,7 +73,7 @@ jobs:
# it never queues behind PR-CI. `publish` -> molecule-runner-publish-*.
runs-on: publish
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
env:
@@ -80,7 +80,7 @@ jobs:
# `publish` -> molecule-runner-publish-* sub-pool.
runs-on: publish
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 25
steps:
+1 -1
View File
@@ -54,7 +54,7 @@ jobs:
# runners with internet access to package mirrors). Falls back to GitHub
# binary download. GitHub releases may be blocked on some runner networks
# (infra#241 follow-up).
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
run: |
if apt-get update -qq && apt-get install -y -qq jq; then
+1 -1
View File
@@ -57,7 +57,7 @@ jobs:
name: Detect SECRET_PATTERNS drift
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
timeout-minutes: 5
steps:
+4 -87
View File
@@ -6,44 +6,25 @@
#
# See `qa-review.yml` header for the full A1-α / A1.1 / A4 / A5 design
# rationale; everything below is identical in shape.
#
# A1-α addendum (internal#760): review-event trigger added so the security
# gate flips immediately when a team member submits an APPROVE review.
# Uses `pull_request_review` types: [submitted] — verified live via
# sop-tier-check.yml which provably fires this event (produces
# `sop-tier-check / tier-check (pull_request_review)` contexts).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so only APPROVE
# reviews run the evaluator; COMMENT and REQUEST_CHANGES are skipped at
# the job level. Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS the required
# context via the API. Trust boundary preserved (BASE ref, no PR-head).
name: security-review
on:
pull_request_target:
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
permissions:
contents: read
pull-requests: read
statuses: write
secrets: read
jobs:
# bp-exempt: PR security review bot signal; required merge state is enforced by CI / all-required.
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
# Comment-triggered refires live in review-refire-comments.yml. Keeping
# this workflow PR-only avoids comment-triggered queue storms.
if: |
github.event_name == 'pull_request_target' ||
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
github.event_name == 'pull_request_target'
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -76,7 +57,6 @@ jobs:
ref: ${{ github.event.repository.default_branch }}
- name: Evaluate security-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -88,66 +68,3 @@ jobs:
REVIEW_CHECK_DEBUG: '0'
REVIEW_CHECK_STRICT: '0'
run: bash .gitea/scripts/review-check.sh
- name: Post required status context on pull_request_review
# Gitea Actions auto-publishes (pull_request_review) context
# for this event, but branch-protection requires (pull_request_target).
# We explicitly POST the BP-required context so the gate flips.
# Trust boundary: same BASE-ref script result, no PR-head code.
#
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
GITEA_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
EVAL_OUTCOME: ${{ steps.eval.outcome }}
run: |
set -euo pipefail
authfile=$(mktemp)
chmod 600 "$authfile"
printf 'header = "Authorization: token %s"\n' "$GITEA_TOKEN" > "$authfile"
prfile=$(mktemp)
code=$(curl -sS -o "$prfile" -w '%{http_code}' -K "$authfile" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/pulls/${PR_NUMBER}")
if [ "$code" != "200" ]; then
echo "::error::GET /pulls/${PR_NUMBER} returned HTTP ${code}"
rm -f "$prfile" "$authfile"
exit 1
fi
head_sha=$(jq -r '.head.sha // ""' "$prfile")
rm -f "$prfile"
if [ "$EVAL_OUTCOME" = "success" ]; then
status_state="success"
description="Approved via pull_request_review trigger"
else
status_state="failure"
description="Review check failed via pull_request_review trigger"
fi
body=$(jq -nc \
--arg state "$status_state" \
--arg context "security-review / approved (pull_request_target)" \
--arg description "$description" \
'{state:$state, context:$context, description:$description}')
post_code=$(curl -sS -o /dev/null -w '%{http_code}' -X POST \
-K "$authfile" -H "Content-Type: application/json" \
-d "$body" \
"https://${GITEA_HOST}/api/v1/repos/${REPO}/statuses/${head_sha}")
rm -f "$authfile"
if [ "$post_code" != "200" ] && [ "$post_code" != "201" ]; then
echo "::error::POST /statuses/${head_sha} returned HTTP ${post_code}"
exit 1
fi
echo "::notice::posted ${status_state} for context=\"security-review / approved (pull_request_target)\" on sha=${head_sha}"
+6 -6
View File
@@ -179,10 +179,10 @@ jobs:
- name: Refire qa-review status
if: steps.classify.outputs.run_qa == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
# RFC_324_TEAM_READ_TOKEN is read-only (team membership read scope only).
# review-refire-status.sh POSTs to /statuses — requires write scope.
# SOP_TIER_CHECK_TOKEN carries write:repository + write:issue + read:organization.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
@@ -198,10 +198,10 @@ jobs:
- name: Refire security-review status
if: steps.classify.outputs.run_security == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
# RFC_324_TEAM_READ_TOKEN is read-only (team membership read scope only).
# review-refire-status.sh POSTs to /statuses — requires write scope.
# SOP_TIER_CHECK_TOKEN carries write:repository + write:issue + read:organization.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
+3 -3
View File
@@ -36,7 +36,7 @@
# window closed. continue-on-error: true has been removed from the
# tier-check job; AND-composition is now fully enforced. If you need
# to temporarily re-introduce a mask, file a tracker and follow the
# mc#1982 protocol (Tier 2e lint requires a current tracker within
# mc#774 protocol (Tier 2e lint requires a current tracker within
# 2 lines of any continue-on-error: true).
name: sop-tier-check
@@ -92,7 +92,7 @@ jobs:
# runners). The sop-tier-check script has its own fallback as a
# third line of defense. continue-on-error: true ensures this step
# failing does not block the job.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
run: |
# apt-get is the primary method — Ubuntu package mirrors are reliably
@@ -113,7 +113,7 @@ jobs:
# continue-on-error: true at step level — job-level is ignored by Gitea
# Actions (quirk #10, internal runbooks). Belt-and-suspenders with
# SOP_FAIL_OPEN=1 + || true below.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
+2 -2
View File
@@ -90,7 +90,7 @@ jobs:
staging-smoke:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
sha: ${{ steps.compute.outputs.sha }}
@@ -212,7 +212,7 @@ jobs:
if: ${{ needs.staging-smoke.result == 'success' && needs.staging-smoke.outputs.smoke_ran == 'true' }}
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
env:
SHA: ${{ needs.staging-smoke.outputs.sha }}
+1 -1
View File
@@ -71,7 +71,7 @@ jobs:
name: Sweep CF orphans
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 3 min surfaces hangs (CF API stall, AWS describe-instances stuck)
# within one cron interval instead of burning a full tick. Realistic
+1 -1
View File
@@ -55,7 +55,7 @@ jobs:
name: Sweep CF tunnels
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 30 min cap. Was 5 min on the theory that the only thing that
# could take >5min is a CF-API hang — but on 2026-05-02 a backlog
-100
View File
@@ -1,100 +0,0 @@
name: sync-providers-yaml
# Cross-repo canonical↔synced-copy drift gate (internal#718 P2-A, CTO
# 2026-05-27 "Distribution = SDK via codegen + verify-CI", multi-repo branch:
# "codegen-checked-into-each-repo + verify-CI").
#
# The canonical provider-registry SSOT is molecule-controlplane
# internal/providers/providers.yaml. molecule-core has NO Go module dependency
# on controlplane, so instead of importing it we carry a SYNCED COPY at
# workspace-server/internal/providers/providers.yaml and gate it.
#
# This workflow fetches the canonical providers.yaml from controlplane (via the
# Gitea raw endpoint, read-only) and byte-compares it against core's synced
# copy. RED if they differ — meaning the canonical moved and core's copy must be
# re-synced (copy verbatim + `go generate ./...` + bump
# canonicalProvidersYAMLSHA256 in sync_canonical_test.go).
#
# Pairs with:
# * sync_canonical_test.go — hermetic sha pin (catches a hand-edit of core's
# copy even with no network); runs in the normal `go test ./...`.
# * verify-providers-gen.yml — artifact ↔ synced-copy drift.
#
# ENFORCEMENT GATING: standalone workflow, NOT a job in ci.yml and NOT in
# branch protection (same soak-then-promote posture as verify-providers-gen).
# It is intentionally absent from ci.yml's job set so the ci-required-drift
# sentinel does not fire on it.
#
# AUTH: uses AUTO_SYNC_TOKEN (the existing cross-repo read token used to sync
# template/provider content from sibling repos). If the secret is absent the
# job emits a clear ::warning:: and exits 0 — the hermetic sha pin in
# sync_canonical_test.go is the always-on backstop, so a missing cross-repo
# token degrades to "hand-edit still caught, live canonical drift not caught"
# rather than a hard red that blocks unrelated PRs.
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- 'workspace-server/internal/providers/providers.yaml'
- '.gitea/workflows/sync-providers-yaml.yml'
push:
branches: [main, staging]
paths:
- 'workspace-server/internal/providers/providers.yaml'
- '.gitea/workflows/sync-providers-yaml.yml'
schedule:
# Daily at :23 — catch a canonical change in controlplane that landed
# without a paired core re-sync PR (off-zero to spread cron load).
- cron: '23 4 * * *'
workflow_dispatch:
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
concurrency:
group: sync-providers-yaml-${{ github.ref }}
cancel-in-progress: true
jobs:
# bp-required: pending #718 — soak-then-promote, not in BP yet.
compare:
name: Compare synced providers.yaml against controlplane canonical
runs-on: ubuntu-latest
timeout-minutes: 6
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Fetch canonical providers.yaml from controlplane and byte-compare
env:
AUTO_SYNC_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
API_ROOT: ${{ github.server_url }}/api/v1
run: |
set -euo pipefail
if [ -z "${AUTO_SYNC_TOKEN:-}" ]; then
echo "::warning::AUTO_SYNC_TOKEN secret missing — skipping the live cross-repo compare."
echo "The hermetic sha pin (sync_canonical_test.go) still gates hand-edits of core's copy."
echo "Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane) to enable live canonical-drift detection."
exit 0
fi
CANON_URL="${API_ROOT}/repos/molecule-ai/molecule-controlplane/raw/internal/providers/providers.yaml?ref=main"
# Use the /raw endpoint: it returns the file bytes directly. (The
# /contents endpoint ignores Accept: application/vnd.gitea.raw on
# Gitea 1.22.6 and returns the JSON+base64 envelope, which made this
# diff a permanent false RED.)
curl -fsS \
-H "Authorization: token ${AUTO_SYNC_TOKEN}" \
"${CANON_URL}" -o /tmp/canonical-providers.yaml
LOCAL=workspace-server/internal/providers/providers.yaml
if diff -u /tmp/canonical-providers.yaml "$LOCAL"; then
echo "OK — core's synced providers.yaml is byte-identical to the controlplane canonical."
else
echo "::error::core's synced providers.yaml DRIFTED from the controlplane canonical (SSOT)."
echo "Re-sync: copy controlplane internal/providers/providers.yaml verbatim over"
echo " $LOCAL, run 'go generate ./...' in workspace-server/, and bump"
echo " canonicalProvidersYAMLSHA256 in internal/providers/sync_canonical_test.go."
exit 1
fi
+1 -1
View File
@@ -49,7 +49,7 @@ jobs:
name: Ops scripts (unittest)
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-108
View File
@@ -1,108 +0,0 @@
name: verify-providers-gen
# Provider-registry SSOT enforcement gate — molecule-core side (internal#718
# P2-A, CTO 2026-05-27 "Distribution = SDK via codegen + verify-CI").
#
# The canonical schema SSOT is molecule-controlplane
# internal/providers/providers.yaml. molecule-core carries a SYNCED COPY at
# workspace-server/internal/providers/providers.yaml (kept in sync by the
# companion sync-providers-yaml.yml gate), and cmd/gen-providers emits the
# checked-in Go projection workspace-server/internal/providers/gen/registry_gen.go.
#
# This workflow regenerates the artifact into the working tree and fails RED if
# it differs from what is committed — catching BOTH:
# * a providers.yaml (synced-copy) change that wasn't followed by `go generate ./...`, and
# * a hand-edit of the generated artifact (it carries a DO NOT EDIT header).
#
# It is the molecule-core mirror of molecule-controlplane's verify-providers-gen
# workflow. Together with sync-providers-yaml (canonical↔synced-copy drift) it
# closes the codegen-checked-into-each-repo + verify-CI loop the RFC mandates.
#
# ENFORCEMENT GATING (deliberate, per dev-SOP "implementation gating"):
# this is a STANDALONE workflow, NOT a job inside ci.yml, and is NOT yet in any
# branch-protection status_check_contexts. Rationale (identical to the CP P0
# rollout):
# * It runs + reports RED on every PR/push immediately (visible signal).
# * It is intentionally absent from ci.yml's job set so the ci-required-drift
# sentinel (jobs ↔ branch-protection ↔ audit-env) does NOT fire on it, and
# from branch protection (turning it into a hard merge gate has blast radius
# — operator GO required, same pattern as sop-tier-check / verify-providers-gen
# on controlplane). Promote it into branch protection in a follow-up once
# P2 has soaked.
# Until then it behaves like secret-scan / block-internal-paths: a standalone
# advisory-to-hard gate the author is expected to keep green.
on:
pull_request:
types: [opened, synchronize, reopened]
# CI-scheduler-overload fix (fix/ci-scheduler-fanout, 2026-06-01):
# this gate only verifies that the generated providers artifact is in
# sync with the schema SSOT. Its verdict can ONLY change when one of
# the codegen inputs/outputs changes, so firing the Go toolchain on
# every unrelated PR (docs, canvas, scripts) is pure fan-out cost.
# Scoped to the codegen surface. SAFE because this workflow is NOT a
# branch-protection status_check_context (see header §ENFORCEMENT
# GATING) — lint-required-no-paths only forbids paths filters on
# REQUIRED workflows; this is advisory, so a paths filter is allowed.
# Mirrors the sibling sync-providers-yaml.yml scoping convention.
paths:
- 'workspace-server/internal/providers/**'
- 'workspace-server/cmd/gen-providers/**'
- '.gitea/workflows/verify-providers-gen.yml'
push:
branches: [main, staging]
paths:
- 'workspace-server/internal/providers/**'
- 'workspace-server/cmd/gen-providers/**'
- '.gitea/workflows/verify-providers-gen.yml'
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
concurrency:
group: verify-providers-gen-${{ github.ref }}
cancel-in-progress: true
jobs:
# bp-required: pending #718 — soak-then-promote, not in BP yet.
verify:
name: Regenerate providers artifact and fail on drift
runs-on: ubuntu-latest
timeout-minutes: 8
defaults:
run:
working-directory: workspace-server
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Verify generated artifact is in sync with providers.yaml
run: |
set -euo pipefail
# -check regenerates in memory and byte-compares against the
# checked-in artifact; exit 1 (RED) on any drift. This is the
# single source of the gate's verdict — the same code path
# `go test ./cmd/gen-providers` exercises.
go run ./cmd/gen-providers -check
- name: Belt-and-braces — regenerate in place and assert clean tree
run: |
set -euo pipefail
# Independent confirmation that does not trust the -check path:
# actually write the artifact and assert git sees no change. If
# this and the step above ever disagree, the gate is suspect.
go generate ./...
if ! git diff --quiet -- internal/providers/gen/registry_gen.go; then
echo "::error::workspace-server/internal/providers/gen/registry_gen.go drifted from providers.yaml."
echo "Run 'go generate ./...' (or 'go run ./cmd/gen-providers') in workspace-server/ and commit the result."
git --no-pager diff -- internal/providers/gen/registry_gen.go | head -80
exit 1
fi
echo "OK — generated providers artifact is in sync with the schema SSOT."
+1 -1
View File
@@ -31,7 +31,7 @@ jobs:
name: Weekly Platform-Go Surface
runs-on: ubuntu-latest
# continue-on-error: surface only, never block
# mc#1982: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
defaults:
run:
-12
View File
@@ -46,18 +46,6 @@
---
## Quick Start
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
cd molecule-core
./scripts/dev-start.sh
```
Then open [http://localhost:3000](http://localhost:3000), add your model API key in **Config → Secrets & API Keys → Global**, and create a workspace from a template.
See the full [Quickstart Guide](./docs/quickstart.md) for prerequisites, manual setup, and troubleshooting.
## The Pitch
Molecule AI is the most powerful way to govern an AI agent organization in production.
+2 -17
View File
@@ -60,26 +60,11 @@ test.describe("MobileChat", () => {
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible({ timeout: 15_000 });
// Reload and deterministically wait for the chat-history GET that
// rehydrates the transcript to come back 2xx, rather than racing a
// fixed-timeout render assertion against an in-flight fetch. The
// server now persists the a2a_receive row SYNCHRONOUSLY before the
// send's 200 (workspace-server logA2ASuccess), so the row is
// guaranteed present by the time this GET runs — the wait is for
// hydration latency, not for a still-racing write.
const historyResponse = page.waitForResponse(
(resp) =>
resp.url().includes("/chat-history") &&
resp.request().method() === "GET" &&
resp.status() === 200,
{ timeout: 15_000 },
);
await page.reload();
await page.waitForSelector("[data-testid='chat-panel']", { timeout: 10_000 });
await historyResponse;
await expect(page.getByText("Mobile persistence", { exact: true })).toBeVisible();
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible();
await expect(page.getByText("Mobile persistence", { exact: true })).toBeVisible({ timeout: 5_000 });
await expect(page.getByText("Echo: Mobile persistence")).toBeVisible({ timeout: 5_000 });
});
test("composer auto-grows with multi-line text", async ({ page }) => {
+4 -53
View File
@@ -250,38 +250,7 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
const workspaceId = ws.body.id as string;
console.log(`[staging-setup] Workspace created: ${workspaceId}`);
// 6. Wait for workspace RENDERABLE.
//
// This harness exists to verify the canvas *tab UI* renders (staging-
// tabs.spec.ts: open each of the 13 workspace-panel tabs, assert no hard
// crash / no "Failed to load" toast). It does NOT exercise the agent —
// no LLM call is made, the spec even mocks /cp/auth/me and 401→200. All
// it needs is a workspace ROW that the canvas lists so the node renders
// and the side-panel tabs open. A fully-`online` agent is NOT required.
//
// That distinction became load-bearing on 2026-06-03: workspace-server
// #2162 (fix(provision): platform-managed workspace must fail-closed when
// CP proxy env absent) made a platform_managed workspace ABORT AT BOOT
// with MISSING_PLATFORM_PROXY when MOLECULE_LLM_BASE_URL /
// MOLECULE_LLM_USAGE_TOKEN are not present in the tenant's env. The
// canvas E2E creates a bare hermes/gpt-4o workspace, which defaults
// closed to platform_managed (workspace_provision.go:~1009), and the
// staging tenant does not carry the CP proxy env — so the agent never
// starts. Pre-#2162 this same workspace booted credential-less (the bug
// #2162 fixed) and the tabs rendered fine; #2162 is a correct production
// safety fix, but it surfaced here as `status:"failed", uptime_seconds:0,
// last_sample_error:null` — the pre-start credential-abort shape — and the
// old hard-throw turned a UI-irrelevant boot skip into a main-red
// (core#2199). The agent boot stage is simply not what this test gates.
//
// So: online is the happy path. A `failed` row that is the PRE-START
// credential-abort shape (the agent process never ran: uptime_seconds==0
// AND no last_sample_error) is treated as RENDERABLE — the row exists,
// the node + tabs render, proceed. We do NOT mask a real boot regression:
// any `failed` carrying a last_sample_error, OR a non-zero uptime (the
// agent started then crashed — image pull, panic, PYTHONPATH, etc.),
// still hard-throws. Genuine *infra* provision failure is already caught
// loud one step earlier at the org level (instance_status === "failed").
// 6. Wait for workspace online
await waitFor<boolean>(
async () => {
const r = await jsonFetch(`${tenantURL}/workspaces/${workspaceId}`, {
@@ -290,24 +259,6 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
if (r.status !== 200) return null;
if (r.body?.status === "online") return true;
if (r.body?.status === "failed") {
const uptime = Number(r.body?.uptime_seconds ?? 0);
const sampleErr = r.body?.last_sample_error;
const preStartCredentialAbort = uptime === 0 && !sampleErr;
if (preStartCredentialAbort) {
// Agent never started (no LLM cred on this staging tenant — the
// expected #2162 platform-proxy gap). The workspace row still
// renders, which is all the tab-UI test needs. Proceed, but log
// loudly so a real "agent never booted because of something else"
// is not silently normalized.
console.warn(
`[staging-setup] workspace ${workspaceId} is 'failed' with the pre-start ` +
`credential-abort shape (uptime_seconds=0, no last_sample_error) — agent did ` +
`not boot (expected on staging without CP LLM proxy env, post workspace-server ` +
`#2162). The tab-UI test does not exercise the agent; proceeding with the ` +
`workspace row, which renders regardless. full body: ${JSON.stringify(r.body)}`,
);
return true;
}
// last_sample_error is often empty when the failure happens before
// the agent emits a sample (e.g. boot crash, image pull error,
// missing PYTHONPATH, OpenAI quota at startup). Dumping the full
@@ -315,8 +266,8 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
// needs without a second probe. Otherwise this propagates as a
// bare "Workspace failed: " — the exact useless message that
// sent #2632 to the issue tracker.
const detail = sampleErr
? sampleErr
const detail = r.body.last_sample_error
? r.body.last_sample_error
: `(no last_sample_error) full body: ${JSON.stringify(r.body)}`;
throw new Error(`Workspace failed: ${detail}`);
}
@@ -326,7 +277,7 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
10_000,
"workspace online",
);
console.log(`[staging-setup] Workspace renderable`);
console.log(`[staging-setup] Workspace online`);
// 7. Hand state off to tests + teardown — overwrite the slug-only
// bootstrap state with the full state spec tests need.
-6
View File
@@ -41,12 +41,6 @@ describe("buildCsp — production", () => {
expect(csp).toContain("object-src 'none'");
});
it("allows blob: in frame-src for authenticated PDF previews", () => {
const frameSrc = csp.match(/frame-src[^;]*/)?.[0] ?? "";
expect(frameSrc).toContain("'self'");
expect(frameSrc).toContain("blob:");
});
it("locks base-uri to 'self' (prevents base-tag injection)", () => {
expect(csp).toContain("base-uri 'self'");
});
+1 -1
View File
@@ -41,7 +41,7 @@ export default function PricingPage() {
<p className="mt-2 text-ink-mid">
We publish the{" "}
<a
href="https://git.moleculesai.app/molecule-ai/molecule-core"
href="https://git.moleculesai.app/molecule-ai/molecule-monorepo"
className="text-accent underline hover:text-accent"
>
full source on GitHub
+368 -195
View File
@@ -5,16 +5,6 @@ import * as Dialog from "@radix-ui/react-dialog";
import { api } from "@/lib/api";
import { isSaaSTenant } from "@/lib/tenant";
import { ExternalConnectModal, type ExternalConnectionInfo } from "./ExternalConnectModal";
import {
ProviderModelSelector,
buildProviderCatalog,
buildProviderCatalogFromRegistry,
findProviderForModel,
type SelectorModel,
type SelectorValue,
type RegistryProvider,
type RegistryModel,
} from "./ProviderModelSelector";
interface WorkspaceOption {
id: string;
@@ -32,40 +22,96 @@ interface TemplateSpec {
id: string;
name?: string;
runtime?: string;
model?: string;
models?: SelectorModel[];
providers?: string[];
// internal#718 P3 registry-served fields (additive; absent on older
// backends and for non-registry runtimes). When registry_backed is true the
// provider→model catalog is built from registry_providers/registry_models so
// each model's DERIVED provider (e.g. moonshot/kimi-k2.6 → "platform") drives
// the dropdown bucket and the create payload's llm_provider — instead of the
// legacy inferVendor heuristic that slash-splits the id into "moonshot".
// Mirrors ConfigTab's RuntimeOption loader (RFC#340 Fix C).
registry_backed?: boolean;
registry_providers?: RegistryProvider[];
registry_models?: RegistryModel[];
}
const DEFAULT_RUNTIME = "claude-code";
const RUNTIME_OPTIONS = [
{ value: "claude-code", label: "Claude Code" },
{ value: "codex", label: "OpenAI Codex CLI" },
{ value: "google-adk", label: "Google ADK" },
{ value: "hermes", label: "Hermes" },
{ value: "openclaw", label: "OpenClaw" },
interface HermesProvider {
id: string;
label: string;
envVar: string;
defaultModel: string;
models: string[];
}
type LLMAuthMode = "platform" | "api_key" | "oauth";
interface NativeLLMProvider {
id: string;
label: string;
envVar?: string;
defaultModel: string;
models: string[];
authModes: LLMAuthMode[];
}
export const NATIVE_LLM_PROVIDERS: NativeLLMProvider[] = [
{
id: "minimax",
label: "MiniMax",
envVar: "MINIMAX_API_KEY",
defaultModel: "MiniMax-M2.7",
models: ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5"],
authModes: ["platform", "api_key"],
},
{
id: "kimi-coding",
label: "Kimi",
envVar: "KIMI_API_KEY",
defaultModel: "kimi-for-coding",
models: ["kimi-for-coding", "kimi-k2.5", "kimi-k2"],
authModes: ["platform", "api_key"],
},
{
id: "anthropic",
label: "Anthropic",
envVar: "ANTHROPIC_API_KEY",
defaultModel: "claude-sonnet-4-6",
models: ["claude-sonnet-4-6", "claude-opus-4-7", "claude-haiku-4-5"],
authModes: ["platform", "api_key"],
},
{
id: "anthropic-oauth",
label: "Claude OAuth",
envVar: "CLAUDE_CODE_OAUTH_TOKEN",
defaultModel: "sonnet",
models: ["sonnet", "opus", "haiku"],
authModes: ["oauth"],
},
];
const BASE_RUNTIME_TEMPLATE_IDS = new Set(["claude-code-default", "codex", "google-adk", "hermes", "openclaw"]);
const DEFAULT_HEADLESS_INSTANCE_TYPE = "t3.medium";
const DEFAULT_HEADLESS_ROOT_GB = 30;
const DEFAULT_DISPLAY_INSTANCE_TYPE = "t3.xlarge";
const DEFAULT_DISPLAY_ROOT_GB = 80;
// All providers supported by Hermes runtime via providers.resolve_provider().
// `defaultModel` is the slug injected into the workspace provision request
// when the user picks this provider — template-hermes's derive-provider.sh
// maps the prefix back to the provider name at install time, so this is
// the canonical handshake. `models` are additional suggestions surfaced in
// the datalist so the user can pick a different size without typing the
// whole slug.
export const HERMES_PROVIDERS: HermesProvider[] = [
{ id: "anthropic", label: "Anthropic (Claude)", envVar: "ANTHROPIC_API_KEY", defaultModel: "anthropic/claude-sonnet-4-5", models: ["anthropic/claude-opus-4-5", "anthropic/claude-sonnet-4-5", "anthropic/claude-haiku-4-5"] },
{ id: "openai", label: "OpenAI", envVar: "OPENAI_API_KEY", defaultModel: "openai/gpt-4o", models: ["openai/gpt-4o", "openai/gpt-4o-mini", "openai/o3-mini"] },
{ id: "openrouter", label: "OpenRouter", envVar: "OPENROUTER_API_KEY", defaultModel: "openrouter/auto", models: ["openrouter/auto", "openrouter/anthropic/claude-sonnet-4", "openrouter/meta-llama/llama-3.3-70b"] },
{ id: "xai", label: "xAI (Grok)", envVar: "XAI_API_KEY", defaultModel: "xai/grok-4", models: ["xai/grok-4", "xai/grok-4-mini"] },
{ id: "gemini", label: "Google Gemini", envVar: "GEMINI_API_KEY", defaultModel: "gemini/gemini-2.5-pro", models: ["gemini/gemini-2.5-pro", "gemini/gemini-2.5-flash"] },
{ id: "qwen", label: "Qwen (Alibaba)", envVar: "QWEN_API_KEY", defaultModel: "alibaba/qwen3-max", models: ["alibaba/qwen3-max", "alibaba/qwen3-coder"] },
{ id: "glm", label: "GLM (Zhipu AI)", envVar: "GLM_API_KEY", defaultModel: "zai/glm-4.6", models: ["zai/glm-4.6", "zai/glm-4.5-air"] },
{ id: "kimi", label: "Kimi (Moonshot)", envVar: "KIMI_API_KEY", defaultModel: "kimi-coding/kimi-k2", models: ["kimi-coding/kimi-k2", "kimi-coding/kimi-k1.5"] },
{ id: "minimax", label: "MiniMax", envVar: "MINIMAX_API_KEY", defaultModel: "minimax/MiniMax-M2.7", models: ["minimax/MiniMax-M2.7", "minimax/MiniMax-M2.7-highspeed", "minimax/MiniMax-M1"] },
{ id: "deepseek", label: "DeepSeek", envVar: "DEEPSEEK_API_KEY", defaultModel: "deepseek/deepseek-chat", models: ["deepseek/deepseek-chat", "deepseek/deepseek-reasoner"] },
{ id: "groq", label: "Groq", envVar: "GROQ_API_KEY", defaultModel: "openrouter/groq/llama-3.3-70b", models: ["openrouter/groq/llama-3.3-70b"] },
{ id: "mistral", label: "Mistral", envVar: "MISTRAL_API_KEY", defaultModel: "openrouter/mistralai/mistral-large", models: ["openrouter/mistralai/mistral-large"] },
{ id: "together", label: "Together AI", envVar: "TOGETHER_API_KEY", defaultModel: "openrouter/meta-llama/llama-3.3-70b", models: ["openrouter/meta-llama/llama-3.3-70b"] },
{ id: "fireworks", label: "Fireworks AI", envVar: "FIREWORKS_API_KEY", defaultModel: "openrouter/meta-llama/llama-3.3-70b", models: ["openrouter/meta-llama/llama-3.3-70b"] },
{ id: "hermes", label: "Hermes / Nous (legacy)", envVar: "HERMES_API_KEY", defaultModel: "nousresearch/Hermes-3-Llama-3.1-405B", models: ["nousresearch/Hermes-3-Llama-3.1-405B", "nousresearch/Hermes-4-14B"] },
];
export function CreateWorkspaceButton() {
const [open, setOpen] = useState(false);
const [name, setName] = useState("");
const [role, setRole] = useState("");
const [runtime, setRuntime] = useState(DEFAULT_RUNTIME);
const [template, setTemplate] = useState("");
const [parentId, setParentId] = useState("");
const [budgetLimit, setBudgetLimit] = useState("");
@@ -80,22 +126,32 @@ export function CreateWorkspaceButton() {
// filter below. Same data source ConfigTab uses (PR #2454). When the
// selected template declares `runtime_config.providers` in its
// config.yaml, the modal surfaces only those providers in the
// <select>. Provider/model options are derived from template models.
// <select>. Empty/missing list falls back to the full HERMES_PROVIDERS
// catalog so older templates without the field keep working.
const [templateSpecs, setTemplateSpecs] = useState<TemplateSpec[]>([]);
// External-runtime path: skip docker provision, mint a workspace_auth_token,
// and surface the connection snippet in a modal after create. When
// isExternal is true the template and model fields are hidden (they're
// meaningless for BYO-compute agents).
// isExternal is true the template / model / hermes-provider fields are
// hidden (they're meaningless for BYO-compute agents).
const [isExternal, setIsExternal] = useState(false);
const [externalRuntime, setExternalRuntime] = useState("external");
const [externalConnection, setExternalConnection] =
useState<ExternalConnectionInfo | null>(null);
const [llmSelection, setLLMSelection] = useState<SelectorValue>({
providerId: "",
model: "",
envVars: [],
});
// Hermes-specific state
const [hermesProvider, setHermesProvider] = useState("anthropic");
const [hermesApiKey, setHermesApiKey] = useState("");
// Model slug is sent to CP as `model` and plumbed to the workspace EC2
// as HERMES_DEFAULT_MODEL env var. template-hermes's derive-provider.sh
// reads the prefix (`minimax/…`, `anthropic/…`) to set
// HERMES_INFERENCE_PROVIDER at install time. Missing model → provider
// falls back to "auto" and hermes picks its compiled-in default
// (Anthropic), which 401s if the user's key is for a different
// provider. Hence: require model when template=hermes.
const [hermesModel, setHermesModel] = useState("");
const [llmAuthMode, setLLMAuthMode] = useState<LLMAuthMode>("platform");
const [llmProvider, setLLMProvider] = useState("minimax");
const [llmModel, setLLMModel] = useState("MiniMax-M2.7");
const [llmSecret, setLLMSecret] = useState("");
// Tier picker: on SaaS every workspace gets its own EC2 VM (Full Access
@@ -152,103 +208,93 @@ export function CreateWorkspaceButton() {
[]
);
const handleRuntimeChange = useCallback((nextRuntime: string) => {
setRuntime(nextRuntime);
setTemplate("");
setLLMSelection({ providerId: "", model: "", envVars: [] });
setLLMSecret("");
}, []);
const isHermes = template.trim().toLowerCase() === "hermes";
const nativeLLMProviders = useMemo(
() => NATIVE_LLM_PROVIDERS.filter((p) => p.authModes.includes(llmAuthMode)),
[llmAuthMode],
);
const selectedNativeProvider = useMemo(
() => nativeLLMProviders.find((p) => p.id === llmProvider) ?? nativeLLMProviders[0],
[llmProvider, nativeLLMProviders],
);
// Resolve the selected workspace template from /templates. Runtime is
// deliberately separate: "SEO Agent" is a workspace template, not a
// runtime, so it must never appear in the runtime selector.
// Resolve the selected template's spec from the /templates response.
// The `template` input is free-text; templates can be matched by id,
// name, or runtime so any of those work. Lower-cased compare keeps
// "Hermes" / "hermes" / "HERMES" interchangeable.
const selectedTemplateSpec = useMemo<TemplateSpec | null>(() => {
if (!template) return null;
return templateSpecs.find((s) => s.id === template) ?? null;
const t = template.trim().toLowerCase();
if (!t) return null;
return (
templateSpecs.find(
(s) =>
(s.id || "").toLowerCase() === t ||
(s.name || "").toLowerCase() === t ||
(s.runtime || "").toLowerCase() === t,
) ?? null
);
}, [template, templateSpecs]);
const selectedRuntimeTemplateSpec = useMemo<TemplateSpec | null>(() => (
templateSpecs.find((s) => {
if (!BASE_RUNTIME_TEMPLATE_IDS.has(s.id)) return false;
const specRuntime = (s.runtime ?? s.id).trim().toLowerCase();
return s.id === runtime || specRuntime === runtime;
}) ?? null
), [runtime, templateSpecs]);
const visibleTemplateSpecs = useMemo(
() => templateSpecs.filter((spec) => {
if (BASE_RUNTIME_TEMPLATE_IDS.has(spec.id)) return false;
const specRuntime = (spec.runtime ?? DEFAULT_RUNTIME).trim().toLowerCase();
return specRuntime === runtime;
}),
[runtime, templateSpecs],
);
// The /templates row backing the LLM picker: an explicitly-selected
// workspace template wins, else the base runtime template row.
const llmSourceSpec = useMemo<TemplateSpec | null>(
() => selectedTemplateSpec ?? selectedRuntimeTemplateSpec,
[selectedRuntimeTemplateSpec, selectedTemplateSpec],
);
// internal#718 P3 / RFC#340 Fix C: a runtime is registry-backed when the
// /templates row says so AND it served a non-empty registry_models set.
// Mirrors ConfigTab's `registryBacked` derivation exactly.
const registryBacked = useMemo(
() =>
llmSourceSpec?.registry_backed === true &&
(llmSourceSpec.registry_models?.length ?? 0) > 0,
[llmSourceSpec],
);
// Models fed to the selector dropdown. For a registry-backed runtime use the
// registry-served native set, carrying each model's DERIVED provider so the
// selector buckets it correctly (moonshot/kimi-k2.6 → "platform", not the
// inferVendor "moonshot"). Otherwise fall back to the template-served
// models[] + the legacy heuristic — same fallback ConfigTab keeps.
const llmModels = useMemo<SelectorModel[]>(
() => {
if (registryBacked) {
return (llmSourceSpec?.registry_models ?? []).map((m) => ({
id: m.id,
name: m.name,
...(m.provider ? { provider: m.provider } : {}),
}));
}
return llmSourceSpec?.models?.length ? llmSourceSpec.models : [];
},
[registryBacked, llmSourceSpec],
);
// Registry-backed path: build the catalog from registry_providers/
// registry_models so dropdown labels + billing + the derived provider come
// from the provider-registry SSOT (restores the "Platform" bucket). Legacy
// path: re-infer from models[] via buildProviderCatalog (inferVendor).
const llmCatalog = useMemo(
() =>
registryBacked
? buildProviderCatalogFromRegistry(
llmSourceSpec?.registry_providers ?? [],
llmSourceSpec?.registry_models ?? [],
)
: buildProviderCatalog(llmModels),
[registryBacked, llmSourceSpec, llmModels],
);
const selectedLLMProvider = useMemo(
() => llmCatalog.find((p) => p.id === llmSelection.providerId) ?? llmCatalog[0],
[llmCatalog, llmSelection.providerId],
);
// Filter HERMES_PROVIDERS by what the template declares it supports.
// Empty/missing declared list → fall back to the full catalog so
// templates that haven't migrated to the explicit `providers:` field
// (and self-hosted setups without /templates) keep working unchanged.
const availableProviders = useMemo<HermesProvider[]>(() => {
const declared = selectedTemplateSpec?.providers;
if (!declared || declared.length === 0) return HERMES_PROVIDERS;
const allowed = new Set(declared.map((p) => p.toLowerCase()));
const filtered = HERMES_PROVIDERS.filter((p) => allowed.has(p.id.toLowerCase()));
// Defensive: if the template's declared list doesn't match anything
// in our static catalog (e.g. brand-new provider id we don't have
// metadata for yet), fall back to the full list rather than render
// an empty <select>. Better to over-show than to lock the user out.
return filtered.length > 0 ? filtered : HERMES_PROVIDERS;
}, [selectedTemplateSpec]);
// If the currently-selected provider is filtered out by a template
// change, snap back to the first available. Without this, the
// hermesProvider state could refer to a provider not in the dropdown
// — confusing UI + the API key field's envVar would be wrong.
useEffect(() => {
if (!isHermes) return;
if (availableProviders.length === 0) return;
if (!availableProviders.some((p) => p.id === hermesProvider)) {
setHermesProvider(availableProviders[0].id);
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [availableProviders, isHermes]);
useEffect(() => {
if (llmCatalog.length === 0) return;
const sourceDefault = llmSourceSpec?.model?.trim();
const platformProvider = llmCatalog.find((p) => p.vendor === "platform");
const matched = sourceDefault ? findProviderForModel(llmCatalog, sourceDefault) : null;
const next = platformProvider ?? matched ?? llmCatalog[0];
const defaultModel = next.models.find((model) => model.id === sourceDefault)?.id
?? next.models[0]?.id
?? "";
setLLMSelection({
providerId: next.id,
model: next.wildcard ? "" : defaultModel,
envVars: next.envVars,
});
setLLMSecret("");
}, [llmCatalog, llmSourceSpec]);
if (isHermes) return;
if (nativeLLMProviders.length === 0) return;
if (!nativeLLMProviders.some((p) => p.id === llmProvider)) {
setLLMProvider(nativeLLMProviders[0].id);
setLLMModel(nativeLLMProviders[0].defaultModel);
}
}, [isHermes, llmProvider, nativeLLMProviders]);
useEffect(() => {
if (isHermes || !selectedNativeProvider) return;
if (!selectedNativeProvider.models.includes(llmModel)) {
setLLMModel(selectedNativeProvider.defaultModel);
}
}, [isHermes, llmModel, selectedNativeProvider]);
// Auto-fill hermesModel with the provider's defaultModel whenever the
// provider changes, but only if the user hasn't already typed their own
// slug. Prevents the empty-model → "auto" → Anthropic-default 401 trap.
useEffect(() => {
if (!isHermes) return;
const p = HERMES_PROVIDERS.find((x) => x.id === hermesProvider);
if (!p) return;
// Replace model only if current value matches another provider's
// default (user hasn't customized it) OR is empty.
const isUntouched =
hermesModel === "" ||
HERMES_PROVIDERS.some((x) => x.defaultModel === hermesModel);
if (isUntouched) setHermesModel(p.defaultModel);
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [hermesProvider, isHermes]);
// Reset form and load workspaces whenever dialog opens
useEffect(() => {
@@ -256,7 +302,6 @@ export function CreateWorkspaceButton() {
setName("");
setRole("");
setTier(defaultTier);
setRuntime(DEFAULT_RUNTIME);
setTemplate("");
setParentId("");
setBudgetLimit("");
@@ -265,8 +310,13 @@ export function CreateWorkspaceButton() {
setDisplayInstanceType(DEFAULT_DISPLAY_INSTANCE_TYPE);
setDisplayRootGB(String(DEFAULT_DISPLAY_ROOT_GB));
setDisplayResolution("1920x1080");
setHermesProvider("anthropic");
setExternalRuntime("external");
setLLMSelection({ providerId: "", model: "", envVars: [] });
setHermesApiKey("");
setHermesModel("");
setLLMAuthMode("platform");
setLLMProvider("minimax");
setLLMModel("MiniMax-M2.7");
setLLMSecret("");
api
.get<WorkspaceOption[]>("/workspaces")
@@ -275,7 +325,7 @@ export function CreateWorkspaceButton() {
api
.get<TemplateSpec[]>("/templates")
.then((rows) => setTemplateSpecs(Array.isArray(rows) ? rows : []))
.catch(() => { /* keep empty; create stays blocked until the catalog loads */ });
.catch(() => { /* keep empty — HERMES_PROVIDERS fallback below */ });
// defaultTier is stable for the session (derived from window.location),
// safe to omit from deps.
// eslint-disable-next-line react-hooks/exhaustive-deps
@@ -286,18 +336,29 @@ export function CreateWorkspaceButton() {
setError("Name is required");
return;
}
if (!isExternal && !llmSelection.model.trim()) {
if (isHermes && !hermesApiKey.trim()) {
setError("API key is required for Hermes workspaces");
return;
}
if (isHermes && !hermesModel.trim()) {
setError("Model is required for Hermes workspaces — provider routing depends on the model slug prefix");
return;
}
if (!isExternal && !isHermes && !llmModel.trim()) {
setError("Model is required");
return;
}
if (!isExternal && selectedLLMProvider?.envVars.length && !llmSecret.trim()) {
setError("Provider credential is required");
if (!isExternal && !isHermes && llmAuthMode !== "platform" && !llmSecret.trim()) {
setError(llmAuthMode === "oauth" ? "Claude OAuth token is required" : "API key is required");
return;
}
setCreating(true);
setError(null);
const nativeProvider = selectedLLMProvider;
const provider = isHermes
? HERMES_PROVIDERS.find((p) => p.id === hermesProvider)
: undefined;
const nativeProvider = !isHermes ? selectedNativeProvider : undefined;
try {
const parsedBudget = budgetLimit.trim()
@@ -321,12 +382,12 @@ export function CreateWorkspaceButton() {
tier,
parent_id: parentId || undefined,
budget_limit: parsedBudget,
...(!isExternal && nativeProvider
...(!isExternal && !isHermes && nativeProvider
? {
model: llmSelection.model.trim(),
llm_provider: nativeProvider.vendor,
...(nativeProvider.envVars.length > 0
? { secrets: { [nativeProvider.envVars[0]]: llmSecret.trim() } }
model: llmModel.trim(),
llm_provider: nativeProvider.id,
...(llmAuthMode !== "platform" && nativeProvider.envVar
? { secrets: { [nativeProvider.envVar]: llmSecret.trim() } }
: {}),
}
: {}),
@@ -354,7 +415,13 @@ export function CreateWorkspaceButton() {
// Runtime=external flips the backend into awaiting-agent mode:
// no container provisioning, token minted, connection payload
// returned in the response for the modal below.
...(isExternal ? { runtime: externalRuntime } : { runtime }),
...(isExternal ? { runtime: externalRuntime } : {}),
...(!isExternal && isHermes && provider
? {
secrets: { [provider.envVar]: hermesApiKey.trim() },
model: hermesModel.trim(),
}
: {}),
});
// External path: keep the create dialog open just long enough to
// hand control to the connect modal, then close. The connect
@@ -466,65 +533,77 @@ export function CreateWorkspaceButton() {
)}
{!isExternal && (
<div className="space-y-3">
<div>
<label htmlFor="runtime-select" className="text-[11px] text-ink-mid block mb-1">
Runtime
</label>
<select
id="runtime-select"
value={runtime}
onChange={(e) => handleRuntimeChange(e.target.value)}
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors"
>
{RUNTIME_OPTIONS.map((option) => (
<option key={option.value} value={option.value}>
{option.label}
</option>
))}
</select>
</div>
<div>
<label htmlFor="workspace-template-select" className="text-[11px] text-ink-mid block mb-1">
Workspace Template
</label>
<select
id="workspace-template-select"
value={template}
onChange={(e) => setTemplate(e.target.value)}
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors"
>
<option value="">Blank workspace</option>
{visibleTemplateSpecs.map((spec) => (
<option key={spec.id} value={spec.id}>
{spec.name || spec.id}
</option>
))}
</select>
</div>
</div>
<InputField
label="Template"
value={template}
onChange={setTemplate}
placeholder="e.g. seo-agent (from workspace-configs-templates/)"
mono
/>
)}
{!isExternal && selectedLLMProvider && (
{!isExternal && !isHermes && selectedNativeProvider && (
<div className="rounded-lg border border-line/50 bg-surface-card/40 p-3 space-y-3">
<div className="text-[11px] font-medium text-ink-mid">
LLM
</div>
<ProviderModelSelector
models={llmModels}
catalog={registryBacked ? llmCatalog : undefined}
value={llmSelection}
onChange={(next) => {
setLLMSelection(next);
setLLMSecret("");
}}
idPrefix="create-workspace-llm"
variant="stack"
/>
{selectedLLMProvider.envVars.length > 0 && (
<div>
<label htmlFor="llm-auth-mode" className="text-[11px] text-ink-mid block mb-1">
Auth Mode
</label>
<select
id="llm-auth-mode"
value={llmAuthMode}
onChange={(e) => setLLMAuthMode(e.target.value as LLMAuthMode)}
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors"
>
<option value="platform">Platform provided</option>
<option value="api_key">API key</option>
<option value="oauth">Claude OAuth</option>
</select>
</div>
<div>
<label htmlFor="llm-provider-select" className="text-[11px] text-ink-mid block mb-1">
Provider
</label>
<select
id="llm-provider-select"
value={selectedNativeProvider.id}
onChange={(e) => {
const next = nativeLLMProviders.find((p) => p.id === e.target.value);
setLLMProvider(e.target.value);
if (next) setLLMModel(next.defaultModel);
}}
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors"
>
{nativeLLMProviders.map((p) => (
<option key={p.id} value={p.id}>
{p.label}
</option>
))}
</select>
</div>
<div>
<label htmlFor="llm-model-input" className="text-[11px] text-ink-mid block mb-1">
Model
</label>
<input
id="llm-model-input"
type="text"
value={llmModel}
onChange={(e) => setLLMModel(e.target.value)}
list="llm-model-suggestions"
spellCheck={false}
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors font-mono"
/>
<datalist id="llm-model-suggestions">
{selectedNativeProvider.models.map((m) => <option key={m} value={m} />)}
</datalist>
</div>
{llmAuthMode !== "platform" && (
<div>
<label htmlFor="llm-secret-input" className="text-[11px] text-ink-mid block mb-1">
{selectedLLMProvider.envVars[0]}
{llmAuthMode === "oauth" ? "OAuth Token" : "API Key"}
</label>
<input
id="llm-secret-input"
@@ -662,6 +741,100 @@ export function CreateWorkspaceButton() {
</div>
</div>
{/* Hermes provider configuration — shown only when template === "hermes" */}
{isHermes && (
<div
className="mt-4 rounded-xl border border-violet-700/40 bg-violet-950/20 p-4 space-y-3"
data-testid="hermes-provider-section"
>
<p className="text-[11px] font-semibold text-violet-400 uppercase tracking-wide">
Hermes Provider
</p>
<p className="text-[11px] text-ink-mid -mt-1">
Choose the AI provider and paste your API key. The key is
stored as an encrypted workspace secret.
</p>
<div>
<label
htmlFor="hermes-provider-select"
className="text-[11px] text-ink-mid block mb-1"
>
Provider
</label>
<select
id="hermes-provider-select"
value={hermesProvider}
onChange={(e) => setHermesProvider(e.target.value)}
aria-label="Hermes provider"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink focus:outline-none focus:border-violet-500/60 focus:ring-1 focus:ring-violet-500/20 transition-colors"
>
{availableProviders.map((p) => (
<option key={p.id} value={p.id}>
{p.label}
</option>
))}
</select>
</div>
<div>
<label
htmlFor="hermes-api-key-input"
className="text-[11px] text-ink-mid block mb-1"
>
API Key{" "}
<span aria-hidden="true" className="text-bad">
*
</span>
<span className="sr-only"> (required)</span>
</label>
<input
id="hermes-api-key-input"
type="password"
value={hermesApiKey}
onChange={(e) => setHermesApiKey(e.target.value)}
placeholder="sk-…"
aria-label="Hermes API key"
autoComplete="off"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-violet-500/60 focus:ring-1 focus:ring-violet-500/20 transition-colors font-mono"
/>
</div>
<div>
<label
htmlFor="hermes-model-input"
className="text-[11px] text-ink-mid block mb-1"
>
Model{" "}
<span aria-hidden="true" className="text-bad">
*
</span>
<span className="sr-only"> (required)</span>
</label>
<input
id="hermes-model-input"
type="text"
value={hermesModel}
onChange={(e) => setHermesModel(e.target.value)}
placeholder="e.g. minimax/MiniMax-M2.7"
aria-label="Hermes model slug"
autoComplete="off"
spellCheck={false}
list="hermes-model-suggestions"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-violet-500/60 focus:ring-1 focus:ring-violet-500/20 transition-colors font-mono"
/>
<datalist id="hermes-model-suggestions">
{HERMES_PROVIDERS.find((p) => p.id === hermesProvider)?.models.map(
(m) => <option key={m} value={m} />,
)}
</datalist>
<p className="text-[10px] text-ink-mid mt-1">
Slug determines which provider hermes routes to at install time.
</p>
</div>
</div>
)}
{error && (
<div
role="alert"
+2 -2
View File
@@ -4,7 +4,7 @@ import { useState, useEffect, useCallback } from "react";
import { api } from "@/lib/api";
import { useCanvasStore } from "@/store/canvas";
import { OrgTemplatesSection } from "./TemplatePalette";
import { isUserVisibleWorkspaceTemplate, type Template } from "@/lib/deploy-preflight";
import { type Template } from "@/lib/deploy-preflight";
import { useTemplateDeploy } from "@/hooks/useTemplateDeploy";
import { Spinner } from "./Spinner";
import { TIER_CONFIG } from "@/lib/design-tokens";
@@ -18,7 +18,7 @@ export function EmptyState() {
useEffect(() => {
api
.get<Template[]>("/templates")
.then((t) => setTemplates(t.filter(isUserVisibleWorkspaceTemplate)))
.then((t) => setTemplates(t))
.catch(() => setTemplates([]))
.finally(() => setLoading(false));
}, []);
+18 -240
View File
@@ -23,8 +23,6 @@ interface Props {
/** Grouped provider options derived from the template's models[] /
* required_env. When length ≥ 2 the modal shows a radio picker. */
providers?: ProviderChoice[];
/** Optional keys to offer in the deploy modal without blocking Deploy. */
optionalKeys?: string[];
/** Runtime slug — used only for the "The <runtime> runtime …"
* headline; behavior is driven by providers/missingKeys. */
runtime: string;
@@ -96,13 +94,13 @@ export function MissingKeysModal({
open,
missingKeys,
providers,
optionalKeys,
runtime,
onKeysAdded,
onCancel,
onOpenSettings,
workspaceId,
configuredKeys,
modelSuggestions,
models,
initialModel,
title,
@@ -116,13 +114,13 @@ export function MissingKeysModal({
<ProviderPickerModal
open={open}
providers={pickerProviders}
optionalKeys={optionalKeys ?? []}
runtime={runtime}
onKeysAdded={onKeysAdded}
onCancel={onCancel}
onOpenSettings={onOpenSettings}
workspaceId={workspaceId}
configuredKeys={configuredKeys}
modelSuggestions={modelSuggestions}
models={models}
initialModel={initialModel}
title={title}
@@ -140,15 +138,11 @@ export function MissingKeysModal({
<AllKeysModal
open={open}
missingKeys={keys}
optionalKeys={optionalKeys ?? []}
runtime={runtime}
onKeysAdded={onKeysAdded}
onCancel={onCancel}
onOpenSettings={onOpenSettings}
workspaceId={workspaceId}
configuredKeys={configuredKeys}
title={title}
description={description}
/>
);
}
@@ -176,13 +170,13 @@ export function providerIdForModel(
function ProviderPickerModal({
open,
providers,
optionalKeys,
runtime,
onKeysAdded,
onCancel,
onOpenSettings,
workspaceId,
configuredKeys,
modelSuggestions,
models,
initialModel,
title,
@@ -190,13 +184,13 @@ function ProviderPickerModal({
}: {
open: boolean;
providers: ProviderChoice[];
optionalKeys: string[];
runtime: string;
onKeysAdded: (model?: string) => void;
onCancel: () => void;
onOpenSettings?: () => void;
workspaceId?: string;
configuredKeys?: Set<string>;
modelSuggestions?: string[];
models?: ModelSpec[];
initialModel?: string;
title?: string;
@@ -256,9 +250,16 @@ function ProviderPickerModal({
const [selectorValue, setSelectorValue] = useState<SelectorValue>(initial);
const [entries, setEntries] = useState<KeyEntry[]>([]);
const [optionalEntries, setOptionalEntries] = useState<KeyEntry[]>([]);
const firstInputRef = useRef<HTMLInputElement>(null);
// Legacy compat: map the selector value back into the old `selected`/
// `model` shape for the rest of the modal body (footer copy, etc.).
const selected = useMemo(
() =>
providers.find((p) => p.id === selectorValue.providerId) ??
providers[0],
[providers, selectorValue.providerId],
);
const model = selectorValue.model;
const showModelInput = catalog.length > 0;
@@ -281,18 +282,7 @@ function ProviderPickerModal({
error: null,
})),
);
setOptionalEntries(
optionalKeys
.filter((key) => !selectorValue.envVars.includes(key))
.map((key) => ({
key,
value: "",
saved: configuredKeys?.has(key) ?? false,
saving: false,
error: null,
})),
);
}, [open, selectorValue.envVars, configuredKeys, optionalKeys]);
}, [open, selectorValue.envVars, configuredKeys]);
useEffect(() => {
if (!open) return;
@@ -346,43 +336,6 @@ function ProviderPickerModal({
[entries, updateEntry, workspaceId],
);
const updateOptionalEntry = useCallback(
(index: number, updates: Partial<KeyEntry>) => {
setOptionalEntries((prev) =>
prev.map((e, i) => (i === index ? { ...e, ...updates } : e)),
);
},
[],
);
const handleSaveOptionalKey = useCallback(
async (index: number) => {
const entry = optionalEntries[index];
if (!entry.value.trim()) return;
updateOptionalEntry(index, { saving: true, error: null });
try {
if (workspaceId) {
await api.put(`/workspaces/${workspaceId}/secrets`, {
key: entry.key,
value: entry.value.trim(),
});
} else {
await api.put("/settings/secrets", {
key: entry.key,
value: entry.value.trim(),
});
}
updateOptionalEntry(index, { saved: true, saving: false });
} catch (e) {
updateOptionalEntry(index, {
saving: false,
error: e instanceof Error ? e.message : "Failed to save",
});
}
},
[optionalEntries, updateOptionalEntry, workspaceId],
);
if (!open) return null;
// Portal to document.body for the same reason as
// OrgImportPreflightModal — several callers (TemplatePalette,
@@ -512,62 +465,6 @@ function ProviderPickerModal({
</div>
))}
</div>
{optionalEntries.length > 0 && (
<div className="space-y-2">
<div className="text-[10px] uppercase tracking-wide text-ink-mid font-semibold">
Optional
</div>
{optionalEntries.map((entry, index) => (
<div
key={entry.key}
className="bg-surface-card/30 rounded-lg px-3 py-2.5 border border-line/40"
>
<div className="flex items-center justify-between mb-1.5">
<div>
<div className="text-[11px] text-ink-mid font-medium">
{getKeyLabel(entry.key)}
</div>
<div className="text-[9px] font-mono text-ink-mid">{entry.key}</div>
</div>
{entry.saved && (
<span className="text-[9px] text-good bg-emerald-900/30 px-1.5 py-0.5 rounded flex items-center gap-1">
Saved
</span>
)}
</div>
{!entry.saved && (
<div className="flex gap-2 mt-2">
<input
value={entry.value}
onChange={(e) => updateOptionalEntry(index, { value: e.target.value.trimStart() })}
placeholder={entry.key.includes("API_KEY") ? "sk-..." : "Enter value"}
type="password"
aria-label={`Optional value for ${entry.key}`}
onKeyDown={(e) => {
if (e.key === "Enter" && entry.value.trim()) {
handleSaveOptionalKey(index);
}
}}
className="flex-1 bg-surface-sunken border border-line rounded px-2 py-1.5 text-[11px] text-ink font-mono focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/20 transition-colors"
/>
<button
type="button"
onClick={() => handleSaveOptionalKey(index)}
disabled={!entry.value.trim() || entry.saving}
className="px-3 py-1.5 bg-surface-card hover:bg-surface-card/80 text-[11px] rounded text-ink border border-line disabled:opacity-30 transition-colors shrink-0 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
>
{entry.saving ? "..." : "Save"}
</button>
</div>
)}
{entry.error && (
<div role="alert" aria-live="assertive" className="mt-1.5 text-[10px] text-bad">{entry.error}</div>
)}
</div>
))}
</div>
)}
</div>
<div className="px-5 py-3 border-t border-line bg-surface/50 flex items-center justify-between gap-2">
@@ -615,30 +512,21 @@ function ProviderPickerModal({
function AllKeysModal({
open,
missingKeys,
optionalKeys,
runtime,
onKeysAdded,
onCancel,
onOpenSettings,
workspaceId,
configuredKeys,
title,
description,
}: {
open: boolean;
missingKeys: string[];
optionalKeys: string[];
runtime: string;
onKeysAdded: () => void;
onCancel: () => void;
onOpenSettings?: () => void;
workspaceId?: string;
configuredKeys?: Set<string>;
title?: string;
description?: string;
}) {
const [entries, setEntries] = useState<KeyEntry[]>([]);
const [optionalEntries, setOptionalEntries] = useState<KeyEntry[]>([]);
const [globalError, setGlobalError] = useState<string | null>(null);
useEffect(() => {
@@ -647,24 +535,13 @@ function AllKeysModal({
missingKeys.map((key) => ({
key,
value: "",
saved: configuredKeys?.has(key) ?? false,
saved: false,
saving: false,
error: null,
})),
);
setOptionalEntries(
optionalKeys
.filter((key) => !missingKeys.includes(key))
.map((key) => ({
key,
value: "",
saved: configuredKeys?.has(key) ?? false,
saving: false,
error: null,
})),
);
setGlobalError(null);
}, [open, missingKeys, optionalKeys, configuredKeys]);
}, [open, missingKeys]);
useEffect(() => {
if (!open) return;
@@ -714,45 +591,6 @@ function AllKeysModal({
[entries, updateEntry, workspaceId],
);
const updateOptionalEntry = useCallback(
(index: number, updates: Partial<KeyEntry>) => {
setOptionalEntries((prev) =>
prev.map((entry, i) => (i === index ? { ...entry, ...updates } : entry)),
);
},
[],
);
const handleSaveOptionalKey = useCallback(
async (index: number) => {
const entry = optionalEntries[index];
if (!entry.value.trim()) return;
updateOptionalEntry(index, { saving: true, error: null });
try {
if (workspaceId) {
await api.put(`/workspaces/${workspaceId}/secrets`, {
key: entry.key,
value: entry.value.trim(),
});
} else {
await api.put("/settings/secrets", {
key: entry.key,
value: entry.value.trim(),
});
}
updateOptionalEntry(index, { saved: true, saving: false });
} catch (e) {
updateOptionalEntry(index, {
saving: false,
error: e instanceof Error ? e.message : "Failed to save",
});
}
},
[optionalEntries, updateOptionalEntry, workspaceId],
);
const handleAddKeysAndDeploy = useCallback(() => {
const anySaving = entries.some((e) => e.saving);
if (anySaving) {
@@ -818,16 +656,12 @@ function AllKeysModal({
</svg>
</div>
<h3 id="missing-keys-title" className="text-sm font-semibold text-ink">
{title ?? "Missing API Keys"}
Missing API Keys
</h3>
</div>
<p className="text-[12px] text-ink-mid leading-relaxed">
{description ?? (
<>
The <span className="text-warm font-medium">{runtimeLabel}</span>{" "}
runtime requires the following keys to be configured before deploying.
</>
)}
The <span className="text-warm font-medium">{runtimeLabel}</span>{" "}
runtime requires the following keys to be configured before deploying.
</p>
</div>
@@ -885,62 +719,6 @@ function AllKeysModal({
</div>
))}
{optionalEntries.length > 0 && (
<div className="space-y-2">
<div className="text-[10px] uppercase tracking-wide text-ink-mid font-semibold">
Optional
</div>
{optionalEntries.map((entry, index) => (
<div
key={entry.key}
className="bg-surface-card/30 rounded-lg px-3 py-2.5 border border-line/40"
>
<div className="flex items-center justify-between mb-1">
<div>
<div className="text-[11px] text-ink-mid font-medium">
{getKeyLabel(entry.key)}
</div>
<div className="text-[9px] font-mono text-ink-mid">{entry.key}</div>
</div>
{entry.saved && (
<span className="text-[9px] text-good bg-emerald-900/30 px-1.5 py-0.5 rounded">
Saved
</span>
)}
</div>
{!entry.saved && (
<div className="flex gap-2 mt-2">
<input
value={entry.value}
onChange={(e) => updateOptionalEntry(index, { value: e.target.value.trimStart() })}
placeholder={entry.key.includes("API_KEY") ? "sk-..." : "Enter value"}
type="password"
aria-label={`Optional value for ${entry.key}`}
onKeyDown={(e) => {
if (e.key === "Enter" && entry.value.trim()) {
handleSaveOptionalKey(index);
}
}}
className="flex-1 bg-surface-sunken border border-line rounded px-2 py-1.5 text-[11px] text-ink font-mono focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/20 transition-colors"
/>
<button
type="button"
onClick={() => handleSaveOptionalKey(index)}
disabled={!entry.value.trim() || entry.saving}
className="px-3 py-1.5 bg-surface-card hover:bg-surface-card/80 text-[11px] rounded text-ink border border-line disabled:opacity-30 transition-colors shrink-0 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
>
{entry.saving ? "..." : "Save"}
</button>
</div>
)}
{entry.error && <div className="mt-1.5 text-[10px] text-bad">{entry.error}</div>}
</div>
))}
</div>
)}
{globalError && (
<div role="alert" aria-live="assertive" className="px-3 py-2 bg-red-950/40 border border-red-800/50 rounded-lg text-[11px] text-bad">
{globalError}
+1 -108
View File
@@ -28,7 +28,6 @@ import { useId, useMemo } from "react";
export interface SelectorModel {
id: string;
name?: string;
provider?: string;
required_env?: string[];
}
@@ -49,33 +48,6 @@ export interface ProviderEntry {
wildcard: boolean;
/** Optional tooltip text (rendered as native title=). */
tooltip?: string;
/** Billing mode the DERIVED provider implies, when this entry came from the
* registry-backed payload (internal#718 P3): "platform_managed" | "byok".
* Undefined for entries built by the legacy inferVendor heuristic. */
billingMode?: "platform_managed" | "byok";
}
/** RegistryProvider mirrors one entry of GET /templates `registry_providers`
* (workspace-server registryProviderView): the registry's native provider for
* a runtime, with its display label, auth-env NAMES, and billing mode. This is
* the SSOT the dropdown labels come from — the canvas drops VENDOR_LABELS for
* registry-backed runtimes (internal#718 P3, retire-list #4). */
export interface RegistryProvider {
name: string;
display_name?: string;
auth_env?: string[];
billing_mode?: "platform_managed" | "byok";
deprecated?: boolean;
}
/** RegistryModel mirrors one entry of GET /templates `registry_models`: a
* native model id annotated with its DERIVED provider (registry name) and the
* billing_mode that provider implies. */
export interface RegistryModel {
id: string;
name?: string;
provider?: string;
billing_mode?: "platform_managed" | "byok";
}
export interface SelectorValue {
@@ -95,13 +67,6 @@ interface Props {
models: SelectorModel[];
value: SelectorValue;
onChange: (next: SelectorValue) => void;
/** Optional pre-built provider catalog. When provided, the selector uses it
* verbatim instead of re-inferring one from `models` via
* buildProviderCatalog — the registry-backed path (internal#718 P3), where
* the parent builds the catalog from the registry-served providers/models
* so dropdown labels + billing come from the provider-registry SSOT rather
* than the inferVendor heuristic. Omitted = legacy heuristic over `models`. */
catalog?: ProviderEntry[];
/** Display variant. "grid" = label+control side-by-side (used in ConfigTab
* Runtime section). "stack" = vertical (used in MissingKeysModal). */
variant?: "grid" | "stack";
@@ -123,7 +88,6 @@ interface Props {
/** Vendor keys → human label. Add new vendors here when templates pick
* up new model families. */
const VENDOR_LABELS: Record<string, string> = {
"platform": "Platform",
"anthropic-oauth": "Claude Code subscription",
anthropic: "Anthropic API",
minimax: "MiniMax",
@@ -154,8 +118,6 @@ const VENDOR_LABELS: Record<string, string> = {
/** Optional per-vendor tooltip shown on hover. */
const VENDOR_TOOLTIPS: Record<string, string> = {
"platform":
"Use the Molecule platform-managed LLM proxy. No vendor API key is required.",
"anthropic-oauth":
"Use your Claude.ai (Pro/Max/Team) subscription via OAuth. Run `claude login` in the workspace terminal to mint the token, then paste it here. No API spend.",
anthropic:
@@ -203,9 +165,6 @@ const BARE_VENDOR_PATTERNS: Array<{ test: (id: string) => boolean; vendor: strin
/** Infer a vendor key from a model spec. Combines id-prefix and env
* signals. Exported for tests. */
export function inferVendor(model: SelectorModel): string {
const explicitProvider = model.provider?.trim().toLowerCase();
if (explicitProvider) return explicitProvider;
const id = model.id || "";
const envSet = new Set(model.required_env ?? []);
@@ -285,66 +244,6 @@ export function buildProviderCatalog(models: SelectorModel[]): ProviderEntry[] {
return Array.from(buckets.values());
}
/** Build the provider catalog from a REGISTRY-BACKED GET /templates payload
* (registry_providers + registry_models) — internal#718 P3, retire-list #4.
*
* Unlike buildProviderCatalog (which RE-INFERS vendor from model-id prefixes
* + env via inferVendor/VENDOR_LABELS/BARE_VENDOR_PATTERNS), this trusts the
* registry: each model carries its DERIVED `provider` (a registry provider
* name) and the dropdown label/billing/auth come from the matching
* `registry_providers` entry. The canvas can render no provider/model the
* registry did not serve ("only registered selectable"), and the billing-mode
* shown reflects the derived provider rather than a hardcoded rule.
*
* A provider with no served model is omitted (no empty buckets). Models whose
* `provider` doesn't match a registry_providers entry still get a bucket
* keyed by the raw provider name (defensive — should not happen for a
* well-formed registry payload), so a model is never silently dropped. */
export function buildProviderCatalogFromRegistry(
registryProviders: RegistryProvider[],
registryModels: RegistryModel[],
): ProviderEntry[] {
const byName = new Map<string, RegistryProvider>();
for (const p of registryProviders) byName.set(p.name, p);
// Bucket models by their derived provider name, preserving registry order.
const buckets = new Map<string, ProviderEntry>();
for (const m of registryModels) {
const vendor = (m.provider ?? "").trim();
if (!vendor) continue; // un-annotated registry model — skip from the
// provider cascade (selectable elsewhere via free-text); it has no
// derived provider to bucket under.
const meta = byName.get(vendor);
const wildcard = m.id.includes("*");
let entry = buckets.get(vendor);
if (!entry) {
entry = {
id: `registry|${vendor}`,
vendor,
label: meta?.display_name || vendor,
envVars: meta?.auth_env ?? [],
models: [],
wildcard,
billingMode: meta?.billing_mode ?? m.billing_mode,
tooltip: VENDOR_TOOLTIPS[vendor],
};
buckets.set(vendor, entry);
}
entry.models.push({ id: m.id, name: m.name, provider: vendor });
entry.wildcard = entry.wildcard || wildcard;
}
// Decorate label with model-count when ≥2 concrete models share the bucket,
// matching buildProviderCatalog's UX.
for (const e of buckets.values()) {
if (!e.wildcard && e.models.length > 1) {
e.label = `${e.label} (${e.models.length} models)`;
}
}
return Array.from(buckets.values());
}
/** Find the provider entry that contains a given model id. Used by
* callers to back-derive the provider when only the model is known
* (e.g. ConfigTab loading from saved state). */
@@ -377,7 +276,6 @@ export function ProviderModelSelector({
models,
value,
onChange,
catalog: catalogProp,
variant = "stack",
allowCustomModelEscape = false,
disabled = false,
@@ -388,12 +286,7 @@ export function ProviderModelSelector({
const providerSelectId = `${baseId}-provider`;
const modelSelectId = `${baseId}-model`;
// Registry-backed path (internal#718 P3): use the parent-supplied catalog
// verbatim; otherwise re-infer one from `models` via the legacy heuristic.
const catalog = useMemo(
() => catalogProp ?? buildProviderCatalog(models),
[catalogProp, models],
);
const catalog = useMemo(() => buildProviderCatalog(models), [models]);
const selected = useMemo(
() => catalog.find((p) => p.id === value.providerId) ?? null,
[catalog, value.providerId],
+2 -2
View File
@@ -5,7 +5,7 @@ import { flushSync } from "react-dom";
import { api } from "@/lib/api";
import { useCanvasStore } from "@/store/canvas";
import type { WorkspaceData } from "@/store/socket";
import { isUserVisibleWorkspaceTemplate, type Template } from "@/lib/deploy-preflight";
import { type Template } from "@/lib/deploy-preflight";
import { useTemplateDeploy } from "@/hooks/useTemplateDeploy";
import {
OrgImportPreflightModal,
@@ -446,7 +446,7 @@ export function TemplatePalette() {
setLoading(true);
try {
const data = await api.get<Template[]>("/templates");
setTemplates(data.filter(isUserVisibleWorkspaceTemplate));
setTemplates(data);
} catch {
setTemplates([]);
} finally {
+2 -4
View File
@@ -224,14 +224,12 @@ export function Toolbar() {
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if (e.key !== "?") return;
const target = e.target as HTMLElement;
if (target.closest?.('[data-display-stream="true"]')) return;
const tag = target.tagName;
const tag = (e.target as HTMLElement).tagName;
const inInput =
tag === "INPUT" ||
tag === "TEXTAREA" ||
tag === "SELECT" ||
target.isContentEditable;
(e.target as HTMLElement).isContentEditable;
if (inInput) return;
// Don't fire when a modal/dialog is already mounted (canvas modals,
// side panel, etc. use z-50 or above).
@@ -1,82 +1,411 @@
// @vitest-environment jsdom
/**
* Focused tests for BudgetSection's PER-PERIOD progress-bar math + aria (#49).
* Tests for BudgetSection (issue #541).
*
* Behavioral coverage (loading, save, 402 banners, USD formatting, legacy
* back-compat) lives in tabs/__tests__/BudgetSection.test.tsx — this file
* deliberately covers only the per-period progress percentage + aria-valuenow
* + the over-budget colouring, which that suite doesn't assert in detail. Kept
* separate to avoid duplicating the behavioral suite (one component, no
* parallel/identical suites).
* Covers:
* - Loading state
* - Stats row: used / limit, "Unlimited" when null
* - Progress bar: correct percentage, capped at 100%, absent when no limit
* - Budget remaining text
* - Input pre-fill (existing limit / blank when null)
* - Save: PATCH with number, PATCH with null (blank input)
* - 402 on GET → exceeded banner, no fetch-error text
* - 402 on PATCH → exceeded banner
* - Non-402 fetch error → error text
* - Non-402 save error → save error alert
* - Section header and subheading
* - Fetch error does not show stats
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, waitFor, cleanup } from "@testing-library/react";
import {
render,
screen,
fireEvent,
waitFor,
cleanup,
act,
} from "@testing-library/react";
// ── Mock api ──────────────────────────────────────────────────────────────────
vi.mock("@/lib/api", () => ({
api: { get: vi.fn(), patch: vi.fn() },
api: {
get: vi.fn(),
patch: vi.fn(),
},
}));
import { api } from "@/lib/api";
import { BudgetSection } from "../tabs/BudgetSection";
const mockGet = vi.mocked(api.get);
const mockPatch = vi.mocked(api.patch);
type P = { limit: number | null; spend: number; remaining: number | null };
// ── Helpers ───────────────────────────────────────────────────────────────────
// Build a periods response where the named period has the given limit/spend.
function withMonthly(limit: number | null, spend: number) {
const blank: P = { limit: null, spend: 0, remaining: null };
const monthly: P = { limit, spend, remaining: limit == null ? null : limit - spend };
function budgetResponse(overrides: Partial<{
budget_limit: number | null;
budget_used: number;
budget_remaining: number | null;
}> = {}) {
return {
periods: { hourly: blank, daily: blank, weekly: blank, monthly },
budget_limit: limit,
monthly_spend: spend,
budget_remaining: monthly.remaining,
budget_limit: 1000,
budget_used: 250,
budget_remaining: 750,
...overrides,
};
}
beforeEach(() => vi.clearAllMocks());
afterEach(() => cleanup());
function make402Error(): Error {
return new Error("API GET /workspaces/ws-1/budget: 402 Payment Required");
}
async function renderLoaded(data: unknown) {
function make402PatchError(): Error {
return new Error("API PATCH /workspaces/ws-1/budget: 402 Payment Required");
}
function makeGenericError(msg = "network timeout"): Error {
return new Error(`API GET /workspaces/ws-1/budget: 500 ${msg}`);
}
beforeEach(() => {
vi.clearAllMocks();
});
afterEach(() => {
cleanup();
});
// ── Rendering helpers ─────────────────────────────────────────────────────────
async function renderLoaded(budgetData = budgetResponse()) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(data as any);
mockGet.mockResolvedValueOnce(budgetData as any);
render(<BudgetSection workspaceId="ws-1" />);
// Wait for loading to finish
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
}
describe("BudgetSection — per-period progress bar", () => {
it("renders the bar for a limited period and omits it for an unlimited one", async () => {
await renderLoaded(withMonthly(1000, 250));
expect(screen.getByTestId("budget-monthly-fill")).toBeTruthy();
expect(screen.queryByTestId("budget-hourly-fill")).toBeNull(); // hourly unlimited
// ── Loading state ─────────────────────────────────────────────────────────────
describe("BudgetSection — loading state", () => {
it("shows loading indicator while fetch is in flight", () => {
// Never resolve
mockGet.mockReturnValue(new Promise(() => {}));
render(<BudgetSection workspaceId="ws-1" />);
expect(screen.getByTestId("budget-loading")).toBeTruthy();
expect(screen.getByText("Loading…")).toBeTruthy();
});
it("fills to 25%", async () => {
await renderLoaded(withMonthly(1000, 250));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("25%");
});
it("fills to 50%", async () => {
await renderLoaded(withMonthly(1000, 500));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("50%");
});
it("caps fill at 100% when spend exceeds limit", async () => {
await renderLoaded(withMonthly(1000, 4000));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("100%");
});
it("sets aria-valuenow to the computed percentage on the progressbar", async () => {
await renderLoaded(withMonthly(1000, 250));
const bars = screen.getAllByRole("progressbar");
// the monthly bar is the only one rendered (others unlimited)
expect(bars).toHaveLength(1);
expect(bars[0].getAttribute("aria-valuenow")).toBe("25");
});
it("shows a 0% bar when spend is 0 against a set limit", async () => {
await renderLoaded(withMonthly(1000, 0));
expect((screen.getByTestId("budget-monthly-fill") as HTMLElement).style.width).toBe("0%");
it("hides loading indicator after fetch resolves", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(budgetResponse() as any);
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
});
});
// ── Section header ────────────────────────────────────────────────────────────
describe("BudgetSection — header and subheading", () => {
it("renders 'Budget' as the section heading", async () => {
await renderLoaded();
expect(screen.getByText("Budget")).toBeTruthy();
});
it("renders the subheading 'Limit total message credits for this workspace'", async () => {
await renderLoaded();
expect(
screen.getByText("Limit total message credits for this workspace")
).toBeTruthy();
});
it("renders 'Budget limit (credits)' label for the input", async () => {
await renderLoaded();
expect(screen.getByText("Budget limit (credits)")).toBeTruthy();
});
});
// ── Stats row ─────────────────────────────────────────────────────────────────
describe("BudgetSection — stats row", () => {
it("shows budget_used in the stats row", async () => {
await renderLoaded(budgetResponse({ budget_used: 350, budget_limit: 1000 }));
expect(screen.getByTestId("budget-used-value").textContent).toBe("350");
});
it("shows budget_limit in the stats row", async () => {
await renderLoaded(budgetResponse({ budget_used: 100, budget_limit: 500 }));
expect(screen.getByTestId("budget-limit-value").textContent).toBe("500");
});
it("shows 'Unlimited' when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
expect(screen.getByTestId("budget-limit-value").textContent).toBe("Unlimited");
});
it("shows budget_remaining when present", async () => {
await renderLoaded(budgetResponse({ budget_remaining: 750 }));
expect(screen.getByTestId("budget-remaining").textContent).toContain("750");
expect(screen.getByTestId("budget-remaining").textContent).toContain("credits remaining");
});
it("hides budget_remaining row when null", async () => {
await renderLoaded(budgetResponse({ budget_remaining: null }));
expect(screen.queryByTestId("budget-remaining")).toBeNull();
});
it("does not crash when budget_used is missing from the response", async () => {
// Backend for a provisioning-stuck workspace may return a partial
// shape. Regression: previously this threw
// "Cannot read properties of undefined (reading 'toLocaleString')"
// and crashed the whole Details tab.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await renderLoaded({ budget_limit: 1000, budget_remaining: null } as any);
expect(screen.getByTestId("budget-used-value").textContent).toBe("0");
});
});
// ── Progress bar ──────────────────────────────────────────────────────────────
describe("BudgetSection — progress bar", () => {
it("renders the progress bar when budget_limit is set", async () => {
await renderLoaded(budgetResponse({ budget_used: 250, budget_limit: 1000 }));
expect(screen.getByRole("progressbar")).toBeTruthy();
});
it("does NOT render progress bar when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
expect(screen.queryByRole("progressbar")).toBeNull();
});
it("fills to the correct percentage (25%)", async () => {
await renderLoaded(budgetResponse({ budget_used: 250, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("25%");
});
it("fills to the correct percentage (50%)", async () => {
await renderLoaded(budgetResponse({ budget_used: 500, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("50%");
});
it("caps fill at 100% when budget_used exceeds budget_limit", async () => {
await renderLoaded(budgetResponse({ budget_used: 1500, budget_limit: 1000 }));
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("100%");
});
it("progress bar has aria-valuenow equal to the calculated percentage", async () => {
await renderLoaded(budgetResponse({ budget_used: 300, budget_limit: 1000 }));
const bar = screen.getByRole("progressbar");
expect(bar.getAttribute("aria-valuenow")).toBe("30");
});
it("shows 0% progress bar when budget_used is absent from the response", async () => {
// Regression: budget_used is optional (provisioning-stuck workspaces return
// partial shapes). Without the `?? 0` guard the progressPct calculation
// throws a TypeScript strict-null error and the build fails.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await renderLoaded({ budget_limit: 1000, budget_remaining: null } as any);
const bar = screen.getByRole("progressbar");
expect(bar.getAttribute("aria-valuenow")).toBe("0");
const fill = screen.getByTestId("budget-progress-fill") as HTMLDivElement;
expect(fill.style.width).toBe("0%");
});
});
// ── Input pre-fill ────────────────────────────────────────────────────────────
describe("BudgetSection — input pre-fill", () => {
it("pre-fills input with existing budget_limit", async () => {
await renderLoaded(budgetResponse({ budget_limit: 500 }));
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
expect(input.value).toBe("500");
});
it("leaves input empty when budget_limit is null", async () => {
await renderLoaded(budgetResponse({ budget_limit: null, budget_remaining: null }));
const input = screen.getByTestId("budget-limit-input") as HTMLInputElement;
expect(input.value).toBe("");
});
});
// ── Save — PATCH calls ────────────────────────────────────────────────────────
describe("BudgetSection — save", () => {
it("calls PATCH /workspaces/:id/budget with budget_limit as integer", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: 800 }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "800" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
expect(mockPatch.mock.calls[0][0]).toBe("/workspaces/ws-1/budget");
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBe(800);
});
it("sends budget_limit: 0 (not null) when input is '0' — zero-credit budget", async () => {
// Regression for QA bug report: `parseInt("0") || null` would yield null.
// The correct form `raw !== "" ? parseInt(raw, 10) : null` must return 0.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: 0, budget_used: 0, budget_remaining: 0 }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "0" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBe(0);
expect(body.budget_limit).not.toBeNull();
});
it("sends budget_limit: null when input is blank", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(budgetResponse({ budget_limit: null, budget_remaining: null }) as any);
await renderLoaded(budgetResponse({ budget_limit: 1000 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() => expect(mockPatch).toHaveBeenCalled());
const body = mockPatch.mock.calls[0][1] as Record<string, unknown>;
expect(body.budget_limit).toBeNull();
});
it("updates displayed stats after successful save", async () => {
const updated = budgetResponse({ budget_limit: 2000, budget_used: 500, budget_remaining: 1500 });
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(updated as any);
await renderLoaded(budgetResponse({ budget_limit: 1000, budget_used: 250 }));
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "2000" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-limit-value").textContent).toBe("2,000")
);
});
it("shows save error message on non-402 PATCH failure", async () => {
mockPatch.mockRejectedValueOnce(
new Error("API PATCH /workspaces/ws-1/budget: 500 server error")
);
await renderLoaded();
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-save-error")).toBeTruthy()
);
expect(screen.getByTestId("budget-save-error").textContent).toContain("500");
});
});
// ── 402 handling ──────────────────────────────────────────────────────────────
describe("BudgetSection — 402 handling", () => {
it("shows exceeded banner when GET returns 402", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
expect(screen.getByText("Budget exceeded — messages blocked")).toBeTruthy();
});
it("does NOT show fetch error text when GET returns 402 (only banner)", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.queryByTestId("budget-loading")).toBeNull()
);
expect(screen.queryByTestId("budget-fetch-error")).toBeNull();
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy();
});
it("shows exceeded banner when PATCH returns 402", async () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValueOnce(budgetResponse() as any);
mockPatch.mockRejectedValueOnce(make402PatchError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
fireEvent.click(screen.getByTestId("budget-save-btn"));
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
// Should NOT also show the save-error alert
expect(screen.queryByTestId("budget-save-error")).toBeNull();
});
it("clears exceeded banner after a successful save", async () => {
mockGet.mockRejectedValueOnce(make402Error());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-exceeded-banner")).toBeTruthy()
);
// Now a successful PATCH (limit was raised)
const updated = budgetResponse({ budget_limit: 5000, budget_used: 250, budget_remaining: 4750 });
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPatch.mockResolvedValueOnce(updated as any);
await act(async () => {
fireEvent.change(screen.getByTestId("budget-limit-input"), {
target: { value: "5000" },
});
fireEvent.click(screen.getByTestId("budget-save-btn"));
});
await waitFor(() =>
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull()
);
});
});
// ── Non-402 fetch error ───────────────────────────────────────────────────────
describe("BudgetSection — non-402 fetch errors", () => {
it("shows fetch error text on non-402 GET failure", async () => {
mockGet.mockRejectedValueOnce(makeGenericError("internal server error"));
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() =>
expect(screen.getByTestId("budget-fetch-error")).toBeTruthy()
);
expect(screen.getByTestId("budget-fetch-error").textContent).toContain("500");
});
it("does NOT show stats row on fetch error", async () => {
mockGet.mockRejectedValueOnce(makeGenericError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
expect(screen.queryByTestId("budget-stats-row")).toBeNull();
});
it("does NOT show exceeded banner on non-402 fetch error", async () => {
mockGet.mockRejectedValueOnce(makeGenericError());
render(<BudgetSection workspaceId="ws-1" />);
await waitFor(() => expect(screen.queryByTestId("budget-loading")).toBeNull());
expect(screen.queryByTestId("budget-exceeded-banner")).toBeNull();
});
});
@@ -201,13 +201,15 @@ describe("CreateWorkspaceDialog — WCAG SC 1.3.1 label/input association", () =
expect(label?.textContent).toContain("Budget limit");
});
it("Workspace Template select has a <label> whose htmlFor matches the select id", async () => {
it("Template input has a <label> whose htmlFor matches the input id", async () => {
await openDialog();
const templateSelect = screen.getByLabelText("Workspace Template") as HTMLSelectElement;
expect(templateSelect.id).toBeTruthy();
const label = document.querySelector(`label[for="${templateSelect.id}"]`);
const templateInput = screen.getByPlaceholderText(
"e.g. seo-agent (from workspace-configs-templates/)"
) as HTMLInputElement;
expect(templateInput.id).toBeTruthy();
const label = document.querySelector(`label[for="${templateInput.id}"]`);
expect(label).toBeTruthy();
expect(label?.textContent).toContain("Workspace Template");
expect(label?.textContent).toContain("Template");
});
it("each InputField generates a distinct id (no id collisions)", async () => {
@@ -216,16 +218,13 @@ describe("CreateWorkspaceDialog — WCAG SC 1.3.1 label/input association", () =
screen.getByPlaceholderText("e.g. SEO Agent"),
screen.getByPlaceholderText("e.g. SEO Specialist"),
screen.getByPlaceholderText("e.g. 100"),
screen.getByPlaceholderText("e.g. seo-agent (from workspace-configs-templates/)"),
] as HTMLInputElement[];
const selects = [
screen.getByLabelText("Runtime"),
screen.getByLabelText("Workspace Template"),
] as HTMLSelectElement[];
const ids = [...inputs, ...selects].map((i) => i.id).filter(Boolean);
const ids = inputs.map((i) => i.id).filter(Boolean);
const unique = new Set(ids);
expect(unique.size).toBe(ids.length); // no duplicates
expect(ids.length).toBe(5);
expect(ids.length).toBe(4);
});
it("Name label text contains the required asterisk indicator", async () => {
@@ -1,7 +1,7 @@
// @vitest-environment jsdom
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, fireEvent, waitFor, cleanup } from "@testing-library/react";
import { CreateWorkspaceButton } from "../CreateWorkspaceDialog";
import { CreateWorkspaceButton, HERMES_PROVIDERS } from "../CreateWorkspaceDialog";
vi.mock("@/lib/api", () => ({
api: {
@@ -20,63 +20,10 @@ const SAMPLE_WORKSPACES = [
{ id: "ws-2", name: "Research Agent", tier: 2 },
];
const SAMPLE_TEMPLATES = [
{
id: "claude-code-default",
name: "Claude Code Agent",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
providers: ["platform", "minimax", "kimi-coding", "anthropic", "anthropic-oauth"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: [] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
{ id: "kimi-k2-turbo-preview", name: "Kimi K2 Turbo Preview", required_env: ["KIMI_API_KEY"] },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", required_env: ["ANTHROPIC_API_KEY"] },
{ id: "sonnet", name: "Claude Sonnet", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
{ id: "opus", name: "Claude Opus", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
{ id: "haiku", name: "Claude Haiku", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
],
},
{
id: "seo-agent",
name: "SEO Agent",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
providers: ["platform", "minimax", "kimi-coding", "anthropic", "anthropic-oauth"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: [] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
{ id: "kimi-k2-turbo-preview", name: "Kimi K2 Turbo Preview", required_env: ["KIMI_API_KEY"] },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", required_env: ["ANTHROPIC_API_KEY"] },
{ id: "sonnet", name: "Claude Sonnet", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
{ id: "opus", name: "Claude Opus", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
{ id: "haiku", name: "Claude Haiku", required_env: ["CLAUDE_CODE_OAUTH_TOKEN"] },
],
},
{
id: "hermes",
name: "Hermes",
runtime: "hermes",
model: "openai/gpt-4o",
providers: ["openai", "anthropic", "platform"],
models: [
{ id: "openai/gpt-4o", name: "GPT-4o", required_env: ["OPENAI_API_KEY"] },
{ id: "anthropic/claude-sonnet-4-5", name: "Claude Sonnet 4.5", required_env: ["ANTHROPIC_API_KEY"] },
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: [] },
],
},
];
beforeEach(() => {
vi.clearAllMocks();
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_TEMPLATES as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockGet.mockResolvedValue(SAMPLE_WORKSPACES as any);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
mockPost.mockResolvedValue({} as any);
});
@@ -95,14 +42,7 @@ async function openDialog() {
async function setTemplate(value: string) {
fireEvent.change(
screen.getByLabelText("Workspace Template"),
{ target: { value } }
);
}
async function setRuntime(value: string) {
fireEvent.change(
screen.getByLabelText("Runtime"),
screen.getByPlaceholderText("e.g. seo-agent (from workspace-configs-templates/)"),
{ target: { value } }
);
}
@@ -199,34 +139,11 @@ describe("CreateWorkspaceDialog", () => {
volume: { root_gb: 30 },
display: { mode: "none" },
});
expect(body.model).toBe("moonshot/kimi-k2.6");
expect(body.llm_provider).toBe("platform");
expect(body.runtime).toBe("claude-code");
expect(body.model).toBe("MiniMax-M2.7");
expect(body.llm_provider).toBe("minimax");
expect(body.secrets).toBeUndefined();
});
it("keeps runtime and workspace template as separate selectors", async () => {
await openDialog();
const runtimeSelect = screen.getByLabelText("Runtime") as HTMLSelectElement;
const runtimeTexts = Array.from(runtimeSelect.options).map((o) => o.text.trim());
expect(runtimeTexts).toEqual([
"Claude Code",
"OpenAI Codex CLI",
"Google ADK",
"Hermes",
"OpenClaw",
]);
expect(runtimeTexts).not.toContain("SEO Agent");
await waitFor(() => {
const templateSelect = screen.getByLabelText("Workspace Template") as HTMLSelectElement;
const templateTexts = Array.from(templateSelect.options).map((o) => o.text.trim());
expect(templateTexts).toContain("SEO Agent");
expect(templateTexts).not.toContain("Hermes");
});
});
it("does not send managed compute for external agents", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
@@ -255,8 +172,8 @@ describe("CreateWorkspaceDialog", () => {
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.model).toBe("moonshot/kimi-k2.6");
expect(body.llm_provider).toBe("platform");
expect(body.model).toBe("MiniMax-M2.7");
expect(body.llm_provider).toBe("minimax");
expect(body.compute).toEqual({
instance_type: "t3.xlarge",
volume: { root_gb: 80 },
@@ -274,8 +191,8 @@ describe("CreateWorkspaceDialog", () => {
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "BYOK Agent" },
});
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "minimax|MINIMAX_API_KEY" },
fireEvent.change(document.getElementById("llm-auth-mode") as HTMLSelectElement, {
target: { value: "api_key" },
});
fireEvent.change(document.getElementById("llm-secret-input") as HTMLInputElement, {
target: { value: "sk-minimax-test" },
@@ -296,11 +213,8 @@ describe("CreateWorkspaceDialog", () => {
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "OAuth Agent" },
});
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "anthropic-oauth|CLAUDE_CODE_OAUTH_TOKEN" },
});
fireEvent.change(document.querySelector("[data-testid='model-select']") as HTMLSelectElement, {
target: { value: "sonnet" },
fireEvent.change(document.getElementById("llm-auth-mode") as HTMLSelectElement, {
target: { value: "oauth" },
});
fireEvent.change(document.getElementById("llm-secret-input") as HTMLInputElement, {
target: { value: "oauth-token" },
@@ -316,18 +230,6 @@ describe("CreateWorkspaceDialog", () => {
expect(body.secrets).toEqual({ CLAUDE_CODE_OAUTH_TOKEN: "oauth-token" });
});
it("lists all Claude Code subscription aliases for blank workspaces", async () => {
await openDialog();
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "anthropic-oauth|CLAUDE_CODE_OAUTH_TOKEN" },
});
const modelSelect = document.querySelector("[data-testid='model-select']") as HTMLSelectElement;
const optionValues = Array.from(modelSelect.options).map((option) => option.value);
expect(optionValues).toEqual(expect.arrayContaining(["sonnet", "opus", "haiku"]));
});
it("renders gracefully when GET /workspaces fails", async () => {
mockGet.mockRejectedValueOnce(new Error("Network error"));
await openDialog();
@@ -342,103 +244,225 @@ describe("CreateWorkspaceDialog", () => {
});
// ---------------------------------------------------------------------------
// Dynamic runtime provider picker tests
// Hermes provider picker tests
// ---------------------------------------------------------------------------
describe("CreateWorkspaceDialog — dynamic runtime provider picker", () => {
it("does not render the old Hermes-only provider section", async () => {
describe("CreateWorkspaceDialog — Hermes provider picker", () => {
it("does NOT show hermes provider section for non-hermes templates", async () => {
await openDialog();
await setRuntime("hermes");
await setTemplate("seo-agent");
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeNull();
});
it("derives Hermes provider and model options from the /templates runtime row", async () => {
it("shows hermes provider section when template is 'hermes'", async () => {
await openDialog();
await setRuntime("hermes");
const providerSelect = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
await waitFor(() => expect(providerSelect.options.length).toBe(4));
const providerValues = Array.from(providerSelect.options).map((option) => option.value);
expect(providerValues).toEqual(expect.arrayContaining([
"platform|",
"openai|OPENAI_API_KEY",
"anthropic|ANTHROPIC_API_KEY",
]));
expect(providerValues).not.toContain("gemini|GEMINI_API_KEY");
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
});
it("uses the template-declared default provider/model for Hermes", async () => {
it("shows hermes provider section for template 'HERMES' (case-insensitive)", async () => {
await openDialog();
await setRuntime("hermes");
await waitFor(() => {
const providerSelect = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(providerSelect.value).toBe("platform|");
});
const modelSelect = document.querySelector("[data-testid='model-select']") as HTMLSelectElement;
expect(modelSelect.value).toBe("moonshot/kimi-k2.6");
await setTemplate("HERMES");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
});
it("prompts for the provider credential required by the selected Hermes model", async () => {
it("hermes provider dropdown defaults to 'anthropic'", async () => {
await openDialog();
await setRuntime("hermes");
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
expect(providerSelect).toBeTruthy();
expect(providerSelect.value).toBe("anthropic");
});
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "openai|OPENAI_API_KEY" },
it("hermes provider dropdown lists all 15 providers", async () => {
await openDialog();
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
expect(providerSelect.options.length).toBe(HERMES_PROVIDERS.length);
const ids = Array.from(providerSelect.options).map((o) => o.value);
expect(ids).toContain("anthropic");
expect(ids).toContain("openai");
expect(ids).toContain("gemini");
expect(ids).toContain("deepseek");
expect(ids).toContain("hermes");
});
// Pins the dynamic-providers behavior: when the matched template's
// /templates row declares `providers`, the dropdown filters to that
// subset instead of showing the full HERMES_PROVIDERS catalog. Same
// data source ConfigTab uses (PR #2454) — keeps the modal and the
// settings tab honest about which providers a template supports.
it("hermes provider dropdown filters to template-declared providers when /templates ships them", async () => {
// Per-URL mock: /workspaces returns the existing fixture, /templates
// returns a hermes row that only allows anthropic + minimax + openai.
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
return [
{ id: "hermes", name: "Hermes", runtime: "hermes", providers: ["anthropic", "minimax", "openai"] },
// eslint-disable-next-line @typescript-eslint/no-explicit-any
] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
const keyInput = document.getElementById("llm-secret-input") as HTMLInputElement;
await openDialog();
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
// Filtered list arrives async after /templates fetch resolves —
// keep waiting until the dropdown shrinks below the full catalog.
await waitFor(() => expect(providerSelect.options.length).toBe(3));
const ids = Array.from(providerSelect.options).map((o) => o.value);
expect(ids).toEqual(expect.arrayContaining(["anthropic", "minimax", "openai"]));
expect(ids).not.toContain("gemini");
expect(ids).not.toContain("deepseek");
});
// Back-compat: a template that hasn't migrated to runtime_config.providers
// (older templates, self-hosted setups without /templates server) keeps
// showing the full provider catalog. Operators picking from those
// templates can't be locked out of providers we know hermes supports.
it("hermes provider dropdown falls back to all providers when template declares no providers list", async () => {
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// No `providers` field — empty/missing → fall back to full catalog.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return [{ id: "hermes", name: "Hermes", runtime: "hermes" }] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
await openDialog();
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
expect(providerSelect.options.length).toBe(HERMES_PROVIDERS.length);
});
// Defensive: a template's declared list with NO matches against our
// static catalog (e.g. a brand-new provider id we don't have label/
// envVar metadata for yet) must not render an empty <select> — the
// operator can't pick a provider, the form locks. Component falls
// back to the full catalog so the user can still proceed.
it("hermes provider dropdown falls back to all providers when template declares only unknown providers", async () => {
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
return [
{ id: "hermes", name: "Hermes", runtime: "hermes", providers: ["totally-new-provider-2030"] },
// eslint-disable-next-line @typescript-eslint/no-explicit-any
] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
await openDialog();
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
// Stays at full catalog length — no flapping to 0 then back.
expect(providerSelect.options.length).toBe(HERMES_PROVIDERS.length);
});
it("hermes API key field is a password input (masked)", async () => {
await openDialog();
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
const keyInput = document.getElementById("hermes-api-key-input") as HTMLInputElement;
expect(keyInput).toBeTruthy();
expect(keyInput.type).toBe("password");
});
it("shows an error if the selected runtime provider requires a credential", async () => {
it("shows an error if hermes template is set but API key is empty on submit", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Hermes Agent" },
});
await setRuntime("hermes");
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "openai|OPENAI_API_KEY" },
});
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
// Submit without API key
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => {
const alert = screen.getByRole("alert");
expect(alert.textContent).toContain("Provider credential");
expect(alert.textContent).toContain("API key");
});
expect(mockPost).not.toHaveBeenCalled();
});
it("includes runtime-derived provider/model/secrets in POST body", async () => {
it("includes secrets in POST body with correct env var for selected provider", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Hermes OpenAI" },
});
await setRuntime("hermes");
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "openai|OPENAI_API_KEY" },
});
fireEvent.change(document.getElementById("llm-secret-input") as HTMLInputElement, {
target: { value: "sk-openai-test" },
target: { value: "Hermes Agent" },
});
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
// Fill in the API key
const keyInput = document.getElementById("hermes-api-key-input") as HTMLInputElement;
fireEvent.change(keyInput, { target: { value: "sk-test-anthropic-key" } });
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.secrets).toEqual({ ANTHROPIC_API_KEY: "sk-test-anthropic-key" });
expect(body.template).toBe("hermes");
});
it("uses the correct env var when a non-default provider is selected", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Hermes OpenAI" },
});
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
// Switch to openai
const providerSelect = document.getElementById("hermes-provider-select") as HTMLSelectElement;
fireEvent.change(providerSelect, { target: { value: "openai" } });
const keyInput = document.getElementById("hermes-api-key-input") as HTMLInputElement;
fireEvent.change(keyInput, { target: { value: "sk-openai-test" } });
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.runtime).toBe("hermes");
expect(body.template).toBeUndefined();
expect(body.model).toBe("openai/gpt-4o");
expect(body.llm_provider).toBe("openai");
expect(body.secrets).toEqual({ OPENAI_API_KEY: "sk-openai-test" });
});
it("does NOT include secrets field when provider is platform-managed", async () => {
it("does NOT include secrets field when template is not hermes", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Normal Agent" },
@@ -452,127 +476,19 @@ describe("CreateWorkspaceDialog — dynamic runtime provider picker", () => {
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.secrets).toBeUndefined();
});
});
// ---------------------------------------------------------------------------
// Registry-backed provider catalog (RFC#340 Fix C)
//
// Regression guard for the mis-bucketing bug: when a registry-backed
// claude-code template serves `moonshot/kimi-k2.6` whose DERIVED provider is
// `platform`, the dialog must build the dropdown from registry_providers/
// registry_models (buildProviderCatalogFromRegistry) — NOT the legacy
// inferVendor heuristic which slash-splits the id into "moonshot". The
// distinguishing trait of this fixture: the plain `models[]` array does NOT
// carry an explicit `provider` field, so the LEGACY path would bucket the
// model under "moonshot" and send llm_provider:"moonshot". Only the
// registry-backed path yields the Platform bucket + llm_provider:"platform".
// ---------------------------------------------------------------------------
// claude-code template whose plain models[] is UN-annotated (no explicit
// provider). The derived-provider annotation lives ONLY in registry_models.
const REGISTRY_TEMPLATE = {
id: "claude-code-default",
name: "Claude Code Agent",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
// Legacy fields — note: NO explicit provider on the platform model, so the
// legacy inferVendor path would slash-split it into "moonshot".
providers: ["platform", "minimax", "anthropic"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", required_env: [] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", required_env: ["ANTHROPIC_API_KEY"] },
],
// Registry-served SSOT (internal#718 P3). DeriveProvider resolved
// moonshot/kimi-k2.6 → "platform"; MiniMax-M2.7 → "minimax".
registry_backed: true,
registry_providers: [
{ name: "platform", display_name: "Platform", auth_env: [], billing_mode: "platform_managed" },
{ name: "minimax", display_name: "MiniMax", auth_env: ["MINIMAX_API_KEY"], billing_mode: "byok" },
{ name: "anthropic", display_name: "Anthropic API", auth_env: ["ANTHROPIC_API_KEY"], billing_mode: "byok" },
],
registry_models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", billing_mode: "platform_managed" },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", provider: "minimax", billing_mode: "byok" },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", provider: "anthropic", billing_mode: "byok" },
],
};
describe("CreateWorkspaceDialog — registry-backed provider catalog (RFC#340 Fix C)", () => {
beforeEach(() => {
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return [REGISTRY_TEMPLATE] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
});
it("shows the Platform provider bucket for the registry-backed claude-code runtime", async () => {
it("hides hermes section and resets state when template is cleared", async () => {
await openDialog();
const providerSelect = await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(sel).toBeTruthy();
return sel;
});
const labels = Array.from(providerSelect.options).map((o) => o.text.trim());
// Registry display_name "Platform" appears — NOT "moonshot" from the
// legacy slash-split heuristic.
expect(labels).toContain("Platform");
expect(labels).not.toContain("moonshot");
// Bucket id is the registry-keyed id, vendor is the bare provider name.
const values = Array.from(providerSelect.options).map((o) => o.value);
expect(values).toContain("registry|platform");
});
await setTemplate("hermes");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeTruthy()
);
it("sends llm_provider: platform (not moonshot) for moonshot/kimi-k2.6", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Kimi Agent" },
});
// Wait for the registry default to settle on the Platform bucket + model.
await waitFor(() => {
const modelSelect = document.querySelector("[data-testid='model-select']") as HTMLSelectElement;
expect(modelSelect?.value).toBe("moonshot/kimi-k2.6");
});
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.model).toBe("moonshot/kimi-k2.6");
expect(body.llm_provider).toBe("platform");
// Platform is auth-env-free → no BYOK secret.
expect(body.secrets).toBeUndefined();
});
it("buckets MiniMax-M2.7 under its derived provider and sends llm_provider: minimax", async () => {
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "MiniMax Agent" },
});
await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(Array.from(sel.options).map((o) => o.value)).toContain("registry|minimax");
});
fireEvent.change(document.querySelector("[data-testid='provider-select']") as HTMLSelectElement, {
target: { value: "registry|minimax" },
});
fireEvent.change(document.getElementById("llm-secret-input") as HTMLInputElement, {
target: { value: "sk-minimax-test" },
});
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.model).toBe("MiniMax-M2.7");
expect(body.llm_provider).toBe("minimax");
expect(body.secrets).toEqual({ MINIMAX_API_KEY: "sk-minimax-test" });
// Clear template
await setTemplate("");
await waitFor(() =>
expect(document.querySelector("[data-testid='hermes-provider-section']")).toBeNull()
);
});
});
@@ -96,12 +96,12 @@ vi.mock("@/lib/design-tokens", () => ({
// ─── Fixtures ─────────────────────────────────────────────────────────────────
const TEMPLATE = {
id: "seo-agent",
name: "SEO Agent",
description: "SEO workspace template",
id: "tpl-1",
name: "Claude Code Agent",
description: "A general-purpose coding assistant",
tier: 2,
skill_count: 3,
model: "MiniMax-M2.7",
model: "claude-opus-4-5",
};
function template(overrides: Partial<typeof TEMPLATE> = {}): typeof TEMPLATE {
@@ -159,7 +159,7 @@ describe("EmptyState — loading", () => {
it("does not render template buttons while loading", async () => {
renderEmpty();
await flush();
expect(screen.queryByText("SEO Agent")).toBeNull();
expect(screen.queryByText("Claude Code Agent")).toBeNull();
});
});
@@ -183,8 +183,8 @@ describe("EmptyState — templates", () => {
it("renders template buttons with name and description", async () => {
renderEmpty();
await flush();
expect(screen.getByText("SEO Agent")).toBeTruthy();
expect(screen.getByText("SEO workspace template")).toBeTruthy();
expect(screen.getByText("Claude Code Agent")).toBeTruthy();
expect(screen.getByText("A general-purpose coding assistant")).toBeTruthy();
});
it("renders tier badge and skill count", async () => {
@@ -198,42 +198,25 @@ describe("EmptyState — templates", () => {
it("renders model name when present", async () => {
renderEmpty();
await flush();
expect(screen.getByText(/MiniMax-M2.7/i)).toBeTruthy();
expect(screen.getByText(/claude-opus/i)).toBeTruthy();
});
it("calls deploy with the template on click", async () => {
renderEmpty();
await flush();
fireEvent.click(screen.getByText("SEO Agent"));
fireEvent.click(screen.getByText("Claude Code Agent"));
expect(_deploy.deployFn).toHaveBeenCalledWith(template());
});
it("hides runtime-default templates from the product template grid", async () => {
mockApiGet.mockResolvedValue([
template({ id: "claude-code-default", name: "Claude Code Agent" }),
template({ id: "codex", name: "OpenAI Codex CLI" }),
template({ id: "hermes", name: "Hermes Agent" }),
template({ id: "openclaw", name: "OpenClaw Agent" }),
template(),
]);
renderEmpty();
await flush();
expect(screen.getByText("SEO Agent")).toBeTruthy();
expect(screen.queryByText("Claude Code Agent")).toBeNull();
expect(screen.queryByText("OpenAI Codex CLI")).toBeNull();
expect(screen.queryByText("Hermes Agent")).toBeNull();
expect(screen.queryByText("OpenClaw Agent")).toBeNull();
});
it("shows 'Deploying...' on the button of the template being deployed", async () => {
_deploy.deploying = "seo-agent";
_deploy.deploying = "tpl-1";
renderEmpty();
await flush();
expect(screen.getByText("Deploying...")).toBeTruthy();
});
it("disables the template button of the deploying template", async () => {
_deploy.deploying = "seo-agent";
_deploy.deploying = "tpl-1";
renderEmpty();
await flush();
const btn = screen.getByText("Deploying...").closest("button") as HTMLButtonElement;
@@ -241,7 +224,7 @@ describe("EmptyState — templates", () => {
});
it("disables 'create blank' while a template is deploying", async () => {
_deploy.deploying = "seo-agent";
_deploy.deploying = "tpl-1";
renderEmpty();
await flush();
expect(screen.getByRole("button", { name: "+ Create blank workspace" }).disabled).toBe(true);
@@ -262,7 +245,7 @@ describe("EmptyState — fetch failure / empty templates", () => {
it("does not render template grid when GET /templates returns []", async () => {
renderEmpty();
await flush();
expect(screen.queryByText("SEO Agent")).toBeNull();
expect(screen.queryByText("Claude Code Agent")).toBeNull();
});
it("renders 'create blank' button when templates list is empty", async () => {
@@ -275,7 +258,7 @@ describe("EmptyState — fetch failure / empty templates", () => {
mockApiGet.mockReset().mockRejectedValue(new Error("Network failure"));
renderEmpty();
await flush();
expect(screen.queryByText("SEO Agent")).toBeNull();
expect(screen.queryByText("Claude Code Agent")).toBeNull();
});
});
@@ -333,7 +316,7 @@ describe("EmptyState — create blank", () => {
await flush();
fireEvent.click(screen.getByRole("button", { name: "+ Create blank workspace" }));
await act(async () => { await Promise.resolve(); });
expect((screen.getByText("SEO Agent").closest("button") as HTMLButtonElement).disabled).toBe(true);
expect((screen.getByText("Claude Code Agent").closest("button") as HTMLButtonElement).disabled).toBe(true);
});
it("shows error banner when POST /workspaces fails", async () => {
@@ -402,31 +402,6 @@ describe("MissingKeysModal — add keys and deploy", () => {
expect(onKeysAdded).toHaveBeenCalled();
});
it("shows optional keys without blocking deploy", () => {
const onKeysAdded = vi.fn();
render(
<MissingKeysModal
open={true}
missingKeys={[]}
optionalKeys={["GOOGLE_GSC_SITE"]}
runtime="claude-code"
title="Configure Workspace"
onKeysAdded={onKeysAdded}
onCancel={vi.fn()}
/>
);
expect(screen.getByText("Optional")).toBeTruthy();
expect(screen.getAllByText("GOOGLE_GSC_SITE").length).toBeGreaterThan(0);
const deployBtn = Array.from(document.querySelectorAll("button")).find(
(b) => b.textContent?.trim() === "Deploy",
);
expect(deployBtn).toBeTruthy();
expect(deployBtn!.disabled).toBe(false);
act(() => { fireEvent.click(deployBtn!); });
expect(onKeysAdded).toHaveBeenCalled();
});
it("shows global error when not all keys saved", async () => {
const onKeysAdded = vi.fn();
render(
@@ -554,4 +529,4 @@ describe("MissingKeysModal — cancel and settings", () => {
);
expect(screen.queryByRole("button", { name: /open settings/i })).toBeNull();
});
});
});
@@ -1,110 +0,0 @@
// @vitest-environment jsdom
//
// internal#718 P3 (retire-list #4) — when GET /templates serves a
// registry-backed selectable list (registry_providers + registry_models with
// display_name / billing_mode / derived provider), the canvas builds the
// provider catalog FROM that registry data instead of re-inferring vendor
// from model-id prefixes (VENDOR_LABELS / BARE_VENDOR_PATTERNS / inferVendor).
// The heuristic path stays only as the fallback for non-registry runtimes /
// older backends.
import { describe, it, expect } from "vitest";
import {
buildProviderCatalogFromRegistry,
type RegistryProvider,
type RegistryModel,
} from "../ProviderModelSelector";
// Mirrors the registry-served claude-code payload from GET /templates
// (registry_providers / registry_models). display_name + billing_mode come
// from the registry, NOT from the canvas VENDOR_LABELS map.
const CLAUDE_CODE_REGISTRY_PROVIDERS: RegistryProvider[] = [
{
name: "anthropic-oauth",
display_name: "Claude Code subscription",
auth_env: ["CLAUDE_CODE_OAUTH_TOKEN"],
billing_mode: "byok",
},
{
name: "anthropic-api",
display_name: "Anthropic API",
auth_env: ["ANTHROPIC_API_KEY"],
billing_mode: "byok",
},
{
name: "platform",
display_name: "Platform",
auth_env: ["ANTHROPIC_API_KEY", "MOLECULE_LLM_USAGE_TOKEN"],
billing_mode: "platform_managed",
},
];
const CLAUDE_CODE_REGISTRY_MODELS: RegistryModel[] = [
{ id: "sonnet", provider: "anthropic-oauth", billing_mode: "byok" },
{ id: "opus", provider: "anthropic-oauth", billing_mode: "byok" },
{ id: "claude-opus-4-7", provider: "anthropic-api", billing_mode: "byok" },
{ id: "anthropic/claude-opus-4-7", provider: "platform", billing_mode: "platform_managed" },
];
describe("buildProviderCatalogFromRegistry", () => {
it("buckets models by their DERIVED registry provider, not by inferred vendor", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
const byVendor = new Map(catalog.map((p) => [p.vendor, p]));
// anthropic-oauth bucket holds the two OAuth-derived models.
const oauth = byVendor.get("anthropic-oauth");
expect(oauth).toBeDefined();
expect(oauth!.models.map((m) => m.id).sort()).toEqual(["opus", "sonnet"]);
// platform bucket holds the platform-namespaced model.
const platform = byVendor.get("platform");
expect(platform).toBeDefined();
expect(platform!.models.map((m) => m.id)).toEqual(["anthropic/claude-opus-4-7"]);
});
it("labels providers from the registry display_name, not VENDOR_LABELS", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
const oauth = catalog.find((p) => p.vendor === "anthropic-oauth");
// Registry display_name "Claude Code subscription" (decorated with the
// model count by the catalog builder is acceptable; assert it carries the
// registry label, not an inferred one).
expect(oauth!.label).toContain("Claude Code subscription");
});
it("carries the registry billing_mode per provider", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
expect(catalog.find((p) => p.vendor === "anthropic-oauth")!.billingMode).toBe("byok");
expect(catalog.find((p) => p.vendor === "platform")!.billingMode).toBe("platform_managed");
});
it("surfaces the registry auth_env on the provider entry", () => {
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
CLAUDE_CODE_REGISTRY_MODELS,
);
expect(catalog.find((p) => p.vendor === "anthropic-oauth")!.envVars).toEqual([
"CLAUDE_CODE_OAUTH_TOKEN",
]);
});
it("only includes providers that actually have at least one served model", () => {
// anthropic-api is a registry provider but has no model in this slice →
// it should not appear as an empty bucket.
const models: RegistryModel[] = [
{ id: "sonnet", provider: "anthropic-oauth", billing_mode: "byok" },
];
const catalog = buildProviderCatalogFromRegistry(
CLAUDE_CODE_REGISTRY_PROVIDERS,
models,
);
expect(catalog.map((p) => p.vendor)).toEqual(["anthropic-oauth"]);
});
});
@@ -44,14 +44,6 @@ const HERMES_MODELS: SelectorModel[] = [
];
describe("inferVendor", () => {
it("uses explicit provider metadata before slug heuristics", () => {
expect(inferVendor({
id: "moonshot/kimi-k2.6",
provider: "platform",
required_env: [],
})).toBe("platform");
});
it("uses slash prefix when present", () => {
expect(inferVendor({ id: "nousresearch/hermes-4-70b", required_env: ["HERMES_API_KEY"] }))
.toBe("nousresearch");
@@ -113,22 +105,6 @@ describe("buildProviderCatalog", () => {
expect(oauth!.models.map((m) => m.id).sort()).toEqual(["haiku", "opus", "sonnet"]);
});
it("labels explicit platform-managed providers", () => {
const catalog = buildProviderCatalog([
{
id: "moonshot/kimi-k2.6",
name: "Kimi K2.6",
provider: "platform",
required_env: [],
},
]);
expect(catalog[0]).toMatchObject({
vendor: "platform",
label: "Platform",
envVars: [],
});
});
it("flags wildcard providers", () => {
const catalog = buildProviderCatalog(HERMES_MODELS);
const hf = catalog.find((p) => p.vendor === "huggingface");
@@ -189,23 +189,6 @@ describe("TemplatePalette — sidebar", () => {
expect(screen.getByText("Researcher")).toBeTruthy();
});
it("hides runtime-default templates from the deployable product template list", async () => {
mockGet.mockResolvedValue([
{ id: "claude-code-default", name: "Claude Code Agent", description: "", tier: 4, skills: [] },
{ id: "codex", name: "OpenAI Codex CLI", description: "", tier: 4, skills: [] },
{ id: "hermes", name: "Hermes Agent", description: "", tier: 4, skills: [] },
{ id: "openclaw", name: "OpenClaw Agent", description: "", tier: 4, skills: [] },
{ id: "seo-agent", name: "SEO Agent", description: "SEO workspace template", tier: 4, skills: ["seo"] },
]);
render(<TemplatePalette />);
await openSidebar();
expect(screen.getByText("SEO Agent")).toBeTruthy();
expect(screen.queryByText("Claude Code Agent")).toBeNull();
expect(screen.queryByText("OpenAI Codex CLI")).toBeNull();
expect(screen.queryByText("Hermes Agent")).toBeNull();
expect(screen.queryByText("OpenClaw Agent")).toBeNull();
});
it("shows template description", async () => {
mockGet.mockResolvedValue(MOCK_TEMPLATES);
render(<TemplatePalette />);
@@ -68,11 +68,7 @@ afterEach(() => {
function ShortcutTestComponent() {
useKeyboardShortcuts();
return (
<div data-testid="canvas-root">
<div data-testid="display-stream" data-display-stream="true" />
</div>
);
return <div data-testid="canvas-root" />;
}
function renderWithProvider() {
@@ -82,13 +78,6 @@ function renderWithProvider() {
// ─── Tests ───────────────────────────────────────────────────────────────────
describe("Esc — deselect / close context menu", () => {
it("does not handle keys targeted at the display stream", () => {
mockStoreState.contextMenu = { x: 100, y: 100, nodeId: "n1" };
const { getByTestId } = renderWithProvider();
fireEvent.keyDown(getByTestId("display-stream"), { key: "Escape" });
expect(mockStoreState.closeContextMenu).not.toHaveBeenCalled();
});
it("closes the context menu when one is open", () => {
mockStoreState.contextMenu = { x: 100, y: 100, nodeId: "n1" };
renderWithProvider();
@@ -28,14 +28,12 @@ function hasChildren(nodeId: string, nodes: Node<WorkspaceNodeData>[]): boolean
export function useKeyboardShortcuts() {
useEffect(() => {
const handler = (e: KeyboardEvent) => {
const target = e.target as HTMLElement;
if (target.closest?.('[data-display-stream="true"]')) return;
const tag = target.tagName;
const tag = (e.target as HTMLElement).tagName;
const inInput =
tag === "INPUT" ||
tag === "TEXTAREA" ||
tag === "SELECT" ||
target.isContentEditable;
(e.target as HTMLElement).isContentEditable;
if (e.key === "Escape") {
const state = useCanvasStore.getState();
@@ -131,7 +131,7 @@ export function OrgTokensTab() {
<button
onClick={handleCreate}
disabled={creating}
className="px-3 py-1.5 bg-accent-strong/20 hover:bg-accent-strong/30 border border-accent/30 rounded-lg text-[11px] text-accent font-medium transition-colors disabled:opacity-50 disabled:cursor-not-allowed flex items-center gap-1.5 focus:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
className="px-3 py-1.5 bg-accent-strong/20 hover:bg-accent-strong/30 border border-accent/30 rounded-lg text-[11px] text-accent font-medium transition-colors disabled:opacity-50 disabled:cursor-not-allowed flex items-center gap-1.5"
>
{creating ? (
<>
@@ -175,7 +175,7 @@ export function OrgTokensTab() {
)}
{error && (
<div role="alert" aria-live="assertive" className="px-3 py-2 bg-red-950/40 border border-red-800/50 rounded-lg text-[10px] text-bad">
<div className="px-3 py-2 bg-red-950/40 border border-red-800/50 rounded-lg text-[10px] text-bad">
{error}
</div>
)}
+2 -2
View File
@@ -152,7 +152,7 @@ export function SecretRow({ secret, workspaceId }: SecretRowProps) {
className="secret-row__action-btn"
title="Edit"
>
<span aria-hidden="true"></span>
</button>
<button
type="button"
@@ -161,7 +161,7 @@ export function SecretRow({ secret, workspaceId }: SecretRowProps) {
className="secret-row__action-btn secret-row__action-btn--delete"
title="Delete"
>
<span aria-hidden="true">🗑</span>
🗑
</button>
</div>
</div>
+2 -2
View File
@@ -121,7 +121,7 @@ function WorkspaceTokensTab({ workspaceId }: TokensTabProps) {
<button
onClick={handleCreate}
disabled={creating}
className="px-3 py-1.5 bg-accent-strong/20 hover:bg-accent-strong/30 border border-accent/30 rounded-lg text-[11px] text-accent font-medium transition-colors disabled:opacity-50 disabled:cursor-not-allowed flex items-center gap-1.5 focus:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
className="px-3 py-1.5 bg-accent-strong/20 hover:bg-accent-strong/30 border border-accent/30 rounded-lg text-[11px] text-accent font-medium transition-colors disabled:opacity-50 disabled:cursor-not-allowed flex items-center gap-1.5"
>
{creating ? <><Spinner size="sm" /> Creating...</> : '+ New Token'}
</button>
@@ -155,7 +155,7 @@ function WorkspaceTokensTab({ workspaceId }: TokensTabProps) {
)}
{error && (
<div role="alert" aria-live="assertive" className="px-3 py-2 bg-red-950/40 border border-red-800/50 rounded-lg text-[10px] text-bad">
<div className="px-3 py-2 bg-red-950/40 border border-red-800/50 rounded-lg text-[10px] text-bad">
{error}
</div>
)}
+137 -158
View File
@@ -7,28 +7,10 @@ import { api } from "@/lib/api";
// Types
// ---------------------------------------------------------------------------
// Period keys MUST match the server SSOT (workspace-server budget_periods.go).
type BudgetPeriod = "hourly" | "daily" | "weekly" | "monthly";
const PERIODS: { key: BudgetPeriod; label: string }[] = [
{ key: "hourly", label: "Hourly" },
{ key: "daily", label: "Daily" },
{ key: "weekly", label: "Weekly" },
{ key: "monthly", label: "Monthly" },
];
interface PeriodBudget {
limit: number | null; // USD cents; null = no limit
spend: number; // rolling-window spend, USD cents
remaining: number | null; // null when no limit
}
interface BudgetData {
periods?: Partial<Record<BudgetPeriod, PeriodBudget>>;
// legacy fields (pre-multi-period server) — tolerated for back-compat
budget_limit?: number | null;
monthly_spend?: number;
budget_remaining?: number | null;
budget_limit: number | null;
budget_used?: number; // optional — provisioning-stuck workspaces return partial shapes
budget_remaining: number | null;
}
interface Props {
@@ -44,71 +26,31 @@ function isApiError402(e: unknown): boolean {
return e instanceof Error && /: 402( |$)/.test(e.message);
}
/** USD cents → "$X.XX". */
function fmtUSD(cents: number): string {
return `$${(cents / 100).toLocaleString(undefined, { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;
}
/** Normalize the server payload (multi-period or legacy) into a period map. */
function periodsFrom(data: BudgetData | null): Record<BudgetPeriod, PeriodBudget> {
const base: Record<BudgetPeriod, PeriodBudget> = {
hourly: { limit: null, spend: 0, remaining: null },
daily: { limit: null, spend: 0, remaining: null },
weekly: { limit: null, spend: 0, remaining: null },
monthly: { limit: null, spend: 0, remaining: null },
};
if (!data) return base;
if (data.periods) {
for (const { key } of PERIODS) {
const p = data.periods[key];
if (p) base[key] = { limit: p.limit ?? null, spend: p.spend ?? 0, remaining: p.remaining ?? null };
}
return base;
}
// legacy: map the single monthly limit/spend
base.monthly = {
limit: data.budget_limit ?? null,
spend: data.monthly_spend ?? 0,
remaining: data.budget_remaining ?? null,
};
return base;
}
// ---------------------------------------------------------------------------
// Component
// ---------------------------------------------------------------------------
/**
* BudgetSection — per-workspace LLM budget, four independent rolling windows
* (hourly / daily / weekly / monthly). Each period has its own ceiling (USD);
* spend is the rolling-window LLM cost. Crossing ANY period blocks new work
* (server returns 402). Sends PATCH {budget_limits:{period:cents|null}}.
* BudgetSection — dedicated "Budget" section in the workspace details panel.
*
* - Fetches GET /workspaces/:id/budget on mount for live usage stats
* - Shows a progress bar (budget_used / budget_limit, blue-500, capped 100%)
* - Allows updating budget_limit via PATCH /workspaces/:id/budget
* - Shows a 402-specific "Budget exceeded" amber banner for any blocked state
*/
export function BudgetSection({ workspaceId }: Props) {
const [budget, setBudget] = useState<BudgetData | null>(null);
const [loading, setLoading] = useState(true);
const [fetchError, setFetchError] = useState<string | null>(null);
// One input per period, in USD cents (string for controlled inputs).
const [limitInputs, setLimitInputs] = useState<Record<BudgetPeriod, string>>({
hourly: "",
daily: "",
weekly: "",
monthly: "",
});
const [limitInput, setLimitInput] = useState("");
const [saving, setSaving] = useState(false);
const [saveError, setSaveError] = useState<string | null>(null);
/** True when a 402 has been seen from any API call in this section. */
const [budgetExceeded, setBudgetExceeded] = useState(false);
const syncInputs = useCallback((data: BudgetData | null) => {
const p = periodsFrom(data);
setLimitInputs({
hourly: p.hourly.limit != null ? String(p.hourly.limit) : "",
daily: p.daily.limit != null ? String(p.daily.limit) : "",
weekly: p.weekly.limit != null ? String(p.weekly.limit) : "",
monthly: p.monthly.limit != null ? String(p.monthly.limit) : "",
});
}, []);
// ── Fetch current budget data ─────────────────────────────────────────────
const loadBudget = useCallback(async () => {
setLoading(true);
@@ -116,7 +58,7 @@ export function BudgetSection({ workspaceId }: Props) {
try {
const data = await api.get<BudgetData>(`/workspaces/${workspaceId}/budget`);
setBudget(data);
syncInputs(data);
setLimitInput(data.budget_limit != null ? String(data.budget_limit) : "");
} catch (e) {
if (isApiError402(e)) {
setBudgetExceeded(true);
@@ -126,30 +68,29 @@ export function BudgetSection({ workspaceId }: Props) {
} finally {
setLoading(false);
}
}, [workspaceId, syncInputs]);
}, [workspaceId]);
useEffect(() => {
loadBudget();
}, [loadBudget]);
// ── Save handler ──────────────────────────────────────────────────────────
const handleSave = async () => {
setSaving(true);
setSaveError(null);
// Build the per-period map: blank → null (clear); a number → that ceiling.
const budget_limits: Record<BudgetPeriod, number | null> = {
hourly: null,
daily: null,
weekly: null,
monthly: null,
};
for (const { key } of PERIODS) {
const raw = limitInputs[key].trim();
budget_limits[key] = raw !== "" ? parseInt(raw, 10) : null;
}
const raw = limitInput.trim();
// Use explicit empty-string check (not falsy check) so that a
// user-entered "0" is sent as budget_limit: 0, not null (unlimited).
const parsedLimit = raw !== "" ? parseInt(raw, 10) : null;
try {
const updated = await api.patch<BudgetData>(`/workspaces/${workspaceId}/budget`, { budget_limits });
const updated = await api.patch<BudgetData>(`/workspaces/${workspaceId}/budget`, {
budget_limit: parsedLimit,
});
setBudget(updated);
syncInputs(updated);
setLimitInput(updated.budget_limit != null ? String(updated.budget_limit) : "");
// Clear exceeded state if the save succeeded (limit was raised or removed)
setBudgetExceeded(false);
} catch (e) {
if (isApiError402(e)) {
@@ -162,15 +103,24 @@ export function BudgetSection({ workspaceId }: Props) {
}
};
const periods = periodsFrom(budget);
// ── Progress calculation ──────────────────────────────────────────────────
const progressPct =
budget && budget.budget_limit != null && budget.budget_limit > 0
? Math.min(100, Math.round(((budget.budget_used ?? 0) / budget.budget_limit) * 100))
: 0;
// ── Render ────────────────────────────────────────────────────────────────
return (
<div className="space-y-3" data-testid="budget-section">
{/* Section header */}
<div>
<h3 className="text-xs font-semibold text-ink-mid uppercase tracking-wider">Budget</h3>
<h3 className="text-xs font-semibold text-ink-mid uppercase tracking-wider">
Budget
</h3>
<p className="text-[11px] text-ink-mid mt-0.5">
Cap LLM spend for this workspace per period crossing any limit pauses new work
Limit total message credits for this workspace
</p>
</div>
@@ -181,14 +131,32 @@ export function BudgetSection({ workspaceId }: Props) {
data-testid="budget-exceeded-banner"
className="flex items-center gap-2 px-3 py-2 rounded-lg bg-surface border border-amber-700/50 text-warm text-xs font-medium"
>
<svg width="13" height="13" viewBox="0 0 13 13" fill="none" aria-hidden="true" className="shrink-0">
<path d="M6.5 1.5L11.5 10.5H1.5L6.5 1.5Z" stroke="currentColor" strokeWidth="1.4" strokeLinejoin="round" />
<path d="M6.5 5.5V7.5M6.5 9.5h.01" stroke="currentColor" strokeWidth="1.4" strokeLinecap="round" />
<svg
width="13"
height="13"
viewBox="0 0 13 13"
fill="none"
aria-hidden="true"
className="shrink-0"
>
<path
d="M6.5 1.5L11.5 10.5H1.5L6.5 1.5Z"
stroke="currentColor"
strokeWidth="1.4"
strokeLinejoin="round"
/>
<path
d="M6.5 5.5V7.5M6.5 9.5h.01"
stroke="currentColor"
strokeWidth="1.4"
strokeLinecap="round"
/>
</svg>
Budget exceeded new work paused
Budget exceeded messages blocked
</div>
)}
{/* Usage stats */}
{loading ? (
<p className="text-xs text-ink-mid" data-testid="budget-loading">
Loading
@@ -197,78 +165,89 @@ export function BudgetSection({ workspaceId }: Props) {
<p className="text-xs text-bad" data-testid="budget-fetch-error">
{fetchError}
</p>
) : (
<div className="space-y-3">
{PERIODS.map(({ key, label }) => {
const p = periods[key];
const pct =
p.limit != null && p.limit > 0 ? Math.min(100, Math.round((p.spend / p.limit) * 100)) : 0;
const over = p.limit != null && p.spend >= p.limit;
return (
<div key={key} className="space-y-1" data-testid={`budget-period-${key}`}>
<div className="flex items-baseline justify-between">
<label htmlFor={`budget-${key}-${workspaceId}`} className="text-xs text-ink-mid">
{label}
</label>
<span className="text-[11px] font-mono text-ink-mid">
<span data-testid={`budget-${key}-spend`}>{fmtUSD(p.spend)}</span>
<span className="mx-1">/</span>
<span data-testid={`budget-${key}-limit`}>{p.limit != null ? fmtUSD(p.limit) : "∞"}</span>
</span>
</div>
{p.limit != null && (
<div
role="progressbar"
aria-label={`${label} budget usage`}
aria-valuenow={pct}
aria-valuemin={0}
aria-valuemax={100}
className="h-1.5 w-full rounded-full bg-surface-card overflow-hidden"
>
<div
data-testid={`budget-${key}-fill`}
className={`h-full rounded-full transition-all duration-300 ${over ? "bg-bad" : "bg-accent"}`}
style={{ width: `${pct}%` }}
/>
</div>
)}
<input
id={`budget-${key}-${workspaceId}`}
type="number"
min="0"
step="1"
value={limitInputs[key]}
onChange={(e) => setLimitInputs((s) => ({ ...s, [key]: e.target.value }))}
placeholder="USD cents — blank for unlimited"
data-testid={`budget-${key}-input`}
className="w-full bg-surface-card border border-line rounded-lg px-3 py-1.5 text-xs text-ink-mid placeholder-zinc-500 focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/30 transition-colors"
/>
</div>
);
})}
) : budget ? (
<div className="space-y-2">
{/* Stats row */}
<div className="flex items-baseline justify-between" data-testid="budget-stats-row">
<span className="text-xs text-ink-mid">Credits used</span>
<span className="text-xs font-mono text-ink-mid">
<span data-testid="budget-used-value">{(budget.budget_used ?? 0).toLocaleString()}</span>
<span className="text-ink-mid mx-1">/</span>
<span data-testid="budget-limit-value">
{budget.budget_limit != null
? budget.budget_limit.toLocaleString()
: "Unlimited"}
</span>
</span>
</div>
<p className="text-[11px] text-ink-mid">Limits are USD cents (e.g. 500 = $5.00). Blank = unlimited.</p>
{saveError && (
{/* Progress bar (only when limit is set) */}
{budget.budget_limit != null && (
<div
role="alert"
data-testid="budget-save-error"
className="px-3 py-1.5 rounded-lg bg-red-950/40 border border-red-800/50 text-xs text-bad"
role="progressbar"
aria-label="Budget usage"
aria-valuenow={progressPct}
aria-valuemin={0}
aria-valuemax={100}
className="h-1.5 w-full rounded-full bg-surface-card overflow-hidden"
>
{saveError}
<div
data-testid="budget-progress-fill"
className="h-full rounded-full bg-accent transition-all duration-300"
style={{ width: `${progressPct}%` }}
/>
</div>
)}
<button
onClick={handleSave}
disabled={saving}
data-testid="budget-save-btn"
className="px-4 py-1.5 bg-accent-strong hover:bg-accent active:bg-accent-strong rounded-lg text-xs font-medium text-white disabled:opacity-50 transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-900"
>
{saving ? "Saving…" : "Save"}
</button>
{/* Remaining credits */}
{budget.budget_remaining != null && (
<p className="text-[11px] text-ink-mid" data-testid="budget-remaining">
{budget.budget_remaining.toLocaleString()} credits remaining
</p>
)}
</div>
)}
) : null}
{/* Input + Save */}
<div className="space-y-1.5 pt-1">
<label
htmlFor={`budget-limit-input-${workspaceId}`}
className="text-[11px] text-ink-mid block"
>
Budget limit (credits)
</label>
<input
id={`budget-limit-input-${workspaceId}`}
type="number"
min="0"
step="1"
value={limitInput}
onChange={(e) => setLimitInput(e.target.value)}
placeholder="e.g. 1000 — blank for unlimited"
data-testid="budget-limit-input"
className="w-full bg-surface-card border border-line rounded-lg px-3 py-2 text-sm text-ink-mid placeholder-zinc-500 focus:outline-none focus:border-accent focus:ring-1 focus:ring-accent/30 transition-colors"
/>
<p className="text-xs text-ink-mid">Leave blank for unlimited</p>
{saveError && (
<div
role="alert"
data-testid="budget-save-error"
className="px-3 py-1.5 rounded-lg bg-red-950/40 border border-red-800/50 text-xs text-bad"
>
{saveError}
</div>
)}
<button
onClick={handleSave}
disabled={saving}
data-testid="budget-save-btn"
className="px-4 py-1.5 bg-accent-strong hover:bg-accent active:bg-accent-strong rounded-lg text-xs font-medium text-white disabled:opacity-50 transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-900"
>
{saving ? "Saving…" : "Save"}
</button>
</div>
</div>
);
}
+67 -227
View File
@@ -6,17 +6,12 @@ import { useCanvasStore } from "@/store/canvas";
import { type ConfigData, DEFAULT_CONFIG, TextInput, NumberInput, Toggle, TagList, Section } from "./config/form-inputs";
import { parseYaml, toYaml } from "./config/yaml-utils";
import { SecretsSection } from "./config/secrets-section";
import { LLMBillingSection } from "./config/llm-billing-section";
import { ExternalConnectionSection } from "./ExternalConnectionSection";
import {
ProviderModelSelector,
buildProviderCatalog,
buildProviderCatalogFromRegistry,
findProviderForModel,
type SelectorValue,
type ProviderEntry,
type RegistryProvider,
type RegistryModel,
} from "../ProviderModelSelector";
import { isExternalLikeRuntime } from "@/lib/externalRuntimes";
@@ -262,17 +257,6 @@ interface RuntimeOption {
// canvas falls back to deriving unique vendor prefixes from
// models[].id (still adapter-driven, just inferred).
providers: string[];
// registryBacked / registryProviders / registryModels come from the
// registry-served GET /templates fields (internal#718 P3). When
// registryBacked is true, the selectable provider+model list is built from
// the registry (registryProviders/registryModels) — display labels +
// billing mode + derived provider come from the provider-registry SSOT, not
// the canvas VENDOR_LABELS / billingModeForProvider vocabularies. When
// false (non-registry runtime / older backend), the canvas falls back to
// the template-served models[] + its inferVendor heuristic.
registryBacked: boolean;
registryProviders: RegistryProvider[];
registryModels: RegistryModel[];
}
// deriveProvidersFromModels — when a template doesn't ship an explicit
@@ -303,66 +287,6 @@ export function deriveProvidersFromModels(models: ModelSpec[]): string[] {
return out;
}
// billingModeForProvider — maps a selected PROVIDER (vendor key) to the
// LLM billing_mode it implies (internal#703 Gap 2).
//
// Today, picking a non-Platform provider in the Config tab writes the
// credential env (CLAUDE_CODE_OAUTH_TOKEN / vendor key) but leaves
// llm_billing_mode at its resolved default (`platform_managed`). The CP
// tenant_config endpoint then keeps injecting the platform proxy base
// URLs, so the OAuth token / vendor key is never actually used — BYOK
// silently no-ops (the live SEO-Agent symptom in #703). The workspace-
// server even hard-blocks vendor-key writes on platform_managed
// workspaces (secrets.go:87), pointing the user at this exact billing-
// mode switch. Wiring the provider change to also set billing_mode is
// the UI half that makes BYOK take (the CP/workspace-server backend half
// is being fixed in parallel — internal#703 Gap 1).
//
// Mapping:
// - "platform" (the Platform-managed proxy) OR "" (no explicit
// provider override → inherit, defaults to platform) → "platform_managed".
// - any other vendor key ("anthropic-oauth" = Claude Code subscription
// OAuth, "anthropic" = Anthropic API key, "minimax", "openrouter",
// etc.) → "byok".
//
// Returns the billing_mode string the PUT body should carry. The valid
// set is fixed by workspace-server's recognizer (platform_managed | byok
// | disabled); "disabled" is never auto-selected by a provider choice —
// it's an explicit operator action via the LLM Billing section.
export type LLMBillingMode = "platform_managed" | "byok";
export function billingModeForProvider(provider: string): LLMBillingMode {
const v = provider.trim().toLowerCase();
if (v === "" || v === "platform") return "platform_managed";
return "byok";
}
// billingModeForSelectedProvider — internal#718 P3 (retire-list #5): the
// billing mode the Config tab shows/sends for the selected PROVIDER, sourced
// from the registry-served catalog when available rather than the hardcoded
// billingModeForProvider rule.
//
// When the runtime is registry-backed, GET /templates serves each provider's
// DERIVED billing_mode (platform_managed for the closed platform provider,
// byok otherwise) on the ProviderEntry. We read it off the catalog so the UI
// reflects the registry SSOT — the same predicate billing/credential emission
// keys off the derived provider.
//
// Falls back to billingModeForProvider when: no catalog (non-registry runtime
// / older backend), or the provider string isn't carried by the catalog
// (e.g. a stale saved value). The fallback keeps the legacy behavior intact
// for everything the registry doesn't yet speak to.
export function billingModeForSelectedProvider(
provider: string,
catalog?: ProviderEntry[],
): LLMBillingMode {
if (catalog && catalog.length > 0) {
const entry = catalog.find((p) => p.vendor === provider.trim());
if (entry?.billingMode) return entry.billingMode;
}
return billingModeForProvider(provider);
}
// Fallback used when /templates can't be fetched (offline, older backend).
// Keep in sync with manifest.json workspace_templates as a defensive default.
// Model + env suggestions only flow when the backend is reachable.
@@ -377,20 +301,13 @@ export function billingModeForSelectedProvider(
// config.yaml` on the container is a separate runtime-internal file,
// not this one.
const RUNTIMES_WITH_OWN_CONFIG = new Set<string>(["external", "kimi", "kimi-cli", "openclaw"]);
// The runtime picker is SSOT-driven: options come from GET /templates,
// which workspace-server already gates to the manifest.json maintained set
// (loadRuntimesFromManifest). A hand-maintained frontend allowlist silently
// dropped runtimes the backend added (google-adk shipped in manifest but was
// filtered out, so its workspaces rendered the wrong default option). A
// template may still opt OUT of the picker via `displayable: false` on its
// /templates row. See project_canvas_runtime_dropdown_ssot_fix.
const SUPPORTED_RUNTIME_VALUES = new Set(["claude-code", "codex", "openclaw", "hermes"]);
const FALLBACK_RUNTIME_OPTIONS: RuntimeOption[] = [
{ value: "claude-code", label: "Claude Code", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "codex", label: "Codex", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "google-adk", label: "Google ADK", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "openclaw", label: "OpenClaw", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "hermes", label: "Hermes", models: [], providers: [], registryBacked: false, registryProviders: [], registryModels: [] },
{ value: "claude-code", label: "Claude Code", models: [], providers: [] },
{ value: "codex", label: "Codex", models: [], providers: [] },
{ value: "openclaw", label: "OpenClaw", models: [], providers: [] },
{ value: "hermes", label: "Hermes", models: [], providers: [] },
];
export function ConfigTab({ workspaceId }: Props) {
@@ -403,24 +320,15 @@ export function ConfigTab({ workspaceId }: Props) {
const [rawMode, setRawMode] = useState(false);
const [rawDraft, setRawDraft] = useState("");
const [runtimeOptions, setRuntimeOptions] = useState<RuntimeOption[]>(FALLBACK_RUNTIME_OPTIONS);
// internal#718 P4 closure: the explicit provider override
// (LLM_PROVIDER workspace_secret, surfaced via GET/PUT
// /workspaces/:id/provider) has been RETIRED. The provider is
// derived at every decision point from (runtime, model) via the
// registry — no stored row remains. The `provider` / `originalProvider`
// state and the provider dropdown survive in this component for
// backwards-compat (display only) but are no longer persisted:
// - loadConfig no longer GETs /workspaces/:id/provider (the
// endpoint returns 410 Gone). The state initializes to ""
// and stays there.
// - handleSave no longer PUTs /workspaces/:id/provider.
// - The dropdown still updates the local `provider` state so the
// user can preview the derived value; the value never leaves
// the browser.
// This is the canvas-side complement to the backend retirement of
// SetProvider/GetProvider/setProviderSecret. Older canvases that
// still call PUT /provider hit the 410 Gone with a structured
// PROVIDER_ENDPOINT_RETIRED code — loud failure, no silent miss.
// Provider override (Option B PR-5): stored separately from config.yaml
// because the value lives in workspace_secrets (encrypted), not in the
// platform-managed config.yaml. The two endpoints are GET/PUT
// /workspaces/:id/provider on workspace-server (handlers/secrets.go).
// Empty = "auto-derive from model slug prefix" — pre-Option-B behavior
// and what most users want. Setting to a non-empty value writes
// LLM_PROVIDER into workspace_secrets and triggers an auto-restart so
// the workspace boots with the new provider in env (and via CP user-
// data, written into /configs/config.yaml on next provision too).
const [provider, setProvider] = useState("");
const [originalProvider, setOriginalProvider] = useState("");
// Track the model the form first rendered, so handleSave can detect
@@ -471,23 +379,26 @@ export function ConfigTab({ workspaceId }: Props) {
//
// See GH #1894 for the workspace-row-as-source-of-truth rationale
// that motivated splitting from a single config.yaml read.
// internal#718 P4 closure: the GET /workspaces/:id/provider leg is
// RETIRED — the endpoint returns 410 Gone. Provider is now derived
// from (runtime, model) via the registry; no stored value exists
// to load. Always seed the local state to "" so the dropdown
// initializes to "auto-derive".
const [wsRes, modelRes] = await Promise.all([
const [wsRes, modelRes, providerRes] = await Promise.all([
api.get<{ runtime?: string; tier?: number }>(`/workspaces/${workspaceId}`)
.catch(() => ({} as { runtime?: string; tier?: number })),
api.get<{ model?: string }>(`/workspaces/${workspaceId}/model`)
.catch(() => ({} as { model?: string })),
api.get<{ provider?: string }>(`/workspaces/${workspaceId}/provider`)
.catch(() => null),
]);
const wsMetadataRuntime = (wsRes.runtime || "").trim();
const wsMetadataModel = (modelRes.model || "").trim();
const wsMetadataTier: number | null =
typeof wsRes.tier === "number" ? wsRes.tier : null;
setProvider("");
setOriginalProvider("");
if (providerRes !== null) {
const loadedProvider = (providerRes.provider || "").trim();
setProvider(loadedProvider);
setOriginalProvider(loadedProvider);
} else {
setProvider("");
setOriginalProvider("");
}
// originalModel is set further down once the YAML has been parsed —
// we want it to reflect what the form ACTUALLY rendered, which may
// be the YAML's runtime_config.model fallback when MODEL_PROVIDER
@@ -581,49 +492,20 @@ export function ConfigTab({ workspaceId }: Props) {
useEffect(() => {
let cancelled = false;
api.get<Array<{
id: string;
name?: string;
runtime?: string;
models?: ModelSpec[];
providers?: string[];
// internal#718 P3 registry-served fields (additive; absent on older
// backends and for non-registry runtimes).
registry_backed?: boolean;
registry_providers?: RegistryProvider[];
registry_models?: RegistryModel[];
displayable?: boolean;
}>>("/templates")
api.get<Array<{ id: string; name?: string; runtime?: string; models?: ModelSpec[]; providers?: string[] }>>("/templates")
.then((rows) => {
if (cancelled || !Array.isArray(rows)) return;
const byRuntime = new Map<string, RuntimeOption>();
for (const r of rows) {
const v = (r.runtime || "").trim();
if (!v) continue;
// Honor an explicit opt-out; absent/true means show it.
if (r.displayable === false) continue;
if (!SUPPORTED_RUNTIME_VALUES.has(v)) continue;
// Last template wins if two templates share a runtime — rare, and the
// one with the richer models list is probably newer.
const existing = byRuntime.get(v);
const models = Array.isArray(r.models) ? r.models : [];
const providers = Array.isArray(r.providers) ? r.providers : [];
const registryProviders = Array.isArray(r.registry_providers) ? r.registry_providers : [];
const registryModels = Array.isArray(r.registry_models) ? r.registry_models : [];
const registryBacked = r.registry_backed === true && registryModels.length > 0;
// Prefer the richer payload: a registry-backed entry, then more
// template models. Keeps the "last/richer template wins" intent.
const score = (o: RuntimeOption) => (o.registryBacked ? 1000 : 0) + o.models.length;
const candidate: RuntimeOption = {
value: v,
label: r.name || v,
models,
providers,
registryBacked,
registryProviders,
registryModels,
};
if (!existing || score(candidate) > score(existing)) {
byRuntime.set(v, candidate);
if (!existing || models.length > existing.models.length) {
byRuntime.set(v, { value: v, label: r.name || v, models, providers });
}
}
if (byRuntime.size > 0) setRuntimeOptions(Array.from(byRuntime.values()));
@@ -634,13 +516,7 @@ export function ConfigTab({ workspaceId }: Props) {
// Models + env hints for the currently-selected runtime.
const selectedRuntime = runtimeOptions.find((o) => o.value === (config.runtime || "")) ?? null;
// Memoised so its identity is stable across renders — it feeds several
// useMemo dependency arrays below (registry/legacy catalog, selector models)
// and a fresh `[]` literal each render would defeat their memoisation.
const availableModels: ModelSpec[] = useMemo(
() => selectedRuntime?.models ?? [],
[selectedRuntime?.models],
);
const availableModels: ModelSpec[] = selectedRuntime?.models ?? [];
// Provider suggestions for the legacy free-text input fallback (used
// when /templates returned no models for this runtime, e.g. hermes
// workspaces). Prefer the runtime's declarative providers list,
@@ -654,37 +530,9 @@ export function ConfigTab({ workspaceId }: Props) {
// Vendor-aware catalog shared with the selector. Memoised so the
// catalog identity is stable across renders (selector relies on it).
//
// internal#718 P3: when the runtime is registry-backed, build the catalog
// FROM the registry-served providers/models (display labels + billing +
// derived provider from the provider-registry SSOT) instead of re-inferring
// vendor from model-id prefixes. Falls back to the inferVendor heuristic
// for non-registry runtimes / older backends.
const registryBacked = selectedRuntime?.registryBacked ?? false;
const providerCatalog = useMemo(
() =>
registryBacked
? buildProviderCatalogFromRegistry(
selectedRuntime?.registryProviders ?? [],
selectedRuntime?.registryModels ?? [],
)
: buildProviderCatalog(availableModels),
[registryBacked, selectedRuntime?.registryProviders, selectedRuntime?.registryModels, availableModels],
);
// Models fed to the selector dropdown: the registry-served native set for a
// registry-backed runtime (so the dropdown can render no unregistered
// option), else the template-served models.
const selectorModels: ModelSpec[] = useMemo(
() =>
registryBacked
? (selectedRuntime?.registryModels ?? []).map((m) => ({
id: m.id,
name: m.name,
// carry the derived provider so the selector buckets correctly
...(m.provider ? { provider: m.provider } : {}),
}))
: availableModels,
[registryBacked, selectedRuntime?.registryModels, availableModels],
() => buildProviderCatalog(availableModels),
[availableModels],
);
// Derive the selector's current value from the form state. Provider
@@ -835,27 +683,23 @@ export function ConfigTab({ workspaceId }: Props) {
}
}
// internal#718 P4 closure: provider override save is RETIRED. The
// /workspaces/:id/provider endpoint returns 410 Gone; the provider
// is derived from (runtime, model) at every decision point via the
// registry. The local dropdown state still updates so the user can
// see the predicted provider, but it never round-trips to the
// server. Variables retained as locals (set to constants) so the
// downstream restart-suppress logic below has clear semantics
// and the diff against the prior shape stays small.
const providerSaveError: string | null = null;
const providerChanged = false;
// internal#718 P4 closure: provider → billing_mode linkage is also
// RETIRED. P2-B (#1972) moved the billing decision to
// ResolveLLMBillingModeDerived, which DERIVES the provider from
// (runtime, model) at every read. The canvas can no longer
// override it via a separate PUT, by design — the runtime+model
// selection IS the billing-mode selection. The
// /admin/workspaces/:id/llm-billing-mode endpoint still exists
// as the operator override surface (workspaces.llm_billing_mode
// column); it is no longer driven by the provider dropdown.
const billingModeSaveError: string | null = null;
// Provider override save (Option B PR-5). PUT only when the user
// changed the dropdown — otherwise an unrelated Save (e.g. tier
// edit) would re-write the provider unchanged and the server-
// side auto-restart would fire on every Save, costing the user a
// ~30s reboot for a no-op change. Server endpoint accepts an
// empty string to clear the override (deletes the
// workspace_secrets row); we forward whatever the form holds.
let providerSaveError: string | null = null;
const providerChanged = provider !== originalProvider;
if (providerChanged) {
try {
await api.put(`/workspaces/${workspaceId}/provider`, { provider });
setOriginalProvider(provider);
} catch (e) {
providerSaveError = e instanceof Error ? e.message : "Provider update was rejected";
}
}
setOriginalYaml(content);
if (rawMode) {
@@ -864,29 +708,28 @@ export function ConfigTab({ workspaceId }: Props) {
} else {
setRawDraft(content);
}
// internal#718 P4 closure: providerWillAutoRestart is always
// false now (provider PUT is retired; no server-side auto-restart
// can fire). Save+Restart flows through the canvas store
// restart path the same way it did pre-#718 for non-provider
// edits.
const providerWillAutoRestart = providerChanged && !providerSaveError
// SetProvider on the server already triggers an auto-restart for
// the workspace whenever the value actually changed (see
// workspace-server/internal/handlers/secrets.go:SetProvider). If
// the user also clicked Save+Restart we'd kick off a SECOND
// restart here and the two would race in the canvas store —
// suppress the redundant call and rely on the server-side one.
const providerWillAutoRestart = providerChanged && !providerSaveError;
if (restart && !providerWillAutoRestart) {
await useCanvasStore.getState().restartWorkspace(workspaceId);
} else if (!restart) {
useCanvasStore.getState().updateNodeData(workspaceId, { needsRestart: !providerWillAutoRestart });
}
// Aggregate partial-save errors. With provider+billing-mode PUTs
// retired, only modelSaveError can fire from the secret-mint side
// — the provider/billing branches are dead code retained as
// constant nils to keep the diff small. They are surfaced
// defensively in case a future re-enablement needs the wiring.
// Aggregate partial-save errors. Both modelSaveError and
// providerSaveError describe rejected updates from independent
// endpoints — show whichever fired so the user knows which
// field reverts on next reload (otherwise they'd see "Saved" and
// be confused why Provider snapped back).
const partialError = providerSaveError
? `Other fields saved, but provider update failed: ${providerSaveError}`
: billingModeSaveError
? `Provider saved, but switching billing mode failed — your own provider key/OAuth may not take effect until billing mode is set: ${billingModeSaveError}`
: modelSaveError
? `Other fields saved, but model update failed: ${modelSaveError}`
: null;
: modelSaveError
? `Other fields saved, but model update failed: ${modelSaveError}`
: null;
if (partialError) {
setError(partialError);
} else {
@@ -1004,10 +847,9 @@ export function ConfigTab({ workspaceId }: Props) {
— empty = "auto-derive from model slug" was the pre-PR-5
behavior; selecting any provider here writes LLM_PROVIDER
and triggers an auto-restart. */}
{selectorModels.length > 0 ? (
{availableModels.length > 0 ? (
<ProviderModelSelector
models={selectorModels}
catalog={registryBacked ? providerCatalog : undefined}
models={availableModels}
value={selectorValue}
onChange={(next) => {
setSelectorValue(next);
@@ -1020,7 +862,7 @@ export function ConfigTab({ workspaceId }: Props) {
setConfig((prev) => {
const v = next.model;
const prevModelId = prev.runtime_config?.model || prev.model || "";
const prevSpec = selectorModels.find((m) => m.id === prevModelId) ?? null;
const prevSpec = availableModels.find((m) => m.id === prevModelId) ?? null;
const prevRequired = prev.runtime_config?.required_env ?? [];
const wasTemplateDriven =
prevRequired.length === 0 ||
@@ -1266,8 +1108,6 @@ export function ConfigTab({ workspaceId }: Props) {
</div>
</Section>
<LLMBillingSection workspaceId={workspaceId} />
<SecretsSection
workspaceId={workspaceId}
requiredEnv={config.runtime_config?.required_env}
@@ -29,15 +29,8 @@ type FormState = {
displayMode: string;
displayProtocol: string;
resolution: string;
dataPersistence: string; // "" (auto) | "persist" | "ephemeral" — internal#734
};
// internal#734: per-workspace durable-data choice. "" = auto (desktop-control
// keeps data, others follow the org default). Human labels for the selector.
const DATA_PERSISTENCE_OPTIONS = ["", "persist", "ephemeral"];
const dataPersistenceLabel = (v: string): string =>
v === "persist" ? "Always keep (persist)" : v === "ephemeral" ? "Don't keep (ephemeral)" : "Auto";
export function ContainerConfigTab({ workspaceId, data }: Props) {
const runtime = data.runtime;
const instanceType = data.compute?.instance_type;
@@ -46,10 +39,9 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
const displayProtocol = data.compute?.display?.protocol;
const displayWidth = data.compute?.display?.width;
const displayHeight = data.compute?.display?.height;
const dataPersistence = data.compute?.data_persistence;
const initial = useMemo(
() => formFromData({ runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight, dataPersistence }),
[runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight, dataPersistence],
() => formFromData({ runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight }),
[runtime, instanceType, rootGB, displayMode, displayProtocol, displayWidth, displayHeight],
);
const [form, setForm] = useState<FormState>(initial);
const [saving, setSaving] = useState(false);
@@ -92,8 +84,6 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
display: form.displayEnabled
? { mode: form.displayMode, protocol: form.displayProtocol, width, height }
: { mode: "none" },
// internal#734: omit when "auto" so the wire/default behavior is unchanged.
...(form.dataPersistence ? { data_persistence: form.dataPersistence } : {}),
};
const resp = await api.patch<{ needs_restart?: boolean }>(`/workspaces/${workspaceId}`, {
@@ -186,18 +176,6 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
onChange={(resolution) => setForm((s) => ({ ...s, resolution }))}
/>
)}
<SelectField
id="data-persistence"
label="Saved data (cookies, downloads, memory)"
value={form.dataPersistence}
options={DATA_PERSISTENCE_OPTIONS}
optionLabel={dataPersistenceLabel}
onChange={(dataPersistence) => setForm((s) => ({ ...s, dataPersistence }))}
/>
<p className="-mt-1 text-[10px] leading-snug text-ink-soft">
Whether this workspace&apos;s data survives a restart/recreate. Auto keeps it for
browser (desktop) workspaces; Ephemeral never keeps it (privacy).
</p>
</div>
<div className="mt-4 flex items-center justify-end gap-2">
@@ -253,7 +231,6 @@ function formFromData(data: {
displayProtocol?: string;
displayWidth?: number;
displayHeight?: number;
dataPersistence?: string;
}): FormState {
const width = data.displayWidth ?? 1920;
const height = data.displayHeight ?? 1080;
@@ -266,7 +243,6 @@ function formFromData(data: {
displayMode: data.displayMode && data.displayMode !== "none" ? data.displayMode : "desktop-control",
displayProtocol: data.displayProtocol || "novnc",
resolution,
dataPersistence: data.dataPersistence || "",
};
}

Some files were not shown because too many files have changed in this diff Show More