Compare commits

..

1 Commits

Author SHA1 Message Date
Molecule AI Dev Engineer B (MiniMax) 1e1790c98b fix(observability): enrich server log on CommitMemory plugin error
The POST /workspaces/:id/memories handler returns a generic
HTTP 500 'failed to store memory' when the underlying v2
memory plugin's CommitMemory call errors. The current
log.Printf('Commit memory error (plugin): %v', err) emits
only the error — operators have no workspace, scope, or
namespace context to diagnose recurring main incidents
(continuous-synth E2E + HMA memory-commit both currently
fail with this 500; the backend error is swallowed).

Fix: enrich the server-side log line with workspaceID, the
requested scope, the resolved v2 namespace, a structured
err_class=<type> field (for log-aggregator filtering),
and the quoted err text (preserves trailing whitespace /
special chars that %v would munge).

Hard constraint (same discipline as the #2392 leak fix):
the underlying err stays server-log-only. The HTTP
response body is UNCHANGED — still 500 'failed to store
memory' with no plugin error leaked to the client.

No behavior change to the write path itself. The change
is one log.Printf line + a 9-line comment explaining the
no-leak discipline. The new log line is:

  log.Printf('Commit memory plugin error: workspace=%s scope=%s namespace=%s err_class=%T err=%q',
      workspaceID, body.Scope, nsName, err, err)

Unblocks operator diagnosis of the memory-v2 backend
without changing the client surface or weakening the
server's error-disclosure posture.
2026-06-08 03:27:25 +00:00
140 changed files with 927 additions and 13140 deletions
-247
View File
@@ -1,247 +0,0 @@
#!/usr/bin/env python3
"""
SSOT fail-closed approval validator (SEV-1 internal#812).
This module is the SINGLE source of truth for whether a Gitea review counts
as a "genuine" approval. Both consumers must call into it — they MUST NOT
duplicate the predicate:
- .gitea/scripts/gitea-merge-queue.py (Python) — imports directly.
- .gitea/scripts/review-check.sh (bash, jq) — calls the Python helper
at .gitea/scripts/_review_check_filter.py, which in turn calls this
module. There is no separate jq / bash copy of the predicate; a
reviewer who wants to weaken the gate has to weaken this one file.
# The fail-closed contract
A review counts as a GENUINE APPROVED on the current head ONLY IF ALL hold:
1. state == "APPROVED"
2. official == true
3. dismissed != true
4. stale != true
5. commit_id is present and equals the PR's current head SHA
ANY failure of any of the above → REJECT.
# The bug this fixes
The previous gitea-merge-queue.py predicate had a `if isinstance(commit_id,
str) and commit_id and headsha:` guard that *skipped* the commit_id check
when the review carried no commit_id. The previous review-check.sh jq
filter required `commit_id == $head`, which is also implicitly fail-closed
on missing commit_id (null != head), but only one of the two consumers
behaved correctly — a code-drift trap.
Both behaviors are now defined here, as a single fail-closed predicate.
A MISSING commit_id is the Gitea row signature of a spoofed or pre-commit
review: a real reviewer cannot have submitted against a commit that
doesn't exist. Accepting these is exactly the fail-open that SEV-1
internal#812 describes and the re-opened path that closed #843 (with CR2
+ Researcher both flagging it) addresses.
# Mutation-resistance
The unit tests in tests/test_approval_validator.py assert rejection
explicitly for each fail-closed case (missing commit_id, stale head,
non-official, dismissed, etc.). A reviewer who tries to weaken the
predicate by removing the commit_id check, by re-introducing the
"no commit_id is accepted" escape hatch, or by changing `!=` to `==`
in the head comparison will trip those tests in CI.
"""
from __future__ import annotations
from typing import Iterable, Optional, Tuple
# ---------------------------------------------------------------------------
# Canonical Gitea review-state enum (EXACT match -- no case coercion).
# ---------------------------------------------------------------------------
#
# Gitea's reviews API emits review.state as one of a fixed set of
# UPPERCASE string constants: "APPROVED", "REQUEST_CHANGES",
# "REQUEST_REVIEW", "COMMENT", "PENDING", "DISMISSED" (verified
# against the live API across real molecule-core PRs). They are ALWAYS
# uppercase on the wire.
#
# FAIL-CLOSED: we compare review.state to these constants with EXACT
# equality. The previous code used str(state or "").upper(), which
# coerced a lowercase/mixed-case "approved" or "request_changes" into
# the canonical value and ACCEPTED it. A real Gitea row never carries a
# lowercase state, so a case-variant value is the signature of a
# hand-forged / spoofed row, not a legitimate review. Coercing it was a
# residual fail-open (SEV-1 internal#812, RCs 9849/9851/9852). We reject
# anything that is not byte-for-byte the canonical constant.
STATE_APPROVED = "APPROVED"
STATE_REQUEST_CHANGES = "REQUEST_CHANGES"
# ---------------------------------------------------------------------------
# Shared predicate — fail-closed on every condition
# ---------------------------------------------------------------------------
def is_official_current_head(review: object, headsha: object) -> bool:
"""Common predicate: review is official, not dismissed, not stale, and
bound to the PR's current head SHA. EVERY condition is mandatory and
fail-closed. Both is_genuine_approval and is_open_request_changes build
on this so the rule cannot drift between the two cases.
`official` is checked with `is not True` (NOT `not review.get("official")`).
The latter is truthy on the string "false" or the integer 1, which is
exactly the fail-open surface we are closing here — a non-boolean
pass-through is treated as official. Gitea emits a real boolean, so
the stricter check rejects anything that isn't literally True.
"""
if not isinstance(review, dict):
return False
if review.get("official") is not True:
return False
if review.get("dismissed"):
return False
if review.get("stale"):
return False
commit_id = review.get("commit_id")
# FAIL-CLOSED: a missing/empty/non-string commit_id is REJECTED. The
# previous code had `if isinstance(commit_id, str) and commit_id and
# headsha:` which SKIPPED the check when the review carried no
# commit_id. That was the spoof-bug surface.
if not isinstance(commit_id, str) or not commit_id:
return False
# FAIL-CLOSED: a present-but-wrong commit_id is also REJECTED. Stale
# reviews (on a previous head) cannot count.
if not isinstance(headsha, str) or not headsha or commit_id != headsha:
return False
return True
# ---------------------------------------------------------------------------
# Per-verdict predicates
# ---------------------------------------------------------------------------
def is_genuine_approval(
review: object,
*,
headsha: str,
reviewer_set: Optional[Iterable[str]] = None,
) -> bool:
"""Return True iff `review` is a genuine APPROVED on the current head.
When `reviewer_set` is provided, the review's `user.login` must be in
the set (the merge-queue uses this to count only "recognised"
reviewers for the 2-genuine floor; review-check.sh applies its own
team-membership probe separately and so does not pass a set).
"""
if not isinstance(review, dict):
return False
# EXACT-ENUM (fail-closed): no .upper()/.strip() coercion. A
# case-variant or whitespace-padded state is a forged row and is
# rejected, not normalised into APPROVED.
if review.get("state") != STATE_APPROVED:
return False
if not is_official_current_head(review, headsha):
return False
if reviewer_set is not None:
user = (review.get("user") or {}).get("login")
if not isinstance(user, str) or user not in set(reviewer_set):
return False
return True
def is_open_request_changes(review: object, *, headsha: str) -> bool:
"""Return True iff `review` is an open official REQUEST_CHANGES on the
current head. Same fail-closed contract as is_genuine_approval —
a missing commit_id is REJECTED, not silently treated as 'still
blocking the merge from an old head'.
"""
if not isinstance(review, dict):
return False
# EXACT-ENUM (fail-closed): same contract as is_genuine_approval. A
# lowercase/mixed-case "request_changes" must NOT be coerced into a
# block-erasing match; an exact REQUEST_CHANGES is required.
if review.get("state") != STATE_REQUEST_CHANGES:
return False
if not is_official_current_head(review, headsha):
return False
return True
# ---------------------------------------------------------------------------
# Consumer-facing reducer (returns the two call sites need)
# ---------------------------------------------------------------------------
def classify_reviews(
reviews: Iterable[object],
*,
headsha: str,
reviewer_set: Optional[Iterable[str]] = None,
) -> Tuple[set[str], list[str]]:
"""Reduce a PR's reviews to (approvers, request_changes) on the CURRENT head.
approvers: distinct logins whose LATEST official review on the current
head is APPROVED.
request_changes: distinct logins whose LATEST official review on the
current head is REQUEST_CHANGES.
Gitea returns reviews oldest-first. We keep the latest *VALID*
submission per user (later VALID entries overwrite earlier ones; an
invalid later row — a COMMENT, or a review with a null/old commit_id —
is ignored and can NOT overwrite or erase a genuine review). See the
inline VALIDATE-BEFORE-REDUCE note below for the exploit this closes.
"""
reviewer_set_set = set(reviewer_set) if reviewer_set is not None else None
# VALIDATE-BEFORE-REDUCE (SEV-1 internal#812 follow-up).
#
# The earlier implementation reduced FIRST (latest row per user, keyed
# only on state in {APPROVED, REQUEST_CHANGES}) and validated the single
# surviving row AFTER. That is reduce-before-validate, and it is
# exploitable: a user posts a genuine current-head APPROVED (or
# REQUEST_CHANGES), then posts a LATER row that fails the fail-closed
# predicate (a COMMENT, or an APPROVED with a null/old commit_id). The
# later INVALID row overwrote the genuine one in latest_by_user, so a
# real approval was masked, and — worse — a real current-head
# REQUEST_CHANGES could be erased and the block silently evaporate.
#
# The fix: filter to VALID reviews FIRST (each row must pass
# is_official_current_head AND carry an APPROVED/REQUEST_CHANGES state),
# and only then reduce to the latest VALID review per user. An invalid
# later row is never eligible to become a user's "latest" state, so it
# cannot overwrite or erase a genuine review. A user's verdict is the
# state of their latest VALID (official, current-head, non-dismissed,
# non-stale, commit_id-present-and-matching) review.
latest_valid_by_user: dict = {}
for review in reviews:
if not isinstance(review, dict):
continue
user = (review.get("user") or {}).get("login")
if not isinstance(user, str):
continue
if reviewer_set_set is not None and user not in reviewer_set_set:
continue
# EXACT-ENUM (fail-closed): exact constants only, no coercion. A
# case-coerced row must not become eligible to overwrite/erase a
# genuine per-user verdict in the reduce below.
state = review.get("state")
if state not in (STATE_APPROVED, STATE_REQUEST_CHANGES):
continue
# Fail-closed predicate BEFORE the reduce: official, not dismissed,
# not stale, commit_id present AND == head. Invalid rows are dropped
# here and so can never become the per-user "latest".
if not is_official_current_head(review, headsha):
continue
latest_valid_by_user[user] = review
approvers: set[str] = set()
request_changes: list[str] = []
for user, review in latest_valid_by_user.items():
# Each surviving review already passed is_official_current_head, so
# the state alone determines the verdict. We still go through the
# per-verdict SSOT predicates so the rule cannot drift.
if is_genuine_approval(review, headsha=headsha, reviewer_set=None):
approvers.add(user)
elif is_open_request_changes(review, headsha=headsha):
request_changes.append(user)
return approvers, request_changes
-74
View File
@@ -1,74 +0,0 @@
#!/usr/bin/env python3
"""
Helper for review-check.sh: applies the SSOT approval predicate to a
PR's reviews and prints the candidate approver logins on stdout (one per
line, de-duplicated, author excluded).
review-check.sh uses this in place of its previous inline jq filter so the
predicate is single-sourced. The jq filter is gone; if you want to change
the predicate, edit .gitea/scripts/_approval_validator.py, not this file.
Usage:
python3 _review_check_filter.py <reviews.json> <head-sha> <author-login>
Output:
- Candidate approver logins, one per line, de-duplicated, sorted.
- Excludes `author-login` (the PR author cannot approve their own PR).
- Empty output → review-check.sh interprets as "no candidates" and exits 1
after the team-membership probe.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
# Same-dir import — script lives next to _approval_validator.py
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _approval_validator import is_genuine_approval # noqa: E402
def main(argv: list[str]) -> int:
if len(argv) != 4:
print(
f"usage: {argv[0] if argv else '_review_check_filter.py'} "
"<reviews.json> <head-sha> <author-login>",
file=sys.stderr,
)
return 2
reviews_path = Path(argv[1])
headsha = argv[2]
author = argv[3]
try:
reviews = json.loads(reviews_path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError) as exc:
print(f"::error::could not read reviews JSON: {exc}", file=sys.stderr)
return 2
if not isinstance(reviews, list):
print("::error::reviews JSON was not a list", file=sys.stderr)
return 2
candidates: set[str] = set()
for review in reviews:
# We pass reviewer_set=None here because review-check.sh applies its
# own team-membership probe (CURL_AUTH_FILE + 200/204/403/404 logic)
# separately. The SSOT predicate enforces only the fail-closed
# commit_id / state / official / dismissed / stale contract here.
if not is_genuine_approval(review, headsha=headsha, reviewer_set=None):
continue
user = (review.get("user") or {}).get("login")
if not isinstance(user, str) or not user:
continue
if user == author:
continue
candidates.add(user)
for user in sorted(candidates):
print(user)
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv))
+52 -42
View File
@@ -105,12 +105,6 @@ import urllib.parse
import urllib.request
from typing import Any
# SSOT fail-closed approval predicate (SEV-1 internal#812). review-check.sh
# consumes the same module via _review_check_filter.py — do NOT duplicate
# the predicate here. See _approval_validator.py for the fail-closed contract.
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from _approval_validator import classify_reviews as _classify_reviews_ssot # noqa: E402
def _env(key: str, *, default: str = "") -> str:
return os.environ.get(key, default)
@@ -154,23 +148,10 @@ OPT_OUT_LABELS = {
# branch-protection configuration. These are the uniform-gate checks that
# must pass before any PR can merge (SOP tier removal makes them mandatory
# for all PRs, not just tier:medium/tier:high).
#
# Context names use the (pull_request_target) suffix (not pull_request)
# to match the workflow event_type that actually emits them — verified
# live against PR#2419/#2331/etc.: the qa-review/security-review
# workflows run on pull_request_target (their `on:` block uses
# pull_request_target, not pull_request), and sop-checklist's
# all-items-acked job also uses pull_request_target. The previous
# (pull_request) suffix never matched the live emitted contexts,
# which is what was painting ~16 ready PRs red (gate appeared
# "missing" qa-review/security-review even after both passed).
# Verified against the lint-bp-context-emit-match test which already
# asserts (pull_request_target) for these names. No requirement
# dropped; just a name correction.
GOVERNANCE_REQUIRED_CONTEXTS = [
"qa-review / approved (pull_request_target)",
"security-review / approved (pull_request_target)",
"sop-checklist / all-items-acked (pull_request_target)",
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
]
REQUIRED_CONTEXTS_RAW = _env(
"REQUIRED_CONTEXTS",
@@ -430,26 +411,57 @@ def get_branch_protection(branch: str) -> BranchProtection:
def genuine_approvals(
reviews: list[dict],
*,
headsha: str,
head_sha: str,
reviewer_set: set[str],
) -> tuple[set[str], list[str]]:
"""Thin wrapper over the SSOT predicate in _approval_validator.py.
"""Reduce a PR's reviews to genuine official approvals on the CURRENT head.
All logic — the per-review commit_id / state / official / dismissed /
stale contract — lives in _approval_validator.classify_reviews. This
wrapper exists only to keep the call site (and external readers of
the symbol) stable. Do NOT add any per-review logic here; if you need
to change the predicate, edit _approval_validator.py.
Returns (approvers, request_changes) where:
- approvers is the set of distinct logins (in reviewer_set) whose LATEST
review on the current head is an official, non-stale, non-dismissed
APPROVED, and
- request_changes is the list of logins (in reviewer_set) whose latest
official review on the current head is REQUEST_CHANGES.
See _approval_validator.py for the full fail-closed contract
(SEV-1 internal#812). The previous inline implementation had a
`if isinstance(commit_id, str) and commit_id and headsha:` guard that
silently accepted reviews with no commit_id; that fail-open surface is
now closed at the SSOT.
"Current head" is enforced two ways, because Gitea exposes both signals:
a review must be `official` and NOT `stale`/`dismissed`, AND when the
review carries a commit_id it must equal head_sha. A review with no
commit_id but stale=False/dismissed=False is accepted (older Gitea rows).
We take each reviewer's LATEST submission (reviews arrive oldest-first), so
a later REQUEST_CHANGES correctly supersedes an earlier APPROVED and vice
versa.
"""
return _classify_reviews_ssot(
reviews, headsha=headsha, reviewer_set=reviewer_set
)
latest_by_user: dict[str, dict] = {}
for review in reviews:
if not isinstance(review, dict):
continue
user = (review.get("user") or {}).get("login")
if not isinstance(user, str) or user not in reviewer_set:
continue
state = str(review.get("state") or "").upper()
if state not in {"APPROVED", "REQUEST_CHANGES"}:
continue # ignore COMMENT/PENDING/DISMISSED-state rows
# reviews are returned oldest-first; later entries overwrite → latest wins
latest_by_user[user] = review
approvers: set[str] = set()
request_changes: list[str] = []
for user, review in latest_by_user.items():
if not review.get("official"):
continue
if review.get("stale") or review.get("dismissed"):
continue
commit_id = review.get("commit_id")
if isinstance(commit_id, str) and commit_id and head_sha:
if commit_id != head_sha:
continue # review was on a previous head
state = str(review.get("state") or "").upper()
if state == "APPROVED":
approvers.add(user)
elif state == "REQUEST_CHANGES":
request_changes.append(user)
return approvers, request_changes
def get_pull_reviews(pr_number: int) -> list[dict]:
_, body = api("GET", f"/repos/{OWNER}/{NAME}/pulls/{pr_number}/reviews")
@@ -754,7 +766,7 @@ def list_queued_issues() -> list[dict]:
query={
"state": "open",
"type": "pulls",
"label": QUEUE_LABEL,
"labels": QUEUE_LABEL,
},
)
@@ -1122,7 +1134,7 @@ def _evaluate_candidate(
reviews = get_pull_reviews(pr_number)
approvers, request_changes = genuine_approvals(
reviews, headsha=head_sha, reviewer_set=REVIEWER_SET
reviews, head_sha=head_sha, reviewer_set=REVIEWER_SET
)
decision = evaluate_merge_readiness(
@@ -1158,9 +1170,7 @@ def enumerate_readiness(*, dry_run: bool = False) -> list[ReadinessEntry]:
post-batch summary can be printed.
"""
bp = get_branch_protection(WATCH_BRANCH)
# Uniform gate: governance checks are ALWAYS required, even if branch
# protection does not enumerate them. Deduplicate against BP list.
contexts = list(dict.fromkeys(bp.required_contexts + GOVERNANCE_REQUIRED_CONTEXTS))
contexts = bp.required_contexts
required_approvals = bp.required_approvals
main_sha = get_branch_head(WATCH_BRANCH)
+1 -1
View File
@@ -165,7 +165,7 @@ def api(
# Format: "<workflow_name> / <job_name_or_key> (<event>)"
# Examples observed on molecule-core/main:
# "Secret scan / Scan diff for credential-shaped strings (pull_request)"
# "sop-checklist / all-items-acked (pull_request)"
# " / tier-check (pull_request)"
#
# Split strategy: peel off the trailing ` (<event>)` first, then split
# the leading `<workflow> / <rest>` on the FIRST ` / ` (workflow names
+11 -7
View File
@@ -197,13 +197,17 @@ if [ "$HTTP_CODE" != "200" ]; then
exit 1
fi
# Filter via the SSOT fail-closed predicate in _approval_validator.py
# (same module gitea-merge-queue.py imports). The jq filter is gone
# entirely — any change to the predicate must be made in
# _approval_validator.py. See SEV-1 internal#812 for the fail-closed
# contract this closes.
SCRIPT_DIR_HERE="$(cd "$(dirname "$0")" && pwd)"
REVIEW_CANDIDATES=$(python3 "$SCRIPT_DIR_HERE/_review_check_filter.py" "$REVIEWS_JSON" "$PR_HEAD_SHA" "$PR_AUTHOR")
# Filter: state=APPROVED, official=true, not-dismissed, non-author,
# commit_id matches current PR head. All conditions are mandatory.
JQ_FILTER='.[]
| select(.state == "APPROVED")
| select(.official == true)
| select(.dismissed != true)
| select(.user.login != $author)
| select(.commit_id == $head)
| .user.login'
REVIEW_CANDIDATES=$(jq -r --arg author "$PR_AUTHOR" --arg head "$PR_HEAD_SHA" "$JQ_FILTER" "$REVIEWS_JSON" | sort -u)
debug "candidate non-author approvers: $(echo "$REVIEW_CANDIDATES" | tr '\n' ' ')"
if [ -z "$REVIEW_CANDIDATES" ]; then
@@ -134,14 +134,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "user": {"login": "core-devops"}, "commit_id": "deadbeef0000111122223333444455556666"},
])
if sc == "T23_missing_commit_id":
# APPROVED review with NO commit_id field — the SEV-1
# internal#812 / closed-#843 spoof-bug signature. The
# fail-closed SSOT must REJECT (not silently accept as
# "older Gitea row" the way the old pre-fix code did).
return self._json(200, [
{"state": "APPROVED", "official": True, "dismissed": False, "user": {"login": "core-devops"}},
])
# Default: one non-author APPROVED (current head, official)
return self._json(200, [
{"state": "APPROVED", "dismissed": False, "official": True, "user": {"login": "core-devops"}, "commit_id": "deadbeef0000111122223333444455556666"},
@@ -1,610 +0,0 @@
#!/usr/bin/env python3
"""
Mutation-verified unit tests for the SSOT fail-closed approval predicate
in _approval_validator.py (SEV-1 internal#812).
Each test asserts REJECTION explicitly. A reviewer who weakens the
predicate — e.g., by removing the commit_id check, by reintroducing the
"no commit_id is accepted" escape hatch, by changing `!=` to `==` in the
head comparison, or by allowing official == false — will trip these
tests in CI.
Run:
cd .gitea/scripts
python3 -m unittest tests.test_approval_validator -v
# or
python3 tests/test_approval_validator.py
"""
from __future__ import annotations
import os
import sys
import unittest
# Same-dir import — test lives next to _approval_validator.py
sys.path.insert(
0,
os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
)
from _approval_validator import ( # noqa: E402
classify_reviews,
is_genuine_approval,
is_official_current_head,
is_open_request_changes,
)
HEAD = "0123456789abcdef0123456789abcdef01234567"
OTHER_HEAD = "fedcba9876543210fedcba9876543210fedcba98"
def _review(
*,
state: str = "APPROVED",
official: bool = True,
dismissed: bool = False,
stale: bool = False,
commit_id: object = HEAD,
user: str = "reviewer-1",
body: str = "",
) -> dict:
"""Build a minimal review row shaped like the Gitea reviews API."""
return {
"id": 1,
"user": {"login": user, "id": 1},
"body": body,
"state": state,
"official": official,
"dismissed": dismissed,
"stale": stale,
"commit_id": commit_id,
}
# ---------------------------------------------------------------------------
# Hard contract: every fail-closed branch must reject
# ---------------------------------------------------------------------------
class IsOfficialCurrentHeadFailClosed(unittest.TestCase):
"""is_official_current_head is the common predicate. EVERY condition
is mandatory. The tests below assert REJECTION for every possible
failure of any condition."""
def test_accepts_canonical_review(self):
self.assertTrue(is_official_current_head(_review(), HEAD))
def test_rejects_non_dict(self):
for bad in [None, "string", 42, [], (), object()]:
with self.subTest(bad=bad):
self.assertFalse(is_official_current_head(bad, HEAD))
def test_rejects_when_official_is_false(self):
for v in [False, None, 0, "false"]:
with self.subTest(v=v):
self.assertFalse(
is_official_current_head(_review(official=v), HEAD)
)
def test_rejects_when_dismissed(self):
for v in [True, "true", 1]:
with self.subTest(v=v):
self.assertFalse(
is_official_current_head(_review(dismissed=v), HEAD)
)
def test_rejects_when_stale(self):
for v in [True, "true", 1]:
with self.subTest(v=v):
self.assertFalse(
is_official_current_head(_review(stale=v), HEAD)
)
def test_rejects_when_commit_id_missing(self):
"""FAIL-CLOSED #1: missing commit_id is REJECTED.
This is the spoof signature that closed #843 (with CR2 + Researcher
both flagging it)."""
for bad in [None, "", 0, False, [], {}, ()]:
with self.subTest(commit_id=bad):
self.assertFalse(
is_official_current_head(_review(commit_id=bad), HEAD),
f"commit_id={bad!r} must reject (fail-closed)",
)
def test_rejects_when_commit_id_wrong_type(self):
for bad in [123, 1.5, True, ["abc"], {"sha": HEAD}, ("tuple",)]:
with self.subTest(commit_id=bad):
self.assertFalse(
is_official_current_head(_review(commit_id=bad), HEAD)
)
def test_rejects_when_commit_id_stale(self):
"""FAIL-CLOSED #2: present-but-wrong commit_id is REJECTED. Stale
reviews on a previous head cannot count."""
self.assertFalse(
is_official_current_head(_review(commit_id=OTHER_HEAD), HEAD)
)
def test_rejects_when_head_missing(self):
for bad in [None, "", 0, False]:
with self.subTest(head=bad):
self.assertFalse(
is_official_current_head(_review(), bad)
)
def test_rejects_when_head_wrong_type(self):
self.assertFalse(is_official_current_head(_review(), 123))
self.assertFalse(is_official_current_head(_review(), ["x"]))
# ---------------------------------------------------------------------------
# is_genuine_approval
# ---------------------------------------------------------------------------
class IsGenuineApprovalContract(unittest.TestCase):
def test_accepts_canonical_approval(self):
self.assertTrue(
is_genuine_approval(_review(state="APPROVED"), headsha=HEAD)
)
def test_rejects_non_approved_states(self):
for state in ("REQUEST_CHANGES", "COMMENT", "PENDING", "DISMISSED", "approve", "", "bogus"):
with self.subTest(state=state):
self.assertFalse(
is_genuine_approval(_review(state=state), headsha=HEAD)
)
def test_rejects_case_coerced_approved_states(self):
"""EXACT-ENUM fail-closed (RCs 9849/9851/9852): Gitea always emits
the canonical UPPERCASE "APPROVED". A lowercase/mixed-case/padded
value is the signature of a forged row and MUST be rejected, not
coerced via .upper() into an accepted APPROVED. Each of these was
ACCEPTED before the exact-enum fix."""
for state in (
"approved", "Approved", "ApProVeD", "APPROVED ", " APPROVED",
"approved\n", "\tAPPROVED",
):
with self.subTest(state=state):
self.assertFalse(
is_genuine_approval(_review(state=state), headsha=HEAD),
f"case-coerced/padded state {state!r} must NOT count as "
"a genuine approval",
)
def test_rejects_non_official_approval(self):
"""Comment-based / non-official 'APPROVED' is REJECTED.
PM: 'reject comment-based / non-official reviews'."""
self.assertFalse(
is_genuine_approval(
_review(state="APPROVED", official=False), headsha=HEAD
)
)
def test_rejects_dismissed_approval(self):
self.assertFalse(
is_genuine_approval(
_review(state="APPROVED", dismissed=True), headsha=HEAD
)
)
def test_rejects_stale_head_approval(self):
"""commit_id != head is REJECTED. Stale-on-old-head approvals cannot
count, even if they were official and not dismissed."""
self.assertFalse(
is_genuine_approval(
_review(state="APPROVED", commit_id=OTHER_HEAD), headsha=HEAD
)
)
def test_rejects_missing_commit_id_approval(self):
"""FAIL-CLOSED #3: the SEV-1 case. A APPROVED review with NO
commit_id is the spoof-bug signature. Reject."""
for bad in [None, "", 0, False]:
with self.subTest(commit_id=bad):
self.assertFalse(
is_genuine_approval(
_review(state="APPROVED", commit_id=bad), headsha=HEAD
),
f"missing commit_id={bad!r} must reject",
)
def test_reviewer_set_filters_users(self):
self.assertTrue(
is_genuine_approval(
_review(user="alice"),
headsha=HEAD,
reviewer_set={"alice", "bob"},
)
)
self.assertFalse(
is_genuine_approval(
_review(user="carol"),
headsha=HEAD,
reviewer_set={"alice", "bob"},
)
)
def test_reviewer_set_none_skips_check(self):
# None means "no team filter at this layer" (e.g., review-check.sh
# applies its own team-membership probe separately).
self.assertTrue(
is_genuine_approval(
_review(user="anyone"),
headsha=HEAD,
reviewer_set=None,
)
)
# ---------------------------------------------------------------------------
# is_open_request_changes
# ---------------------------------------------------------------------------
class IsOpenRequestChangesContract(unittest.TestCase):
def test_accepts_canonical_request_changes(self):
self.assertTrue(
is_open_request_changes(
_review(state="REQUEST_CHANGES"), headsha=HEAD
)
)
def test_rejects_non_request_changes_states(self):
for state in ("APPROVED", "COMMENT", "PENDING", "DISMISSED"):
with self.subTest(state=state):
self.assertFalse(
is_open_request_changes(
_review(state=state), headsha=HEAD
)
)
def test_rejects_case_coerced_request_changes_states(self):
"""EXACT-ENUM fail-closed: a lowercase/mixed-case "request_changes"
must NOT be coerced into an open-block match. Before the exact-enum
fix, .upper() accepted these as REQUEST_CHANGES."""
for state in (
"request_changes", "Request_Changes", "REQUEST_CHANGES ",
" REQUEST_CHANGES", "request_changes\n",
):
with self.subTest(state=state):
self.assertFalse(
is_open_request_changes(
_review(state=state), headsha=HEAD
),
f"case-coerced/padded state {state!r} must NOT count as "
"an open REQUEST_CHANGES",
)
def test_rejects_when_dismissed(self):
self.assertFalse(
is_open_request_changes(
_review(state="REQUEST_CHANGES", dismissed=True), headsha=HEAD
)
)
def test_rejects_when_stale_head(self):
self.assertFalse(
is_open_request_changes(
_review(state="REQUEST_CHANGES", commit_id=OTHER_HEAD),
headsha=HEAD,
)
)
def test_rejects_when_missing_commit_id(self):
for bad in [None, "", 0]:
with self.subTest(commit_id=bad):
self.assertFalse(
is_open_request_changes(
_review(state="REQUEST_CHANGES", commit_id=bad),
headsha=HEAD,
)
)
# ---------------------------------------------------------------------------
# classify_reviews — the merge-queue consumer
# ---------------------------------------------------------------------------
class ClassifyReviewsContract(unittest.TestCase):
def test_basic_approvers_and_request_changes(self):
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="bob", state="REQUEST_CHANGES", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, {"alice"})
self.assertEqual(request_changes, ["bob"])
def test_reviewer_set_filters_early(self):
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="carol", state="APPROVED", commit_id=HEAD),
]
approvers, _ = classify_reviews(
reviews, headsha=HEAD, reviewer_set={"alice"}
)
self.assertEqual(approvers, {"alice"})
def test_latest_review_per_user_wins(self):
# alice's REQUEST_CHANGES (latest) supersedes her earlier APPROVED.
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="alice", state="REQUEST_CHANGES", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertNotIn("alice", approvers)
self.assertIn("alice", request_changes)
def test_stale_head_approval_excluded(self):
reviews = [
_review(user="alice", state="APPROVED", commit_id=OTHER_HEAD),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, set())
def test_missing_commit_id_approval_excluded(self):
"""The SEV-1 fail-open surface. APPROVED + no commit_id → must NOT
count toward approvers, even with stale=False/dismissed=False."""
reviews = [
_review(user="alice", state="APPROVED", commit_id=None),
_review(user="bob", state="APPROVED", commit_id=""),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, set())
def test_dismissed_approval_excluded(self):
reviews = [
_review(user="alice", state="APPROVED", dismissed=True, commit_id=HEAD),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, set())
def test_non_official_approval_excluded(self):
reviews = [
_review(user="alice", state="APPROVED", official=False, commit_id=HEAD),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, set())
def test_comment_state_excluded(self):
reviews = [
_review(user="alice", state="COMMENT", commit_id=HEAD),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, set())
def test_case_coerced_approved_not_counted(self):
"""EXACT-ENUM via the reducer: a lowercase 'approved' (otherwise
valid official current-head row) must NOT be counted as an approver.
Before the fix, classify_reviews coerced it via .upper()."""
for state in ("approved", "Approved", "APPROVED "):
with self.subTest(state=state):
reviews = [
_review(user="alice", state=state, commit_id=HEAD),
]
approvers, request_changes = classify_reviews(
reviews, headsha=HEAD
)
self.assertEqual(approvers, set())
self.assertEqual(request_changes, [])
def test_case_coerced_request_changes_not_silently_dropped(self):
"""EXACT-ENUM via the reducer: a lowercase 'request_changes' must be
rejected (not coerced into a block). Crucially, it must NOT silently
erase a SAME-USER genuine current-head REQUEST_CHANGES posted
earlier — the case-variant later row is invalid and is ignored, so
the genuine block stands."""
reviews = [
_review(user="bob", state="REQUEST_CHANGES", commit_id=HEAD),
_review(user="bob", state="request_changes", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn("bob", request_changes)
self.assertNotIn("bob", approvers)
def test_stale_head_request_changes_excluded(self):
# A REQUEST_CHANGES on a previous head must NOT block the current head.
reviews = [
_review(user="bob", state="REQUEST_CHANGES", commit_id=OTHER_HEAD),
]
_, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(request_changes, [])
# -----------------------------------------------------------------
# VALIDATE-BEFORE-REDUCE regression tests (SEV-1 internal#812 follow-up).
#
# The bug: classify_reviews reduced to the LATEST row per user FIRST and
# validated AFTER. A later INVALID row (a COMMENT, or APPROVED/
# REQUEST_CHANGES with a null/old commit_id) from the same user could
# overwrite a genuine current-head review — masking an approval or
# ERASING a REQUEST_CHANGES block. The fix validates before the reduce,
# so an invalid later row is never eligible to be a user's "latest".
# -----------------------------------------------------------------
def test_genuine_approval_not_masked_by_later_comment(self):
"""A genuine current-head APPROVED followed by a LATER COMMENT from
the SAME user must STILL count as an approval. A later non-
APPROVED/RC row (COMMENT) must not erase the approval. This is the
reduce-before-validate masking bug."""
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="alice", state="COMMENT", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn("alice", approvers)
self.assertEqual(request_changes, [])
def test_genuine_approval_not_masked_by_later_null_commit_id(self):
"""A genuine current-head APPROVED followed by a LATER APPROVED with
a null commit_id (the spoof/invalid signature) from the SAME user
must STILL count. The invalid later row must be ignored, not allowed
to overwrite the valid earlier approval."""
for bad in [None, ""]:
with self.subTest(commit_id=bad):
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="alice", state="APPROVED", commit_id=bad),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertIn(
"alice", approvers,
f"later invalid commit_id={bad!r} must not mask the "
"genuine current-head approval",
)
def test_genuine_approval_not_masked_by_later_stale_commit_id(self):
"""A genuine current-head APPROVED followed by a LATER APPROVED on a
STALE (old) head from the SAME user must STILL count toward
approvers — the stale later row is invalid and must be ignored."""
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="alice", state="APPROVED", commit_id=OTHER_HEAD),
]
approvers, _ = classify_reviews(reviews, headsha=HEAD)
self.assertIn("alice", approvers)
def test_request_changes_not_erased_by_later_comment(self):
"""A genuine current-head REQUEST_CHANGES followed by a LATER COMMENT
from the SAME user must STILL block. The later invalid row must not
erase the REQUEST_CHANGES — this is the worse, silently-evaporating-
block variant of the bug."""
reviews = [
_review(user="bob", state="REQUEST_CHANGES", commit_id=HEAD),
_review(user="bob", state="COMMENT", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn("bob", request_changes)
self.assertNotIn("bob", approvers)
def test_request_changes_not_erased_by_later_null_commit_id(self):
"""A genuine current-head REQUEST_CHANGES followed by a LATER
REQUEST_CHANGES with a null/old commit_id from the SAME user must
STILL block. The invalid later row must be ignored, not allowed to
relocate the user's verdict off the current head."""
for bad in [None, "", OTHER_HEAD]:
with self.subTest(commit_id=bad):
reviews = [
_review(user="bob", state="REQUEST_CHANGES", commit_id=HEAD),
_review(user="bob", state="REQUEST_CHANGES", commit_id=bad),
]
_, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn(
"bob", request_changes,
f"later invalid commit_id={bad!r} must not erase the "
"genuine current-head REQUEST_CHANGES block",
)
def test_request_changes_not_erased_by_later_approved_invalid(self):
"""A genuine current-head REQUEST_CHANGES followed by a LATER
INVALID APPROVED (null commit_id) from the SAME user must STILL
block AND must NOT count the user as an approver. The invalid
approval must not flip a real block into a pass."""
reviews = [
_review(user="bob", state="REQUEST_CHANGES", commit_id=HEAD),
_review(user="bob", state="APPROVED", commit_id=None),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn("bob", request_changes)
self.assertNotIn("bob", approvers)
def test_genuine_request_changes_still_supersedes_genuine_approval(self):
"""Sanity: a genuine LATER current-head REQUEST_CHANGES still
supersedes an earlier genuine APPROVED from the same user (the
valid-row supersession we MUST preserve — only INVALID later rows
are ignored). Guards against an over-correction that ignores all
later rows."""
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="alice", state="REQUEST_CHANGES", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertNotIn("alice", approvers)
self.assertIn("alice", request_changes)
def test_genuine_approval_still_supersedes_genuine_request_changes(self):
"""Sanity: a genuine LATER current-head APPROVED supersedes an
earlier genuine REQUEST_CHANGES from the same user."""
reviews = [
_review(user="alice", state="REQUEST_CHANGES", commit_id=HEAD),
_review(user="alice", state="APPROVED", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertIn("alice", approvers)
self.assertEqual(request_changes, [])
def test_two_valid_approvers_plus_one_invalid_later_row(self):
"""Two distinct users with valid current-head approvals + a third
user whose ONLY genuine approval is followed by an invalid later
row → all three real approvers are counted; the invalid later row
does not drop the third user."""
reviews = [
_review(user="alice", state="APPROVED", commit_id=HEAD),
_review(user="bob", state="APPROVED", commit_id=HEAD),
_review(user="carol", state="APPROVED", commit_id=HEAD),
_review(user="carol", state="COMMENT", commit_id=HEAD),
]
approvers, request_changes = classify_reviews(reviews, headsha=HEAD)
self.assertEqual(approvers, {"alice", "bob", "carol"})
self.assertEqual(request_changes, [])
# ---------------------------------------------------------------------------
# Mutation-resistance smoke checks
#
# These tests document the mutations a reviewer would have to apply to
# weaken the gate. They are not synthetic; they verify that the
# predicate is structured so each known-softening mutation would also
# fail at least one other test in this file. We can't actually mutate
# the source in CI, but these tests are explicit about the mutations
# that would slip through, and the suite is dense enough that any
# loosening of the predicate will fail multiple cases.
# ---------------------------------------------------------------------------
class MutationResistance(unittest.TestCase):
def test_documented_mutation_remove_commit_id_check_fails(self):
"""If a reviewer removes the commit_id check (e.g., reverts to
the pre-fix `if isinstance(commit_id, str) and commit_id and
headsha:` guard, or replaces `commit_id != headsha` with True),
the missing-commit_id tests above (test_rejects_when_commit_id_missing
in IsOfficialCurrentHeadFailClosed, test_rejects_missing_commit_id_approval
in IsGenuineApprovalContract, test_missing_commit_id_approval_excluded
in ClassifyReviewsContract) would all fail. The reviewer would have
to weaken all three test categories to slip the SEV-1 surface in."""
# Sanity: every missing-commit_id case is a False today.
for bad in [None, "", 0, False]:
with self.subTest(commit_id=bad):
self.assertFalse(
is_official_current_head(_review(commit_id=bad), HEAD)
)
self.assertFalse(
is_genuine_approval(
_review(commit_id=bad), headsha=HEAD
)
)
def test_documented_mutation_change_neq_to_eq_fails(self):
"""If a reviewer changes `commit_id != headsha` to `commit_id == headsha`
in the head comparison (inverting the check), the stale-head tests
(test_rejects_when_commit_id_stale, test_stale_head_approval_excluded)
would fail because the wrong head would now match."""
self.assertFalse(
is_official_current_head(_review(commit_id=OTHER_HEAD), HEAD)
)
def test_documented_mutation_drop_official_check_fails(self):
"""If a reviewer drops the `if not review.get('official')` check, the
non-official tests (test_rejects_when_official_is_false,
test_rejects_non_official_approval, test_non_official_approval_excluded)
would all fail."""
self.assertFalse(
is_genuine_approval(
_review(state="APPROVED", official=False), headsha=HEAD
)
)
if __name__ == "__main__":
unittest.main()
@@ -50,15 +50,15 @@ class TestQaReviewDirectTrigger:
"pull_request_review must include 'submitted' type"
)
def test_job_guard_has_no_review_state_check(self):
def test_job_guard_requires_approved_state(self):
wf = load_workflow("qa-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state" not in guard, (
"job guard must NOT check review.state (#2159: Gitea 1.22.6 payload unreliable); "
"evaluator (review-check.sh) verifies actual APPROVE via API"
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
assert "github.event_name == 'pull_request_target'" in guard
assert "github.event_name == 'pull_request_review'" in guard
def test_post_step_uses_status_post_token(self):
wf = load_workflow("qa-review.yml")
@@ -91,15 +91,15 @@ class TestSecurityReviewDirectTrigger:
"pull_request_review must include 'submitted' type"
)
def test_job_guard_has_no_review_state_check(self):
def test_job_guard_requires_approved_state(self):
wf = load_workflow("security-review.yml")
guard = _job_guard_string(wf)
assert "github.event.review.state" not in guard, (
"job guard must NOT check review.state (#2159: Gitea 1.22.6 payload unreliable); "
"evaluator (review-check.sh) verifies actual APPROVE via API"
assert "github.event.review.state == 'APPROVED'" in guard, (
"job guard must check review.state for 'APPROVED'"
)
assert "github.event.review.state == 'approved'" in guard, (
"job guard must check review.state for 'approved' (case fallback per #2135)"
)
assert "github.event_name == 'pull_request_target'" in guard
assert "github.event_name == 'pull_request_review'" in guard
def test_post_step_uses_status_post_token(self):
wf = load_workflow("security-review.yml")
@@ -153,7 +153,7 @@ class TestRefireTokenSeparation:
"qa refire must receive STATUS_POST_TOKEN env var"
)
# Evaluator stays on read token
assert "SOP_CHECKLIST_GATE_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"qa refire evaluator must stay on read-scoped token"
)
@@ -163,6 +163,6 @@ class TestRefireTokenSeparation:
assert env.get("STATUS_POST_TOKEN") == "${{ secrets.STATUS_POST_TOKEN }}", (
"security refire must receive STATUS_POST_TOKEN env var"
)
assert "SOP_CHECKLIST_GATE_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
assert "SOP_TIER_CHECK_TOKEN" in env.get("GITEA_TOKEN", "") or "GITHUB_TOKEN" in env.get("GITEA_TOKEN", ""), (
"security refire evaluator must stay on read-scoped token"
)
+42 -63
View File
@@ -14,35 +14,35 @@ spec.loader.exec_module(mq)
def test_latest_statuses_dedupes_by_context_newest_first():
statuses = [
{"context": "CI / all-required (pull_request)", "status": "failure"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "state": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
{"context": "CI / all-required (pull_request)", "status": "success"},
]
latest = mq.latest_statuses_by_context(statuses)
assert latest["CI / all-required (pull_request)"]["status"] == "failure"
assert latest["sop-checklist / all-items-acked (pull_request_target)"]["state"] == "success"
assert latest["sop-checklist / all-items-acked (pull_request)"]["state"] == "success"
def test_required_contexts_green_rejects_missing_and_pending():
latest = mq.latest_statuses_by_context([
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "pending"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "pending"},
])
ok, missing_or_bad = mq.required_contexts_green(
latest,
[
"CI / all-required (pull_request)",
"sop-checklist / all-items-acked (pull_request_target)",
"qa-review / approved (pull_request_target)",
"sop-checklist / all-items-acked (pull_request)",
"qa-review / approved (pull_request)",
],
)
assert ok is False
assert missing_or_bad == [
"sop-checklist / all-items-acked (pull_request_target)=pending",
"qa-review / approved (pull_request_target)=missing",
"sop-checklist / all-items-acked (pull_request)=pending",
"qa-review / approved (pull_request)=missing",
]
@@ -56,7 +56,7 @@ def test_required_contexts_green_rejects_volume_skipped():
latest = mq.latest_statuses_by_context([
{"context": "CI / all-required (pull_request)", "status": "success"},
{
"context": "sop-checklist / all-items-acked (pull_request_target)",
"context": "sop-checklist / all-items-acked (pull_request)",
"status": "pending",
"description": "[volume-skipped] comment-cap=1000 hit; please file ...",
},
@@ -66,12 +66,12 @@ def test_required_contexts_green_rejects_volume_skipped():
latest,
[
"CI / all-required (pull_request)",
"sop-checklist / all-items-acked (pull_request_target)",
"sop-checklist / all-items-acked (pull_request)",
],
)
assert ok is False
assert "sop-checklist / all-items-acked (pull_request_target)=pending" in missing_or_bad
assert "sop-checklist / all-items-acked (pull_request)=pending" in missing_or_bad
def test_choose_next_pr_sorts_by_queue_label_timestamp_then_number():
@@ -129,16 +129,16 @@ def _ready_kwargs(**overrides):
"state": "success",
"statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
],
},
required_contexts=[
"CI / all-required (pull_request)",
"qa-review / approved (pull_request_target)",
"security-review / approved (pull_request_target)",
"sop-checklist / all-items-acked (pull_request_target)",
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
],
required_approvals=2,
approvers={"agent-reviewer-cr2", "agent-researcher"},
@@ -248,7 +248,7 @@ def test_genuine_approvals_counts_two_distinct_on_current_head():
{"state": "APPROVED", "user": {"login": "agent-reviewer-cr2"},
"official": True, "stale": False, "dismissed": False, "commit_id": "HEAD"},
]
approvers, rc = mq.genuine_approvals(reviews, headsha="HEAD", reviewer_set=REVIEWERS)
approvers, rc = mq.genuine_approvals(reviews, head_sha="HEAD", reviewer_set=REVIEWERS)
assert approvers == {"agent-researcher", "agent-reviewer-cr2"}
assert rc == []
@@ -265,7 +265,7 @@ def test_genuine_approvals_ignores_stale_dismissed_and_wrong_head():
{"state": "APPROVED", "user": {"login": "agent-reviewer"},
"official": True, "stale": False, "dismissed": False, "commit_id": "OLD"},
]
approvers, rc = mq.genuine_approvals(reviews, headsha="HEAD", reviewer_set=REVIEWERS)
approvers, rc = mq.genuine_approvals(reviews, head_sha="HEAD", reviewer_set=REVIEWERS)
assert approvers == set()
assert rc == []
@@ -279,7 +279,7 @@ def test_genuine_approvals_ignores_unofficial_and_outsiders():
{"state": "APPROVED", "user": {"login": "hongming-codex-laptop"},
"official": True, "stale": False, "dismissed": False, "commit_id": "HEAD"},
]
approvers, rc = mq.genuine_approvals(reviews, headsha="HEAD", reviewer_set=REVIEWERS)
approvers, rc = mq.genuine_approvals(reviews, head_sha="HEAD", reviewer_set=REVIEWERS)
assert approvers == set()
@@ -291,7 +291,7 @@ def test_genuine_approvals_latest_review_supersedes_earlier():
{"state": "REQUEST_CHANGES", "user": {"login": "agent-reviewer-cr2"},
"official": True, "stale": False, "dismissed": False, "commit_id": "HEAD"},
]
approvers, rc = mq.genuine_approvals(reviews, headsha="HEAD", reviewer_set=REVIEWERS)
approvers, rc = mq.genuine_approvals(reviews, head_sha="HEAD", reviewer_set=REVIEWERS)
assert approvers == set()
assert rc == ["agent-reviewer-cr2"]
@@ -317,26 +317,6 @@ def test_merge_blocked_when_insufficient_genuine_approvals():
def test_governance_red_blocks_merge():
# Uniform gate: qa-review, security-review, sop-checklist are ALWAYS
# required. If any of them fail/pending, the PR is blocked.
pr_status = {
"state": "failure",
"statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "failure"},
{"context": "security-review / approved (pull_request_target)", "status": "pending"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "failure"},
{"context": "Staging SaaS / e2e (pull_request)", "status": "failure"},
],
}
decision = mq.evaluate_merge_readiness(**_ready_kwargs(pr_status=pr_status))
assert decision.ready is False
assert decision.action == "wait"
assert "required contexts not green" in decision.reason
def test_non_required_red_does_not_block_merge():
# Uniform gate flip (CTO #2407): qa-review, security-review, sop-checklist
# are REQUIRED for ALL PRs. A PR with these failing/pending must NOT be
# force-mergeable, even if BP-required CI is green and approvals are genuine.
pr_status = {
"state": "failure",
"statuses": [
@@ -351,7 +331,6 @@ def test_non_required_red_does_not_block_merge():
assert decision.ready is False
assert decision.action == "wait"
assert "required contexts not green" in decision.reason
assert decision.force is False
def test_non_required_advisory_red_does_not_block_merge():
@@ -361,9 +340,9 @@ def test_non_required_advisory_red_does_not_block_merge():
"state": "failure", # combined polluted by advisory non-required reds
"statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
{"context": "Staging SaaS / e2e (pull_request)", "status": "failure"},
],
}
@@ -471,9 +450,9 @@ def test_process_once_holds_pr_on_permanent_merge_error(monkeypatch):
return {"state": "success", "statuses": [{"context": "CI / all-required (push)", "status": "success"}]}
return {"state": "success", "statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
@@ -544,9 +523,9 @@ def _fully_ready_process_once_monkeypatch(monkeypatch, mergeable, calls):
return {"state": "success", "statuses": [{"context": "CI / all-required (push)", "status": "success"}]}
return {"state": "success", "statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
@@ -955,9 +934,9 @@ def _stale_pr_update_409_monkeypatch(monkeypatch, queued_issues, calls):
return {"state": "success", "statuses": [{"context": "CI / all-required (push)", "status": "success"}]}
return {"state": "success", "statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
@@ -1203,7 +1182,7 @@ def test_list_candidate_issues_omits_label_filter_when_auto_discover(monkeypatch
assert captured["query"].get("type") == "pulls"
mq.list_candidate_issues(auto_discover=False)
assert captured["query"].get("label") == "merge-queue"
assert captured["query"].get("labels") == "merge-queue"
def _wire_ready_process_once(monkeypatch, *, issues, pr_payload, calls):
@@ -1232,9 +1211,9 @@ def _wire_ready_process_once(monkeypatch, *, issues, pr_payload, calls):
]}
return {"state": "success", "statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
monkeypatch.setattr(mq, "list_candidate_issues", lambda *, auto_discover: issues)
@@ -1420,9 +1399,9 @@ def _wire_multi_candidate_process_once(monkeypatch, *, issues, pulls, reviews, c
return {"state": "success", "statuses": [{"context": "CI / all-required (push)", "status": "success"}]}
return {"state": "success", "statuses": [
{"context": "CI / all-required (pull_request)", "status": "success"},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
@@ -1557,9 +1536,9 @@ def test_hol_unready_red_required_ci_is_skipped_for_ready_pr(monkeypatch):
return {"state": state,
"statuses": [
{"context": "CI / all-required (pull_request)", "status": state},
{"context": "qa-review / approved (pull_request_target)", "status": "success"},
{"context": "security-review / approved (pull_request_target)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request_target)", "status": "success"},
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
]}
monkeypatch.setattr(mq, "get_combined_status", fake_combined)
@@ -35,33 +35,11 @@ if grep -q '_is_tier_low_pending_ok' .gitea/scripts/gitea-merge-queue.py; then
fi
# 5. No sop-tier-check context references in workflow YAML
if grep -rI --exclude-dir='__pycache__' 'sop-tier-check' .gitea/workflows/; then
if grep -r 'sop-tier-check' .gitea/workflows/; then
echo "FAIL: sop-tier-check context reappeared in workflows" >&2
fail=1
fi
# 6. No SOP_TIER_CHECK_TOKEN references in workflow YAML or scripts
if grep -rI --exclude-dir='__pycache__' --exclude='test_no_tier_regression.sh' 'SOP_TIER_CHECK_TOKEN' .gitea/workflows/ .gitea/scripts/; then
echo "FAIL: SOP_TIER_CHECK_TOKEN reference reappeared (use SOP_CHECKLIST_GATE_TOKEN)" >&2
fail=1
fi
# 7. qa-review and security-review must have labeled/unlabeled triggers (#2139)
for f in .gitea/workflows/qa-review.yml .gitea/workflows/security-review.yml; do
if ! grep -q 'labeled, unlabeled' "$f"; then
echo "FAIL: $f missing labeled/unlabeled triggers (#2139)" >&2
fail=1
fi
done
# 8. qa-review and security-review must NOT have review.state guard (#2159)
for f in .gitea/workflows/qa-review.yml .gitea/workflows/security-review.yml; do
if grep -q 'github.event.review.state' "$f"; then
echo "FAIL: $f has review.state guard reappeared (#2159)" >&2
fail=1
fi
done
if [ "$fail" -eq 1 ]; then
echo "TIER_REGRESSION_DETECTED" >&2
exit 1
-21
View File
@@ -25,11 +25,6 @@
# T20 — ai-sop-ack APPROVED review excluded from security-review gate
# T21 — stale-head APPROVED review → exit 1 (commit_id mismatch)
# T22 — missing/non-official APPROVED review → exit 1 (official != true)
# T23 — missing-commit_id APPROVED review → exit 1 (SEV-1 internal#812
# fail-closed contract: a missing/empty commit_id is REJECTED, not
# silently accepted as "older Gitea row" the way the pre-fix
# gitea-merge-queue.py did. Closes the spoof-bug surface that
# #843 had.)
#
# Hostile-self-review (per feedback_assert_exact_not_substring):
# this test MUST FAIL if the script is absent. Verified by running
@@ -432,22 +427,6 @@ T22_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T22 exit code 1 (missing official rejected)" "1" "$T22_RC"
assert_contains "T22 no candidates error" "no candidates from reviews API or issue comments" "$T22_OUT"
# T23 — missing-commit_id APPROVED review must be rejected.
# SEV-1 internal#812 (supersedes closed internal#843). A review with NO
# commit_id field is the spoof-bug signature: a real reviewer cannot
# have submitted against a commit that doesn't exist. The fail-closed
# SSOT must REJECT — the pre-fix gitea-merge-queue.py silently accepted
# these (the "older Gitea row" escape hatch), which is the exact surface
# that closed #843 had. The Python unit tests in
# test_approval_validator.py cover the predicate at the unit level;
# this T23 covers the bash + jq pipeline end-to-end.
echo
echo "== T23 missing commit_id APPROVED review rejected (SEV-1 fail-closed) =="
T23_OUT=$(run_review_check "T23_missing_commit_id")
T23_RC=$(cat "$FIX_STATE_DIR/last_rc")
assert_eq "T23 exit code 1 (missing commit_id rejected)" "1" "$T23_RC"
assert_contains "T23 no candidates error" "no candidates from reviews API or issue comments" "$T23_OUT"
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL"
+1 -5
View File
@@ -149,11 +149,7 @@ items:
- slug: memory-consulted
numeric_alias: 7
# #1973: normalize marker so it matches the slug. Previously the
# slash produced a checklist status that never resolved because
# normalize_slug() collapses / to - and the Gitea PR body parser
# would not find the expected heading.
pr_section_marker: "Memory consulted"
pr_section_marker: "Memory/saved-feedback consulted"
required_teams: [engineers]
ai_ack_eligible: true
description: >-
+1 -1
View File
@@ -42,7 +42,7 @@ jobs:
- name: Detect force-merge + emit audit event
env:
# Same org-level secret the sop-checklist workflow uses.
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
+1 -1
View File
@@ -81,7 +81,7 @@ jobs:
# Gitea persona whose ONLY job is reading branch_protections
# and posting the [ci-drift] tracking issue. The endpoint
# `GET /repos/.../branch_protections/{branch}` requires
# repo-ADMIN role (Gitea 1.22.6) — the default GITHUB_TOKEN and the
# repo-ADMIN role (Gitea 1.22.6) — SOP_TIER_CHECK_TOKEN and the
# auto-injected GITHUB_TOKEN do NOT have it (read-only / write
# without admin), so the previous fallback chain 403'd.
# Mirrors the controlplane fix landed in CP PR#134.
-10
View File
@@ -148,11 +148,6 @@ jobs:
run: $(go env GOPATH)/bin/golangci-lint run --timeout 3m ./...
- if: ${{ needs.changes.outputs.platform == 'true' }}
name: Diagnostic — per-package verbose 60s
# DIAGNOSTIC ONLY (continue-on-error below): this step exists to dump
# verbose per-package output for triage, NOT to gate. The blocking gate
# is "Run tests with coverage (blocking gate)" immediately below. The
# `set +e` / swallowed exits here are intentional — do not "fix" them
# like a gate; the real gate is the next step.
run: |
set +e
go test -race -v -timeout 60s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
@@ -314,11 +309,6 @@ jobs:
# #1815 — wires coverage into CI so we get a baseline visible on
# every PR. No threshold gate yet; thresholds dial in (Step 3, also
# tracked in #1815) after the team sees what current coverage is.
# Memory: the full vitest+v8-coverage process tree peaks at ~1.33 GB
# (measured 2026-06-08), comfortably within the runner — so this single
# run is BOTH the pass/fail gate and the coverage artifact (one SSOT, no
# split). The earlier intermittent red here was a DisplayTab paste-race
# (fixed in this PR), NOT a coverage OOM.
run: npx vitest run --coverage
- name: Upload coverage summary as artifact
if: ${{ needs.changes.outputs.canvas == 'true' }}
-3
View File
@@ -429,9 +429,6 @@ jobs:
# round-trip is covered by the priority-runtimes `mock` arm, not here.
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_keyless_feature_contracts_e2e.sh
- name: Run user_tasks E2E (REST + MCP — agent→user action requests)
if: needs.detect-changes.outputs.api == 'true'
run: bash tests/e2e/test_user_tasks_e2e.sh
- name: Run secrets-dispatch contract test (keyless SECRETS_JSON branch order)
# Previously orphaned (no workflow referenced it). Hermetic unit-style
# contract over test_staging_full_saas.sh's LLM-key branch precedence —
-352
View File
@@ -54,13 +54,6 @@ on:
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- 'tests/e2e/test_staging_concierge_e2e.sh'
- 'tests/e2e/test_staging_concierge_creates_workspace_e2e.sh'
- 'workspace-server/internal/staginge2e/**'
- 'workspace-server/internal/handlers/platform_agent.go'
- 'workspace-server/internal/handlers/user_tasks.go'
- 'workspace-server/internal/handlers/llm_billing_mode_handler.go'
- 'workspace-server/internal/handlers/discovery.go'
- '.gitea/workflows/e2e-staging-saas.yml'
pull_request:
branches: [main]
@@ -76,13 +69,6 @@ on:
- 'tests/e2e/lib/model_slug.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/test_aws_leak_check.sh'
- 'tests/e2e/test_staging_concierge_e2e.sh'
- 'tests/e2e/test_staging_concierge_creates_workspace_e2e.sh'
- 'workspace-server/internal/staginge2e/**'
- 'workspace-server/internal/handlers/platform_agent.go'
- 'workspace-server/internal/handlers/user_tasks.go'
- 'workspace-server/internal/handlers/llm_billing_mode_handler.go'
- 'workspace-server/internal/handlers/discovery.go'
- '.gitea/workflows/e2e-staging-saas.yml'
workflow_dispatch:
schedule:
@@ -510,341 +496,3 @@ jobs:
echo "::warning::platform-boot teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
# ── CONCIERGE user_tasks PRIMITIVE (Feature 3) — real-staging REST+MCP+authz ──
#
# Drives tests/e2e/test_staging_concierge_e2e.sh against a fresh throwaway
# tenant: the full agent→user "ask" contract over BOTH surfaces (REST +
# the MCP tools/call envelope a canvas concierge agent uses) PLUS the
# cross-workspace authz scoping (ws-B can't touch ws-A's task). Reuses the
# same CP-admin org-provision/teardown scaffolding + _lib.sh + AWS-leak-check
# lib as the full-SaaS harness (the script SOURCEs them — no duplication).
#
# GATING (no continue-on-error): user_tasks is a pure DB/handler primitive
# with NO LLM container dependency (workspaces are created 'external' — row
# only, no EC2), so this is fast (~provision + TLS, no 10-min cold boot) and
# NOT subject to the cp#245 boot-timeout flake the full-SaaS job carries. It
# therefore has no honest reason to be masked. Runs on push-to-main /
# workflow_dispatch / cron only (needs live staging infra — never on PR, where
# the pr-validate job above already posts the workflow's PR status).
# bp-required: pending #2430
e2e-staging-concierge-user-tasks:
name: E2E Staging Concierge user_tasks
runs-on: ubuntu-latest
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule'
timeout-minutes: 30
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
- name: Verify admin token + AWS creds present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — EC2 leak verification cannot run"
exit 2
fi
done
echo "Admin token + AWS creds present ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a workspace bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run concierge user_tasks E2E
run: bash tests/e2e/test_staging_concierge_e2e.sh
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
# Sweep any e2e-cncrg-YYYYMMDD-<run_id>-* org this run created if the
# script died before its EXIT trap fired. Run-id scoped so it never
# stomps a concurrent run's fresh tenant (see the saas job's note).
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
if run_id:
prefixes = tuple(f'e2e-cncrg-{d}-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-cncrg-{d}-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/cncrg-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/cncrg-cleanup.code
set -e
code=$(cat /tmp/cncrg-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::concierge teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/cncrg-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::concierge teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
# ── CONCIERGE FUNCTIONAL: it ACTUALLY CREATES A WORKSPACE (real-LLM) ─────────
#
# Drives tests/e2e/test_staging_concierge_creates_workspace_e2e.sh — the
# RFC docs/design/rfc-platform-agent.md §11.4 "Reach" check turned into a gate:
# send the org concierge a natural-language A2A message ("create a workspace
# named e2e-cncrg-worker-<runid> with role engineer") and assert the
# DETERMINISTIC SIDE EFFECT — that named workspace now EXISTS in GET /workspaces
# — which can only happen if the concierge's LLM really invoked the
# create_workspace platform-MCP tool (a real org mutation), NOT just that a REST
# API returned 200.
#
# GATING (no continue-on-error), but FALSE-GREEN-PROOF via E2E_REQUIRE_LIVE=1:
# this is a REAL-LLM, REAL-tool test, so it depends on the concierge being
# provisioned on the DEDICATED platform-agent image (Dockerfile.platform-agent,
# ships /opt/molecule-mcp-server — the ONLY image where create_workspace lights
# up; see platform_agent.go's SELF-HOST CAVEAT). A parallel agent is wiring that
# image into the staging provision path. The script SKIPs LOUD when the
# concierge is absent / not online / not on the platform-agent image — but with
# E2E_REQUIRE_LIVE=1 the harness converts that skip into a HARD FAIL (exit 5) so
# a silently-missing platform-agent image can NEVER false-green this gate. Runs
# on push-to-main / workflow_dispatch / cron only (needs live staging infra +
# a model — never on PR, where pr-validate posts the workflow's PR status).
# bp-required: pending #2430
e2e-staging-concierge-creates-workspace:
name: E2E Staging Concierge Creates Workspace
runs-on: ubuntu-latest
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule'
timeout-minutes: 45
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
# The concierge is platform_managed on SaaS (the CP-exported LLM proxy
# supplies its model — no BYOK key needed for the concierge itself). The
# MiniMax key is wired anyway so a staging image that boots the concierge
# BYOK-MiniMax (parallel-agent image work) still has a model; harmless when
# the concierge is platform-managed.
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
# False-green guard: a concierge that is absent / not on the platform-agent
# image / never online must FAIL this gate (exit 5), not silently skip.
E2E_REQUIRE_LIVE: '1'
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
- name: Verify admin token + AWS creds present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — EC2 leak verification cannot run"
exit 2
fi
done
echo "Admin token + AWS creds present ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a workspace bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run concierge-creates-workspace functional E2E
run: bash tests/e2e/test_staging_concierge_creates_workspace_e2e.sh
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
# Sweep any e2e-cncrg-mk-YYYYMMDD-<run_id>-* org this run created if the
# script died before its EXIT trap fired. Run-id scoped so it never
# stomps a concurrent run's fresh tenant.
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
if run_id:
prefixes = tuple(f'e2e-cncrg-mk-{d}-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-cncrg-mk-{d}-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/cncrg-mk-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/cncrg-mk-cleanup.code
set -e
code=$(cat /tmp/cncrg-mk-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::concierge-mk teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/cncrg-mk-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::concierge-mk teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
# ── CONCIERGE / PLATFORM-AGENT Go staginge2e (Features 1,2,4,5,6) ────────────
#
# Drives TestConciergePlatformAgent_Staging (workspace-server/internal/
# staginge2e/concierge_platform_test.go), which REUSES the lifecycle suite's
# harness (requireStagingEnv / adminCreateOrg / tenantAdminToken /
# tenantCreateWorkspace / doTenantJSON / jsonField) to assert, against a real
# tenant: platform-agent install + /org/identity (1), kind on the workspace
# API (2), discovery peers admin-auth regression guard (4), BYOK billing-mode
# round-trip (5), and the concierge config-tab auth sweep (6). It asserts
# OBSERVABLE state (sole root re-parenting, kind discriminator, resolved_mode,
# non-401 tabs) — not just HTTP 200.
#
# Two jobs, mirroring e2e-workspace-lifecycle.yml's honest pattern:
# • concierge-compile-skip (every push/PR/dispatch): proves the staginge2e
# suite still COMPILES under -tags=staging_e2e and SKIPs LOUD without
# creds. GATING (no mask) — a broken test file fails at PR time.
# • concierge-staging (push-to-main/dispatch/cron): the real live run with
# staging creds + t.Cleanup teardown.
# bp-exempt: PR-time compile-only check (build the concierge e2e test, then
# skip execution — no staging creds on PR). pr-validate posts the workflow's
# PR status; this job is not itself a branch-protection gate.
e2e-staging-concierge-compile-skip:
name: E2E Staging Concierge (compile+skip)
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: go vet (staging_e2e tag)
working-directory: workspace-server
run: go vet -tags staging_e2e ./internal/staginge2e/...
- name: Compile + skip-run (must SKIP LOUD without STAGING_E2E)
working-directory: workspace-server
run: |
# No STAGING_E2E / creds → the suite MUST skip (not pass-with-zero-
# assertions). go test exit 0 with a SKIP line is the contract.
out=$(go test -tags staging_e2e ./internal/staginge2e/ -run TestConciergePlatformAgent -count=1 -v 2>&1)
echo "$out"
echo "$out" | grep -q "SKIP: TestConciergePlatformAgent_Staging" \
|| { echo "::error::expected a LOUD skip of TestConciergePlatformAgent_Staging without creds"; exit 1; }
# bp-required: pending #2430
e2e-staging-concierge-platform:
name: E2E Staging Concierge Platform Agent
runs-on: ubuntu-latest
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule'
timeout-minutes: 40
permissions:
contents: read
env:
CP_BASE_URL: https://staging-api.moleculesai.app
CP_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
STAGING_E2E: '1'
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Verify admin token present
run: |
if [ -z "$CP_ADMIN_API_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$CP_BASE_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (HTTP $code) — infra, not a concierge bug."
exit 1
fi
echo "Staging CP healthy"
- name: Run concierge/platform-agent staginge2e
working-directory: workspace-server
run: go test -tags staging_e2e ./internal/staginge2e/ -run TestConciergePlatformAgent_Staging -count=1 -v -timeout 35m
# Teardown: the test installs a t.Cleanup admin-DELETE of its own tenant
# (e2e-cncrg-* slug), running even on a t.Fatal. The age-guarded
# sweep-stale-e2e-orgs workflow (30-min floor, e2e- prefix) is the final
# net for a tenant orphaned by a hard runner cancel.
+2 -2
View File
@@ -82,7 +82,7 @@ jobs:
- name: Run gate-check-v3 (single PR mode)
if: github.event_name == 'pull_request_target' || github.event.inputs.pr_number != ''
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.inputs.pr_number }}
POST_COMMENT: ${{ github.event.inputs.post_comment || 'true' }}
@@ -97,7 +97,7 @@ jobs:
- name: Run gate-check-v3 (all open PRs — cron mode)
if: github.event_name == 'schedule'
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
REPO: ${{ github.repository }}
run: |
@@ -244,12 +244,7 @@ jobs:
# fail if any didn't land — that would be a real regression we
# want loud.
# workspace_schedules added for the #2149 scheduler integration tests.
# workspace_auth_tokens + org_api_tokens added for the #2156
# registry-auth TestIntegration_ suite (#2148). Without this
# guard, a silently-skipped migration 020 (workspace_auth_tokens)
# or 035 (org_api_tokens) would let the auth tests run against
# missing tables and falsely green.
for tbl in delegations workspaces activity_logs pending_uploads workspace_schedules workspace_auth_tokens org_api_tokens; do
for tbl in delegations workspaces activity_logs pending_uploads workspace_schedules; do
if ! psql -h "${PG_HOST}" -U postgres -d molecule -tA \
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
| grep -q 1; then
+1 -1
View File
@@ -19,7 +19,7 @@
# Forward-compat scope:
# Today (2026-05-11) molecule-core/main protects 3 contexts:
# - "Secret scan / Scan diff for credential-shaped strings (pull_request)"
# - "sop-checklist / all-items-acked (pull_request)"
# - "sop-checklist / tier-check (pull_request)"
# - "CI / all-required (pull_request)"
# Per RFC#324 Step 2 the required-list expands to ~5 contexts
# (qa-review, security-review added). Each new required context's
-395
View File
@@ -1,395 +0,0 @@
name: Local Provision Lifecycle E2E
# MANDATORY coverage for the LOCAL Docker provisioner (MOLECULE_ENV=development,
# docker.sock) — the path self-hosters + dev runs use. Every OTHER e2e exercises
# the SaaS/EC2 (control-plane) provisioner; nothing mandatory drove the local
# Docker path, which is why a config-volume restart-survival bug went undetected.
# This workflow provisions a REAL workspace via the local Docker provisioner and
# asserts the full lifecycle, INCLUDING the restart-survival assertion.
#
# Two jobs:
# * lifecycle-stub (REQUIRED gate) — builds the tiny stub runtime image, tags
# it to the provisioner's RegistryModeLocal cache tag, and runs the full
# lifecycle e2e (provision -> online -> restart-survive -> proxy-reach). Fast
# (seconds of agent boot, no LLM, no 2.5GB image).
# * lifecycle-real (ADVISORY, continue-on-error) — runs the SAME script against
# the real claude-code template image with a REAL MiniMax BYOK credential
# (LIFECYCLE_LLM=minimax). The proxy-reach step asserts an ACTUAL model reply
# (real round-trip through the ws-<id>:8000 proxy), not just reachability.
# MiniMax is the cheapest LLM the platform offers, and its `minimax` provider
# dials api.minimax.io directly (no CP proxy needed on this local stack).
# Heavy + network-dependent (pulls/builds the template + a real LLM call), so
# it is non-blocking. Needs the MOLECULE_STAGING_MINIMAX_API_KEY CI secret:
# when ABSENT the script SKIPS loud (exit 0) — it never reds on a missing
# secret (serving-e2e skip-if-absent pattern).
#
# SUBSTRATE REQUIREMENT (read before wiring into branch protection)
# -----------------------------------------------------------------
# This workflow provisions SIBLING docker containers from a HOST Go binary via
# the runner's docker.sock — exactly like e2e-api.yml, which already provisions
# the `mock` + `priority-runtimes` arms on `docker-host`. So the docker-in-runner
# capability IS available on the molecule-runner-* (docker-host) lane. If the
# operator ever moves these to a runner WITHOUT docker.sock access for the
# platform binary, this lane will red — keep it on `docker-host`.
#
# Both jobs pin `runs-on: docker-host` (Linux operator-host runners with the
# molecule-core-net bridge + a working docker.sock). The bare `ubuntu-latest`
# label is also advertised by the Windows act_runner, where docker.sock-bound
# steps fail non-deterministically — see lint-required-workflows-docker-host-
# pinned.yml + internal#512.
on:
push:
branches: [main, staging]
pull_request:
branches: [main, staging]
concurrency:
# Per-SHA grouping (mirrors e2e-api.yml). cancel-in-progress:false so a queued
# run for an older SHA isn't cancelled by a newer push (auto-promote brittleness).
group: local-provision-e2e-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# ===========================================================================
# REQUIRED gate — stub runtime, fast. This IS meant to be a required merge gate
# (the only mandatory coverage for the LOCAL Docker provisioner), but the new
# context is not yet in branch_protections/main — wire it in once the operator
# confirms the docker-host runners reliably provision sibling containers from
# the host platform binary for this lane (see SUBSTRATE REQUIREMENT above), then
# flip the directive below to `# bp-required: yes`. Until then it runs gating
# locally (continue-on-error: false) but un-wired in BP, an acknowledged
# asymmetry tracked for follow-up. (Earlier this block read `# bp-exempt`, which
# contradicted "REQUIRED gate" and tripped lint-required-context-exists-in-bp.)
# bp-required: pending #2409
# ===========================================================================
lifecycle-stub:
name: Local Provision Lifecycle E2E (stub)
runs-on: docker-host
continue-on-error: false
timeout-minutes: 15
env:
PG_CONTAINER: pg-lpe2e-${{ github.run_id }}-${{ github.run_attempt }}
REDIS_CONTAINER: redis-lpe2e-${{ github.run_id }}-${{ github.run_attempt }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Ensure provisioner network + pre-pull alpine
run: |
# The local provisioner attaches workspace containers to
# molecule-core-net and seeds /configs via an alpine helper; the
# lifecycle script also uses alpine to seed config.yaml into the
# named config volume. Pre-pull + ensure the bridge (idempotent).
docker pull alpine:3 >/dev/null
docker network create molecule-core-net >/dev/null 2>&1 || true
echo "alpine:3 pre-pulled; molecule-core-net ensured."
- name: Start Postgres (docker, ephemeral host port)
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker run -d --name "$PG_CONTAINER" \
-e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule \
-p 0:5432 postgres:16 >/dev/null
PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -z "$PG_PORT" ] && PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$PG_PORT" ]; then echo "::error::no host port for $PG_CONTAINER"; docker logs "$PG_CONTAINER" || true; exit 1; fi
echo "DATABASE_URL=postgres://dev:dev@127.0.0.1:${PG_PORT}/molecule?sslmode=disable" >> "$GITHUB_ENV"
for i in $(seq 1 30); do
docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1 && { echo "pg ready ${i}s"; exit 0; }
sleep 1
done
echo "::error::Postgres not ready in 30s"; docker logs "$PG_CONTAINER" || true; exit 1
- name: Start Redis (docker, ephemeral host port)
run: |
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
docker run -d --name "$REDIS_CONTAINER" -p 0:6379 redis:7 >/dev/null
REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -z "$REDIS_PORT" ] && REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$REDIS_PORT" ]; then echo "::error::no host port for $REDIS_CONTAINER"; docker logs "$REDIS_CONTAINER" || true; exit 1; fi
echo "REDIS_URL=redis://127.0.0.1:${REDIS_PORT}" >> "$GITHUB_ENV"
for i in $(seq 1 15); do
docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG && { echo "redis ready ${i}s"; exit 0; }
sleep 1
done
echo "::error::Redis not ready in 15s"; docker logs "$REDIS_CONTAINER" || true; exit 1
- name: Configure platform env (admin token + local Docker provisioner)
run: |
# Deterministic admin token: the script sends MOLECULE_ADMIN_TOKEN as the
# bearer; the platform checks ADMIN_TOKEN. Set both to the same value.
T="lpe2e-admin-${{ github.run_id }}-${{ github.run_attempt }}"
echo "ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "MOLECULE_ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:8080" >> "$GITHUB_ENV"
# MOLECULE_ENV=development: dev posture. MOLECULE_ORG_ID is left UNSET so
# main.go wires the LOCAL Docker provisioner (not the CP provisioner), and
# MOLECULE_IMAGE_REGISTRY is left UNSET so image resolution uses
# RegistryModeLocal (the dockerHasTag cache-check the stub pre-tags into).
echo "MOLECULE_ENV=development" >> "$GITHUB_ENV"
echo "SECRETS_ENCRYPTION_KEY=lpe2e-test-encryption-key-32bytes!!" >> "$GITHUB_ENV"
- name: Build platform
working-directory: workspace-server
run: go build -o platform-server ./cmd/server
- name: Kill stale platform-server before start (issue #1046)
run: |
# ROOT CAUSE of the stub-gate red on docker-host: both this gating job
# and the advisory lifecycle-real job bind the SAME fixed host port
# :8080 (PORT=8080 ./platform-server). On the small docker-host runner
# pool a prior cancelled/timeout run can leave a zombie platform-server
# on :8080 (a cancelled run never reaches "Stop platform"), and — until
# lifecycle-real was serialised behind this job via needs: — the two
# jobs could also co-schedule on one runner and contend for :8080. A
# second bind on :8080 is FATAL (the server exits), so "Wait for
# /health" times out at 300s and this REQUIRED gate reds. Free the port
# before binding — mirrors the e2e-api.yml #1046 fix for the identical
# fixed-port-on-shared-runner class.
#
# /proc scan — works on any Linux without pkill/lsof/ss. comm is
# truncated to 15 chars: "platform-serve" matches "platform-server".
# Verify via cmdline to avoid false positives.
killed=0
for pid in $(grep -l "platform-serve" /proc/[0-9]*/comm 2>/dev/null); do
kpid="${pid%/comm}"; kpid="${kpid##*/}"
cmdline=$(cat "/proc/${kpid}/cmdline" 2>/dev/null | tr '\0' ' ')
if echo "$cmdline" | grep -q "platform-server"; then
echo "Killing stale platform-server pid ${kpid}: ${cmdline}"
kill "$kpid" 2>/dev/null || true
killed=$((killed + 1))
fi
done
if [ "$killed" -gt 0 ]; then echo "Killed $killed stale platform-server process(es)."; else echo "No platform-server-named process found."; fi
# Belt-and-braces: also free :8080 from ANY holder regardless of process
# name. A differently-named squatter (e.g. a leftover Fastify dev server
# from another job) survives the comm-name scan above, makes our bind
# FATAL, and can false-positive the /health probe below (no-flakes RCA;
# tracked alongside #2430). fuser/lsof are present on the ubuntu runner;
# if neither exists the name-scan above is the floor.
if command -v fuser >/dev/null 2>&1; then fuser -k 8080/tcp 2>/dev/null || true; fi
if command -v lsof >/dev/null 2>&1; then lsof -ti tcp:8080 2>/dev/null | xargs -r kill -9 2>/dev/null || true; fi
sleep 2
echo ":8080 freed (comm-scan + port-scan swept any squatter)."
- name: Start platform (background)
working-directory: workspace-server
run: |
# Bind to :8080 (the script's BASE). DATABASE_URL/REDIS_URL/ADMIN_TOKEN/
# MOLECULE_ENV are inherited from $GITHUB_ENV.
PORT=8080 ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
run: |
DEADLINE=300; PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"; start=$(date +%s)
while :; do
# Verify OUR server owns :8080 BEFORE trusting /health. Our server binds
# :8080 or exits FATAL, so "our PID alive" <=> "we own :8080"; checking it
# first stops a squatter that answers /health on :8080 (our bind having
# failed) from false-positiving the gate (no-flakes RCA).
if [ -n "$PID" ] && ! kill -0 "$PID" 2>/dev/null; then
echo "::error::platform-server exited early (failed to bind :8080 or crashed)"; cat workspace-server/platform.log || true; exit 1
fi
if curl -sf "$BASE/health" >/dev/null; then
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc \
"SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'" 2>/dev/null || echo 0)
[ "$tables" = "1" ] && { echo "healthy + migrated after $(( $(date +%s) - start ))s"; exit 0; }
fi
[ "$(( $(date +%s) - start ))" -ge "$DEADLINE" ] && { echo "::error::platform not healthy in ${DEADLINE}s"; cat workspace-server/platform.log || true; exit 1; }
sleep 1
done
- name: Run local-provision lifecycle E2E (stub — REQUIRED)
run: bash tests/e2e/test_local_provision_lifecycle_e2e.sh
- name: Dump platform log on failure
if: failure()
run: cat workspace-server/platform.log || true
- name: Stop platform
if: always()
run: |
[ -f workspace-server/platform.pid ] && kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true
- name: Stop service containers
if: always()
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
# ===========================================================================
# ADVISORY — real claude-code image, lifecycle-only. Non-blocking. It pulls/
# builds the 2.5GB template image, makes a real (cheap) MiniMax LLM call, and is
# network-dependent, so a miss must not block. It proves the REAL runtime
# survives a restart AND serves a genuine LLM round-trip on the local
# provisioner (proxy-reach asserts a real MiniMax reply, not just reachability).
# ===========================================================================
# bp-exempt: advisory lane (continue-on-error: true) — informational, never a merge gate.
lifecycle-real:
name: Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory)
runs-on: docker-host
# Serialise behind the gating stub job: both jobs bind the SAME fixed host
# port :8080, so co-scheduling them on one docker-host runner makes the
# second platform-server fail to bind (fatal) and reds whichever lost the
# race. `needs:` forces this advisory job to start only AFTER lifecycle-stub
# finishes, so they never contend for :8080. continue-on-error keeps a real-
# job miss non-blocking; `needs:` does NOT gate on the stub's success (a
# failed required gate still lets this advisory dependent run).
needs: lifecycle-stub
if: ${{ always() }}
# Tracker for lint-continue-on-error-tracking (Tier 2e / internal#350): this
# mask has a forced 14-day renewal cycle. mc#2408 tracks promoting this
# advisory MiniMax round-trip to a gating job (then flip to false).
continue-on-error: true # mc#2408 — promote advisory MiniMax e2e to gating
timeout-minutes: 30
env:
PG_CONTAINER: pg-lpe2e-real-${{ github.run_id }}-${{ github.run_attempt }}
REDIS_CONTAINER: redis-lpe2e-real-${{ github.run_id }}-${{ github.run_attempt }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
cache: true
cache-dependency-path: workspace-server/go.sum
- name: Ensure provisioner network + pre-pull alpine
run: |
docker pull alpine:3 >/dev/null
docker network create molecule-core-net >/dev/null 2>&1 || true
- name: Start Postgres (docker, ephemeral host port)
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker run -d --name "$PG_CONTAINER" \
-e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule \
-p 0:5432 postgres:16 >/dev/null
PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -z "$PG_PORT" ] && PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$PG_PORT" ]; then echo "::error::no host port"; docker logs "$PG_CONTAINER" || true; exit 1; fi
echo "DATABASE_URL=postgres://dev:dev@127.0.0.1:${PG_PORT}/molecule?sslmode=disable" >> "$GITHUB_ENV"
for i in $(seq 1 30); do
docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1 && { echo "pg ready ${i}s"; exit 0; }
sleep 1
done
echo "::error::Postgres not ready"; docker logs "$PG_CONTAINER" || true; exit 1
- name: Start Redis (docker, ephemeral host port)
run: |
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
docker run -d --name "$REDIS_CONTAINER" -p 0:6379 redis:7 >/dev/null
REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}')
[ -z "$REDIS_PORT" ] && REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | head -1 | awk -F: '{print $NF}')
if [ -z "$REDIS_PORT" ]; then echo "::error::no host port"; docker logs "$REDIS_CONTAINER" || true; exit 1; fi
echo "REDIS_URL=redis://127.0.0.1:${REDIS_PORT}" >> "$GITHUB_ENV"
for i in $(seq 1 15); do
docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG && { echo "redis ready ${i}s"; exit 0; }
sleep 1
done
echo "::error::Redis not ready"; docker logs "$REDIS_CONTAINER" || true; exit 1
- name: Configure platform env
run: |
T="lpe2e-real-admin-${{ github.run_id }}-${{ github.run_attempt }}"
echo "ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "MOLECULE_ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:8080" >> "$GITHUB_ENV"
echo "MOLECULE_ENV=development" >> "$GITHUB_ENV"
echo "SECRETS_ENCRYPTION_KEY=lpe2e-test-encryption-key-32bytes!!" >> "$GITHUB_ENV"
- name: Build platform
working-directory: workspace-server
run: go build -o platform-server ./cmd/server
- name: Kill stale platform-server before start (issue #1046)
run: |
# Same fixed-:8080 hygiene as the stub job — free the port from any
# zombie left by a cancelled run before this job binds it.
killed=0
for pid in $(grep -l "platform-serve" /proc/[0-9]*/comm 2>/dev/null); do
kpid="${pid%/comm}"; kpid="${kpid##*/}"
cmdline=$(cat "/proc/${kpid}/cmdline" 2>/dev/null | tr '\0' ' ')
if echo "$cmdline" | grep -q "platform-server"; then
echo "Killing stale platform-server pid ${kpid}: ${cmdline}"
kill "$kpid" 2>/dev/null || true
killed=$((killed + 1))
fi
done
if [ "$killed" -gt 0 ]; then echo "Killed $killed stale platform-server process(es)."; else echo "No platform-server-named process found."; fi
# Belt-and-braces: free :8080 from ANY holder regardless of process name
# (a differently-named squatter survives the comm-name scan above, makes
# our bind FATAL, and can false-positive the /health probe). Mirrors the
# stub job's no-flakes fix (tracked alongside #2430).
if command -v fuser >/dev/null 2>&1; then fuser -k 8080/tcp 2>/dev/null || true; fi
if command -v lsof >/dev/null 2>&1; then lsof -ti tcp:8080 2>/dev/null | xargs -r kill -9 2>/dev/null || true; fi
sleep 2
echo ":8080 freed (comm-scan + port-scan swept any squatter)."
- name: Start platform (background)
working-directory: workspace-server
run: |
PORT=8080 ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
run: |
DEADLINE=300; PID="$(cat workspace-server/platform.pid 2>/dev/null || true)"; start=$(date +%s)
while :; do
# Verify OUR server owns :8080 before trusting /health (no-flakes RCA):
# our server binds :8080 or exits FATAL, so checking our PID first stops
# a squatter answering /health on :8080 from false-positiving the gate.
if [ -n "$PID" ] && ! kill -0 "$PID" 2>/dev/null; then
echo "::error::platform-server exited early (failed to bind :8080 or crashed)"; cat workspace-server/platform.log || true; exit 1
fi
if curl -sf "$BASE/health" >/dev/null; then
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc \
"SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'" 2>/dev/null || echo 0)
[ "$tables" = "1" ] && { echo "healthy after $(( $(date +%s) - start ))s"; exit 0; }
fi
[ "$(( $(date +%s) - start ))" -ge "$DEADLINE" ] && { echo "::error::platform not healthy in ${DEADLINE}s"; cat workspace-server/platform.log || true; exit 1; }
sleep 1
done
- name: Run local-provision lifecycle E2E (real image + MiniMax LLM — ADVISORY)
env:
# LIFECYCLE_LLM=minimax: provision the REAL claude-code template image
# (the mode forces LIFECYCLE_PROVISIONER_BUILDS=1 — the provisioner
# clones + docker-builds the template from Gitea via RegistryModeLocal)
# with a real MiniMax BYOK credential, and assert an ACTUAL model reply
# at the proxy-reach step (a genuine round-trip through ws-<id>:8000).
# MiniMax is the cheapest LLM the platform offers; its `minimax`
# provider dials api.minimax.io directly, so no CP proxy env is needed.
#
# Key wiring (DO NOT hardcode): the script reads MINIMAX_API_KEY from
# the env; we feed it from the MOLECULE_STAGING_MINIMAX_API_KEY CI
# secret (the same secret the staging-smoke + e2e-api MiniMax arms use).
# When that secret is ABSENT, MINIMAX_API_KEY is empty and the script
# SKIPS loud (exit 0) — it never reds on a missing secret (serving-e2e
# skip-if-absent pattern). The advisory job stays green either way.
LIFECYCLE_LLM: minimax
MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
run: bash tests/e2e/test_local_provision_lifecycle_e2e.sh
- name: Dump platform log on failure
if: failure()
run: cat workspace-server/platform.log || true
- name: Stop platform
if: always()
run: |
[ -f workspace-server/platform.pid ] && kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true
- name: Stop service containers
if: always()
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
+14 -27
View File
@@ -7,25 +7,18 @@
#
# A1-α (refire mechanism):
# Triggers on:
# - `pull_request_target`: opened, synchronize, reopened, labeled, unlabeled
# → initial status posts when PR opens / re-pushes, and re-evaluates
# when labels change (e.g. risk-indicator labels).
# - `pull_request_target`: opened, synchronize, reopened
# → initial status posts when PR opens / re-pushes
# - `pull_request_review` types: [submitted]
# → re-evaluate when a team member submits an APPROVE review so
# the gate flips immediately (no wait for the next push or
# slash-command). Verified live: sop-checklist.yml uses this
# same event and provably fires (produces
# `sop-checklist / all-items-acked (pull_request_review)` contexts).
# The job-level `if:` does NOT guard on review.state (issue
# #2159): Gitea 1.22.6's payload shape for this event does not
# reliably expose the state field that the GitHub-style guard
# expects. The evaluator (review-check.sh) reads actual reviews
# from the API and checks for a real APPROVE, so running on
# COMMENT or REQUEST_CHANGES is harmless (read-only,
# idempotent). Branch-protection requires the
# `(pull_request_target)` context variant, so the review-event
# path EXPLICITLY POSTS the required context via the API. Trust
# boundary preserved (BASE ref, no PR-head).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so
# only APPROVE reviews run the evaluator; COMMENT and
# REQUEST_CHANGES are skipped at the job level.
# Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS
# the required context via the API. Trust boundary preserved
@@ -103,7 +96,7 @@ name: qa-review
on:
pull_request_target:
types: [opened, synchronize, reopened, labeled, unlabeled]
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
@@ -117,19 +110,13 @@ jobs:
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review events: always run. We do NOT guard on
# review.state here because Gitea 1.22.6's payload shape for this
# event does not reliably expose the state field (issue #2159).
# The evaluator (review-check.sh) reads actual reviews from the
# API and checks for a real APPROVE, so running on COMMENT or
# REQUEST_CHANGES is harmless (read-only, idempotent).
# - On labeled/unlabeled events: re-evaluate when labels change.
# This ensures qa-review flips when risk-indicator labels are
# added or removed.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
if: |
github.event_name == 'pull_request_target' ||
github.event_name == 'pull_request_review'
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -143,7 +130,7 @@ jobs:
# no comment.user.login so the step is a no-op skip there.
if: github.event_name == 'issue_comment'
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
login="${{ github.event.comment.user.login }}"
@@ -175,7 +162,7 @@ jobs:
- name: Evaluate qa-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
# PR number lives in different places per event:
@@ -198,7 +185,7 @@ jobs:
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_CHECKLIST_GATE_TOKEN (read-only) per deliberate security
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
-19
View File
@@ -21,21 +21,15 @@ on:
branches: [main, staging]
paths:
- '.gitea/scripts/review-check.sh'
- '.gitea/scripts/_approval_validator.py'
- '.gitea/scripts/_review_check_filter.py'
- '.gitea/scripts/tests/test_review_check.sh'
- '.gitea/scripts/tests/_review_check_fixture.py'
- '.gitea/scripts/tests/test_approval_validator.py'
- '.gitea/workflows/review-check-tests.yml'
pull_request:
branches: [main, staging]
paths:
- '.gitea/scripts/review-check.sh'
- '.gitea/scripts/_approval_validator.py'
- '.gitea/scripts/_review_check_filter.py'
- '.gitea/scripts/tests/test_review_check.sh'
- '.gitea/scripts/tests/_review_check_fixture.py'
- '.gitea/scripts/tests/test_approval_validator.py'
- '.gitea/workflows/review-check-tests.yml'
workflow_dispatch:
@@ -76,16 +70,3 @@ jobs:
- name: Run review-check.sh regression suite
run: bash .gitea/scripts/tests/test_review_check.sh
- name: SSOT approval-validator unit tests (SEV-1 internal#812)
# The Python unit tests for _approval_validator.py are
# mutation-verified — every fail-closed branch has an explicit
# REJECT assertion. A reviewer who weakens the predicate trips
# these in CI.
run: |
# The test file lives in .gitea/scripts/tests/ with no __init__.py,
# so `unittest discover -s .gitea/scripts` finds 0 tests (the SEV-1
# suite silently never ran — a CI gap fixed alongside internal#812).
# Run the file directly; it self-inserts its sys.path and calls
# unittest.main(), so a failing assertion exits non-zero and fails CI.
python3 .gitea/scripts/tests/test_approval_validator.py -v
+14 -23
View File
@@ -12,21 +12,18 @@
# Uses `pull_request_review` types: [submitted] — verified live via
# sop-checklist.yml which provably fires this event (produces
# `sop-checklist / all-items-acked (pull_request_review)` contexts).
# The job-level `if:` does NOT guard on review.state (issue #2159):
# Gitea 1.22.6's payload shape for this event does not reliably expose
# the state field that the GitHub-style guard expects. The evaluator
# (review-check.sh) reads actual reviews from the API and checks for a
# real APPROVE, so running on COMMENT or REQUEST_CHANGES is harmless
# (read-only, idempotent). Branch-protection requires the
# `(pull_request_target)` context variant, so the review-event path
# EXPLICITLY POSTS the required context via the API. Trust boundary
# preserved (BASE ref, no PR-head).
# The job-level `if:` guard checks
# `github.event.review.state == 'APPROVED' || 'approved'` so only APPROVE
# reviews run the evaluator; COMMENT and REQUEST_CHANGES are skipped at
# the job level. Branch-protection requires the `(pull_request_target)`
# context variant, so the review-event path EXPLICITLY POSTS the required
# context via the API. Trust boundary preserved (BASE ref, no PR-head).
name: security-review
on:
pull_request_target:
types: [opened, synchronize, reopened, labeled, unlabeled]
types: [opened, synchronize, reopened]
pull_request_review:
types: [submitted]
@@ -40,19 +37,13 @@ jobs:
approved:
# Gate the job:
# - On pull_request_target events: always run.
# - On pull_request_review events: always run. We do NOT guard on
# review.state here because Gitea 1.22.6's payload shape for this
# event does not reliably expose the state field (issue #2159).
# The evaluator (review-check.sh) reads actual reviews from the
# API and checks for a real APPROVE, so running on COMMENT or
# REQUEST_CHANGES is harmless (read-only, idempotent).
# - On labeled/unlabeled events: re-evaluate when labels change.
# This ensures security-review flips when risk-indicator labels
# are added or removed.
# - On pull_request_review_approved events: run so the gate flips
# immediately when a team member submits an APPROVE review.
# Comment-triggered refires live in sop-checklist.yml review-refire job.
if: |
github.event_name == 'pull_request_target' ||
github.event_name == 'pull_request_review'
(github.event_name == 'pull_request_review' &&
(github.event.review.state == 'APPROVED' || github.event.review.state == 'approved'))
runs-on: ubuntu-latest
steps:
- name: Privilege check (A1.1 — INFORMATIONAL log only, NOT a gate)
@@ -61,7 +52,7 @@ jobs:
# so re-running on a non-collaborator comment is harmless.
if: github.event_name == 'issue_comment'
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
login="${{ github.event.comment.user.login }}"
@@ -87,7 +78,7 @@ jobs:
- name: Evaluate security-review
id: eval
env:
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
@@ -107,7 +98,7 @@ jobs:
# TOKEN FIX (RC 8326): uses STATUS_POST_TOKEN (CTO-granted,
# msg d52cc72a). Dedicated narrow-scoped write:repository token
# for the explicit status POST. Evaluator step stays on
# SOP_CHECKLIST_GATE_TOKEN (read-only) per deliberate security
# SOP_TIER_CHECK_TOKEN (read-only) per deliberate security
# separation: eval computes, POST writes, never the same cred.
if: github.event_name == 'pull_request_review' && always()
env:
+2 -2
View File
@@ -167,7 +167,7 @@ jobs:
if: steps.classify.outputs.run_qa == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
@@ -186,7 +186,7 @@ jobs:
if: steps.classify.outputs.run_security == 'true'
env:
# Evaluator (review-check.sh + GET /pulls) stays on read-scoped token.
GITEA_TOKEN: ${{ secrets.SOP_CHECKLIST_GATE_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
# Explicit POST /statuses uses narrow-scoped write:repository token.
STATUS_POST_TOKEN: ${{ secrets.STATUS_POST_TOKEN }}
GITEA_HOST: git.moleculesai.app
+9 -38
View File
@@ -58,51 +58,22 @@ jobs:
python-version: '3.11'
- name: Install .gitea script test dependencies
run: python -m pip install --quiet 'pytest==9.0.2' 'PyYAML==6.0.2'
- name: Run scripts/ unittests (fail-closed on 0 collected)
- name: Run scripts/ unittests, if any
# Top-level scripts/ tests live alongside their target file. The
# runtime packaging tests moved to molecule-ai-workspace-runtime, so
# this pass may legitimately find NO test files today.
#
# Gate-integrity fix: the previous guard keyed off `rc==5` to detect
# "no tests collected", but Python 3.12's unittest exits 0 (not 5)
# when discovery finds 0 tests ("NO TESTS RAN"). The guard therefore
# never fired, so any test_*.py added here would silently run 0 tests
# while this step stayed GREEN. A green step that runs 0 tests is
# worse than a red one. We now fail-closed:
# - genuinely NO test_*.py present -> loud SKIP (legitimate no-op)
# - test_*.py present but 0 collected -> FAIL (broken import/empty)
# this pass may legitimately find no tests.
working-directory: scripts
run: |
set -euo pipefail
# Non-recursive count: scripts/ has no __init__.py, so unittest
# discover does not recurse into subdirs (ops/ is run separately
# below) — top-level files are the entire discovery scope here.
nfiles=$(find . -maxdepth 1 -name 'test_*.py' | wc -l | tr -d ' ')
if [ "$nfiles" -eq 0 ]; then
echo "SKIP: no top-level scripts/ test_*.py files present (genuine no-op)."
set +e
python -m unittest discover -t . -p 'test_*.py' -v
rc=$?
if [ "$rc" -eq 5 ]; then
echo "No top-level scripts/ unittest files found; skipping."
exit 0
fi
echo "Found $nfiles top-level scripts/ test_*.py file(s); asserting they collect >0 tests."
ncollected=$(python -c "import unittest; print(unittest.TestLoader().discover('.', pattern='test_*.py', top_level_dir='.').countTestCases())")
echo "Collected $ncollected test case(s)."
if [ "$ncollected" -eq 0 ]; then
echo "FAIL: test_*.py file(s) present but 0 tests collected (broken import / empty file / discovery error)."
exit 1
fi
python -m unittest discover -t . -p 'test_*.py' -v
exit "$rc"
- name: Run scripts/ops/ unittests (sweep_cf_decide, ...)
# Real gate: scripts/ops/ must always run tests. Assert >0 collected so
# deleting all test files (or breaking an import) can't pass GREEN by
# running 0 tests — same gate-integrity class as the scripts/ step.
working-directory: scripts/ops
run: |
set -euo pipefail
ncollected=$(python -c "import unittest; print(unittest.TestLoader().discover('.', pattern='test_*.py').countTestCases())")
echo "scripts/ops/ collected $ncollected test case(s)."
if [ "$ncollected" -eq 0 ]; then
echo "FAIL: scripts/ops/ collected 0 tests — this gate must run real tests (deleted/broken import?)."
exit 1
fi
python -m unittest discover -p 'test_*.py' -v
run: python -m unittest discover -p 'test_*.py' -v
- name: Run .gitea/scripts pytest suite
run: python -m pytest .gitea/scripts/tests -q
+1 -11
View File
@@ -4,7 +4,7 @@
# use this Makefile; CI calls docker compose / go test directly so the
# Makefile can evolve without breaking the build.
.PHONY: help dev up down logs build test e2e-peer-visibility e2e-concierge-creates-workspace openapi-spec openapi-spec-check gen gen-docker gen-check gen-check-docker
.PHONY: help dev up down logs build test e2e-peer-visibility openapi-spec openapi-spec-check gen gen-docker gen-check gen-check-docker
# ─── Provider-registry SSOT codegen (internal#718) ─────────────────────
# The Go module lives in workspace-server/. The checked-in artifact
@@ -57,16 +57,6 @@ test: ## Run Go unit tests in workspace-server/.
e2e-peer-visibility: ## Run the LOCAL peer-visibility MCP gate vs the running stack (needs `make up` first).
bash tests/e2e/test_peer_visibility_mcp_local.sh
# FUNCTIONAL local proof that the org concierge actually DOES org-management:
# send it a natural-language A2A request and assert it really CREATES a workspace
# via its platform MCP (create_workspace) — the deterministic side effect, not a
# REST 200. SKIPs LOUD (exit 0) unless the local concierge is seeded, online, and
# running on the platform-agent image (so create_workspace exists). To run it
# green locally: seed the concierge (MOLECULE_SEED_PLATFORM_AGENT=1) on the
# platform-agent image WITH a model key. See the script header for the contract.
e2e-concierge-creates-workspace: ## Prove the concierge actually creates a workspace via its platform MCP (skips loud if not runnable).
bash tests/e2e/test_concierge_creates_workspace_local.sh
# ─── OpenAPI spec generation (RFC #1706, Phase 1) ─────────────────────
# Regenerate workspace-server/docs/openapi/swagger.{yaml,json} from
# swaggo annotations on the gin handlers. Commit the output. CI runs
-10
View File
@@ -1,14 +1,7 @@
import { test, expect } from "@playwright/test";
import type { Page } from "@playwright/test";
import { startEchoRuntime } from "./fixtures/echo-runtime";
import { seedWorkspace, startHeartbeat, cleanupWorkspace } from "./fixtures/chat-seed";
/** Enter the Org-map view so the Canvas (React Flow graph) mounts. */
async function enterMapView(page: Page): Promise<void> {
const btn = page.getByTestId("nav-map");
await expect(btn, "rail button nav-map missing").toBeVisible({ timeout: 10_000 });
await btn.click();
}
test.describe("Desktop ChatTab", () => {
let cleanup: () => Promise<void> = async () => {};
@@ -36,7 +29,6 @@ test.describe("Desktop ChatTab", () => {
test.beforeEach(async ({ page }) => {
await page.setViewportSize({ width: 1280, height: 800 });
await page.goto("/");
await enterMapView(page);
await page.waitForSelector(".react-flow__node", { timeout: 10_000 });
// Dismiss onboarding guide if present.
const skipGuide = page.getByText("Skip guide");
@@ -75,7 +67,6 @@ test.describe("Desktop ChatTab", () => {
await expect(page.getByText("Echo: Persistence test")).toBeVisible({ timeout: 15_000 });
await page.reload();
await enterMapView(page);
await page.waitForSelector(".react-flow__node", { timeout: 10_000 });
await page.getByText(workspaceName, { exact: true }).first().click();
await page.locator('#tab-chat').click();
@@ -152,7 +143,6 @@ test.describe("Desktop ChatTab — Markdown rendering", () => {
test.beforeEach(async ({ page }) => {
await page.setViewportSize({ width: 1280, height: 800 });
await page.goto("/");
await enterMapView(page);
await page.waitForSelector(".react-flow__node", { timeout: 10_000 });
const skipGuide2 = page.getByText("Skip guide");
if (await skipGuide2.isVisible().catch(() => false)) {
-648
View File
@@ -1,648 +0,0 @@
/**
* Staging concierge canvas E2E — exercises the platform-agent CONCIERGE shell
* (canvas/src/components/concierge/ConciergeShell.tsx and the Settings split)
* against a fresh staging org provisioned by the shared global setup
* (e2e/staging-setup.ts). Each `test.describe` covers ONE concierge function
* and asserts the behaviour works — not merely that an element exists.
*
* Why this is a SEPARATE spec from staging-tabs.spec.ts (which drives the
* Org-map SidePanel tab UI): the two assert different surfaces of the same
* tenant. Both reuse the EXACT shared harness — same global setup (one
* provisioned org/workspace), same Playwright staging config (matched by the
* `staging-*.spec.ts` testMatch), same gated `Canvas tabs E2E` workflow check.
* No new harness, no new seeding mechanism.
*
* One extra precondition this spec needs that staging-tabs does NOT: a
* kind='platform' concierge ROW. The CI/SaaS tenant does not self-seed one
* (MOLECULE_SEED_PLATFORM_AGENT is unset on CI — workspace-server
* cmd/server/main.go), so without it the concierge shell falls back to
* roots[0] as a *pseudo*-platform surface and the platform-specific
* behaviours (root tag, hidden-from-map) can't be asserted. So this spec
* installs one via the SAME admin endpoint the control plane uses at
* org-provision time — POST /admin/org/platform-agent (AdminAuth, accepts the
* per-tenant admin bearer that global setup already exports). Installing it
* re-parents the provisioned hermes workspace UNDER the platform agent
* (handlers/platform_agent.go installPlatformAgent), giving us a real
* platform ROOT + a real child workspace — exactly the topology the concierge
* Home tree and Org-map filter are built to handle.
*
* This install mutates the shared tenant (re-parents the workspace). It is the
* LAST staging spec alphabetically among the topology-touching ones, and
* staging-tabs / staging-display read the workspace by id (not by root-ness),
* so the re-parent does not break them; Playwright runs workers=1 in file
* order, and the install is idempotent.
*
* Auth model is identical to staging-tabs.spec.ts: feed the per-tenant admin
* token as an Authorization: Bearer header on every browser request, mock
* /cp/auth/me so AuthGate resolves, and fall any non-auth 401 back to an
* empty 200 so a workspace-scoped 401 can't yank us to AuthKit.
*/
import { test, expect, type Page, type BrowserContext } from "@playwright/test";
const STAGING = process.env.CANVAS_E2E_STAGING === "1";
// Fail-closed, not skip-green (mirrors staging-tabs.spec.ts): a staging run
// that was REQUESTED (CANVAS_E2E_STAGING=1) but has no tenant state is a
// provisioning failure, asserted loudly inside the test body — not a skip.
// CANVAS_E2E_STAGING unset = operator did not request staging = clean skip.
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — staging-only suite, not requested");
/** Resolve + validate the tenant handoff that global setup exported. */
function tenantEnv() {
const tenantURL = process.env.STAGING_TENANT_URL;
const tenantToken = process.env.STAGING_TENANT_TOKEN;
const workspaceId = process.env.STAGING_WORKSPACE_ID;
const orgID = process.env.STAGING_ORG_ID;
if (!tenantURL || !tenantToken || !workspaceId) {
throw new Error(
"staging-setup.ts did not export STAGING_TENANT_URL / " +
"STAGING_TENANT_TOKEN / STAGING_WORKSPACE_ID. CANVAS_E2E_STAGING=1 was " +
"set (staging WAS requested) but global setup produced no tenant — a " +
"provisioning failure, NOT a reason to skip. See the [staging-setup] " +
"log above.",
);
}
return { tenantURL, tenantToken, workspaceId, orgID };
}
// A fixed, valid uuid for the installed platform agent. Any valid uuid works
// (the install upserts on this id); reusing one constant keeps re-runs
// idempotent on the same row. Chosen out of the e2e namespace so it can't
// collide with a CP-derived org id.
const PLATFORM_AGENT_ID = "e2e0c1e2-0000-4000-a000-000000c0ce0e";
const PLATFORM_AGENT_NAME = "E2E Concierge";
/**
* Idempotently install the platform-agent (concierge) row on the shared
* tenant so the concierge shell resolves a REAL kind='platform' root. Uses
* the per-tenant admin bearer + org-id headers, same as staging-display.spec.
* Tolerant of a pre-existing install (the endpoint is idempotent) and of a
* backend that predates the endpoint (404/405) — in that degraded case the
* spec proceeds against the roots[0] fallback and the two platform-specific
* assertions self-document why they're loosened.
*/
async function installPlatformAgent(
page: Page,
tenantURL: string,
tenantToken: string,
orgID: string | undefined,
): Promise<{ installed: boolean }> {
const headers: Record<string, string> = {
Authorization: `Bearer ${tenantToken}`,
"Content-Type": "application/json",
};
if (orgID) headers["X-Molecule-Org-Id"] = orgID;
const resp = await page.request.post(`${tenantURL}/admin/org/platform-agent`, {
headers,
data: { id: PLATFORM_AGENT_ID, name: PLATFORM_AGENT_NAME },
});
const status = resp.status();
if (status >= 200 && status < 300) {
console.log(`[staging-concierge] platform agent installed (HTTP ${status})`);
return { installed: true };
}
// Endpoint absent on an older backend — proceed against the fallback root.
if (status === 404 || status === 405) {
console.warn(
`[staging-concierge] POST /admin/org/platform-agent returned ${status}` +
`backend predates the platform-agent endpoint. Proceeding against the ` +
`roots[0] concierge fallback; the platform-root / map-hidden assertions ` +
`are loosened accordingly.`,
);
return { installed: false };
}
throw new Error(
`POST /admin/org/platform-agent ${status}: ${await resp.text().catch(() => "")}`,
);
}
/**
* Wire the per-tenant bearer + the /cp/auth/me mock + the 401→empty-200
* fallback. Verbatim contract from staging-tabs.spec.ts so the concierge spec
* authenticates identically (no WorkOS session available to Playwright).
*/
async function authenticate(
context: BrowserContext,
tenantToken: string,
workspaceId: string,
): Promise<void> {
await context.setExtraHTTPHeaders({ Authorization: `Bearer ${tenantToken}` });
await context.route("**/cp/auth/me", (route) =>
route.fulfill({
status: 200,
contentType: "application/json",
body: JSON.stringify({
user_id: `e2e-test-user-${workspaceId}`,
org_id: "e2e-test-org",
email: "e2e@test.local",
}),
}),
);
await context.route("**", async (route, request) => {
if (request.resourceType() !== "fetch") return route.fallback();
if (request.url().includes("/cp/auth/me")) return route.fallback();
let resp;
try {
resp = await route.fetch();
} catch {
return route.fallback();
}
if (resp.status() !== 401) return route.fulfill({ response: resp });
const lastSeg =
new URL(request.url()).pathname.split("/").filter(Boolean).pop() || "";
const looksLikeList = !/^[0-9a-f-]{8,}$/.test(lastSeg);
await route.fulfill({
status: 200,
contentType: "application/json",
body: looksLikeList ? "[]" : "{}",
});
});
}
/**
* Load the concierge shell and wait for hydration. Returns once the icon rail
* (the concierge's left nav) is visible — the rail is the shell's outermost
* stable landmark and only renders after the canvas store has hydrated.
*/
async function loadConcierge(page: Page, tenantURL: string): Promise<void> {
page.on("console", (msg) => {
if (msg.type() === "error") console.log(`[e2e/console-error] ${msg.text()}`);
});
await page.goto(tenantURL, { waitUntil: "domcontentloaded" });
// The canvas store hydrates /workspaces before the desktop shell paints.
// Wait for the concierge nav rail OR the hydration-error banner — whichever
// wins. Don't wait on networkidle: the shell keeps a WS + polling open.
await page.waitForSelector(
'[data-testid="nav-home"], [data-testid="hydration-error"]',
{ timeout: 45_000 },
);
const hydrationErr = await page
.locator('[data-testid="hydration-error"]')
.count();
expect(
hydrationErr,
"canvas hydration failed — check staging CP + tenant reachability",
).toBe(0);
await expect(
page.getByText("Something went wrong", { exact: false }),
"app-level ErrorBoundary tripped during concierge hydration",
).toHaveCount(0);
}
/** Switch the concierge top-level view via the left rail. */
async function navTo(page: Page, view: "home" | "map" | "settings"): Promise<void> {
const btn = page.getByTestId(`nav-${view}`);
await expect(btn, `rail button nav-${view} missing`).toBeVisible({ timeout: 10_000 });
await btn.click();
}
// ── shared per-spec setup ──────────────────────────────────────────────────
// Each test gets a freshly-authenticated context + an installed platform
// agent. Install lives in beforeEach (idempotent) so any single test can run
// in isolation (`--grep`), not only in whole-file order.
let platformInstalled = false;
test.beforeEach(async ({ page, context }) => {
const { tenantURL, tenantToken, workspaceId, orgID } = tenantEnv();
await authenticate(context, tenantToken, workspaceId);
const { installed } = await installPlatformAgent(page, tenantURL, tenantToken, orgID);
platformInstalled = installed;
});
/* ───────────────────────── 1. Concierge shell / nav ──────────────────────── */
test.describe("concierge shell + nav", () => {
test("left rail switches Home / Org map / Settings; topbar shows the org name", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
// All three rail destinations are present.
for (const v of ["home", "map", "settings"] as const) {
await expect(page.getByTestId(`nav-${v}`)).toBeVisible();
}
// Topbar org name is dynamic from GET /org/identity. The endpoint returns
// MOLECULE_ORG_NAME (may be "" on a staging tenant), in which case the
// shell falls back to "Molecule AI". Either way it must render a
// non-empty name — assert the element resolves to real text.
const orgName = page.getByTestId("topbar-org-name");
await expect(orgName).toBeVisible();
await expect
.poll(async () => ((await orgName.innerText()) || "").trim().length, {
message: "topbar org name never resolved to non-empty text",
timeout: 10_000,
})
.toBeGreaterThan(0);
// Nav actually switches the active view. Home → Settings → Map → Home,
// asserting the destination rail button reflects active state each hop
// (the shell toggles the active class; we assert the view content too).
await navTo(page, "settings");
await expect(page.getByRole("heading", { name: "Settings" })).toBeVisible({
timeout: 10_000,
});
await navTo(page, "map");
await expect(page.locator('[aria-label="Agent canvas"]')).toBeVisible({
timeout: 15_000,
});
await navTo(page, "home");
// Home shows the agents/tasks/approvals sub-tab bar.
await expect(page.getByTestId("home-subtab-agents")).toBeVisible({
timeout: 10_000,
});
});
});
/* ─────────────────────────────── 2. Home ─────────────────────────────────── */
test.describe("concierge Home", () => {
test("renders the canonical ChatTab, Agents/Tasks/Approvals sub-tabs, and the platform agent as ROOT", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "home");
// (a) The Home chat panel reuses the EXACT canonical ChatTab — so it must
// expose the My Chat / Agent Comms sub-tabs, a message input, and the
// attachment affordance, exactly like the map SidePanel chat. The
// [data-testid="chat-panel"] root is ChatTab's own marker (canvas/src/
// components/tabs/ChatTab.tsx) — asserting it proves the canonical
// component is mounted, not a bespoke concierge re-implementation.
const chatPanel = page.getByTestId("chat-panel");
await expect(chatPanel, "Home did not mount the canonical ChatTab").toBeVisible({
timeout: 15_000,
});
await expect(chatPanel.locator("#chat-tab-my-chat")).toHaveText(/My Chat/);
await expect(chatPanel.locator("#chat-tab-agent-comms")).toHaveText(/Agent Comms/);
// Switching the chat sub-tab works (My Chat active by default → Agent Comms).
await chatPanel.locator("#chat-tab-agent-comms").click();
await expect(chatPanel.locator("#chat-tab-agent-comms")).toHaveAttribute(
"aria-selected",
"true",
);
await chatPanel.locator("#chat-tab-my-chat").click();
await expect(chatPanel.locator("#chat-tab-my-chat")).toHaveAttribute(
"aria-selected",
"true",
);
// Message input + attachment affordance (My Chat panel). The attach
// control is the labelled button (the underlying <input type=file> is
// aria-hidden); both are always present (disabled when the agent is
// unreachable), so assert presence, not enabled-state.
await expect(
chatPanel.locator('textarea[aria-label="Message to agent"]'),
"ChatTab message input missing",
).toHaveCount(1);
await expect(
chatPanel.locator('button[aria-label="Attach file"]'),
"ChatTab attachment affordance missing",
).toHaveCount(1);
// (b) Agents / Tasks / Approvals sub-tabs switch the Home sidebar pane.
await page.getByTestId("home-subtab-tasks").click();
await expect(page.getByTestId("home-subtab-tasks")).toHaveClass(/active/);
await page.getByTestId("home-subtab-approvals").click();
await expect(page.getByTestId("home-subtab-approvals")).toHaveClass(/active/);
await page.getByTestId("home-subtab-agents").click();
await expect(page.getByTestId("home-subtab-agents")).toHaveClass(/active/);
// (c) The agent tree shows the platform agent as ROOT. After install the
// platform agent is a kind='platform' root carrying the "root" tag, with
// the provisioned workspace re-parented under it (depth>0). When the
// backend predates the install endpoint, roots[0] is the pseudo-root and
// the "root" tag is absent (it only renders for a real kind='platform'
// root) — so we gate the strong assertion on a successful install.
const tree = page.getByTestId("agent-tree-node");
await expect(tree.first(), "agent tree rendered no nodes").toBeVisible({
timeout: 10_000,
});
if (platformInstalled) {
// The depth-0 node is the platform agent and it carries the root tag.
const rootNode = page
.locator('[data-testid="agent-tree-node"][data-depth="0"]')
.first();
await expect(rootNode).toHaveAttribute("data-platform", "true");
await expect(
rootNode.locator('[data-testid="agent-tree-root-tag"]'),
"platform root is missing the ROOT tag",
).toBeVisible();
// And the provisioned workspace is nested beneath it (a child node exists).
await expect(
page.locator('[data-testid="agent-tree-node"][data-depth="1"]'),
"the provisioned workspace did not re-parent under the platform root",
).toHaveCount(1, { timeout: 10_000 });
} else {
// Degraded backend: at least the tree renders a root-level node.
await expect(
page.locator('[data-testid="agent-tree-node"][data-depth="0"]'),
).not.toHaveCount(0);
}
});
});
/* ─────────────────────────────── 3. Org map ──────────────────────────────── */
test.describe("concierge Org map", () => {
test("hides the platform agent from the node graph; normal workspaces render", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "map");
// The React Flow canvas renders.
await expect(page.locator('[aria-label="Molecule AI workspace canvas"]')).toBeVisible({
timeout: 15_000,
});
// Normal workspaces render as map node cards (WorkspaceNode →
// data-testid="workspace-node"). The provisioned hermes workspace must
// appear. expect.poll lets React Flow finish its layout pass.
await expect
.poll(async () => page.locator('[data-testid="workspace-node"]').count(), {
message: "no workspace nodes rendered on the org map",
timeout: 15_000,
})
.toBeGreaterThan(0);
// The concierge (platform agent) is HIDDEN from the graph: no map node
// carries its name. WorkspaceNode's aria-label is "<name> workspace —
// <status>" — assert none matches the platform agent name. This is the
// real behaviour stripPlatformRootForMap implements (Canvas.tsx /
// canvas-topology.ts). Only meaningful when we actually installed one.
if (platformInstalled) {
const platformNode = page.locator(
`[data-testid="workspace-node"][aria-label^="${PLATFORM_AGENT_NAME} workspace"]`,
);
await expect(
platformNode,
"the platform agent (concierge) leaked into the org-map node graph — " +
"stripPlatformRootForMap should exclude it",
).toHaveCount(0);
}
});
});
/* ─────────────────────── 4. Settings — two tabs ──────────────────────────── */
test.describe("concierge Settings — two tabs", () => {
test("Platform-agent config and Org & canvas settings are separate panes; platform tab shows the full WorkspacePanelTabs defaulting to Config", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "settings");
const platformTab = page.getByTestId("settings-tab-platform");
const orgTab = page.getByTestId("settings-tab-org");
await expect(platformTab).toBeVisible({ timeout: 10_000 });
await expect(orgTab).toBeVisible();
// Platform tab is the default; its pane is shown and the org pane is not.
await expect(platformTab).toHaveAttribute("aria-selected", "true");
await expect(page.getByTestId("settings-pane-platform")).toBeVisible();
await expect(page.getByTestId("settings-pane-org")).toHaveCount(0);
// The platform pane embeds the FULL WorkspacePanelTabs (the SAME tablist
// the map SidePanel renders) and defaults to the Config tab. Assert the
// canonical workspace tablist is present, that Config is the active tab,
// and that the other signature tabs exist (Plugins, Container, Display,
// Details, Activity, Terminal, Channels, Schedule).
const wsTablist = page.getByRole("tablist", { name: "Workspace panel tabs" });
await expect(
wsTablist,
"platform-agent Settings tab did not embed WorkspacePanelTabs",
).toBeVisible({ timeout: 15_000 });
await expect(page.locator("#tab-config")).toHaveAttribute(
"aria-selected",
"true",
);
for (const id of [
"config",
"skills",
"container-config",
"display",
"details",
"activity",
"terminal",
"channels",
"schedule",
]) {
await expect(
page.locator(`#tab-${id}`),
`WorkspacePanelTabs is missing #tab-${id}`,
).toHaveCount(1);
}
// Clicking the OTHER settings tab switches panes (not just toggles a
// class): the org pane mounts and the platform pane unmounts.
await orgTab.click();
await expect(orgTab).toHaveAttribute("aria-selected", "true");
await expect(page.getByTestId("settings-pane-org")).toBeVisible();
await expect(page.getByTestId("settings-pane-platform")).toHaveCount(0);
// And back.
await platformTab.click();
await expect(page.getByTestId("settings-pane-platform")).toBeVisible();
await expect(page.getByTestId("settings-pane-org")).toHaveCount(0);
});
});
/* ─────────────────────── 5. Settings — Config tab ────────────────────────── */
test.describe("concierge Settings — Config tab dropdowns", () => {
test("runtime dropdown is SSOT-driven; provider hides Platform on self-host but lists BYOK; model follows provider", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "settings");
// Platform tab defaults to the Config tab — the runtime select is in the
// ConfigTab "Runtime" section (label "Runtime"). Wait for it to settle.
await expect(
page.getByRole("tablist", { name: "Workspace panel tabs" }),
).toBeVisible({ timeout: 15_000 });
// The runtime <select> sits under the "Runtime" label inside the Config
// panel. Use the label association for a stable hook.
const runtimeByLabel = page.locator('#panel-config').getByLabel("Runtime", {
exact: true,
});
await expect(
runtimeByLabel,
"ConfigTab runtime dropdown never rendered",
).toBeVisible({ timeout: 15_000 });
// (a) Runtime dropdown is SSOT-driven: the options come from GET
// /templates (loadRuntimesFromManifest), so the live tenant must serve a
// non-trivial set. Assert >= 1 runtime option AND that the provisioned
// workspace's runtime (hermes) is among them — proving the list reflects
// what /templates actually serves, not a stale hard-coded allowlist.
const runtimeOptionValues = await runtimeByLabel
.locator("option")
.evaluateAll((els) => els.map((e) => (e as HTMLOptionElement).value));
expect(
runtimeOptionValues.length,
"runtime dropdown rendered no options — SSOT /templates feed is empty",
).toBeGreaterThan(0);
expect(
runtimeOptionValues,
"runtime dropdown does not list the provisioned 'hermes' runtime — the " +
"SSOT /templates list has drifted",
).toContain("hermes");
// (b) Provider dropdown: on self-host (no platform proxy) it must NOT
// offer the "Platform" billing option but MUST list BYOK providers. The
// ProviderModelSelector exposes data-testid="provider-select". Read its
// option labels: none should be the "Platform" proxy entry, and the list
// must be non-empty (BYOK providers present). /org/identity's
// platform_managed_available=false on a staging tenant drives this.
const providerSelect = page.getByTestId("provider-select");
await expect(
providerSelect,
"ConfigTab provider dropdown (ProviderModelSelector) never rendered",
).toBeVisible({ timeout: 15_000 });
const providerLabels = await providerSelect
.locator("option")
.evaluateAll((els) =>
els
.map((e) => (e.textContent || "").trim())
.filter((t) => t && !t.startsWith("—")),
);
expect(
providerLabels.length,
"provider dropdown lists no BYOK providers",
).toBeGreaterThan(0);
expect(
providerLabels.map((l) => l.toLowerCase()),
'provider dropdown offered the "Platform" proxy option on a self-host / ' +
"no-proxy tenant (platform_managed_available should hide it)",
).not.toContain("platform");
// (c) Model dropdown follows the provider. The model control is
// data-testid="model-select" (dropdown) or model-input (free-text
// wildcard). Whichever renders, it must be present — proving the model
// control is wired to the provider selection.
const modelControl = page
.locator('[data-testid="model-select"], [data-testid="model-input"]')
.first();
await expect(
modelControl,
"model control did not follow the provider selection",
).toBeVisible({ timeout: 10_000 });
});
});
/* ────────────────── 6. Settings — Org & canvas settings ──────────────────── */
test.describe("concierge Settings — Org & canvas", () => {
test("Secrets / Workspace Tokens / Org API Keys / Organization sub-tabs render; Organization shows the org (no 404)", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "settings");
await page.getByTestId("settings-tab-org").click();
const orgPane = page.getByTestId("settings-pane-org");
await expect(orgPane).toBeVisible({ timeout: 10_000 });
// The four SettingsTabs (canvas/src/components/settings/SettingsTabs.tsx)
// render as a radix tablist labelled "Settings sections". Assert all four
// triggers are present.
const settingsTablist = orgPane.getByRole("tablist", {
name: "Settings sections",
});
await expect(settingsTablist).toBeVisible({ timeout: 10_000 });
for (const label of [
"Secrets",
"Workspace Tokens",
"Org API Keys",
"Organization",
]) {
await expect(
settingsTablist.getByRole("tab", { name: label }),
`Org & canvas settings is missing the "${label}" sub-tab`,
).toBeVisible();
}
// Click the Organization sub-tab — on self-host the canvas reads
// /org/identity (NOT the CP /cp/orgs endpoint), so it must render the org
// identity card and NOT a 404 / error state. Assert the pane settles to
// real, non-error content.
await settingsTablist.getByRole("tab", { name: "Organization" }).click();
const orgInfoPanel = orgPane.locator(
'[role="tabpanel"]:not([hidden])',
);
await expect(orgInfoPanel).toBeVisible({ timeout: 10_000 });
await expect
.poll(
async () => {
const text = ((await orgInfoPanel.innerText()) || "").trim();
return text.length > 0 && !/404|not found/i.test(text);
},
{
message:
"Organization sub-tab rendered empty or a 404/not-found — the " +
"self-host /org/identity path is broken",
timeout: 15_000,
},
)
.toBe(true);
// And no visible error alert inside the org settings pane.
await expect(orgPane.locator('[role="alert"]:visible')).toHaveCount(0);
});
});
/* ───────────────────────────── 7. Map toolbar ────────────────────────────── */
test.describe("concierge Org map toolbar", () => {
test("settings gear, theme toggle and legend are NOT on the map toolbar (moved to Settings/topbar)", async ({
page,
}) => {
const { tenantURL } = tenantEnv();
await loadConcierge(page, tenantURL);
await navTo(page, "map");
await expect(page.locator('[aria-label="Molecule AI workspace canvas"]')).toBeVisible({
timeout: 15_000,
});
// The map toolbar no longer carries a settings gear, a theme toggle, or a
// legend — those moved to the concierge Settings (left rail) + topbar
// (Toolbar.tsx: "Theme picker + settings gear removed from the map
// toolbar"). Assert the map view contains none of them.
//
// Scope to the map mount (<main aria-label="Agent canvas">, ConciergeShell)
// so the legitimate left-rail Settings button + the topbar theme toggle
// (which live OUTSIDE the map) are not counted.
const mapRegion = page.locator('[aria-label="Agent canvas"]');
await expect(mapRegion).toBeVisible({ timeout: 10_000 });
// No settings-gear control inside the map. The old gear used
// title="Settings" / aria-label "Settings".
await expect(
mapRegion.locator('button[title="Settings"], button[aria-label="Settings"]'),
"a settings gear is still on the map toolbar (should be moved to Settings)",
).toHaveCount(0);
// No theme toggle inside the map. The toggle's accessible name is
// "Toggle theme" — it now lives only in the topbar.
await expect(
mapRegion.locator('button[title="Toggle theme"], button[aria-label*="theme" i]'),
"a theme toggle is still on the map toolbar (should be in the topbar)",
).toHaveCount(0);
// No legend inside the map. The Legend component's controls have accessible
// names "Show legend" / "Hide legend" and the panel carries
// data-testid="legend-panel" (canvas/src/components/Legend.tsx). It is no
// longer mounted in Canvas/Toolbar at all — assert none of its surfaces.
await expect(
mapRegion.locator(
'[data-testid="legend-panel"], button[aria-label="Show legend"], button[aria-label="Hide legend"]',
),
"a legend is still on the map toolbar (should be removed)",
).toHaveCount(0);
});
});
+2 -6
View File
@@ -341,15 +341,11 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
);
return true;
}
// #2032: tolerate transient 'failed' during boot — some runtimes
// briefly report failed before recovering to online (e.g. agent
// restart during init). Retry instead of hard-throwing; genuine
// terminal failures will still surface via waitFor timeout.
// Real boot regression — hard-throw immediately with full detail.
const detail = sampleErr
? sampleErr
: `(no last_sample_error) full body: ${JSON.stringify(r.body)}`;
console.warn(`[staging-setup] transient failed (retrying): ${detail}`);
return null;
throw new Error(`Workspace failed: ${detail}`);
}
return null;
},
+2 -4
View File
@@ -52,10 +52,8 @@ describe("prefers-reduced-motion compliance", () => {
expect(src).toContain("motion-safe:animate-pulse");
});
it("WorkspacePanelTabs.tsx uses motion-safe:animate-pulse", () => {
// The connection-status dot moved out of SidePanel.tsx into the extracted
// WorkspacePanelTabs.tsx; verify the reduced-motion guard followed it.
const src = readSrc("components/WorkspacePanelTabs.tsx");
it("SidePanel.tsx uses motion-safe:animate-pulse", () => {
const src = readSrc("components/SidePanel.tsx");
expect(src.includes("animate-pulse") && !src.includes("motion-safe:animate-pulse")).toBe(false);
expect(src).toContain("motion-safe:animate-pulse");
});
+1 -1
View File
@@ -10,7 +10,7 @@ import { describe, it, expect, vi } from "vitest";
// transform). We import layout.tsx only for its exported `metadata`
// constant — mock the font module to a constructor-returning stub.
vi.mock("next/font/google", () => ({
Hanken_Grotesk: () => ({ variable: "--font-hanken" }),
Inter: () => ({ variable: "--font-inter" }),
JetBrains_Mono: () => ({ variable: "--font-jetbrains" }),
}));
+38 -50
View File
@@ -42,52 +42,48 @@
* before paint to eliminate flash.
*/
@theme {
/* Org Concierge palette (RFC platform-agent / canvas redesign). Warm-paper
light theme + purple accent replacing the old blue brand. */
/* Surface — page / elevated card / sunken input / deep card */
--color-surface: #f1efe8;
--color-surface: #fafaf7;
--color-surface-elevated: #ffffff;
--color-surface-sunken: #f6f4ee;
--color-surface-card: #faf9f4;
--color-surface-sunken: #f3f1ec;
--color-surface-card: #efece4;
/* Borders */
--color-line: #ddd9cf;
--color-line-soft: #ebe8df;
--color-line: #e6e2d8;
--color-line-soft: #efece4;
/* Text */
--color-ink: #21201b;
--color-ink-mid: #5c5a52;
--color-ink-soft: #6f6c62;
--color-ink: #15181c;
--color-ink-mid: #5a5e66;
--color-ink-soft: #8b8e95;
/* Brand + state — purple accent (concept #7c3aed); light good/bad kept
slightly darker than the raw concept hues for WCAG AA on the paper tints. */
--color-accent: #7c3aed;
--color-accent-strong: #6d28d9;
--color-warm: #c47e12;
--color-good: #0c8a52;
--color-bad: #c2403c;
/* Brand + state */
--color-accent: #3b5bdb;
--color-accent-strong: #1a2f99;
--color-warm: #c0532b;
--color-good: #2f7a4d;
--color-bad: #b94e4a;
}
[data-theme="dark"] {
/* Org Concierge dark palette — near-black panels, bright purple accent. */
--color-surface: #08080a;
--color-surface-elevated: #16161d;
--color-surface-sunken: #0d0d11;
--color-surface-card: #1b1b23;
--color-surface: #0e1014;
--color-surface-elevated: #15181c;
--color-surface-sunken: #0a0b0e;
--color-surface-card: #1a1d23;
--color-line: #26262e;
--color-line-soft: #1b1b22;
--color-line: #2a2f3a;
--color-line-soft: #1f2329;
--color-ink: #ececf1;
--color-ink-mid: #9b9baa;
--color-ink-soft: #65656f;
--color-ink: #f4f1e9;
--color-ink-mid: #c8c2b4;
--color-ink-soft: #8d92a0;
/* Purple accent brightened for AA on the near-black surfaces. */
--color-accent: #a78bfa;
--color-accent-strong: #c4b5fd;
--color-warm: #fbbf24;
--color-good: #34d399;
--color-bad: #f87171;
/* Accents brighten slightly for AA contrast on dark backgrounds. */
--color-accent: #6883e8;
--color-accent-strong: #8aa1ee;
--color-warm: #d96f48;
--color-good: #4ca06e;
--color-bad: #d27773;
}
:root {
@@ -111,22 +107,15 @@
* component, not per theme.
*/
@theme {
/* Org Concierge canvas palette (near-black + purple). */
--color-bg: rgb(8 8 10); /* concept --bg #08080a */
--color-bg-elev: rgb(22 22 29); /* concept --card #16161d */
--color-bg-card: rgb(27 27 35); /* concept --card-2 #1b1b23 */
--color-line-strong: rgb(54 54 64);
--color-ink-mute: rgb(155 155 170); /* concept --tx-2 */
--color-ink-dim: rgb(101 101 111); /* concept --tx-3 */
--color-accent-dim: rgb(167 139 250);/* concept --accent-2 #a78bfa */
--color-plasma: rgb(139 92 246); /* concept --accent #8b5cf6 */
--color-bg: rgb(9 9 11); /* zinc-950 */
--color-bg-elev: rgb(24 24 27); /* zinc-900 */
--color-bg-card: rgb(39 39 42); /* zinc-800 */
--color-line-strong: rgb(63 63 70); /* zinc-700 */
--color-ink-mute: rgb(161 161 170); /* zinc-400 */
--color-ink-dim: rgb(113 113 122); /* zinc-500 */
--color-accent-dim: rgb(96 165 250);/* blue-400 */
--color-plasma: rgb(59 130 246); /* blue-500 */
--color-warn: rgb(251 191 36); /* amber-400 */
/* Typography — Org Concierge (Hanken Grotesk UI, JetBrains Mono code).
next/font variables are set on <html> in the canvas layout. */
--font-sans: var(--font-hanken), ui-sans-serif, system-ui, -apple-system,
"Segoe UI", Roboto, sans-serif;
--font-mono: var(--font-jetbrains), ui-monospace, "SF Mono", Menlo, monospace;
}
body {
@@ -135,8 +124,7 @@ body {
overflow: hidden;
background-color: var(--color-surface);
color: var(--color-ink);
font-family: var(--font-hanken), -apple-system, BlinkMacSystemFont, "Segoe UI",
Roboto, "Helvetica Neue", sans-serif;
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", sans-serif;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
+3 -13
View File
@@ -1,5 +1,5 @@
import type { Metadata } from "next";
import { Hanken_Grotesk, JetBrains_Mono } from "next/font/google";
import { Inter, JetBrains_Mono } from "next/font/google";
import { cookies, headers } from "next/headers";
import "./globals.css";
@@ -7,13 +7,10 @@ import "./globals.css";
// because Next.js serves the .woff2 from /_next/static). Exposed as
// CSS variables so the mobile palette can reference them without
// importing this module.
// Org Concierge UI typeface (canvas redesign): Hanken Grotesk, exposed as
// --font-hanken and consumed by the --font-sans theme token in globals.css.
const interFont = Hanken_Grotesk({
const interFont = Inter({
subsets: ["latin"],
weight: ["400", "500", "600", "700"],
display: "swap",
variable: "--font-hanken",
variable: "--font-inter",
});
const monoFont = JetBrains_Mono({
subsets: ["latin"],
@@ -164,12 +161,6 @@ export default async function RootLayout({
*/}
<script
nonce={nonce}
// The browser strips the nonce attribute off <script> after applying
// CSP, so the hydrated DOM shows nonce="" while React's tree carries
// the real value — a benign, expected server/client diff. Suppress
// the hydration warning for this element (same rationale as the
// <html> suppressHydrationWarning above).
suppressHydrationWarning
dangerouslySetInnerHTML={{ __html: themeBootScript }}
/>
{/*
@@ -195,7 +186,6 @@ export default async function RootLayout({
<script
type="application/ld+json"
nonce={nonce}
suppressHydrationWarning
dangerouslySetInnerHTML={{
__html: JSON.stringify({
"@context": "https://schema.org",
+8 -2
View File
@@ -1,7 +1,9 @@
"use client";
import { useEffect, useState } from "react";
import { ConciergeShell } from "@/components/concierge/ConciergeShell";
import { Canvas } from "@/components/Canvas";
import { Legend } from "@/components/Legend";
import { CommunicationOverlay } from "@/components/CommunicationOverlay";
import { MobileApp } from "@/components/mobile/MobileApp";
import { Spinner } from "@/components/Spinner";
import { connectSocket, disconnectSocket } from "@/store/socket";
@@ -113,7 +115,11 @@ export default function Home() {
return (
<>
<ConciergeShell />
<main aria-label="Agent canvas">
<Canvas />
</main>
<Legend />
<CommunicationOverlay />
{hydrationError && (
<div
role="alert"
+6 -36
View File
@@ -13,11 +13,8 @@ import {
import "@xyflow/react/dist/style.css";
import { useCanvasStore } from "@/store/canvas";
import { WORKSPACE_KIND } from "@/lib/workspace-kind";
import { stripPlatformRootForMap } from "@/store/canvas-topology";
import { useTheme } from "@/lib/theme-provider";
import { A2ATopologyOverlay } from "./A2ATopologyOverlay";
import { MessageFlightLayer } from "./MessageFlightLayer";
import { WorkspaceNode } from "./WorkspaceNode";
import { SidePanel } from "./SidePanel";
import { CreateWorkspaceButton } from "./CreateWorkspaceDialog";
@@ -81,38 +78,15 @@ function CanvasInner() {
// half-themed page. Pull resolvedTheme so the canvas matches the user's
// selected mode (and the system preference when they pick "system").
const { resolvedTheme } = useTheme();
const storeNodes = useCanvasStore((s) => s.nodes);
const storeEdges = useCanvasStore((s) => s.edges);
const rawNodes = useCanvasStore((s) => s.nodes);
const edges = useCanvasStore((s) => s.edges);
const a2aEdges = useCanvasStore((s) => s.a2aEdges);
const showA2AEdges = useCanvasStore((s) => s.showA2AEdges);
const deletingIds = useCanvasStore((s) => s.deletingIds);
// Hide the org-level platform agent (the concierge) from the map graph: it is
// the undeletable org ROOT surfaced in the shell (topbar + Home tree), not a
// draggable/deletable map node. Its direct children are reparented to
// top-level and tree edges touching it are dropped. The store keeps the full
// node set, so the shell's Home agent tree still renders it as ROOT.
const { nodes: rawNodes, edges } = useMemo(
() => stripPlatformRootForMap(storeNodes, storeEdges),
[storeNodes, storeEdges],
const allEdges = useMemo(
() => (showA2AEdges ? [...edges, ...a2aEdges] : edges),
[edges, a2aEdges, showA2AEdges],
);
const platformIds = useMemo(
() =>
new Set(
storeNodes
.filter((n) => n.data.kind === WORKSPACE_KIND.Platform)
.map((n) => n.id),
),
[storeNodes],
);
const allEdges = useMemo(() => {
if (!showA2AEdges) return edges;
// Drop A2A edges that touch the hidden platform root so React Flow doesn't
// warn about an edge to a missing node.
const a2a = a2aEdges.filter(
(e) => !platformIds.has(e.source) && !platformIds.has(e.target),
);
return [...edges, ...a2a];
}, [edges, a2aEdges, showA2AEdges, platformIds]);
// Drag-lock during a system-owned operation (deploy OR delete).
// React Flow respects Node.draggable, which stops the gesture
// before it starts — preventDefault() on the drag-start callback
@@ -303,7 +277,7 @@ function CanvasInner() {
>
Skip to canvas
</a>
<main id="canvas-main" className="w-full h-full bg-surface">
<main id="canvas-main" className="w-screen h-screen bg-surface">
<ReactFlow
colorMode={resolvedTheme}
nodes={nodes}
@@ -372,10 +346,6 @@ function CanvasInner() {
nodeBorderRadius={4}
/>
<DropTargetBadge />
{/* Flies an envelope between agents on each delegate/message event.
Inside <ReactFlow> so its ViewportPortal renders in flow coords
and tracks pan/zoom. */}
<MessageFlightLayer />
</ReactFlow>
{/* Screen-reader live region — announces workspace count on initial load and
-84
View File
@@ -1,84 +0,0 @@
/** FlightEnvelope — a single envelope that animates from `from` to `to` and
* fades out, used by both the canvas (flow coords inside a ViewportPortal) and
* the concierge home (screen coords inside a fixed overlay). The parent owns
* the coordinate space; this component only animates the translate delta.
*
* Uses the Web Animations API so the from/to delta can be dynamic per flight
* (a static CSS @keyframes can't translate to a runtime-computed point). */
import { useEffect, useRef } from "react";
import { FLIGHT_DURATION_MS, type A2AFlightKind } from "@/hooks/useA2AFlights";
/** Stroke colour by activity kind — mirrors CommunicationOverlay's palette
* (send = cyan, receive = violet/accent, task = warm) so the two surfaces
* read as the same event. */
const KIND_COLOR: Record<A2AFlightKind, string> = {
send: "#22d3ee",
receive: "#8b5cf6",
task: "#f5a623",
};
export interface Point {
x: number;
y: number;
}
export function FlightEnvelope({
from,
to,
kind,
}: {
from: Point;
to: Point;
kind: A2AFlightKind;
}) {
const ref = useRef<HTMLDivElement>(null);
useEffect(() => {
const el = ref.current;
// Element.animate is unavailable in some test/SSR environments — degrade to
// a static (instantly-finished) envelope rather than throw.
if (!el || typeof el.animate !== "function") return;
const dx = to.x - from.x;
const dy = to.y - from.y;
const anim = el.animate(
[
{ transform: "translate(-50%,-50%) translate(0px,0px) scale(0.45)", opacity: 0 },
{ opacity: 1, offset: 0.16 },
{ opacity: 1, offset: 0.8 },
{ transform: `translate(-50%,-50%) translate(${dx}px,${dy}px) scale(1)`, opacity: 0 },
],
{ duration: FLIGHT_DURATION_MS, easing: "cubic-bezier(0.45, 0, 0.25, 1)", fill: "forwards" },
);
return () => anim.cancel();
}, [from.x, from.y, to.x, to.y]);
const color = KIND_COLOR[kind];
return (
<div
ref={ref}
data-testid="flight-envelope"
aria-hidden="true"
style={{
position: "absolute",
left: from.x,
top: from.y,
pointerEvents: "none",
willChange: "transform, opacity",
filter: "drop-shadow(0 1px 3px rgba(0,0,0,0.45))",
zIndex: 6,
}}
>
<svg width="22" height="22" viewBox="0 0 24 24" fill="none" aria-hidden="true">
<rect x="2.5" y="5.5" width="19" height="13" rx="2.5" fill="#0b0b0f" stroke={color} strokeWidth="1.6" />
<path
d="M3.5 7.5l8.5 6 8.5-6"
stroke={color}
strokeWidth="1.6"
fill="none"
strokeLinecap="round"
strokeLinejoin="round"
/>
</svg>
</div>
);
}
@@ -1,46 +0,0 @@
/** MessageFlightLayer — flies an envelope from the source agent to the target
* agent on the spatial canvas whenever a delegate / message event fires.
*
* Mounted INSIDE <ReactFlow> so its ViewportPortal places the envelope in flow
* coordinates; it therefore pans and zooms with the canvas for free. The
* flight lifecycle (which events become envelopes, reduced-motion opt-out,
* expiry) lives in useA2AFlights — this component only resolves node centres
* and renders. */
import { ViewportPortal, type Node } from "@xyflow/react";
import { useCanvasStore } from "@/store/canvas";
import { useA2AFlights } from "@/hooks/useA2AFlights";
import { FlightEnvelope, type Point } from "./FlightEnvelope";
import type { WorkspaceNodeData } from "@/store/canvas";
// Fallback node footprint when React Flow has not measured a node yet. Matches
// WorkspaceNode's leaf size (w-[300px] min-h-[176px]); a slightly-off centre
// for the first frame after mount is invisible at flight scale.
const DEFAULT_W = 300;
const DEFAULT_H = 176;
function nodeCenter(n: Node<WorkspaceNodeData>): Point {
const w = n.measured?.width ?? DEFAULT_W;
const h = n.measured?.height ?? DEFAULT_H;
return { x: n.position.x + w / 2, y: n.position.y + h / 2 };
}
export function MessageFlightLayer() {
const flights = useA2AFlights();
const nodes = useCanvasStore((s) => s.nodes);
if (flights.length === 0) return null;
return (
<ViewportPortal>
{flights.map((f) => {
const src = nodes.find((n) => n.id === f.sourceId);
const dst = nodes.find((n) => n.id === f.targetId);
// Both endpoints must be on-canvas to draw a path between them.
if (!src || !dst) return null;
return (
<FlightEnvelope key={f.key} from={nodeCenter(src)} to={nodeCenter(dst)} kind={f.kind} />
);
})}
</ViewportPortal>
);
}
+134 -8
View File
@@ -1,9 +1,25 @@
"use client";
import { useState, useCallback, useRef, useEffect } from "react";
import { useCanvasStore } from "@/store/canvas";
import { useCanvasStore, type PanelTab } from "@/store/canvas";
import { showToast } from "@/components/Toaster";
import { StatusDot } from "./StatusDot";
import { WorkspacePanelTabs } from "./WorkspacePanelTabs";
import { Tooltip } from "./Tooltip";
import { DetailsTab } from "./tabs/DetailsTab";
import { SkillsTab } from "./tabs/SkillsTab";
import { ChatTab } from "./tabs/ChatTab";
import { ConfigTab } from "./tabs/ConfigTab";
import { ContainerConfigTab } from "./tabs/ContainerConfigTab";
import { DisplayTab } from "./tabs/DisplayTab";
import { TerminalTab } from "./tabs/TerminalTab";
import { FilesTab } from "./tabs/FilesTab";
import { MemoryInspectorPanel } from "./MemoryInspectorPanel";
import { AuditTrailPanel } from "./AuditTrailPanel";
import { TracesTab } from "./tabs/TracesTab";
import { EventsTab } from "./tabs/EventsTab";
import { ActivityTab } from "./tabs/ActivityTab";
import { ScheduleTab } from "./tabs/ScheduleTab";
import { ChannelsTab } from "./tabs/ChannelsTab";
import { summarizeWorkspaceCapabilities } from "@/store/canvas";
const SIDEPANEL_WIDTH_KEY = "molecule:sidepanel-width";
@@ -11,6 +27,24 @@ const SIDEPANEL_DEFAULT_WIDTH = 480;
const SIDEPANEL_MIN_WIDTH = 320;
const SIDEPANEL_MAX_WIDTH = 800;
const TABS: { id: PanelTab; label: string; icon: string }[] = [
{ id: "chat", label: "Chat", icon: "◈" },
{ id: "activity", label: "Activity", icon: "⊙" },
{ id: "details", label: "Details", icon: "◉" },
{ id: "skills", label: "Plugins", icon: "✦" },
{ id: "terminal", label: "Terminal", icon: "▸" },
{ id: "display", label: "Display", icon: "▣" },
{ id: "container-config", label: "Container", icon: "▤" },
{ id: "config", label: "Config", icon: "⚙" },
{ id: "schedule", label: "Schedule", icon: "⏲" },
{ id: "channels", label: "Channels", icon: "⇌" },
{ id: "files", label: "Files", icon: "⊞" },
{ id: "memory", label: "Memory", icon: "◇" },
{ id: "traces", label: "Traces", icon: "◎" },
{ id: "events", label: "Events", icon: "◊" },
{ id: "audit", label: "Audit", icon: "⊟" },
];
export function SidePanel() {
const selectedNodeId = useCanvasStore((s) => s.selectedNodeId);
const panelTab = useCanvasStore((s) => s.panelTab);
@@ -185,12 +219,104 @@ export function SidePanel() {
</div>
</div>
{/* Tabs + tab content — extracted into WorkspacePanelTabs so the same
tab bar/body is reused verbatim by the concierge Settings page. The
map drawer stays store-driven: we thread the global panelTab /
setPanelTab through as the controlled active-tab pair, preserving the
existing selection + keyboard behaviour. */}
<WorkspacePanelTabs node={node} activeTab={panelTab} onTabChange={setPanelTab} />
{/* Tabs — relative wrapper lets the fade gradient position against the scroll container */}
<div className="relative border-b border-line/40">
{/* Right-edge fade: signals more tabs are hidden off-screen when the bar overflows */}
<div className="pointer-events-none absolute inset-y-0 right-0 w-8 bg-gradient-to-l from-surface to-transparent z-10" aria-hidden="true" />
<div
role="tablist"
aria-label="Workspace panel tabs"
className="flex overflow-x-auto bg-surface-sunken/20 px-1"
onKeyDown={(e) => {
const idx = TABS.findIndex((t) => t.id === panelTab);
let next: number | null = null;
if (e.key === "ArrowRight") { e.preventDefault(); next = (idx + 1) % TABS.length; }
else if (e.key === "ArrowLeft") { e.preventDefault(); next = (idx - 1 + TABS.length) % TABS.length; }
else if (e.key === "Home") { e.preventDefault(); next = 0; }
else if (e.key === "End") { e.preventDefault(); next = TABS.length - 1; }
if (next !== null) {
setPanelTab(TABS[next].id);
requestAnimationFrame(() => { const el = document.getElementById(`tab-${TABS[next!].id}`); el?.focus(); el?.scrollIntoView({ block: "nearest", inline: "nearest" }); });
}
}}
>
{TABS.map((tab) => (
<button
type="button"
key={tab.id}
id={`tab-${tab.id}`}
role="tab"
aria-selected={panelTab === tab.id}
aria-controls={`panel-${tab.id}`}
tabIndex={panelTab === tab.id ? 0 : -1}
onClick={() => setPanelTab(tab.id)}
className={`shrink-0 px-3 py-2.5 text-[10px] font-medium tracking-wide transition-all rounded-t-lg mx-0.5 focus:outline-none focus-visible:ring-2 focus-visible:ring-accent/70 ${
panelTab === tab.id
? "text-ink bg-surface-card border-b-2 border-accent"
: "text-ink-mid hover:text-ink hover:bg-surface-card/60"
}`}
>
<span className="mr-1 opacity-50" aria-hidden="true">{tab.icon}</span>
{tab.label}
</button>
))}
</div>
</div>
{/* Needs Restart Banner */}
{node.data.needsRestart && !node.data.currentTask && selectedNodeId && (
<div className="px-4 py-2 bg-sky-950/20 border-b border-sky-800/20 flex items-center justify-between">
<span className="text-[10px] text-sky-300/90">Config changed restart to apply</span>
<button
type="button"
onClick={() => {
useCanvasStore.getState().restartWorkspace(selectedNodeId).catch(() => showToast("Restart failed", "error"));
}}
className="text-[11px] px-2 py-1 bg-sky-800/40 hover:bg-sky-700/50 text-sky-200 rounded transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
>
Restart Now
</button>
</div>
)}
{/* Current Task Banner */}
{node.data.currentTask && (
<Tooltip text={node.data.currentTask as string}>
<div className="px-4 py-2 bg-amber-950/20 border-b border-amber-800/20 flex items-center gap-2 cursor-default">
<div className="w-1.5 h-1.5 rounded-full bg-amber-400 motion-safe:animate-pulse shrink-0" />
<span className="text-[10px] text-warm/90 truncate">
{node.data.currentTask}
</span>
</div>
</Tooltip>
)}
{/* Tab Content */}
<div
role="tabpanel"
id={`panel-${panelTab}`}
aria-labelledby={`tab-${panelTab}`}
tabIndex={0}
className="flex-1 overflow-y-auto focus:outline-none"
>
{panelTab === "details" && <DetailsTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />}
{panelTab === "skills" && <SkillsTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />}
{panelTab === "activity" && <ActivityTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "chat" && <ChatTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />}
{panelTab === "terminal" && <TerminalTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />}
{panelTab === "display" && <DisplayTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "container-config" && selectedNodeId && (
<ContainerConfigTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />
)}
{panelTab === "config" && <ConfigTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "schedule" && <ScheduleTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "channels" && <ChannelsTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "files" && <FilesTab key={selectedNodeId} workspaceId={selectedNodeId} data={node.data} />}
{panelTab === "memory" && <MemoryInspectorPanel key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "traces" && <TracesTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "events" && <EventsTab key={selectedNodeId} workspaceId={selectedNodeId} />}
{panelTab === "audit" && <AuditTrailPanel key={selectedNodeId} workspaceId={selectedNodeId} />}
</div>
{/* Footer — workspace ID */}
<div className="px-4 sm:px-5 py-2 border-t border-line/40 bg-surface-sunken/20">
+10 -8
View File
@@ -3,9 +3,11 @@
import { useMemo, useState, useCallback, useEffect, useRef } from "react";
import { api } from "@/lib/api";
import { useCanvasStore } from "@/store/canvas";
import { WORKSPACE_KIND } from "@/lib/workspace-kind";
import { SettingsButton } from "@/components/settings/SettingsButton";
import { settingsGearRef } from "@/components/settings/SettingsPanel";
import { ConfirmDialog } from "@/components/ConfirmDialog";
import { showToast } from "@/components/Toaster";
import { ThemeToggle } from "@/components/ThemeToggle";
import { statusDotClass } from "@/lib/design-tokens";
import { KeyboardShortcutsDialog } from "@/components/KeyboardShortcutsDialog";
@@ -53,11 +55,8 @@ export function Toolbar() {
}, [wsStatus]);
const counts = useMemo(() => {
// Exclude the org-level platform agent (the concierge) — it's the
// undeletable org root surfaced in the shell, not a counted map workspace.
const mapNodes = nodes.filter((n) => n.data.kind !== WORKSPACE_KIND.Platform);
const c = { total: mapNodes.length, roots: 0, children: 0, online: 0, offline: 0, failed: 0, provisioning: 0, activeTasks: 0 };
for (const n of mapNodes) {
const c = { total: nodes.length, roots: 0, children: 0, online: 0, offline: 0, failed: 0, provisioning: 0, activeTasks: 0 };
for (const n of nodes) {
if (n.data.parentId) c.children++; else c.roots++;
const s = n.data.status;
if (s === "online") c.online++;
@@ -461,8 +460,11 @@ export function Toolbar() {
)}
</div>
{/* Theme picker + settings gear removed from the map toolbar — both now
live in the concierge global Settings (left rail) + topbar. */}
{/* Theme picker — System / Light / Dark */}
<ThemeToggle />
{/* Settings gear icon */}
<SettingsButton ref={settingsGearRef} />
<ConfirmDialog
open={restartConfirmOpen}
+72 -81
View File
@@ -1,7 +1,7 @@
"use client";
import { useMemo, type KeyboardEvent } from "react";
import { Handle, Position, type NodeProps, type Node } from "@xyflow/react";
import { useCallback, useMemo, type KeyboardEvent } from "react";
import { Handle, NodeResizer, Position, type NodeProps, type Node } from "@xyflow/react";
import { useCanvasStore, type WorkspaceNodeData } from "@/store/canvas";
import { getConfigurationError, getConfigurationStatus } from "@/store/canvas-topology";
import { showToast } from "@/components/Toaster";
@@ -21,8 +21,7 @@ function useDescendantCount(nodeId: string): number {
return useMemo(() => countDescendants(nodeId, nodes), [nodeId, nodes]);
}
/** Boolean flag used to drive the container's system-controlled size
* (leaves render fixed-size; parents grow to fit children).
/** Boolean flag used to drive min-size and NodeResizer dimensions.
* Selecting `nodes` stably avoids re-render loops (same issue as
* useDescendantCount). */
function useHasChildren(nodeId: string): boolean {
@@ -88,9 +87,16 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
return (
<>
{/* Free-resize removed (was NodeResizer). Container size + shape are now
* system-controlled: leaf workspaces render at a fixed width; parent
* workspaces grow to fit their nested children (store grow logic). */}
{/* NodeResizer — visible only on the selected card. Lets the user
* drag any edge/corner to grow or shrink the workspace, which is
* useful on cards that contain nested child workspaces. */}
<NodeResizer
isVisible={isSelected}
minWidth={hasChildren ? 360 : 210}
minHeight={hasChildren ? 200 : 110}
lineClassName="!border-accent/40"
handleClassName="!w-2 !h-2 !bg-accent !border !border-blue-300"
/>
<div
role="button"
tabIndex={0}
@@ -155,22 +161,20 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
}
}}
className={`
group relative rounded-xl
${hasChildren && !data.collapsed
? "h-full w-full min-w-[420px] min-h-[240px]"
: "w-[300px] min-h-[176px]"}
group relative rounded-xl h-full w-full
${hasChildren && !data.collapsed ? "min-w-[360px] min-h-[200px]" : "min-w-[210px]"}
cursor-pointer overflow-hidden
transition-all duration-200 ease-out
${isDragTarget
? "bg-emerald-950/40 border-2 border-emerald-400/60 ring-2 ring-emerald-400/20 scale-[1.03]"
: isBatchSelected
? "bg-surface-sunken/95 border-2 border-accent/80 ring-2 ring-accent/30 shadow-lg shadow-accent/15"
? "bg-surface-sunken/95 border-2 border-accent/80 ring-2 ring-accent/30 shadow-lg shadow-blue-500/15"
: isSelected
? "bg-surface-sunken/95 border border-accent/70 ring-1 ring-accent/30 shadow-lg shadow-accent/10"
: "bg-surface-sunken/90 border border-line/80 hover:border-ink-soft/60 shadow-lg shadow-black/30 hover:shadow-xl hover:shadow-black/40"
? "bg-surface-sunken/95 border border-accent/70 ring-1 ring-accent/30 shadow-lg shadow-blue-500/10"
: "bg-surface-sunken/90 border border-line/80 hover:border-zinc-500/60 shadow-lg shadow-black/30 hover:shadow-xl hover:shadow-black/40"
}
backdrop-blur-sm
focus:outline-none focus-visible:ring-2 focus-visible:ring-accent/70 focus-visible:ring-offset-1 focus-visible:ring-offset-surface
focus:outline-none focus-visible:ring-2 focus-visible:ring-accent/70 focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-950
${deploy.isActivelyProvisioning ? "mol-deploy-shimmer" : ""}
${deploy.isLockedChild ? "mol-deploy-locked" : ""}
`}
@@ -208,45 +212,27 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
}
}
}}
className="!w-2.5 !h-1 !rounded-full !bg-surface-card/80 !border-0 !-top-0.5 hover:!bg-accent hover:!h-1.5 focus-visible:!bg-accent focus-visible:!h-1.5 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent/60 focus-visible:ring-offset-1 focus-visible:ring-offset-surface transition-all"
className="!w-2.5 !h-1 !rounded-full !bg-surface-card/80 !border-0 !-top-0.5 hover:!bg-blue-400 hover:!h-1.5 focus-visible:!bg-blue-400 focus-visible:!h-1.5 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-400/60 focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-950 transition-all"
/>
<div className="relative px-4 py-3.5">
<div className="relative px-3.5 py-2.5">
{/* Header row */}
<div className="flex items-center justify-between gap-2 mb-2.5">
<div className="flex items-center gap-2.5 min-w-0">
<div className={`w-2.5 h-2.5 rounded-full shrink-0 ${statusCfg.dot} ${statusCfg.glow} shadow-sm`} />
<span className="text-[15px] font-semibold text-ink truncate leading-tight">
<div className="flex items-center justify-between gap-2 mb-1">
<div className="flex items-center gap-2 min-w-0">
<div className={`w-2 h-2 rounded-full shrink-0 ${statusCfg.dot} ${statusCfg.glow} shadow-sm`} />
<span className="text-[13px] font-semibold text-ink truncate leading-tight">
{data.name}
</span>
</div>
<div className="flex items-center gap-1.5 shrink-0">
{/* Model pill (concept top-right). Shortens the agent_card model to
a family label (Opus/Sonnet/Haiku/Kimi); falls back to the raw
last segment, then to the tier badge when no model is known. */}
{(() => {
const m = (data.agentCard as Record<string, unknown> | null)?.model;
const model = typeof m === "string" && m ? m : null;
if (!model) {
return (
<span className={`text-[11px] font-mono px-2 py-1 rounded-md ${tierCfg.color}`}>
{tierCfg.label}
</span>
);
}
const label = /opus/i.test(model) ? "Opus"
: /sonnet/i.test(model) ? "Sonnet"
: /haiku/i.test(model) ? "Haiku"
: /kimi/i.test(model) ? "Kimi"
: /gpt|openai/i.test(model) ? "GPT"
: /gemini/i.test(model) ? "Gemini"
: (model.split(/[/:]/).pop() || model);
return (
<span className="text-[11px] font-mono px-2 py-1 rounded-md text-white bg-accent" title={model}>
{label}
</span>
);
})()}
{hasChildren && (
<span className="text-[10px] font-mono text-accent bg-accent/15 border border-accent/40 px-1.5 py-0.5 rounded-md">
{descendantCount} sub
</span>
)}
<span className={`text-[10px] font-mono px-1.5 py-0.5 rounded-md ${tierCfg.color}`}>
{tierCfg.label}
</span>
</div>
</div>
@@ -256,9 +242,6 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
We treat empty-string DB values as "missing" so an unbackfilled
row falls through to the agent-card value rather than rendering
a blank pill. */}
{/* Role pill (concept) — uppercase, accent-bordered. Platform root
shows "PLATFORM · ROOT"; Phase 30 external-runtime agents get the
REMOTE marker alongside. */}
{(() => {
const dbRuntime = typeof data.runtime === "string" && data.runtime !== ""
? data.runtime : null;
@@ -266,46 +249,32 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
? (data.agentCard as Record<string, string>).runtime
: null;
const runtime = dbRuntime ?? cardRuntime;
const isRemote = !!runtime && isExternalLikeRuntime(runtime);
const isPlatformRoot = !data.parentId && hasChildren;
const roleLabel = isPlatformRoot ? "PLATFORM · ROOT" : (data.role || null);
if (!roleLabel && !isRemote) return null;
if (!runtime) return null;
return (
<div className="mb-2.5 flex items-center gap-1.5">
{roleLabel && (
<span className="max-w-[220px] truncate text-[10px] font-mono uppercase tracking-[0.04em] px-2 py-1 rounded-md text-accent bg-accent/12 border border-accent/35">
{roleLabel}
</span>
)}
{isRemote && (
<div className="mb-1 flex items-center gap-1">
{isExternalLikeRuntime(runtime) ? (
<span
className="text-[10px] font-mono uppercase px-2 py-1 rounded-md text-white bg-violet-800 border border-violet-900"
className="text-[7px] font-mono px-1.5 py-0.5 rounded-md text-white bg-violet-800 border border-violet-900"
title="Phase 30 remote agent — runs outside this platform's Docker network. Lifecycle managed via heartbeat-based polling, not Docker exec."
>
REMOTE
</span>
) : (
<span className="text-[7px] font-mono px-1.5 py-0.5 rounded-md text-ink-mid bg-surface-card border border-line">
{runtime}
</span>
)}
</div>
);
})()}
{/* Status line (concept) — uppercase status, "· N AGENTS" for parents,
with a queued pill on the right. */}
<div className="mb-2 flex items-center justify-between gap-2">
<span className={`text-[11px] font-mono uppercase tracking-[0.04em] ${
isOnline ? "text-good"
: effectiveStatus === "failed" ? "text-bad"
: (effectiveStatus === "provisioning" || effectiveStatus === "degraded") ? "text-warm"
: "text-ink-soft"
}`}>
{statusCfg.label}{hasChildren ? ` · ${descendantCount} agents` : ""}
</span>
{data.activeTasks > 0 && (
<span className="shrink-0 text-[11px] font-mono px-2 py-1 rounded-md text-ink-mid bg-surface-card border border-line">
{data.activeTasks} queued
</span>
)}
</div>
{/* Role — clamp to 2 lines. Without this, a verbose role
* description (common on org-template imports) lets the card
* grow arbitrarily tall, which wrecks the grid-slot layout
* because siblings all plan for the same CHILD_DEFAULT_HEIGHT. */}
{data.role && (
<div className="text-[10px] text-ink-mid mb-1.5 leading-tight line-clamp-2">{data.role}</div>
)}
{/* Skills */}
{skills.length > 0 && (
@@ -359,7 +328,29 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
</button>
)}
{/* (status + queued now rendered above, concept-style) */}
{/* Bottom row: status / active tasks */}
<div className="flex items-center justify-between mt-0.5">
{effectiveStatus !== "online" ? (
<div className={`text-[10px] uppercase tracking-widest font-medium ${
effectiveStatus === "failed" ? "text-bad" :
effectiveStatus === "degraded" ? "text-warm" :
effectiveStatus === "not_configured" ? "text-warm" :
effectiveStatus === "provisioning" ? "text-accent" :
"text-ink-mid"
}`}>
{statusCfg.label}
</div>
) : <div />}
{data.activeTasks > 0 && (
<div className="flex items-center gap-1">
<div className="w-1 h-1 rounded-full bg-warm motion-safe:animate-pulse" />
<span className="text-[10px] text-warm tabular-nums">
{data.activeTasks} task{data.activeTasks > 1 ? "s" : ""}
</span>
</div>
)}
</div>
{/* Degraded error preview */}
{data.status === "degraded" && data.lastSampleError && (
@@ -404,7 +395,7 @@ export function WorkspaceNode({ id, data }: NodeProps<Node<WorkspaceNodeData>>)
}
}
}}
className="!w-2.5 !h-1 !rounded-full !bg-surface-card/80 !border-0 !-bottom-0.5 hover:!bg-accent hover:!h-1.5 focus-visible:!bg-accent focus-visible:!h-1.5 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent/60 focus-visible:ring-offset-1 focus-visible:ring-offset-surface transition-all"
className="!w-2.5 !h-1 !rounded-full !bg-surface-card/80 !border-0 !-bottom-0.5 hover:!bg-blue-400 hover:!h-1.5 focus-visible:!bg-blue-400 focus-visible:!h-1.5 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-400/60 focus-visible:ring-offset-1 focus-visible:ring-offset-zinc-950 transition-all"
/>
</div>
</>
@@ -1,195 +0,0 @@
"use client";
import { useState } from "react";
import type { Node } from "@xyflow/react";
import {
useCanvasStore,
type PanelTab,
type WorkspaceNodeData,
} from "@/store/canvas";
import { showToast } from "@/components/Toaster";
import { Tooltip } from "./Tooltip";
import { DetailsTab } from "./tabs/DetailsTab";
import { SkillsTab } from "./tabs/SkillsTab";
import { ChatTab } from "./tabs/ChatTab";
import { ConfigTab } from "./tabs/ConfigTab";
import { ContainerConfigTab } from "./tabs/ContainerConfigTab";
import { DisplayTab } from "./tabs/DisplayTab";
import { TerminalTab } from "./tabs/TerminalTab";
import { FilesTab } from "./tabs/FilesTab";
import { MemoryInspectorPanel } from "./MemoryInspectorPanel";
import { AuditTrailPanel } from "./AuditTrailPanel";
import { TracesTab } from "./tabs/TracesTab";
import { EventsTab } from "./tabs/EventsTab";
import { ActivityTab } from "./tabs/ActivityTab";
import { ScheduleTab } from "./tabs/ScheduleTab";
import { ChannelsTab } from "./tabs/ChannelsTab";
/**
* Canonical workspace tab set — the SAME ids/labels/icons the map's
* SidePanel has always rendered. Single source of truth so the map drawer
* and any other host (the concierge Settings page) can't drift.
*/
export const WORKSPACE_PANEL_TABS: { id: PanelTab; label: string; icon: string }[] = [
{ id: "chat", label: "Chat", icon: "◈" },
{ id: "activity", label: "Activity", icon: "⊙" },
{ id: "details", label: "Details", icon: "◉" },
{ id: "skills", label: "Plugins", icon: "✦" },
{ id: "terminal", label: "Terminal", icon: "▸" },
{ id: "display", label: "Display", icon: "▣" },
{ id: "container-config", label: "Container", icon: "▤" },
{ id: "config", label: "Config", icon: "⚙" },
{ id: "schedule", label: "Schedule", icon: "⏲" },
{ id: "channels", label: "Channels", icon: "⇌" },
{ id: "files", label: "Files", icon: "⊞" },
{ id: "memory", label: "Memory", icon: "◇" },
{ id: "traces", label: "Traces", icon: "◎" },
{ id: "events", label: "Events", icon: "◊" },
{ id: "audit", label: "Audit", icon: "⊟" },
];
interface Props {
/** The workspace node whose tabs to render (id + data blob). */
node: Node<WorkspaceNodeData>;
/**
* Controlled active tab. When provided together with `onTabChange`, the
* caller owns the active-tab state (the map's SidePanel threads the global
* `panelTab`/`setPanelTab` here so the store stays the source of truth and
* the existing keyboard/selection behaviour is preserved verbatim).
* When omitted, the component manages its OWN local active-tab state —
* which is what the concierge Settings page uses so the embedded tabs
* don't fight the map's selection.
*/
activeTab?: PanelTab;
onTabChange?: (tab: PanelTab) => void;
/** Initial tab for the uncontrolled (local-state) mode. Defaults to "chat". */
defaultTab?: PanelTab;
}
/**
* The workspace tab bar + tab body, extracted from SidePanel so it can be
* reused verbatim outside the map (e.g. the concierge Settings "Platform
* agent configuration" section). Renders the canonical ARIA tablist and the
* exact same tab content components keyed on the active tab.
*
* Does NOT render the workspace header / meta pills / resize handle / footer —
* those are host chrome and stay in the host (SidePanel for the map).
*/
export function WorkspacePanelTabs({ node, activeTab, onTabChange, defaultTab = "chat" }: Props) {
const restartWorkspace = useCanvasStore((s) => s.restartWorkspace);
// Controlled when both props are present; otherwise own the state locally.
const controlled = activeTab !== undefined && onTabChange !== undefined;
const [localTab, setLocalTab] = useState<PanelTab>(defaultTab);
const tab = controlled ? (activeTab as PanelTab) : localTab;
const setTab = (next: PanelTab) => {
if (controlled) onTabChange!(next);
else setLocalTab(next);
};
const workspaceId = node.id;
const data = node.data;
return (
<>
{/* Tabs — relative wrapper lets the fade gradient position against the scroll container */}
<div className="relative border-b border-line/40">
{/* Right-edge fade: signals more tabs are hidden off-screen when the bar overflows */}
<div className="pointer-events-none absolute inset-y-0 right-0 w-8 bg-gradient-to-l from-surface to-transparent z-10" aria-hidden="true" />
<div
role="tablist"
aria-label="Workspace panel tabs"
className="flex overflow-x-auto bg-surface-sunken/20 px-1"
onKeyDown={(e) => {
const idx = WORKSPACE_PANEL_TABS.findIndex((t) => t.id === tab);
let next: number | null = null;
if (e.key === "ArrowRight") { e.preventDefault(); next = (idx + 1) % WORKSPACE_PANEL_TABS.length; }
else if (e.key === "ArrowLeft") { e.preventDefault(); next = (idx - 1 + WORKSPACE_PANEL_TABS.length) % WORKSPACE_PANEL_TABS.length; }
else if (e.key === "Home") { e.preventDefault(); next = 0; }
else if (e.key === "End") { e.preventDefault(); next = WORKSPACE_PANEL_TABS.length - 1; }
if (next !== null) {
setTab(WORKSPACE_PANEL_TABS[next].id);
requestAnimationFrame(() => { const el = document.getElementById(`tab-${WORKSPACE_PANEL_TABS[next!].id}`); el?.focus(); el?.scrollIntoView({ block: "nearest", inline: "nearest" }); });
}
}}
>
{WORKSPACE_PANEL_TABS.map((t) => (
<button
type="button"
key={t.id}
id={`tab-${t.id}`}
role="tab"
aria-selected={tab === t.id}
aria-controls={`panel-${t.id}`}
tabIndex={tab === t.id ? 0 : -1}
onClick={() => setTab(t.id)}
className={`shrink-0 px-3 py-2.5 text-[10px] font-medium tracking-wide transition-all rounded-t-lg mx-0.5 focus:outline-none focus-visible:ring-2 focus-visible:ring-accent/70 ${
tab === t.id
? "text-ink bg-surface-card border-b-2 border-accent"
: "text-ink-mid hover:text-ink hover:bg-surface-card/60"
}`}
>
<span className="mr-1 opacity-50" aria-hidden="true">{t.icon}</span>
{t.label}
</button>
))}
</div>
</div>
{/* Needs Restart Banner */}
{data.needsRestart && !data.currentTask && (
<div className="px-4 py-2 bg-sky-950/20 border-b border-sky-800/20 flex items-center justify-between">
<span className="text-[10px] text-sky-300/90">Config changed restart to apply</span>
<button
type="button"
onClick={() => {
restartWorkspace(workspaceId).catch(() => showToast("Restart failed", "error"));
}}
className="text-[11px] px-2 py-1 bg-sky-800/40 hover:bg-sky-700/50 text-sky-200 rounded transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-accent focus-visible:ring-offset-1"
>
Restart Now
</button>
</div>
)}
{/* Current Task Banner */}
{data.currentTask && (
<Tooltip text={data.currentTask as string}>
<div className="px-4 py-2 bg-amber-950/20 border-b border-amber-800/20 flex items-center gap-2 cursor-default">
<div className="w-1.5 h-1.5 rounded-full bg-amber-400 motion-safe:animate-pulse shrink-0" />
<span className="text-[10px] text-warm/90 truncate">
{data.currentTask}
</span>
</div>
</Tooltip>
)}
{/* Tab Content */}
<div
role="tabpanel"
id={`panel-${tab}`}
aria-labelledby={`tab-${tab}`}
tabIndex={0}
className="flex-1 overflow-y-auto focus:outline-none"
>
{tab === "details" && <DetailsTab key={workspaceId} workspaceId={workspaceId} data={data} />}
{tab === "skills" && <SkillsTab key={workspaceId} workspaceId={workspaceId} data={data} />}
{tab === "activity" && <ActivityTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "chat" && <ChatTab key={workspaceId} workspaceId={workspaceId} data={data} />}
{tab === "terminal" && <TerminalTab key={workspaceId} workspaceId={workspaceId} data={data} />}
{tab === "display" && <DisplayTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "container-config" && (
<ContainerConfigTab key={workspaceId} workspaceId={workspaceId} data={data} />
)}
{tab === "config" && <ConfigTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "schedule" && <ScheduleTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "channels" && <ChannelsTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "files" && <FilesTab key={workspaceId} workspaceId={workspaceId} data={data} />}
{tab === "memory" && <MemoryInspectorPanel key={workspaceId} workspaceId={workspaceId} />}
{tab === "traces" && <TracesTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "events" && <EventsTab key={workspaceId} workspaceId={workspaceId} />}
{tab === "audit" && <AuditTrailPanel key={workspaceId} workspaceId={workspaceId} />}
</div>
</>
);
}
@@ -275,9 +275,9 @@ describe("WorkspaceNode — status states", () => {
expect(screen.getByText("STARTING")).toBeTruthy();
});
it("shows status label for online node (concept: status always visible)", () => {
it("suppresses status label for online node", () => {
renderNode({ status: "online" });
expect(screen.getByText("ONLINE")).toBeTruthy();
expect(screen.queryByText("ONLINE")).toBeNull();
});
it("shows degraded error preview when status is degraded and lastSampleError is set", () => {
@@ -404,18 +404,14 @@ describe("WorkspaceNode — double-click interactions", () => {
});
describe("WorkspaceNode — active tasks", () => {
it("shows the queued count when activeTasks > 0", () => {
it("shows active tasks badge when activeTasks > 0", () => {
renderNode({ activeTasks: 3 });
expect(
screen.getByText((_, el) => el?.tagName === "SPAN" && (el.textContent ?? "").includes("3 queued")),
).toBeTruthy();
expect(screen.getByText("3 tasks")).toBeTruthy();
});
it("shows the queued count for a single task", () => {
it("shows singular 'task' when activeTasks is 1", () => {
renderNode({ activeTasks: 1 });
expect(
screen.getByText((_, el) => el?.tagName === "SPAN" && (el.textContent ?? "").includes("1 queued")),
).toBeTruthy();
expect(screen.getByText("1 task")).toBeTruthy();
});
it("suppresses badge when no active tasks", () => {
@@ -475,15 +471,13 @@ describe("WorkspaceNode — needs restart", () => {
});
describe("WorkspaceNode — descendant badge", () => {
it("shows the agent count in the status line when node has children", () => {
it("shows descendant count badge when node has children in store", () => {
store().nodes = [
makeNode({ id: "ws-1" }),
{ id: "child-1", data: { ...makeNode({ id: "ws-1" }).data, parentId: "ws-1" } },
];
renderNode();
expect(
screen.getByText((_, el) => el?.tagName === "SPAN" && (el.textContent ?? "").includes("1 agents")),
).toBeTruthy();
expect(screen.getByText("1 sub")).toBeTruthy();
});
it("suppresses badge when node has no children", () => {
@@ -533,9 +527,9 @@ describe("WorkspaceNode — skills pills", () => {
});
describe("WorkspaceNode — runtime badge", () => {
it("shows the role pill (runtime pill replaced by role pill in the concept redesign)", () => {
renderNode({ role: "researcher" });
expect(screen.getByText("researcher")).toBeTruthy();
it("shows runtime badge when runtime is set", () => {
renderNode({ runtime: "hermes" });
expect(screen.getByText("hermes")).toBeTruthy();
});
it("shows REMOTE badge for external runtime", () => {
@@ -1,103 +0,0 @@
// @vitest-environment jsdom
import { describe, it, expect, vi, afterEach } from "vitest";
import { render, screen, fireEvent, cleanup } from "@testing-library/react";
afterEach(() => {
cleanup();
});
// ── Mock every tab content component to a sentinel so we can assert which
// body renders without dragging in API calls / heavy children. ───────────
vi.mock("../tabs/DetailsTab", () => ({ DetailsTab: () => <div data-testid="body-details" /> }));
vi.mock("../tabs/SkillsTab", () => ({ SkillsTab: () => <div data-testid="body-skills" /> }));
vi.mock("../tabs/ChatTab", () => ({ ChatTab: () => <div data-testid="body-chat" /> }));
vi.mock("../tabs/ConfigTab", () => ({ ConfigTab: () => <div data-testid="body-config" /> }));
vi.mock("../tabs/ContainerConfigTab", () => ({ ContainerConfigTab: () => <div data-testid="body-container" /> }));
vi.mock("../tabs/DisplayTab", () => ({ DisplayTab: () => <div data-testid="body-display" /> }));
vi.mock("../tabs/TerminalTab", () => ({ TerminalTab: () => <div data-testid="body-terminal" /> }));
vi.mock("../tabs/FilesTab", () => ({ FilesTab: () => <div data-testid="body-files" /> }));
vi.mock("../MemoryInspectorPanel", () => ({ MemoryInspectorPanel: () => <div data-testid="body-memory" /> }));
vi.mock("../tabs/TracesTab", () => ({ TracesTab: () => <div data-testid="body-traces" /> }));
vi.mock("../tabs/EventsTab", () => ({ EventsTab: () => <div data-testid="body-events" /> }));
vi.mock("../tabs/ActivityTab", () => ({ ActivityTab: () => <div data-testid="body-activity" /> }));
vi.mock("../tabs/ScheduleTab", () => ({ ScheduleTab: () => <div data-testid="body-schedule" /> }));
vi.mock("../tabs/ChannelsTab", () => ({ ChannelsTab: () => <div data-testid="body-channels" /> }));
vi.mock("../AuditTrailPanel", () => ({ AuditTrailPanel: () => <div data-testid="body-audit" /> }));
vi.mock("../Tooltip", () => ({
Tooltip: ({ children }: { children: React.ReactNode }) => <>{children}</>,
}));
vi.mock("@/components/Toaster", () => ({ showToast: vi.fn() }));
// The store is only consulted for restartWorkspace.
const mockRestart = vi.fn(() => Promise.resolve());
vi.mock("@/store/canvas", () => ({
useCanvasStore: vi.fn((selector: (s: { restartWorkspace: typeof mockRestart }) => unknown) =>
selector({ restartWorkspace: mockRestart })
),
}));
import { WorkspacePanelTabs, WORKSPACE_PANEL_TABS } from "../WorkspacePanelTabs";
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const node: any = {
id: "platform-1",
data: {
name: "Org Concierge",
status: "online",
tier: 0,
role: "platform",
parentId: null,
needsRestart: false,
currentTask: null,
agentCard: null,
},
};
describe("WorkspacePanelTabs — uncontrolled (Settings usage)", () => {
it("renders the canonical 15-tab tablist for an explicit node", () => {
render(<WorkspacePanelTabs node={node} />);
const tablist = screen.getByRole("tablist");
expect(tablist.getAttribute("aria-label")).toBe("Workspace panel tabs");
expect(screen.getAllByRole("tab").length).toBe(WORKSPACE_PANEL_TABS.length);
expect(WORKSPACE_PANEL_TABS.length).toBe(15);
});
it("defaults to the chat tab when no defaultTab is given", () => {
render(<WorkspacePanelTabs node={node} />);
expect(screen.getByTestId("body-chat")).toBeTruthy();
expect(document.getElementById("tab-chat")?.getAttribute("aria-selected")).toBe("true");
});
it("honours defaultTab='config' (the concierge Settings entry point)", () => {
render(<WorkspacePanelTabs node={node} defaultTab="config" />);
expect(screen.getByTestId("body-config")).toBeTruthy();
expect(document.getElementById("tab-config")?.getAttribute("aria-selected")).toBe("true");
});
it("clicking a tab swaps the body using local state (no store panelTab)", () => {
render(<WorkspacePanelTabs node={node} />);
fireEvent.click(document.getElementById("tab-channels")!);
expect(screen.getByTestId("body-channels")).toBeTruthy();
expect(document.getElementById("tab-channels")?.getAttribute("aria-selected")).toBe("true");
});
});
describe("WorkspacePanelTabs — controlled (SidePanel usage)", () => {
it("renders activeTab and calls onTabChange instead of local state", () => {
const onTabChange = vi.fn();
render(<WorkspacePanelTabs node={node} activeTab="details" onTabChange={onTabChange} />);
expect(screen.getByTestId("body-details")).toBeTruthy();
fireEvent.click(document.getElementById("tab-config")!);
expect(onTabChange).toHaveBeenCalledWith("config");
// Controlled: body does NOT change on its own (parent owns the state).
expect(screen.getByTestId("body-details")).toBeTruthy();
});
it("ArrowRight from chat calls onTabChange with the next tab", () => {
const onTabChange = vi.fn();
render(<WorkspacePanelTabs node={node} activeTab="chat" onTabChange={onTabChange} />);
fireEvent.keyDown(screen.getByRole("tablist"), { key: "ArrowRight" });
expect(onTabChange).toHaveBeenCalledWith("activity");
});
});
@@ -188,13 +188,11 @@ describe("DropTargetBadge — renders ghost slot + badge for valid drag target",
});
render(<DropTargetBadge />);
expect(screen.getByTestId("ghost-slot")).toBeTruthy();
// Ghost spans one default child slot at zoom 2: width = CHILD_DEFAULT_WIDTH
// (300) × 2 = 600; height = CHILD_DEFAULT_HEIGHT (176) × 2 = 352. left/top
// are the column-0/row-0 slot origin (unchanged by the card-size bump).
// Ghost uses slotBR from 3rd call: slotBR - slotTL = (712-232, 920-660)
expect(screen.getByTestId("ghost-slot").style.left).toBe("232px");
expect(screen.getByTestId("ghost-slot").style.top).toBe("660px");
expect(screen.getByTestId("ghost-slot").style.width).toBe("600px");
expect(screen.getByTestId("ghost-slot").style.height).toBe("352px");
expect(screen.getByTestId("ghost-slot").style.width).toBe("480px");
expect(screen.getByTestId("ghost-slot").style.height).toBe("260px");
});
it("ghost is hidden when slot falls entirely outside parent bounds", () => {
@@ -325,7 +325,7 @@ describe("all shortcuts respect inInput guard", () => {
});
});
describe("Cmd/Ctrl+Arrow — free-resize removed (system-controlled sizing)", () => {
describe("Cmd/Ctrl+Arrow — keyboard node resize", () => {
beforeEach(() => {
mockStoreState.nodes = [
{
@@ -340,15 +340,81 @@ describe("Cmd/Ctrl+Arrow — free-resize removed (system-controlled sizing)", ()
renderWithProvider();
});
it("no longer resizes the node on Cmd/Ctrl+Arrow (free-resize removed)", () => {
// Sizing is system-controlled now: leaves render fixed-size and parents
// grow to fit their children, so Cmd/Ctrl+Arrow must not emit a
// `dimensions` change anymore.
it("resizes height down (smaller) on Cmd/Ctrl+ArrowUp", () => {
// Node starts at minHeight=110 (no children). Shrinking clamps to min —
// height stays 110. Width is unchanged.
fireEvent.keyDown(window, { key: "ArrowUp", metaKey: true });
expect(mockStoreState.onNodesChange).toHaveBeenCalledWith([
expect.objectContaining({
type: "dimensions",
id: "n1",
dimensions: { width: 210, height: 110 },
}),
]);
});
it("resizes height up (larger) on Cmd/Ctrl+ArrowDown", () => {
fireEvent.keyDown(window, { key: "ArrowDown", ctrlKey: true });
expect(mockStoreState.onNodesChange).toHaveBeenCalledWith([
expect.objectContaining({
type: "dimensions",
id: "n1",
dimensions: { width: 210, height: 120 },
}),
]);
});
it("resizes width down (smaller) on Cmd/Ctrl+ArrowLeft", () => {
// Node starts at minWidth=210 (no children). Shrinking clamps to min —
// width stays 210. Height is unchanged.
fireEvent.keyDown(window, { key: "ArrowLeft", metaKey: true });
expect(mockStoreState.onNodesChange).toHaveBeenCalledWith([
expect.objectContaining({
type: "dimensions",
id: "n1",
dimensions: { width: 210, height: 110 },
}),
]);
});
it("resizes width up (larger) on Cmd/Ctrl+ArrowRight", () => {
fireEvent.keyDown(window, { key: "ArrowRight", ctrlKey: true });
expect(mockStoreState.onNodesChange).not.toHaveBeenCalled();
expect(mockStoreState.onNodesChange).toHaveBeenCalledWith([
expect.objectContaining({
type: "dimensions",
id: "n1",
dimensions: { width: 220, height: 110 },
}),
]);
});
it("uses 2px step with Shift held", () => {
// Step is 2px with Shift, but minHeight=110 clamps the result.
// 110 - 2 = 108, Math.max(110, 108) = 110. Width is unchanged.
fireEvent.keyDown(window, { key: "ArrowUp", metaKey: true, shiftKey: true });
expect(mockStoreState.onNodesChange).toHaveBeenCalledWith([
expect.objectContaining({
dimensions: { width: 210, height: 110 },
}),
]);
});
it("respects min-height constraint (no children)", () => {
fireEvent.keyDown(window, { key: "ArrowUp", metaKey: true });
fireEvent.keyDown(window, { key: "ArrowUp", metaKey: true });
// After shrinking from 110 to 100, another ArrowUp hits min-height of 110
// (110 - 10 = 100, but 100 < 110 so it should stay at 110)
// Actually: 110 -> 100 -> 110 (resets to min)
// Let me check: the hook does Math.max(minHeight, currentHeight - step)
// minHeight=110, step=10, so 110 - 10 = 100, but Math.max(110, 100) = 110
// So two ArrowUp calls should both result in height=100 then height=110?
// Wait: 110 - 10 = 100, Math.max(110, 100) = 110 (not 100)
// So the height never goes below 110. After first: 110 -> 100, but clamped to 110.
// Actually Math.max(110, 100) = 110, so the height never changes.
// The min constraint is respected — height stays at 110.
expect(mockStoreState.onNodesChange).toHaveBeenLastCalledWith([
expect.objectContaining({ dimensions: { width: 210, height: 110 } }),
]);
});
it("does NOT fire when no node is selected", () => {
@@ -2,6 +2,13 @@
import { useEffect } from "react";
import { useCanvasStore } from "@/store/canvas";
import { type NodeChange, type Node } from "@xyflow/react";
import type { WorkspaceNodeData } from "@/store/canvas";
/** Returns true if the node has any direct child in the node list. */
function hasChildren(nodeId: string, nodes: Node<WorkspaceNodeData>[]): boolean {
return nodes.some((n) => n.data.parentId === nodeId);
}
/**
* Canvas-wide keyboard shortcuts. All bound to the document window so
@@ -15,9 +22,8 @@ import { useCanvasStore } from "@/store/canvas";
* Cmd/Ctrl+[ bump selected node backward in z-order
* Z zoom-to-team if the selected node has children
* Arrow keys move selected node 10px (50px with Shift)
*
* Node resize shortcuts were removed: container size + shape are now
* system-controlled (leaves fixed-size, parents grow to fit children).
* Cmd/Ctrl+Arrow resize selected node ( height, width)
* Cmd/Ctrl+Shift+Arrow resize by 2px per press (fine control)
*/
export function useKeyboardShortcuts() {
useEffect(() => {
@@ -90,8 +96,8 @@ export function useKeyboardShortcuts() {
// Arrow-key node movement — Figma-style keyboard drag for keyboard users.
// 10 px per press, 50 px with Shift held. Only fires when a node
// is selected and the target isn't a form control. Skipped when a
// modifier key (Cmd/Ctrl/Alt) is held so those combos stay free for
// browser/OS shortcuts (node resize via Cmd+Arrow was removed).
// modifier key (Cmd/Ctrl/Alt) is held so those combos can be used
// for other shortcuts (e.g. Cmd+Arrow = resize).
if (
!inInput &&
!e.metaKey &&
@@ -119,9 +125,43 @@ export function useKeyboardShortcuts() {
state.moveNode(selectedId, dx, dy);
}
// Node resize (was Cmd/Ctrl+Arrow) removed — container size + shape are
// now system-controlled: leaves render at a fixed size and parents grow
// to fit their children, so there is no user-driven resize affordance.
// Cmd/Ctrl+Arrow — keyboard-accessible node resize.
// ↑/↓ resizes height, ←/→ resizes width.
// 10 px per press (2 px with Shift for fine control).
// Uses the same onNodesChange('dimensions') path that NodeResizer uses.
if (
!inInput &&
(e.metaKey || e.ctrlKey) &&
(e.key === "ArrowUp" ||
e.key === "ArrowDown" ||
e.key === "ArrowLeft" ||
e.key === "ArrowRight")
) {
const state = useCanvasStore.getState();
const selectedId = state.selectedNodeId;
if (!selectedId) return;
if (document.querySelector('[role="dialog"][aria-modal="true"]')) return;
e.preventDefault();
const step = e.shiftKey ? 2 : 10;
const node = state.nodes.find((n) => n.id === selectedId);
if (!node) return;
const currentWidth = (node.width ?? 210) as number;
const currentHeight = (node.height ?? 110) as number;
const minWidth = hasChildren(node.id, state.nodes) ? 360 : 210;
const minHeight = hasChildren(node.id, state.nodes) ? 200 : 110;
let newWidth = currentWidth;
let newHeight = currentHeight;
if (e.key === "ArrowUp") newHeight = Math.max(minHeight, currentHeight - step);
else if (e.key === "ArrowDown") newHeight = currentHeight + step;
else if (e.key === "ArrowLeft") newWidth = Math.max(minWidth, currentWidth - step);
else newWidth = currentWidth + step;
const change: NodeChange = {
type: "dimensions",
id: selectedId,
dimensions: { width: newWidth, height: newHeight },
};
state.onNodesChange([change]);
}
};
window.addEventListener("keydown", handler);
return () => window.removeEventListener("keydown", handler);
@@ -1,339 +0,0 @@
/* Faithful port of the Org Concierge concept (molecule-concierge-v1).
Scoped under .root so the concept's generic class names (.btn, .view,
.msg, .node ) cannot collide with the rest of the canvas app. Theme
tokens are redefined here (not the app tokens) so the port matches the
concept palette exactly; they key off the same [data-theme] on <html>. */
.root {
--mono: "JetBrains Mono", ui-monospace, monospace;
--sans: var(--font-hanken), "Hanken Grotesk", system-ui, sans-serif;
/* dark (default) */
--bg: #08080a; --panel: #0d0d11; --panel-2: #101015;
--card: #16161d; --card-2: #1b1b23; --card-hover: #1f1f28;
--hair: rgba(255,255,255,.07); --hair-2: rgba(255,255,255,.11);
--tx: #ececf1; --tx-2: #9b9baa; --tx-3: #65656f;
--accent: #8b5cf6; --accent-2: #a78bfa; --accent-soft: rgba(139,92,246,.14);
--green: #34d399; --green-soft: rgba(52,211,153,.13); --green-bd: rgba(52,211,153,.26);
--amber: #fbbf24; --grey: #6a6a78; --warn: #f5a623; --red: #f87171;
--dot: rgba(255,255,255,.06);
--shadow: 0 18px 50px rgba(0,0,0,.5);
--user-bubble-tx: #fff;
font-family: var(--sans);
background: var(--bg);
color: var(--tx);
font-size: 14px;
-webkit-font-smoothing: antialiased;
position: fixed;
inset: 0;
overflow: hidden;
}
:global([data-theme="light"]) .root {
--bg: #f1efe8; --panel: #fbfaf6; --panel-2: #f6f4ee;
--card: #ffffff; --card-2: #faf9f4; --card-hover: #f3f1ea;
--hair: rgba(20,18,12,.10); --hair-2: rgba(20,18,12,.16);
--tx: #21201b; --tx-2: #5c5a52; --tx-3: #8e8b81;
--accent: #7c3aed; --accent-2: #7c3aed; --accent-soft: rgba(124,58,237,.10);
--green: #0f9d63; --green-soft: rgba(15,157,99,.10); --green-bd: rgba(15,157,99,.24);
--amber: #c98a04; --grey: #a8a59b; --warn: #c47e12; --red: #dc4d4d;
--dot: rgba(20,18,12,.10);
--shadow: 0 18px 50px rgba(60,56,40,.14);
}
.root *, .root *::before, .root *::after { box-sizing: border-box; }
.root ::-webkit-scrollbar { width: 8px; height: 8px; }
.root ::-webkit-scrollbar-thumb { background: var(--hair-2); border-radius: 8px; }
.root ::-webkit-scrollbar-track { background: transparent; }
.app { display: flex; height: 100%; width: 100%; }
/* ===== ICON RAIL ===== */
.rail {
width: 52px; flex: 0 0 52px; background: var(--panel);
border-right: 1px solid var(--hair);
display: flex; flex-direction: column; padding: 12px 8px; gap: 3px;
transition: width .22s cubic-bezier(.4,0,.2,1), flex-basis .22s cubic-bezier(.4,0,.2,1);
overflow: hidden;
}
.app.railOpen .rail { width: 212px; flex-basis: 212px; }
.railTop { display: flex; align-items: center; gap: 8px; height: 36px; margin-bottom: 8px; }
.logo {
width: 36px; height: 36px; flex: 0 0 36px; border-radius: 10px; display: grid; place-items: center; cursor: pointer;
background: linear-gradient(150deg,#7c3aed,#a78bfa);
box-shadow: 0 4px 14px rgba(124,58,237,.45), inset 0 1px 0 rgba(255,255,255,.25);
}
.railWordmark { font-weight: 700; font-size: 14.5px; letter-spacing: -.01em; white-space: nowrap; opacity: 0; transition: opacity .16s; pointer-events: none; }
.app.railOpen .railWordmark { opacity: 1; transition: opacity .18s .08s; }
.railToggle { margin-left: auto; width: 30px; height: 30px; flex: 0 0 30px; border-radius: 8px; display: grid; place-items: center; color: var(--tx-3); cursor: pointer; transition: .16s; border: none; background: none; }
.railToggle:hover { color: var(--tx); background: var(--hair); }
.railToggle svg { width: 18px; height: 18px; }
.app:not(.railOpen) .railToggle { display: none; }
.navbtn { height: 40px; border-radius: 10px; color: var(--tx-3); cursor: pointer; position: relative; transition: .16s; display: flex; align-items: center; gap: 12px; padding: 0; justify-content: flex-start; width: 100%; background: none; border: none; }
.app.railOpen .navbtn { padding: 0 11px; }
.navbtn .ico { width: 36px; flex: 0 0 36px; display: grid; place-items: center; }
.app.railOpen .navbtn .ico { width: 20px; flex: 0 0 20px; }
.navbtn .lbl { font-size: 13.5px; font-weight: 500; white-space: nowrap; opacity: 0; transition: opacity .16s; pointer-events: none; }
.app.railOpen .navbtn .lbl { opacity: 1; transition: opacity .18s .08s; }
.navbtn:hover { color: var(--tx-2); background: var(--hair); }
.navbtn.active { color: var(--accent-2); background: var(--accent-soft); }
.navbtn.active::before { content: ""; position: absolute; left: -8px; top: 50%; transform: translateY(-50%); width: 3px; height: 18px; border-radius: 0 3px 3px 0; background: var(--accent-2); }
.navbtn svg { width: 20px; height: 20px; }
.spacer { flex: 1; }
/* ===== MAIN ===== */
.main { flex: 1; display: flex; flex-direction: column; min-width: 0; }
.topbar { height: 56px; flex: 0 0 56px; border-bottom: 1px solid var(--hair); background: var(--panel); display: flex; align-items: center; justify-content: space-between; padding: 0 18px 0 20px; }
.org { display: flex; align-items: center; gap: 10px; cursor: pointer; padding: 6px 10px; border-radius: 9px; transition: .16s; margin-left: -6px; }
.org:hover { background: var(--hair); }
.orgBadge { width: 24px; height: 24px; border-radius: 7px; display: grid; place-items: center; background: linear-gradient(150deg,#2d2d36,#3a3a46); font-size: 12px; font-weight: 700; color: #d8d8e2; border: 1px solid var(--hair-2); }
:global([data-theme="light"]) .orgBadge { background: linear-gradient(150deg,#7c3aed,#a78bfa); color: #fff; border: none; }
.orgName { font-weight: 600; font-size: 14.5px; letter-spacing: -.01em; }
.chev { color: var(--tx-3); display: flex; }
.chev svg { width: 15px; height: 15px; }
.topbarRight { display: flex; align-items: center; gap: 10px; }
.iconPill { width: 34px; height: 34px; border-radius: 9px; display: grid; place-items: center; color: var(--tx-3); cursor: pointer; transition: .16s; border: none; background: none; }
.iconPill:hover { color: var(--tx-2); background: var(--hair); }
.iconPill svg { width: 18px; height: 18px; }
.themeToggle { width: 34px; height: 34px; border-radius: 9px; display: grid; place-items: center; color: var(--tx-2); cursor: pointer; transition: .16s; border: 1px solid var(--hair); background: none; }
.themeToggle:hover { background: var(--hair); color: var(--tx); }
.themeToggle svg { width: 17px; height: 17px; }
.avatar { width: 32px; height: 32px; border-radius: 50%; background: linear-gradient(150deg,#f0a36b,#e8638a); display: grid; place-items: center; font-weight: 700; font-size: 12.5px; color: #1a0d12; cursor: pointer; border: 1px solid rgba(255,255,255,.16); box-shadow: 0 2px 8px rgba(0,0,0,.3); margin-left: 4px; }
/* ===== VIEWS ===== */
.viewArea { flex: 1; min-height: 0; position: relative; }
.view { position: absolute; inset: 0; display: none; }
.view.active { display: flex; }
/* A transform turns this into the containing block for its position:fixed
descendants so the canvas's own overlays (Toolbar, Legend, Communications,
New Workspace, minimap) anchor to THIS box (the map view area, right of the
rail and below the topbar) instead of the viewport, and stop overlapping the
shell chrome. */
.canvasMount { position: absolute; inset: 0; transform: translateZ(0); overflow: hidden; }
/* ===== HOME VIEW ===== */
.homeSidebar { flex: 0 0 296px; max-width: 296px; background: var(--panel-2); border-right: 1px solid var(--hair); display: flex; flex-direction: column; min-height: 0; }
.sbTabs { display: flex; gap: 2px; padding: 12px 12px 0; border-bottom: 1px solid var(--hair); }
.sbTab { flex: 1; text-align: center; padding: 9px 4px 11px; font-size: 12.5px; font-weight: 600; color: var(--tx-3); cursor: pointer; position: relative; transition: .14s; border-radius: 8px 8px 0 0; border: none; background: none; }
.sbTab:hover { color: var(--tx-2); }
.sbTab.active { color: var(--tx); }
.sbTab.active::after { content: ""; position: absolute; left: 8px; right: 8px; bottom: -1px; height: 2px; border-radius: 2px; background: var(--accent); }
.cnt { font-family: var(--mono); font-size: 10px; font-weight: 600; margin-left: 5px; background: var(--hair); color: var(--tx-2); padding: 1px 5px; border-radius: 10px; }
.sbTab.active .cnt { background: var(--accent-soft); color: var(--accent-2); }
.sbBody { flex: 1; overflow-y: auto; padding: 14px 12px; }
.wsList { display: flex; flex-direction: column; gap: 6px; }
.treeChildren { position: relative; padding-left: 22px; display: flex; flex-direction: column; gap: 6px; margin-top: 6px; }
.tnode { position: relative; display: flex; flex-direction: column; gap: 6px; }
.tnode::before { content: ""; position: absolute; left: -14px; top: -6px; width: 1.5px; height: calc(100% + 6px); background: var(--hair-2); }
.tnode.last::before { height: 33px; }
.tnode::after { content: ""; position: absolute; left: -14px; top: 27px; width: 14px; height: 1.5px; background: var(--hair-2); }
.ws { display: flex; align-items: center; gap: 11px; padding: 10px 11px; border-radius: 13px; cursor: pointer; border: 1px solid transparent; background: transparent; transition: .16s; position: relative; width: 100%; text-align: left; }
.ws:hover { background: var(--card); }
.ws.active { background: var(--accent-soft); border-color: rgba(139,92,246,.34); }
.wsAv { width: 34px; height: 34px; border-radius: 50%; flex: 0 0 34px; position: relative; display: grid; place-items: center; font-weight: 700; font-size: 12px; color: #0c0c10; box-shadow: inset 0 1px 0 rgba(255,255,255,.3); }
.wsAv .dot { position: absolute; right: -1px; bottom: -1px; width: 10px; height: 10px; border-radius: 50%; border: 2.5px solid var(--panel-2); }
.ws.active .wsAv .dot { border-color: var(--card); }
.wsMeta { min-width: 0; flex: 1; }
.wsName { font-weight: 600; font-size: 13.5px; letter-spacing: -.01em; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
.wsSub { display: flex; align-items: center; gap: 6px; margin-top: 1px; min-width: 0; }
.wsRole { font-family: var(--mono); font-size: 10.5px; color: var(--tx-3); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; min-width: 0; flex: 0 1 auto; }
.wsStatus { font-size: 10.5px; font-weight: 500; display: flex; align-items: center; gap: 4px; flex: 0 0 auto; }
.wsStatus .sdot { width: 6px; height: 6px; border-radius: 50%; }
.rootTag { margin-left: auto; font-family: var(--mono); font-size: 9px; letter-spacing: .1em; text-transform: uppercase; color: var(--accent-2); background: var(--accent-soft); padding: 3px 6px; border-radius: 6px; border: 1px solid rgba(139,92,246,.28); }
.wsQ { margin-left: auto; flex: 0 0 auto; font-family: var(--mono); font-size: 10px; font-weight: 700; color: var(--tx-2); background: var(--hair); border: 1px solid var(--hair-2); padding: 2px 7px; border-radius: 20px; display: inline-flex; align-items: center; gap: 4px; }
.wsQ svg { width: 9px; height: 9px; color: var(--tx-3); }
.wsQ.zero { color: var(--tx-3); opacity: .65; }
.wsCaret { flex: 0 0 auto; width: 20px; height: 20px; margin-left: 4px; border: none; background: none; color: var(--tx-3); cursor: pointer; display: grid; place-items: center; border-radius: 6px; transition: .14s; }
.wsCaret:hover { background: var(--hair); color: var(--tx); }
.wsCaret svg { width: 13px; height: 13px; }
.sbSection { font-size: 11px; font-weight: 600; letter-spacing: .12em; text-transform: uppercase; color: var(--tx-3); font-family: var(--mono); padding: 18px 4px 10px; }
/* tasks */
.task { display: flex; flex-direction: column; align-items: stretch; gap: 0; padding: 11px; border-radius: 12px; border: 1px solid var(--hair); background: var(--card); margin-bottom: 7px; }
.taskRow { display: flex; gap: 11px; }
.taskIc { width: 28px; height: 28px; border-radius: 8px; flex: 0 0 28px; display: grid; place-items: center; }
.taskIc svg { width: 15px; height: 15px; }
.taskIc.done { background: var(--green-soft); color: var(--green); border: 1px solid var(--green-bd); }
.taskIc.run { background: rgba(245,166,35,.12); color: var(--amber); border: 1px solid rgba(245,166,35,.28); }
.taskIc.sched { background: var(--accent-soft); color: var(--accent-2); border: 1px solid rgba(139,92,246,.26); }
.taskMeta { flex: 1; min-width: 0; }
.taskT { font-size: 13px; font-weight: 600; letter-spacing: -.01em; line-height: 1.35; }
.taskS { font-size: 11px; color: var(--tx-3); margin-top: 3px; display: flex; align-items: center; gap: 6px; }
.taskS .pip { width: 4px; height: 4px; border-radius: 50%; background: var(--tx-3); }
.taskActions { display: flex; gap: 7px; margin-top: 11px; padding-left: 39px; }
.tbtn { font-family: var(--sans); font-size: 11.5px; font-weight: 600; cursor: pointer; padding: 5px 12px; border-radius: 8px; border: 1px solid var(--hair-2); background: var(--card-2); color: var(--tx-2); transition: .14s; display: inline-flex; align-items: center; gap: 5px; }
.tbtn svg { width: 13px; height: 13px; }
.tbtn:hover { background: var(--card-hover); color: var(--tx); }
.tbtn.done { background: var(--green-soft); color: var(--green); border-color: var(--green-bd); }
.task.isDone .taskT { color: var(--tx-2); }
/* activity */
.act { display: flex; gap: 11px; padding: 6px 4px; }
.actTime { font-family: var(--mono); font-size: 10.5px; color: var(--tx-3); flex: 0 0 52px; padding-top: 1px; font-variant-numeric: tabular-nums; }
.actLine { position: relative; padding-left: 15px; flex: 1; }
.actLine::before { content: ""; position: absolute; left: 0; top: 6px; width: 6px; height: 6px; border-radius: 50%; background: var(--accent); }
.actLine.grn::before { background: var(--green); }
.actText { font-size: 12px; color: var(--tx-2); line-height: 1.45; }
.actText b { color: var(--tx); font-weight: 600; }
/* approvals */
.apprCard { background: var(--card); border: 1px solid var(--hair); border-radius: 14px; overflow: hidden; }
.apprRow { display: flex; align-items: flex-start; gap: 11px; padding: 13px; }
.apprIc { width: 30px; height: 30px; border-radius: 8px; flex: 0 0 30px; display: grid; place-items: center; background: rgba(239,68,68,.12); color: var(--red); border: 1px solid rgba(239,68,68,.22); }
.apprIc svg { width: 15px; height: 15px; }
.apprMeta { flex: 1; min-width: 0; }
.apprT { font-size: 13px; font-weight: 600; letter-spacing: -.01em; line-height: 1.35; }
.apprT code { font-family: var(--mono); font-size: 11px; color: var(--tx-2); background: var(--hair); padding: 1px 5px; border-radius: 5px; font-weight: 500; }
.apprS { font-size: 11px; color: var(--tx-3); margin-top: 3px; }
.apprActions { display: flex; gap: 7px; padding: 0 13px 13px; }
.empty { text-align: center; color: var(--tx-3); font-size: 12.5px; padding: 30px 16px; line-height: 1.6; }
.empty svg { width: 30px; height: 30px; margin-bottom: 10px; color: var(--tx-3); opacity: .6; }
/* buttons */
.btn { font-family: var(--sans); font-size: 12px; font-weight: 600; cursor: pointer; padding: 6px 13px; border-radius: 8px; border: 1px solid var(--hair-2); background: var(--card-2); color: var(--tx-2); transition: .14s; white-space: nowrap; }
.btn:hover { background: var(--card-hover); color: var(--tx); }
.btn.approve { background: var(--accent); color: #fff; border-color: transparent; box-shadow: 0 2px 10px rgba(124,58,237,.4); }
.btn.approve:hover { background: #9d6ef8; }
.btn.deny:hover { background: rgba(239,68,68,.14); color: var(--red); border-color: rgba(239,68,68,.3); }
.btn.flex { flex: 1; text-align: center; }
/* ===== CHAT ===== */
.chat { flex: 1; display: flex; flex-direction: column; min-width: 0; background: var(--bg); }
.chatHead { height: 56px; flex: 0 0 56px; border-bottom: 1px solid var(--hair); display: flex; align-items: center; gap: 12px; padding: 0 22px; background: var(--panel-2); }
.chAv { width: 30px; height: 30px; border-radius: 9px; display: grid; place-items: center; background: linear-gradient(150deg,#7c3aed,#a78bfa); color: #fff; box-shadow: 0 2px 8px rgba(124,58,237,.4); }
.chAv svg { width: 16px; height: 16px; }
.chMeta { flex: 1; }
.chTitle { font-size: 14.5px; font-weight: 600; letter-spacing: -.01em; }
.chSub { font-size: 11.5px; color: var(--tx-3); display: flex; align-items: center; gap: 6px; margin-top: 1px; }
.chSub .sdot { width: 6px; height: 6px; border-radius: 50%; background: var(--green); }
.chTools { display: flex; gap: 6px; }
.chatScroll { flex: 1; overflow-y: auto; padding: 30px 0; }
.chatInner { max-width: 720px; margin: 0 auto; padding: 0 28px; display: flex; flex-direction: column; gap: 22px; }
.msg { display: flex; gap: 13px; max-width: 100%; }
.msg.user { flex-direction: row-reverse; }
.msgAv { width: 30px; height: 30px; border-radius: 9px; flex: 0 0 30px; display: grid; place-items: center; font-weight: 700; font-size: 12px; }
.msg.user .msgAv { background: linear-gradient(150deg,#f0a36b,#e8638a); color: #1a0d12; }
.msg.bot .msgAv { background: linear-gradient(150deg,#7c3aed,#a78bfa); color: #fff; }
.msg.bot .msgAv svg { width: 16px; height: 16px; }
.bubbleWrap { display: flex; flex-direction: column; gap: 11px; min-width: 0; max-width: 560px; }
.msg.user .bubbleWrap { align-items: flex-end; }
.bubble { padding: 12px 15px; border-radius: 15px; font-size: 14px; line-height: 1.55; letter-spacing: -.005em; }
.msg.user .bubble { background: var(--accent); color: var(--user-bubble-tx); border-bottom-right-radius: 5px; box-shadow: 0 3px 14px rgba(124,58,237,.3); }
.msg.bot .bubble { background: var(--card); border: 1px solid var(--hair); border-bottom-left-radius: 5px; color: var(--tx); }
.bubble b { font-weight: 600; }
.actionCard { background: var(--card); border: 1px solid var(--hair); border-radius: 14px; padding: 13px 15px; display: flex; align-items: center; gap: 13px; width: 100%; }
.acIc { width: 34px; height: 34px; border-radius: 10px; flex: 0 0 34px; display: grid; place-items: center; background: var(--green-soft); border: 1px solid var(--green-bd); color: var(--green); }
.acIc svg { width: 18px; height: 18px; }
.acMeta { flex: 1; min-width: 0; }
.acLabel { font-family: var(--mono); font-size: 10px; letter-spacing: .1em; text-transform: uppercase; color: var(--tx-3); margin-bottom: 3px; }
.acTitle { font-size: 13.5px; font-weight: 600; letter-spacing: -.01em; display: flex; align-items: center; gap: 7px; flex-wrap: wrap; }
.acTitle .pill { font-family: var(--mono); font-size: 11px; font-weight: 500; color: var(--accent-2); white-space: nowrap; background: var(--accent-soft); padding: 2px 8px; border-radius: 6px; border: 1px solid rgba(139,92,246,.24); }
.acCheck { color: var(--green); display: flex; }
.acCheck svg { width: 18px; height: 18px; }
.reqCard { background: linear-gradient(180deg,rgba(245,166,35,.08),rgba(245,166,35,.02)); border: 1px solid rgba(245,166,35,.3); border-radius: 16px; padding: 16px; width: 100%; }
.reqTop { display: flex; align-items: flex-start; gap: 13px; }
.reqIc { width: 36px; height: 36px; border-radius: 10px; flex: 0 0 36px; display: grid; place-items: center; background: rgba(245,166,35,.15); border: 1px solid rgba(245,166,35,.34); color: var(--warn); }
.reqIc svg { width: 19px; height: 19px; }
.reqMeta { flex: 1; }
.reqLabel { font-family: var(--mono); font-size: 10px; letter-spacing: .1em; text-transform: uppercase; color: var(--warn); margin-bottom: 4px; font-weight: 600; }
.reqTitle { font-size: 14.5px; font-weight: 600; letter-spacing: -.01em; line-height: 1.4; }
.reqTitle code { font-family: var(--mono); font-size: 12.5px; color: var(--amber); background: rgba(245,166,35,.12); padding: 1px 6px; border-radius: 5px; font-weight: 500; }
.reqDesc { font-size: 12.5px; color: var(--tx-2); margin-top: 6px; line-height: 1.5; }
.reqActions { display: flex; gap: 9px; margin-top: 14px; padding-left: 49px; }
.reqActions .btn { padding: 8px 18px; font-size: 12.5px; }
.composer { padding: 14px 28px 20px; border-top: 1px solid var(--hair); background: var(--panel-2); }
.composerInner { max-width: 720px; margin: 0 auto; }
.inputBox { background: var(--card); border: 1px solid var(--hair-2); border-radius: 16px; padding: 12px 12px 10px 16px; transition: .16s; }
.inputBox:focus-within { border-color: rgba(139,92,246,.5); box-shadow: 0 0 0 3px rgba(139,92,246,.12); }
.inputTop { display: flex; align-items: flex-end; gap: 10px; }
.msgInput { flex: 1; background: none; border: none; outline: none; color: var(--tx); font-family: var(--sans); font-size: 14px; line-height: 1.5; resize: none; max-height: 120px; padding: 5px 0; }
.msgInput::placeholder { color: var(--tx-3); }
.send { width: 36px; height: 36px; flex: 0 0 36px; border-radius: 11px; border: none; cursor: pointer; background: var(--accent); color: #fff; display: grid; place-items: center; transition: .16s; box-shadow: 0 2px 10px rgba(124,58,237,.4); }
.send:hover { background: #9d6ef8; transform: translateY(-1px); }
.send svg { width: 17px; height: 17px; }
.inputBottom { display: flex; align-items: center; gap: 10px; margin-top: 8px; }
.hint { margin-left: auto; font-size: 11px; color: var(--tx-3); font-family: var(--mono); }
.hint kbd { background: var(--hair); border: 1px solid var(--hair); border-radius: 4px; padding: 1px 5px; font-family: var(--mono); font-size: 10px; }
/* greeting (empty chat state) */
.greetWrap { flex: 1; display: flex; flex-direction: column; align-items: center; justify-content: center; gap: 26px; padding: 0 28px; }
.greet { display: flex; align-items: center; gap: 14px; font-size: 34px; font-weight: 400; letter-spacing: -.02em; color: var(--tx); }
.greet .stamp { color: #f0a36b; }
.greetChips { display: flex; flex-wrap: wrap; gap: 10px; justify-content: center; }
.chip { display: inline-flex; align-items: center; gap: 7px; font-size: 13px; font-weight: 600; color: var(--tx-2); background: var(--card); border: 1px solid var(--hair); padding: 8px 13px; border-radius: 10px; cursor: pointer; transition: .14s; }
.chip:hover { background: var(--card-hover); color: var(--tx); border-color: var(--hair-2); }
/* placeholder (settings) */
.ph { flex: 1; display: flex; flex-direction: column; align-items: center; justify-content: center; gap: 14px; color: var(--tx-3); text-align: center; }
.ph svg { width: 42px; height: 42px; opacity: .5; }
.ph h2 { font-size: 18px; font-weight: 600; color: var(--tx-2); }
.ph p { font-size: 13.5px; max-width: 340px; line-height: 1.55; }
/* settings view */
.settingsScroll { flex: 1; min-height: 0; overflow-y: auto; padding: 28px 32px 60px; }
.settingsInner { max-width: 720px; margin: 0 auto; display: flex; flex-direction: column; gap: 26px; }
.settingsHead { display: flex; flex-direction: column; gap: 5px; }
.settingsHead h1 { font-size: 21px; font-weight: 600; letter-spacing: -.01em; color: var(--tx); }
.settingsHead p { font-size: 13px; color: var(--tx-3); line-height: 1.55; max-width: 540px; }
.scard { background: var(--card); border: 1px solid var(--hair); border-radius: 14px; padding: 18px 20px; display: flex; flex-direction: column; gap: 14px; }
.scardHead { display: flex; flex-direction: column; gap: 4px; }
.scardTitle { font-size: 14.5px; font-weight: 600; color: var(--tx); display: flex; align-items: center; gap: 9px; }
.scardDesc { font-size: 12.5px; color: var(--tx-3); line-height: 1.5; }
/* billing radio options */
.optList { display: flex; flex-direction: column; gap: 10px; }
.opt { display: flex; gap: 12px; padding: 13px 14px; border: 1px solid var(--hair); border-radius: 11px; cursor: pointer; transition: .14s; background: var(--card-2); align-items: flex-start; }
.opt:hover { border-color: var(--hair-2); background: var(--card-hover); }
.opt.optActive { border-color: rgba(139,92,246,.5); background: var(--accent-soft); }
.optRadio { width: 16px; height: 16px; flex: 0 0 16px; border-radius: 50%; border: 2px solid var(--hair-2); margin-top: 2px; position: relative; transition: .14s; }
.opt.optActive .optRadio { border-color: var(--accent); }
.opt.optActive .optRadio::after { content: ""; position: absolute; inset: 2px; border-radius: 50%; background: var(--accent); }
.optBody { display: flex; flex-direction: column; gap: 3px; min-width: 0; }
.optTitle { font-size: 13px; font-weight: 600; color: var(--tx); display: flex; align-items: center; gap: 8px; }
.optDesc { font-size: 12px; color: var(--tx-3); line-height: 1.5; }
.optTag { font-family: var(--mono); font-size: 9.5px; font-weight: 600; letter-spacing: .06em; text-transform: uppercase; color: var(--green); background: var(--green-soft); border: 1px solid var(--green-bd); padding: 1px 7px; border-radius: 20px; }
.optTagCur { color: var(--accent-2); background: var(--accent-soft); border-color: rgba(139,92,246,.3); }
/* byok key entry */
.keyRow { display: flex; flex-direction: column; gap: 9px; padding: 14px; border: 1px solid var(--hair); border-radius: 11px; background: var(--card-2); }
.keyLabel { font-size: 11px; font-weight: 600; letter-spacing: .04em; color: var(--tx-2); font-family: var(--mono); }
.keyInputRow { display: flex; gap: 9px; }
.keyInput { flex: 1; min-width: 0; background: var(--panel); border: 1px solid var(--hair-2); border-radius: 8px; padding: 8px 11px; font-family: var(--mono); font-size: 12px; color: var(--tx); outline: none; transition: .14s; }
.keyInput:focus { border-color: var(--accent); }
.keyInput::placeholder { color: var(--tx-3); }
.keyNote { font-size: 11.5px; color: var(--tx-3); line-height: 1.5; }
.keyNote code { font-family: var(--mono); font-size: 11px; color: var(--tx-2); background: var(--hair); padding: 1px 5px; border-radius: 4px; }
.sMsg { font-size: 12px; padding: 8px 11px; border-radius: 8px; line-height: 1.45; }
.sMsgErr { color: var(--red); background: rgba(239,68,68,.12); border: 1px solid rgba(239,68,68,.28); }
.sMsgOk { color: var(--green); background: var(--green-soft); border: 1px solid var(--green-bd); }
.btn.primary { background: var(--accent); color: #fff; border-color: transparent; box-shadow: 0 2px 10px rgba(124,58,237,.4); }
.btn.primary:hover { background: #9d6ef8; }
.btn.primary:disabled { opacity: .4; cursor: default; box-shadow: none; }
/* embedded canvas settings tabs */
.embedSettings { border: 1px solid var(--hair); border-radius: 14px; overflow: hidden; background: var(--card); }
/* embedded full workspace tab panel (the SAME WorkspacePanelTabs the Org-map
SidePanel renders), pointed at the platform agent. A bordered card with a
bounded height + flex column so the tab body's own overflow-y scroller works
inside it (mirrors .embedChat's min-height:0 trick). */
.embedPanel {
border: 1px solid var(--hair);
border-radius: 14px;
overflow: hidden;
background: var(--card);
display: flex;
flex-direction: column;
min-height: 0;
height: 70vh;
max-height: 760px;
}
/* embedded canonical ChatTab (shared with the Org-map SidePanel).
Fills the chat column below the concierge header; min-height:0 lets the
ChatTab's own overflow-y scroller work inside the flex column. */
.embedChat { flex: 1; min-height: 0; display: flex; flex-direction: column; }
@@ -1,604 +0,0 @@
"use client";
import { useCallback, useEffect, useMemo, useState } from "react";
import { useCanvasStore, type TopView } from "@/store/canvas";
import { WORKSPACE_KIND } from "@/lib/workspace-kind";
import { useTheme } from "@/lib/theme-provider";
import { api } from "@/lib/api";
import { showToast } from "@/components/Toaster";
import type { ActivityEntry } from "@/types/activity";
import { Canvas } from "@/components/Canvas";
import { CommunicationOverlay } from "@/components/CommunicationOverlay";
import { MessageFlightHome } from "./MessageFlightHome";
import { ChatTab } from "@/components/tabs/ChatTab";
import { WorkspacePanelTabs } from "@/components/WorkspacePanelTabs";
import { SettingsTabs } from "@/components/settings";
import s from "./Concierge.module.css";
import {
IcHome, IcOrgMap, IcSettings, IcSearch, IcBell, IcSun, IcMoon, IcChevDown,
IcQueue, IcCaret, IcMolecule, IcClock, IcCheck, IcTrash, IcChat,
} from "./icons";
/* ── status → concept palette ─────────────────────────────────────────── */
function statusInfo(status: string): { color: string; label: string } {
switch (status) {
case "online": return { color: "var(--green)", label: "online" };
case "provisioning":
case "starting": return { color: "var(--amber)", label: "starting" };
case "degraded": return { color: "var(--amber)", label: "degraded" };
case "building": return { color: "var(--amber)", label: "building" };
case "failed": return { color: "var(--red)", label: "failed" };
case "paused": return { color: "var(--accent-2)", label: "paused" };
default: return { color: "var(--grey)", label: status || "idle" };
}
}
const AV_GRADIENTS = [
"linear-gradient(150deg,#a78bfa,#7c3aed)",
"linear-gradient(150deg,#60a5fa,#3b82f6)",
"linear-gradient(150deg,#34d399,#10b981)",
"linear-gradient(150deg,#fbbf77,#f59e0b)",
"linear-gradient(150deg,#5eead4,#14b8a6)",
"linear-gradient(150deg,#f0a36b,#e8638a)",
];
function initials(name: string): string {
const parts = name.trim().split(/\s+/).filter(Boolean);
if (parts.length === 0) return "?";
if (parts.length === 1) return parts[0].slice(0, 2).toUpperCase();
return (parts[0][0] + parts[parts.length - 1][0]).toUpperCase();
}
function gradientFor(id: string): string {
let h = 0;
for (let i = 0; i < id.length; i++) h = (h * 31 + id.charCodeAt(i)) >>> 0;
return AV_GRADIENTS[h % AV_GRADIENTS.length];
}
type SbTab = "agents" | "tasks" | "approvals";
interface PendingApproval {
id: string;
workspace_id: string;
workspace_name: string;
action: string;
reason: string | null;
status: string;
created_at: string;
}
interface UserTask {
id: string;
workspace_id: string;
workspace_name: string;
title: string;
detail: string | null;
status: string;
created_at: string;
}
/** ISO timestamp → "9:05 PM" (local). Empty string on a bad/missing value. */
function clockTime(iso: string | null | undefined): string {
if (!iso) return "";
const d = new Date(iso);
if (Number.isNaN(d.getTime())) return "";
return d.toLocaleTimeString([], { hour: "numeric", minute: "2-digit" });
}
/** A human action label from an activity row. */
function activityText(a: ActivityEntry): string {
if (a.summary) return a.summary;
const verb = a.activity_type?.replace(/_/g, " ") ?? "activity";
return a.method ? `${verb} · ${a.method}` : verb;
}
export function ConciergeShell() {
const nodes = useCanvasStore((st) => st.nodes);
const topView = useCanvasStore((st) => st.topView);
const setTopView = useCanvasStore((st) => st.setTopView);
const selectNode = useCanvasStore((st) => st.selectNode);
const selectedNodeId = useCanvasStore((st) => st.selectedNodeId);
const { resolvedTheme, setTheme } = useTheme();
const [railOpen, setRailOpen] = useState(false);
const [sbTab, setSbTab] = useState<SbTab>("agents");
const [settingsTab, setSettingsTab] = useState<"platform" | "org">("platform");
const [collapsed, setCollapsed] = useState<Record<string, boolean>>({});
// Dynamic org name for the topbar. Sourced from GET /org/identity
// ({name} ← MOLECULE_ORG_NAME, added by a parallel backend change).
// Falls back to "Molecule AI" when the endpoint 404s / errors or
// returns an empty name, so the topbar never breaks before the backend
// lands.
const [orgName, setOrgName] = useState("Molecule AI");
useEffect(() => {
let cancelled = false;
api
.get<{ name?: string }>("/org/identity")
.then((r) => {
const name = (r?.name || "").trim();
if (!cancelled && name) setOrgName(name);
})
.catch(() => {
// No endpoint / not reachable — keep the "Molecule AI" fallback.
});
return () => {
cancelled = true;
};
}, []);
// Build the agent hierarchy from live nodes.
const { roots, childrenOf } = useMemo(() => {
const childrenOf = new Map<string, typeof nodes>();
const roots: typeof nodes = [];
for (const n of nodes) {
const p = n.data.parentId;
if (p) {
const arr = childrenOf.get(p) ?? [];
arr.push(n);
childrenOf.set(p, arr);
} else {
roots.push(n);
}
}
return { roots, childrenOf };
}, [nodes]);
const platformRoot = useMemo(
() =>
// Resolve the platform agent by the authoritative kind='platform' marker
// only — the backend in this branch always returns kind
// (COALESCE(w.kind,'workspace')) and the map-side filter
// (canvas-topology/Canvas/Toolbar) is kind-only, so the shell must not
// disagree via a name/role heuristic. Fall back to the first root only as
// graceful degradation if no node is tagged platform.
roots.find((r) => r.data.kind === WORKSPACE_KIND.Platform) ??
roots[0] ??
null,
[roots],
);
const platformId = platformRoot?.id ?? null;
// ── live data: approvals + user-tasks (org-wide), activity (platform agent) ──
const [approvals, setApprovals] = useState<PendingApproval[]>([]);
const [userTasks, setUserTasks] = useState<UserTask[]>([]);
const [activity, setActivity] = useState<ActivityEntry[]>([]);
const [deciding, setDeciding] = useState<string | null>(null);
const [resolving, setResolving] = useState<string | null>(null);
const loadApprovals = useCallback(() => {
api.get<PendingApproval[]>("/approvals/pending")
.then((r) => setApprovals(r ?? []))
.catch(() => setApprovals([]));
}, []);
const loadUserTasks = useCallback(() => {
api.get<UserTask[]>("/user-tasks/pending")
.then((r) => setUserTasks(r ?? []))
.catch(() => setUserTasks([]));
}, []);
useEffect(() => { loadApprovals(); loadUserTasks(); }, [loadApprovals, loadUserTasks]);
useEffect(() => {
if (!platformId) return;
let cancelled = false;
api.get<ActivityEntry[]>(`/workspaces/${platformId}/activity?limit=12`)
.then((r) => { if (!cancelled) setActivity(r ?? []); })
.catch(() => { if (!cancelled) setActivity([]); });
return () => { cancelled = true; };
}, [platformId]);
const decide = useCallback(async (a: PendingApproval, decision: "approved" | "denied") => {
if (deciding) return;
setDeciding(a.id);
try {
await api.post(`/workspaces/${a.workspace_id}/approvals/${a.id}/decide`, {
decision, decided_by: "human",
});
showToast(decision === "approved" ? "Approved" : "Denied", decision === "approved" ? "success" : "info");
setApprovals((prev) => prev.filter((x) => x.id !== a.id));
} catch {
showToast("Failed to record decision", "error");
} finally {
setDeciding(null);
}
}, [deciding]);
const resolveTask = useCallback(async (t: UserTask, status: "done" | "dismissed") => {
if (resolving) return;
setResolving(t.id);
try {
await api.post(`/workspaces/${t.workspace_id}/user-tasks/${t.id}/resolve`, {
status, resolved_by: "human",
});
showToast(status === "done" ? "Marked done" : "Dismissed", status === "done" ? "success" : "info");
setUserTasks((prev) => prev.filter((x) => x.id !== t.id));
} catch {
showToast("Failed to resolve task", "error");
} finally {
setResolving(null);
}
}, [resolving]);
const nav = (v: TopView) => setTopView(v);
/* ── agents tree (recursive) ──────────────────────────────────────── */
function renderNode(n: (typeof nodes)[number], depth: number) {
const kids = childrenOf.get(n.id) ?? [];
const hasKids = kids.length > 0;
const isCollapsed = collapsed[n.id];
const st = statusInfo(n.data.status);
const isRoot = depth === 0;
const isPlatform = n.id === platformRoot?.id;
const q = (n.data.activeTasks as number) ?? 0;
// Role can be a long descriptor (e.g. "Coding Executor (Kimi) — …"); render
// it compact (single-line, truncated by .wsRole) and surface the full text
// on hover via the native tooltip.
const roleLabel = isPlatform ? "platform" : n.data.role || "agent";
const row = (
<div
role="button"
tabIndex={0}
data-testid="agent-tree-node"
data-node-name={n.data.name}
data-ws-id={n.id}
data-platform={isPlatform ? "true" : "false"}
data-depth={depth}
className={`${s.ws} ${selectedNodeId === n.id ? s.active : ""}`}
onClick={() => selectNode(n.id)}
onKeyDown={(e) => {
if (e.key === "Enter" || e.key === " ") {
e.preventDefault();
selectNode(n.id);
}
}}
>
<div className={s.wsAv} style={{ background: gradientFor(n.id) }}>
{initials(n.data.name)}
<span className={s.dot} style={{ background: st.color }} />
</div>
<div className={s.wsMeta}>
<div className={s.wsName}>{n.data.name}</div>
<div className={s.wsSub}>
<span className={s.wsRole} title={roleLabel}>{roleLabel}</span>
<span className={s.wsStatus} style={{ color: st.color }}>
<span className={s.sdot} style={{ background: st.color }} />
{st.label}
</span>
</div>
</div>
{isRoot && isPlatform ? (
<span data-testid="agent-tree-root-tag" className={s.rootTag}>root</span>
) : (
<span className={`${s.wsQ} ${q === 0 ? s.zero : ""}`} title="Tasks in queue">
<IcQueue />
{q}
</span>
)}
{hasKids && (
<button
className={s.wsCaret}
title="Expand / collapse"
onClick={(e) => {
e.stopPropagation();
setCollapsed((c) => ({ ...c, [n.id]: !c[n.id] }));
}}
style={{ transform: isCollapsed ? "none" : "rotate(90deg)", transition: "transform .18s" }}
>
<IcCaret />
</button>
)}
</div>
);
return (
<div key={n.id} className={s.tnode}>
{row}
{hasKids && !isCollapsed && (
<div className={s.treeChildren}>
{kids.map((k) => renderNode(k, depth + 1))}
</div>
)}
</div>
);
}
return (
<div className={s.root}>
{/* Envelope flies between agent rows on each delegate/message event. */}
<MessageFlightHome />
<div className={`${s.app} ${railOpen ? s.railOpen : ""}`}>
{/* ICON RAIL */}
<nav className={s.rail}>
<div className={s.railTop}>
<div className={s.logo} title="Toggle sidebar" onClick={() => setRailOpen((o) => !o)}>
<IcMolecule />
</div>
<span className={s.railWordmark}>Molecule</span>
<button className={s.railToggle} title="Collapse sidebar" onClick={() => setRailOpen((o) => !o)}>
<IcOrgMap />
</button>
</div>
<button data-testid="nav-home" className={`${s.navbtn} ${topView === "home" ? s.active : ""}`} title="Home" onClick={() => nav("home")}>
<span className={s.ico}><IcHome /></span><span className={s.lbl}>Home</span>
</button>
<button data-testid="nav-map" className={`${s.navbtn} ${topView === "map" ? s.active : ""}`} title="Org map" onClick={() => nav("map")}>
<span className={s.ico}><IcOrgMap /></span><span className={s.lbl}>Org map</span>
</button>
<div className={s.spacer} />
<button data-testid="nav-settings" className={`${s.navbtn} ${topView === "settings" ? s.active : ""}`} title="Settings" onClick={() => nav("settings")}>
<span className={s.ico}><IcSettings /></span><span className={s.lbl}>Settings</span>
</button>
</nav>
<div className={s.main}>
{/* TOPBAR */}
<header className={s.topbar}>
<div className={s.org}>
<div className={s.orgBadge}>{initials(orgName).slice(0, 1)}</div>
<span data-testid="topbar-org-name" className={s.orgName}>{orgName}</span>
<span className={s.chev}><IcChevDown /></span>
</div>
<div className={s.topbarRight}>
<button className={s.iconPill} title="Search"><IcSearch /></button>
<button className={s.iconPill} title="Notifications"><IcBell /></button>
<button
className={s.themeToggle}
title="Toggle theme"
onClick={() => setTheme(resolvedTheme === "dark" ? "light" : "dark")}
>
{resolvedTheme === "dark" ? <IcMoon /> : <IcSun />}
</button>
<div className={s.avatar} title="You">HW</div>
</div>
</header>
<div className={s.viewArea}>
{/* HOME VIEW */}
<div className={`${s.view} ${topView === "home" ? s.active : ""}`}>
<aside className={s.homeSidebar}>
<div className={s.sbTabs}>
<button data-testid="home-subtab-agents" className={`${s.sbTab} ${sbTab === "agents" ? s.active : ""}`} onClick={() => setSbTab("agents")}>Agents</button>
<button data-testid="home-subtab-tasks" className={`${s.sbTab} ${sbTab === "tasks" ? s.active : ""}`} onClick={() => setSbTab("tasks")}>
Tasks{userTasks.length > 0 && <span className={s.cnt}>{userTasks.length}</span>}
</button>
<button data-testid="home-subtab-approvals" className={`${s.sbTab} ${sbTab === "approvals" ? s.active : ""}`} onClick={() => setSbTab("approvals")}>
Approvals{approvals.length > 0 && <span className={s.cnt}>{approvals.length}</span>}
</button>
</div>
<div className={s.sbBody}>
{sbTab === "agents" && (
<>
<div className={s.wsList}>
{roots.length === 0 && (
<div className={s.empty}>No agents yet. Ask the concierge to spin up a team.</div>
)}
{roots.map((r) => renderNode(r, 0))}
</div>
<div className={s.sbSection}>Recent activity</div>
<div>
{activity.length === 0 && (
<div className={s.empty}>No recent activity yet.</div>
)}
{activity.map((a) => {
const ok = a.status !== "error" && a.status !== "failed";
return (
<div key={a.id} className={s.act}>
<span className={s.actTime}>{clockTime(a.created_at)}</span>
<div className={`${s.actLine} ${ok ? s.grn : ""}`}>
<div className={s.actText}>{activityText(a)}</div>
</div>
</div>
);
})}
</div>
</>
)}
{sbTab === "tasks" && (
<>
{userTasks.length === 0 && (
<div className={s.empty}>Nothing needs you right now. When an agent needs you to do something, it shows up here.</div>
)}
{userTasks.map((t) => (
<div key={t.id} className={s.task}>
<div className={s.taskRow}>
<div className={`${s.taskIc} ${s.run}`}><IcClock /></div>
<div className={s.taskMeta}>
<div className={s.taskT}>{t.title}</div>
<div className={s.taskS}>
{t.workspace_name}<span className={s.pip} />asked {clockTime(t.created_at)}
</div>
{t.detail && (
<div style={{ fontSize: 12, color: "var(--tx-3)", marginTop: 6, lineHeight: 1.45 }}>
{t.detail}
</div>
)}
</div>
</div>
<div className={s.taskActions}>
<button className={`${s.tbtn} ${s.done}`} disabled={resolving === t.id} onClick={() => resolveTask(t, "done")}>
<IcCheck />Done
</button>
<button className={s.tbtn} disabled={resolving === t.id} onClick={() => resolveTask(t, "dismissed")}>
Dismiss
</button>
</div>
</div>
))}
</>
)}
{sbTab === "approvals" && (
<>
{approvals.length === 0 && (
<div className={s.empty}>No pending approvals. Destructive actions await sign-off here.</div>
)}
{approvals.map((a) => (
<div key={a.id} className={s.apprCard} style={{ marginBottom: 7 }}>
<div className={s.apprRow}>
<div className={s.apprIc}><IcTrash /></div>
<div className={s.apprMeta}>
<div className={s.apprT}>{a.action.replace(/_/g, " ")} <code>{a.workspace_name}</code></div>
<div className={s.apprS}>{a.reason || "destructive"}</div>
</div>
</div>
<div className={s.apprActions}>
<button className={`${s.btn} ${s.approve} ${s.flex}`} disabled={deciding === a.id} onClick={() => decide(a, "approved")}>
{deciding === a.id ? "…" : "Approve"}
</button>
<button className={`${s.btn} ${s.deny} ${s.flex}`} disabled={deciding === a.id} onClick={() => decide(a, "denied")}>
{deciding === a.id ? "…" : "Deny"}
</button>
</div>
</div>
))}
</>
)}
</div>
</aside>
{/* CHAT reuses the EXACT canonical chat the Org-map SidePanel
renders (My Chat / Agent Comms sub-tabs, attachments, history,
delivery-mode handling), pointed at the platform agent. A thin
concierge-styled header keeps the Home look; the ChatTab body
below is identical to the map path so features can't drift. */}
{platformId && platformRoot ? (
<section className={s.chat}>
<div className={s.chatHead}>
<div className={s.chAv}><IcChat /></div>
<div className={s.chMeta}>
<div className={s.chTitle}>{platformRoot.data.name ?? "Org Concierge"}</div>
<div className={s.chSub}>
{(() => {
const online =
platformRoot.data.status === "online" ||
platformRoot.data.status === "degraded";
return (
<>
<span
className={s.sdot}
style={{ background: online ? "var(--green)" : "var(--grey)" }}
/>
{online ? "online" : statusInfo(platformRoot.data.status ?? "").label} · platform agent
</>
);
})()}
</div>
</div>
</div>
<div className={s.embedChat}>
<ChatTab key={platformId} workspaceId={platformId} data={platformRoot.data} />
</div>
</section>
) : (
<section className={s.chat}>
<div className={s.greetWrap}>
<div className={s.greet}>
<span className={s.stamp}></span> No platform agent yet
</div>
</div>
</section>
)}
</div>
{/* ORG MAP VIEW — the live canvas */}
<div className={`${s.view} ${topView === "map" ? s.active : ""}`}>
{topView === "map" && (
<div className={s.canvasMount}>
<main aria-label="Agent canvas" style={{ position: "absolute", inset: 0 }}>
<Canvas />
</main>
<CommunicationOverlay />
</div>
)}
</div>
{/* SETTINGS VIEW */}
<div className={`${s.view} ${topView === "settings" ? s.active : ""}`}>
<div className={s.settingsScroll}>
<div className={s.settingsInner}>
<div className={s.settingsHead}>
<h1>Settings</h1>
<p>
Org-level settings for the platform concierge. Configure the
concierge exactly like any workspace config.yaml, plugins
and skills, container/compute, display, channels, schedule
and secrets plus how it pays for model usage and org
identity.
</p>
</div>
{/* Two tabs instead of one long sheet: Platform agent
configuration vs Org & canvas settings. Reuses the same
.sbTabs purple-underline tab style as the Home sub-tabs. */}
<div className={s.sbTabs} role="tablist" aria-label="Settings sections">
<button
type="button"
role="tab"
data-testid="settings-tab-platform"
aria-selected={settingsTab === "platform"}
className={`${s.sbTab} ${settingsTab === "platform" ? s.active : ""}`}
onClick={() => setSettingsTab("platform")}
>
Platform agent configuration
</button>
<button
type="button"
role="tab"
data-testid="settings-tab-org"
aria-selected={settingsTab === "org"}
className={`${s.sbTab} ${settingsTab === "org" ? s.active : ""}`}
onClick={() => setSettingsTab("org")}
>
Org &amp; canvas settings
</button>
</div>
{/* Platform agent configuration the FULL workspace tab UI
(Config, Plugins/Skills, Container, Display, Details,
Activity, Terminal, Channels, Schedule, Files, Memory,
Traces, Events, Audit), reusing the exact same
WorkspacePanelTabs the Org-map SidePanel renders so the two
surfaces can't drift. Pointed at the platform agent; the
panel owns its own local active-tab state so it doesn't
fight the map's node selection. */}
{settingsTab === "platform" && (
<div data-testid="settings-pane-platform" className={s.scard}>
<div className={s.scardHead}>
<div className={s.scardDesc}>
Update the concierge like any workspace: its config.yaml,
plugins &amp; skills, container/compute, display, channels,
schedule and more.
</div>
</div>
{platformRoot ? (
<div className={s.embedPanel}>
<WorkspacePanelTabs key={platformRoot.id} node={platformRoot} defaultTab="config" />
</div>
) : (
<div className={s.scardDesc}>
No platform agent yet. Spin one up from Home to configure it.
</div>
)}
</div>
)}
{settingsTab === "org" && (
<div data-testid="settings-pane-org" className={s.scard}>
<div className={s.scardHead}>
<div className={s.scardDesc}>
Secrets, workspace tokens, org API keys and organization
identity. These also live behind the gear in the top bar.
</div>
</div>
{platformId && (
<div className={s.embedSettings}>
<SettingsTabs workspaceId={platformId} />
</div>
)}
</div>
)}
</div>
</div>
</div>
</div>
</div>
</div>
</div>
);
}
@@ -1,50 +0,0 @@
/** MessageFlightHome the concierge-home counterpart of MessageFlightLayer.
* The home view is a vertical agent tree (not a spatial canvas), so an envelope
* flies between the source and target agent ROWS. It shares the exact same
* flight stream (useA2AFlights) as the canvas, and resolves endpoints from each
* row's DOM rect (rows carry data-ws-id). Reduced-motion is honoured by the
* shared hook (it emits no flights). */
import { useRef } from "react";
import { useA2AFlights, type A2AFlight } from "@/hooks/useA2AFlights";
import { FlightEnvelope, type Point } from "../FlightEnvelope";
function rowCenter(wsId: string): Point | null {
if (typeof document === "undefined") return null;
const sel =
typeof CSS !== "undefined" && typeof CSS.escape === "function"
? CSS.escape(wsId)
: wsId;
const el = document.querySelector<HTMLElement>(`[data-ws-id="${sel}"]`);
if (!el) return null;
const r = el.getBoundingClientRect();
return { x: r.left + r.width / 2, y: r.top + r.height / 2 };
}
/** One flight. Captures the source/target row rects ONCE on mount (a ref, not
* per-render) so a later re-render or scroll mid-flight does not restart the
* animation. */
function HomeFlight({ flight }: { flight: A2AFlight }) {
const pos = useRef<{ from: Point; to: Point } | null>(null);
if (pos.current === null) {
const from = rowCenter(flight.sourceId);
const to = rowCenter(flight.targetId);
if (from && to) pos.current = { from, to };
}
if (!pos.current) return null; // one or both agents not visible in the tree
return <FlightEnvelope from={pos.current.from} to={pos.current.to} kind={flight.kind} />;
}
export function MessageFlightHome() {
const flights = useA2AFlights();
if (flights.length === 0) return null;
return (
<div
aria-hidden="true"
style={{ position: "fixed", inset: 0, pointerEvents: "none", zIndex: 50 }}
>
{flights.map((f) => (
<HomeFlight key={f.key} flight={f} />
))}
</div>
);
}
-113
View File
@@ -1,113 +0,0 @@
/* Inline SVG icons lifted from the Org Concierge concept (molecule-concierge-v1).
Stroke icons inherit currentColor; size comes from the CSS (svg{width/height}). */
import type { SVGProps } from "react";
const stroke = {
fill: "none",
stroke: "currentColor",
strokeWidth: 1.8,
strokeLinecap: "round" as const,
strokeLinejoin: "round" as const,
};
export const IcMolecule = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" {...p}>
<circle cx="12" cy="5" r="2.4" fill="#fff" />
<circle cx="5.5" cy="16" r="2.4" fill="#fff" opacity=".85" />
<circle cx="18.5" cy="16" r="2.4" fill="#fff" opacity=".85" />
<path d="M12 7.4L6 14.2M12 7.4L18 14.2M7.6 16h8.8" stroke="#fff" strokeWidth="1.4" strokeLinecap="round" />
</svg>
);
export const IcChat = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" {...p}>
<circle cx="12" cy="5" r="2.2" fill="#fff" />
<circle cx="5.5" cy="16" r="2.2" fill="#fff" opacity=".85" />
<circle cx="18.5" cy="16" r="2.2" fill="#fff" opacity=".85" />
<path d="M12 7.2L6 14M12 7.2L18 14M7.6 16h8.8" stroke="#fff" strokeWidth="1.3" strokeLinecap="round" />
</svg>
);
export const IcHome = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M3 10.5L12 3l9 7.5" /><path d="M5 9.5V20h14V9.5" /></svg>
);
export const IcOrgMap = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}>
<rect x="8.5" y="3" width="7" height="6" rx="1.5" />
<rect x="2.5" y="15" width="6.5" height="6" rx="1.5" />
<rect x="15" y="15" width="6.5" height="6" rx="1.5" />
<path d="M12 9v3M12 12H5.75v3M12 12h6.25v3" />
</svg>
);
export const IcSettings = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}>
<circle cx="12" cy="12" r="3" />
<path d="M19.4 15a1.7 1.7 0 0 0 .34 1.87l.06.06a2 2 0 1 1-2.83 2.83l-.06-.06a1.7 1.7 0 0 0-1.87-.34 1.7 1.7 0 0 0-1.03 1.56V21a2 2 0 1 1-4 0v-.09A1.7 1.7 0 0 0 9 19.4a1.7 1.7 0 0 0-1.87.34l-.06.06a2 2 0 1 1-2.83-2.83l.06-.06A1.7 1.7 0 0 0 4.6 15a1.7 1.7 0 0 0-1.56-1.03H3a2 2 0 1 1 0-4h.09A1.7 1.7 0 0 0 4.6 9a1.7 1.7 0 0 0-.34-1.87l-.06-.06a2 2 0 1 1 2.83-2.83l.06.06A1.7 1.7 0 0 0 9 4.6a1.7 1.7 0 0 0 1.03-1.56V3a2 2 0 1 1 4 0v.09A1.7 1.7 0 0 0 15 4.6a1.7 1.7 0 0 0 1.87-.34l.06-.06a2 2 0 1 1 2.83 2.83l-.06.06A1.7 1.7 0 0 0 19.4 9c.13.31.4.55.73.66" />
</svg>
);
export const IcSearch = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="1.8" strokeLinecap="round" {...p}><circle cx="11" cy="11" r="7" /><path d="M20 20l-3.5-3.5" /></svg>
);
export const IcBell = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M18 8a6 6 0 1 0-12 0c0 7-3 9-3 9h18s-3-2-3-9" /><path d="M13.7 21a2 2 0 0 1-3.4 0" /></svg>
);
export const IcSun = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><circle cx="12" cy="12" r="4.2" /><path d="M12 2v2.5M12 19.5V22M2 12h2.5M19.5 12H22M4.9 4.9l1.8 1.8M17.3 17.3l1.8 1.8M19.1 4.9l-1.8 1.8M6.7 17.3l-1.8 1.8" /></svg>
);
export const IcMoon = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M21 12.8A9 9 0 1 1 11.2 3a7 7 0 0 0 9.8 9.8z" /></svg>
);
export const IcChevDown = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round" {...p}><path d="M6 9l6 6 6-6" /></svg>
);
export const IcCaret = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2.4" strokeLinecap="round" strokeLinejoin="round" {...p}><path d="M9 6l6 6-6 6" /></svg>
);
export const IcQueue = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2.2" strokeLinecap="round" {...p}><path d="M8 6h12M8 12h12M8 18h12M4 6h.01M4 12h.01M4 18h.01" /></svg>
);
export const IcCheck = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2.2" strokeLinecap="round" strokeLinejoin="round" {...p}><path d="M20 6L9 17l-5-5" /></svg>
);
export const IcSchedule = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><rect x="3.5" y="4.5" width="17" height="16" rx="2.5" /><path d="M3.5 9h17M8 3v3M16 3v3" /></svg>
);
export const IcWorkspace = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><rect x="3.5" y="3.5" width="7" height="7" rx="1.5" /><rect x="13.5" y="13.5" width="7" height="7" rx="1.5" /><path d="M13.5 7h7M7 13.5v7" /></svg>
);
export const IcWarn = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M12 9v4M12 17h.01" /><path d="M10.3 3.9 1.8 18a2 2 0 0 0 1.7 3h17a2 2 0 0 0 1.7-3L13.7 3.9a2 2 0 0 0-3.4 0Z" /></svg>
);
export const IcSend = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round" {...p}><path d="M5 12h14M13 6l6 6-6 6" /></svg>
);
export const IcHistory = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M3 12a9 9 0 1 0 3-6.7L3 8" /><path d="M3 4v4h4M12 8v4l3 2" /></svg>
);
export const IcDots = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" fill="currentColor" {...p}><circle cx="5" cy="12" r="1.6" /><circle cx="12" cy="12" r="1.6" /><circle cx="19" cy="12" r="1.6" /></svg>
);
export const IcClock = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M12 7v5l3 2" /><circle cx="12" cy="12" r="9" /></svg>
);
export const IcTrash = (p: SVGProps<SVGSVGElement>) => (
<svg viewBox="0 0 24 24" {...stroke} {...p}><path d="M3 6h18M8 6V4h8v2M19 6l-1 14H6L5 6M10 11v5M14 11v5" /></svg>
);
+1 -1
View File
@@ -120,7 +120,7 @@ export { usePalette } from "./palette-context";
// References the CSS variables that next/font/google emits in
// app/layout.tsx. Falls through to system fonts if the variable is
// undefined (e.g. in unit tests with no <body> font class).
export const MOBILE_FONT_SANS = "var(--font-hanken), 'Hanken Grotesk', ui-sans-serif, system-ui, sans-serif";
export const MOBILE_FONT_SANS = "var(--font-inter), 'Inter', ui-sans-serif, system-ui, sans-serif";
export const MOBILE_FONT_MONO = "var(--font-jetbrains), 'JetBrains Mono', ui-monospace, monospace";
// Status keys we surface in the mobile UI. Anything else from the
+13 -55
View File
@@ -15,21 +15,12 @@ import { Spinner } from '@/components/Spinner';
* currently-active org, plus a switcher list when the user belongs to
* multiple orgs.
*
* Data path (SaaS control plane present):
* Data path:
* 1. fetchSession() /cp/auth/me current org_id
* 2. api.get('/cp/orgs') list of all orgs the user belongs to
* 3. Match by id === session.org_id; fall back to host-slug match
* if the session probe loses the race.
*
* Data path (self-host NO control plane):
* /cp/orgs is a control-plane route that does not exist on a self-hosted
* stack, so it 404s. When that probe fails we fall back to the open
* GET /org/identity route (served by the tenant workspace-server in both
* modes) and render a single org card from name + slug + org_id. On a
* fresh self-host only `name` is populated (MOLECULE_ORG_SLUG /
* MOLECULE_ORG_ID are unset) the card omits the empty rows and shows
* no error and no "other organizations" list.
*
* Read-only this tab never mutates. Org creation/switching lives at
* /orgs (the post-signup landing page).
*/
@@ -45,50 +36,25 @@ interface Org {
// for the same defensive unwrap.
type OrgsResponse = Org[] | { orgs?: Org[] };
// GET /org/identity (self-host fallback) — open route on the tenant
// workspace-server. slug/org_id are "" on a fresh self-host.
interface OrgIdentity {
name?: string;
slug?: string;
org_id?: string;
}
export function OrgInfoTab() {
const [orgs, setOrgs] = useState<Org[] | null>(null);
const [session, setSession] = useState<Session | null>(null);
// selfHostOrg is set only when /cp/orgs is unavailable (self-host) and the
// /org/identity fallback yields an org. When non-null we render exactly one
// card from it and never show the "other organizations" list or an error.
const [selfHostOrg, setSelfHostOrg] = useState<Org | null>(null);
const [error, setError] = useState<string | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
let cancelled = false;
(async () => {
const sess = await fetchSession().catch(() => null);
if (cancelled) return;
setSession(sess);
try {
const body = await api.get<OrgsResponse>('/cp/orgs');
const [sess, body] = await Promise.all([
fetchSession().catch(() => null),
api.get<OrgsResponse>('/cp/orgs'),
]);
if (cancelled) return;
setSession(sess);
setOrgs(Array.isArray(body) ? body : body.orgs ?? []);
} catch {
// /cp/orgs is a control-plane route — absent on a self-hosted stack
// (404 / network error). Fall back to the open /org/identity route on
// the tenant server instead of surfacing a red error banner.
try {
const id = await api.get<OrgIdentity>('/org/identity');
if (cancelled) return;
setSelfHostOrg({
id: id.org_id ?? '',
slug: id.slug ?? '',
name: id.name ?? '',
});
} catch (e2) {
if (!cancelled)
setError(e2 instanceof Error ? e2.message : 'Failed to load org info');
}
} catch (e) {
if (!cancelled) setError(e instanceof Error ? e.message : 'Failed to load org info');
} finally {
if (!cancelled) setLoading(false);
}
@@ -100,14 +66,10 @@ export function OrgInfoTab() {
const tenantSlug = getTenantSlug();
const currentOrg =
selfHostOrg ??
orgs?.find((o) => session && o.id === session.org_id) ??
orgs?.find((o) => tenantSlug && o.slug === tenantSlug) ??
null;
// Self-host renders a single org only — no "other organizations" list.
const otherOrgs = selfHostOrg
? []
: orgs?.filter((o) => o.id !== currentOrg?.id) ?? [];
const otherOrgs = orgs?.filter((o) => o.id !== currentOrg?.id) ?? [];
if (loading) {
return (
@@ -165,25 +127,21 @@ export function OrgInfoTab() {
}
function OrgIdentityCard({ org, highlighted }: { org: Org; highlighted?: boolean }) {
// On self-host, slug / UUID may be unconfigured ("") — omit those rows
// gracefully rather than rendering an empty code box.
return (
<div
className={`rounded-lg border p-3 space-y-2 ${
highlighted ? 'border-accent/40 bg-accent-strong/5' : 'border-line/40 bg-surface-card/40'
}`}
data-testid={`org-card-${org.slug || org.id || 'self-host'}`}
data-testid={`org-card-${org.slug}`}
>
<div className="flex items-baseline justify-between gap-2">
<span className="text-[12px] font-medium text-ink truncate">
{org.name || 'This organization'}
</span>
<span className="text-[12px] font-medium text-ink truncate">{org.name}</span>
{org.status && (
<span className="text-[9px] text-ink-mid uppercase tracking-wider shrink-0">{org.status}</span>
)}
</div>
{org.slug && <IdentityRow label="Slug" value={org.slug} />}
{org.id && <IdentityRow label="UUID" value={org.id} mono />}
<IdentityRow label="Slug" value={org.slug} />
<IdentityRow label="UUID" value={org.id} mono />
</div>
);
}
@@ -2,9 +2,13 @@
import { createRef, useCallback, useEffect, useState } from 'react';
import * as Dialog from '@radix-ui/react-dialog';
import * as Tabs from '@radix-ui/react-tabs';
import { useSecretsStore } from '@/stores/secrets-store';
import { useKeyboardShortcut } from '@/hooks/use-keyboard-shortcut';
import { SettingsTabs } from './SettingsTabs';
import { SecretsTab } from './SecretsTab';
import { TokensTab } from './TokensTab';
import { OrgTokensTab } from './OrgTokensTab';
import { OrgInfoTab } from './OrgInfoTab';
import { UnsavedChangesGuard } from './UnsavedChangesGuard';
/** Module-level ref so TopBar's SettingsButton can receive focus back on close. */
@@ -102,7 +106,38 @@ export function SettingsPanel({ workspaceId }: SettingsPanelProps) {
</Dialog.Close>
</div>
<SettingsTabs workspaceId={workspaceId} />
<Tabs.Root defaultValue="api-keys">
<Tabs.List className="settings-panel__tabs" aria-label="Settings sections">
<Tabs.Trigger value="api-keys" className="settings-panel__tab">
Secrets
</Tabs.Trigger>
<Tabs.Trigger value="tokens" className="settings-panel__tab">
Workspace Tokens
</Tabs.Trigger>
<Tabs.Trigger value="org-tokens" className="settings-panel__tab">
Org API Keys
</Tabs.Trigger>
<Tabs.Trigger value="org-info" className="settings-panel__tab">
Organization
</Tabs.Trigger>
</Tabs.List>
<Tabs.Content value="api-keys" className="settings-panel__content">
<SecretsTab workspaceId={workspaceId} />
</Tabs.Content>
<Tabs.Content value="tokens" className="settings-panel__content">
<TokensTab workspaceId={workspaceId} />
</Tabs.Content>
<Tabs.Content value="org-tokens" className="settings-panel__content">
<OrgTokensTab />
</Tabs.Content>
<Tabs.Content value="org-info" className="settings-panel__content">
<OrgInfoTab />
</Tabs.Content>
</Tabs.Root>
<div className="settings-panel__footer">
<span className="settings-panel__shortcut-hint">
@@ -1,60 +0,0 @@
'use client';
import * as Tabs from '@radix-ui/react-tabs';
import { SecretsTab } from './SecretsTab';
import { TokensTab } from './TokensTab';
import { OrgTokensTab } from './OrgTokensTab';
import { OrgInfoTab } from './OrgInfoTab';
interface SettingsTabsProps {
workspaceId: string;
}
/**
* The tabbed body of the workspace settings surface Secrets, Workspace
* Tokens, Org API Keys, Organization.
*
* Extracted from SettingsPanel so the same content can render in two
* places without duplication:
* 1. The right-anchored slide-over drawer (the gear popover) SettingsPanel.
* 2. The concierge Settings view (embedded inline) ConciergeShell.
*
* Pure presentation of the four tabs; all dirty-form / unsaved-guard /
* keyboard-shortcut wiring stays in SettingsPanel where the popover owns it.
*/
export function SettingsTabs({ workspaceId }: SettingsTabsProps) {
return (
<Tabs.Root defaultValue="api-keys">
<Tabs.List className="settings-panel__tabs" aria-label="Settings sections">
<Tabs.Trigger value="api-keys" className="settings-panel__tab">
Secrets
</Tabs.Trigger>
<Tabs.Trigger value="tokens" className="settings-panel__tab">
Workspace Tokens
</Tabs.Trigger>
<Tabs.Trigger value="org-tokens" className="settings-panel__tab">
Org API Keys
</Tabs.Trigger>
<Tabs.Trigger value="org-info" className="settings-panel__tab">
Organization
</Tabs.Trigger>
</Tabs.List>
<Tabs.Content value="api-keys" className="settings-panel__content">
<SecretsTab workspaceId={workspaceId} />
</Tabs.Content>
<Tabs.Content value="tokens" className="settings-panel__content">
<TokensTab workspaceId={workspaceId} />
</Tabs.Content>
<Tabs.Content value="org-tokens" className="settings-panel__content">
<OrgTokensTab />
</Tabs.Content>
<Tabs.Content value="org-info" className="settings-panel__content">
<OrgInfoTab />
</Tabs.Content>
</Tabs.Root>
);
}
@@ -9,9 +9,7 @@
* - Copy button writes the UUID to navigator.clipboard
* - Falls back to host-slug match when session lookup fails
* - Lists other orgs when user belongs to multiple
* - Self-host fallback: /cp/orgs 404 /org/identity single-org card (no error)
* - Self-host fallback with only a name (slug/org_id unset) omits empty rows
* - Error banner only when BOTH /cp/orgs AND /org/identity fail
* - Error banner when /cp/orgs throws
* - Empty/no-match state renders the recovery hint, not a crash
*/
import React from "react";
@@ -182,69 +180,12 @@ describe("OrgInfoTab — fallbacks", () => {
});
});
// ─── Self-host fallback: /cp/orgs absent → /org/identity ─────────────────────
describe("OrgInfoTab — self-host fallback", () => {
it("renders a single org card from /org/identity when /cp/orgs 404s", async () => {
mockFetchSession.mockResolvedValue(null);
mockGet.mockImplementation((path: string) => {
if (path === "/cp/orgs")
return Promise.reject(new Error("API GET /cp/orgs: 404 page not found"));
if (path === "/org/identity")
return Promise.resolve({
name: "Molecule AI",
slug: "molecule-ai",
org_id: "abc-123",
});
return Promise.reject(new Error(`unexpected path ${path}`));
});
render(<OrgInfoTab />);
await flush();
await waitFor(() => screen.getByText("Current Organization"));
// Single card from /org/identity — name + slug + UUID, no error banner.
expect(screen.getByText("Molecule AI")).toBeTruthy();
expect(screen.getByText("molecule-ai")).toBeTruthy();
expect(screen.getByText("abc-123")).toBeTruthy();
// No "other organizations" list and no error.
expect(screen.queryByText(/Your other organizations/)).toBeNull();
expect(screen.queryByText(/404/)).toBeNull();
});
it("renders only the name when slug/org_id are unset (fresh self-host)", async () => {
mockFetchSession.mockResolvedValue(null);
mockGet.mockImplementation((path: string) => {
if (path === "/cp/orgs")
return Promise.reject(new Error("API GET /cp/orgs: 404 page not found"));
if (path === "/org/identity")
return Promise.resolve({ name: "Molecule AI", slug: "", org_id: "" });
return Promise.reject(new Error(`unexpected path ${path}`));
});
render(<OrgInfoTab />);
await flush();
await waitFor(() => screen.getByText("Current Organization"));
expect(screen.getByText("Molecule AI")).toBeTruthy();
// Empty slug/UUID rows omitted — no copy buttons rendered.
expect(screen.queryByRole("button", { name: /Copy Slug/i })).toBeNull();
expect(screen.queryByRole("button", { name: /Copy UUID/i })).toBeNull();
});
});
// ─── Error + empty handling ──────────────────────────────────────────────────
describe("OrgInfoTab — error + empty", () => {
it("renders an error banner only when BOTH /cp/orgs and /org/identity fail", async () => {
it("renders an error banner when /cp/orgs throws", async () => {
mockFetchSession.mockResolvedValue(null);
mockGet.mockImplementation((path: string) => {
if (path === "/cp/orgs")
return Promise.reject(new Error("API GET /cp/orgs: 404 page not found"));
if (path === "/org/identity")
return Promise.reject(new Error("API GET /org/identity: 500 boom"));
return Promise.reject(new Error(`unexpected path ${path}`));
});
mockGet.mockRejectedValue(new Error("API GET /cp/orgs: 500 boom"));
render(<OrgInfoTab />);
await flush();
@@ -252,14 +193,10 @@ describe("OrgInfoTab — error + empty", () => {
expect(screen.queryByText("Current Organization")).toBeNull();
});
it("renders the recovery hint when /cp/orgs returns an empty list (no crash)", async () => {
it("renders the recovery hint when no org matches (no crash)", async () => {
mockFetchSession.mockResolvedValue(null);
mockGetTenantSlug.mockReturnValue("");
mockGet.mockImplementation((path: string) =>
path === "/cp/orgs"
? Promise.resolve([])
: Promise.reject(new Error(`unexpected path ${path}`)),
);
mockGet.mockResolvedValue([]);
render(<OrgInfoTab />);
await flush();
-1
View File
@@ -1,5 +1,4 @@
export { SettingsPanel } from './SettingsPanel';
export { SettingsTabs } from './SettingsTabs';
export { SettingsButton } from './SettingsButton';
export { SecretsTab } from './SecretsTab';
export { SecretRow } from './SecretRow';
@@ -197,14 +197,6 @@ describe("DisplayTab", () => {
fireEvent.click(screen.getByRole("button", { name: "Take control" }));
const desktop = await screen.findByTitle("Workspace desktop");
// Wait for the RFB instance to actually connect before pasting. The component
// sets rfbRef.current synchronously right after `new RFB()` (which fires
// mockRFBConstructor) INSIDE the async connect() — but the "Workspace desktop"
// title renders before that await resolves. Firing paste immediately races
// rfbRef.current===null, so the window paste handler's
// `rfbRef.current?.clipboardPasteFrom(text)` no-ops (0 calls). This lost the
// race under CI runner load; waiting for the constructor makes it deterministic.
await waitFor(() => expect(mockRFBConstructor).toHaveBeenCalled());
fireEvent.paste(desktop, {
clipboardData: {
getData: (type: string) => (type === "text/plain" ? "Paste Me" : ""),
@@ -1,105 +0,0 @@
// @vitest-environment jsdom
/** Unit tests for useA2AFlights the eventflight lifecycle that drives the
* envelope animations on the canvas (MessageFlightLayer) and the concierge
* home (MessageFlightHome). useSocketEvent is mocked so we can drive the
* ACTIVITY_LOGGED handler directly. */
import { renderHook, act } from "@testing-library/react";
import { describe, it, expect, vi, beforeEach } from "vitest";
// Capture the handler the hook registers with the socket bus. vi.hoisted is
// required because vi.mock factories are hoisted above normal declarations and
// may only close over hoisted state.
const h = vi.hoisted(() => ({ captured: null as ((msg: unknown) => void) | null }));
vi.mock("@/hooks/useSocketEvent", () => ({
useSocketEvent: (cb: (msg: unknown) => void) => {
h.captured = cb;
},
}));
import { useA2AFlights, FLIGHT_DURATION_MS } from "@/hooks/useA2AFlights";
function setReducedMotion(reduce: boolean) {
window.matchMedia = vi.fn().mockImplementation((q: string) => ({
matches: reduce && q.includes("reduce"),
media: q,
onchange: null,
addEventListener: vi.fn(),
removeEventListener: vi.fn(),
addListener: vi.fn(),
removeListener: vi.fn(),
dispatchEvent: vi.fn(),
}));
}
const msg = (payload: Record<string, unknown>, event = "ACTIVITY_LOGGED") => ({
event,
workspace_id: "a",
timestamp: "2026-06-08T00:00:00Z",
payload,
});
const a2aSend = (over: Record<string, unknown> = {}) =>
msg({ activity_type: "a2a_send", source_id: "a", target_id: "b", ...over });
describe("useA2AFlights", () => {
beforeEach(() => {
h.captured = null;
vi.useRealTimers();
setReducedMotion(false);
});
it("emits a flight for an a2a_send between two distinct agents", () => {
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(a2aSend()));
expect(result.current).toHaveLength(1);
expect(result.current[0]).toMatchObject({ sourceId: "a", targetId: "b", kind: "send" });
});
it("maps a2a_receive / task_update to their kinds", () => {
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(a2aSend({ activity_type: "a2a_receive" })));
act(() => h.captured?.(a2aSend({ activity_type: "task_update" })));
const kinds = result.current.map((f) => f.kind);
expect(kinds).toContain("receive");
expect(kinds).toContain("task");
});
it("ignores non-A2A activity and non-ACTIVITY_LOGGED events", () => {
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(msg({ activity_type: "status_change", source_id: "a", target_id: "b" })));
act(() => h.captured?.(a2aSend({}, )));
act(() => h.captured?.({ event: "WORKSPACE_UPDATED", workspace_id: "a", payload: {} }));
expect(result.current.every((f) => f.kind === "send")).toBe(true);
expect(result.current).toHaveLength(1); // only the one valid a2aSend
});
it("skips self-loops and flights with no target", () => {
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(a2aSend({ target_id: "a" }))); // self-loop
act(() => h.captured?.(a2aSend({ target_id: "" }))); // missing target
expect(result.current).toHaveLength(0);
});
it("emits nothing when prefers-reduced-motion is set", () => {
setReducedMotion(true);
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(a2aSend()));
expect(result.current).toHaveLength(0);
});
it("emits nothing when disabled", () => {
const { result } = renderHook(() => useA2AFlights(false));
act(() => h.captured?.(a2aSend()));
expect(result.current).toHaveLength(0);
});
it("expires a flight after the TTL", () => {
vi.useFakeTimers();
const { result } = renderHook(() => useA2AFlights());
act(() => h.captured?.(a2aSend()));
expect(result.current).toHaveLength(1);
act(() => {
vi.advanceTimersByTime(FLIGHT_DURATION_MS + 300);
});
expect(result.current).toHaveLength(0);
});
});
-103
View File
@@ -1,103 +0,0 @@
/** useA2AFlights turns the org's live A2A activity stream into transient
* "flights" (one per delegate / message event, source target) that an
* overlay can animate as an envelope travelling between two agents.
*
* This hook owns ONLY the eventflight lifecycle: it subscribes to the same
* ACTIVITY_LOGGED WS bus the CommunicationOverlay uses, keeps a small bounded
* list of in-flight envelopes, and expires each after the animation window.
* The caller resolves positions and renders the envelope, so the exact same
* flight data drives both the spatial canvas (flow coords) and the concierge
* home (DOM row rects).
*
* Honours `prefers-reduced-motion`: when the user opts out of motion the hook
* emits no flights at all, so no envelope ever animates. */
import { useEffect, useRef, useState } from "react";
import { useSocketEvent } from "@/hooks/useSocketEvent";
export type A2AFlightKind = "send" | "receive" | "task";
export interface A2AFlight {
/** unique per flight instance (not per pair) so a burst renders distinct envelopes */
key: string;
sourceId: string;
targetId: string;
kind: A2AFlightKind;
}
/** Total time an envelope is alive (ms). Kept in sync with the overlay's
* Web-Animations duration; the extra tail gives the fade-out room to finish
* before the element unmounts. */
export const FLIGHT_DURATION_MS = 1200;
const FLIGHT_TTL_MS = FLIGHT_DURATION_MS + 120;
/** Cap concurrent envelopes so a delegation storm can't spawn unbounded DOM. */
const MAX_CONCURRENT = 12;
function reducedMotionNow(): boolean {
return (
typeof window !== "undefined" &&
typeof window.matchMedia === "function" &&
window.matchMedia("(prefers-reduced-motion: reduce)").matches
);
}
export function useA2AFlights(enabled = true): A2AFlight[] {
const [flights, setFlights] = useState<A2AFlight[]>([]);
const reduced = useRef<boolean>(reducedMotionNow());
const timers = useRef<number[]>([]);
// Track reduced-motion preference changes live (a user can toggle it mid-session).
useEffect(() => {
if (typeof window === "undefined" || typeof window.matchMedia !== "function") return;
const mq = window.matchMedia("(prefers-reduced-motion: reduce)");
const onChange = () => {
reduced.current = mq.matches;
if (mq.matches) setFlights([]); // drop any in-flight envelopes immediately
};
mq.addEventListener?.("change", onChange);
return () => mq.removeEventListener?.("change", onChange);
}, []);
// Clear pending expiry timers on unmount.
useEffect(() => {
const t = timers.current;
return () => {
t.forEach((id) => window.clearTimeout(id));
};
}, []);
useSocketEvent((msg) => {
if (!enabled || reduced.current) return;
if (msg.event !== "ACTIVITY_LOGGED") return;
const p = (msg.payload || {}) as {
activity_type?: string;
source_id?: string | null;
target_id?: string | null;
};
const t = p.activity_type;
if (t !== "a2a_send" && t !== "a2a_receive" && t !== "task_update") return;
const sourceId = p.source_id || msg.workspace_id;
const targetId = p.target_id || "";
// A flight needs two distinct endpoints; a self-loop or missing peer has
// nowhere to fly, so skip it.
if (!sourceId || !targetId || sourceId === targetId) return;
const kind: A2AFlightKind =
t === "a2a_receive" ? "receive" : t === "task_update" ? "task" : "send";
const key = `${msg.timestamp || Date.now()}:${sourceId}:${targetId}:${Math.random()
.toString(36)
.slice(2, 8)}`;
setFlights((prev) => [...prev.slice(-(MAX_CONCURRENT - 1)), { key, sourceId, targetId, kind }]);
const id = window.setTimeout(() => {
setFlights((prev) => prev.filter((f) => f.key !== key));
timers.current = timers.current.filter((x) => x !== id);
}, FLIGHT_TTL_MS);
timers.current.push(id);
});
return flights;
}
+7 -10
View File
@@ -1,5 +1,5 @@
import type { Secret } from '@/types/secrets';
import { platformAuthHeaders } from '@/lib/api';
import { getTenantSlug } from '../tenant';
const PLATFORM_URL = process.env.NEXT_PUBLIC_PLATFORM_URL ?? 'http://localhost:8080';
@@ -13,19 +13,16 @@ function apiUrl(workspaceId: string, path = ''): string {
}
async function request<T>(url: string, init?: RequestInit): Promise<T> {
// Auth pair (admin/org Bearer token + tenant slug) + JSON Content-Type come
// from the shared `platformAuthHeaders()` helper. This bespoke fetch
// previously hand-rolled only the slug + Content-Type and OMITTED the
// Authorization bearer — so against a workspace-server with ADMIN_TOKEN set
// (local dev, every SaaS tenant), WorkspaceAuth saw no bearer and no verified
// CP session and returned 401 "missing workspace auth token". That's exactly
// the #178 raw-fetch-forgets-a-header bug shape the helper exists to prevent.
// Match api.ts shape — slug header + cross-origin credentials so SaaS
// cross-subdomain fetches work. See lib/api.ts for the rationale.
const slug = getTenantSlug();
const saasHeaders: Record<string, string> = { 'Content-Type': 'application/json' };
if (slug) saasHeaders['X-Molecule-Org-Slug'] = slug;
const res = await fetch(url, {
...init,
credentials: 'include',
headers: {
'Content-Type': 'application/json',
...platformAuthHeaders(),
...saasHeaders,
...init?.headers,
},
});
-17
View File
@@ -1,17 +0,0 @@
/** Canonical workspace `kind` values the TS mirror of Go's models.Kind*
* constants (`models.KindPlatform` / `models.KindWorkspace`).
*
* Single source of truth for the `kind` magic strings used across the canvas
* (topology, map strip, toolbar, concierge shell). Kept in a leaf module so
* both `@/store/canvas` and `@/store/canvas-topology` can import it without a
* circular dependency. `WorkspaceNodeData.kind` stays a plain `string` these
* are the well-known values to compare against, not an exhaustive enum.
*
* - `Platform` = the org-level concierge (the undeletable org root, hidden
* from the map graph, surfaced as the shell's org root).
* - `Workspace` = an ordinary agent. Also the fallback for older ws-server
* builds that predate the `kind` column. */
export const WORKSPACE_KIND = {
Platform: "platform",
Workspace: "workspace",
} as const;
@@ -11,25 +11,7 @@ import {
childSlotInGrid,
parentMinSize,
parentMinSizeFromChildren,
CHILD_DEFAULT_WIDTH,
CHILD_DEFAULT_HEIGHT,
CHILD_GUTTER,
PARENT_SIDE_PADDING,
PARENT_HEADER_PADDING,
PARENT_BOTTOM_PADDING,
stripPlatformRootForMap,
} from "../canvas-topology";
import { WORKSPACE_KIND } from "../../lib/workspace-kind";
// Layout-math aliases so these assertions track the card-size constants
// instead of hard-coding pixel values (which drift when the card size
// changes — e.g. the 240×130 → 300×176 "bigger cards" redesign).
const W = CHILD_DEFAULT_WIDTH;
const H = CHILD_DEFAULT_HEIGHT;
const GUT = CHILD_GUTTER;
const SIDE = PARENT_SIDE_PADDING;
const HEAD = PARENT_HEADER_PADDING;
const BOTTOM = PARENT_BOTTOM_PADDING;
// ─── sortParentsBeforeChildren ─────────────────────────────────────────────────
@@ -133,34 +115,34 @@ describe("sortParentsBeforeChildren", () => {
// ─── defaultChildSlot ─────────────────────────────────────────────────────────
describe("defaultChildSlot — 2-column grid", () => {
describe("defaultChildSlot — 2-column grid (240×130 cards)", () => {
it("slot 0 → column 0, row 0", () => {
const s = defaultChildSlot(0);
expect(s).toEqual({ x: SIDE, y: HEAD });
expect(s).toEqual({ x: 16, y: 130 });
});
it("slot 1 → column 1, row 0", () => {
const s = defaultChildSlot(1);
expect(s.x).toBe(SIDE + W + GUT); // PARENT_SIDE_PADDING + CHILD_DEFAULT_WIDTH + CHILD_GUTTER
expect(s.y).toBe(HEAD);
expect(s.x).toBe(16 + 240 + 14); // PARENT_SIDE_PADDING + CHILD_DEFAULT_WIDTH + CHILD_GUTTER
expect(s.y).toBe(130);
});
it("slot 2 → column 0, row 1", () => {
const s = defaultChildSlot(2);
expect(s.x).toBe(SIDE);
expect(s.y).toBe(HEAD + H + GUT); // row 0 height + gutter
expect(s.x).toBe(16);
expect(s.y).toBe(130 + 130 + 14); // row 0 height + gutter
});
it("slot 3 → column 1, row 1", () => {
const s = defaultChildSlot(3);
expect(s.x).toBe(SIDE + W + GUT);
expect(s.y).toBe(HEAD + H + GUT);
expect(s.x).toBe(16 + 240 + 14);
expect(s.y).toBe(130 + 130 + 14);
});
it("slot 4 → column 0, row 2", () => {
const s = defaultChildSlot(4);
expect(s.x).toBe(SIDE);
expect(s.y).toBe(HEAD + (H + GUT) * 2); // row 1 end + gutter
expect(s.x).toBe(16);
expect(s.y).toBe(130 + (130 + 14) * 2); // row 1 end + gutter
});
});
@@ -212,35 +194,36 @@ describe("parentMinSize — uniform-size children", () => {
it("1 child → 1 col, 1 row", () => {
const s = parentMinSize(1);
// width = SIDE*2 + 1*W; height = HEAD + 1*H + BOTTOM
expect(s.width).toBe(SIDE * 2 + W);
expect(s.height).toBe(HEAD + H + BOTTOM);
// width = 16*2 + 1*240 + 0 = 272; height = 130 + 1*130 + 0 + 16 = 276
expect(s.width).toBe(16 * 2 + 240);
expect(s.height).toBe(130 + 130 + 16);
});
it("2 children → 2 cols, 1 row", () => {
const s = parentMinSize(2);
// width = SIDE*2 + 2*W + 1*GUT; height = HEAD + 1*H + BOTTOM
expect(s.width).toBe(SIDE * 2 + 2 * W + GUT);
expect(s.height).toBe(HEAD + H + BOTTOM);
// width = 16*2 + 2*240 + 1*14 = 526; height = 130 + 1*130 + 0 + 16 = 276
expect(s.width).toBe(16 * 2 + 2 * 240 + 14);
expect(s.height).toBe(130 + 130 + 16);
});
it("3 children → 2 cols, 2 rows", () => {
const s = parentMinSize(3);
expect(s.width).toBe(SIDE * 2 + 2 * W + GUT);
// height = HEAD + 2*H + 1*GUT + BOTTOM
expect(s.height).toBe(HEAD + 2 * H + GUT + BOTTOM);
// width = 16*2 + 2*240 + 1*14 = 526
expect(s.width).toBe(16 * 2 + 2 * 240 + 14);
// height = 130 + 2*130 + 1*14 + 16 = 416
expect(s.height).toBe(130 + 2 * 130 + 14 + 16);
});
it("4 children → 2 cols, 2 rows (full grid)", () => {
const s = parentMinSize(4);
expect(s.width).toBe(SIDE * 2 + 2 * W + GUT);
expect(s.height).toBe(HEAD + 2 * H + GUT + BOTTOM);
expect(s.width).toBe(16 * 2 + 2 * 240 + 14);
expect(s.height).toBe(130 + 2 * 130 + 14 + 16);
});
it("5 children → 2 cols, 3 rows", () => {
const s = parentMinSize(5);
expect(s.width).toBe(SIDE * 2 + 2 * W + GUT);
expect(s.height).toBe(HEAD + 3 * H + 2 * GUT + BOTTOM);
expect(s.width).toBe(16 * 2 + 2 * 240 + 14);
expect(s.height).toBe(130 + 3 * 130 + 2 * 14 + 16);
});
});
@@ -260,8 +243,8 @@ describe("parentMinSizeFromChildren — variable-size children", () => {
it("two equal-width children → same as parentMinSize(2)", () => {
const fromChildren = parentMinSizeFromChildren([
{ width: W, height: H },
{ width: W, height: H },
{ width: 240, height: 130 },
{ width: 240, height: 130 },
]);
expect(fromChildren.width).toBe(parentMinSize(2).width);
expect(fromChildren.height).toBe(parentMinSize(2).height);
@@ -279,74 +262,3 @@ describe("parentMinSizeFromChildren — variable-size children", () => {
expect(wide.width).toBeGreaterThan(narrow.width);
});
});
// ─── stripPlatformRootForMap ───────────────────────────────────────────────────
describe("stripPlatformRootForMap", () => {
// Minimal Node<WorkspaceNodeData> builder — only the fields the function reads.
const node = (
id: string,
opts: { kind?: string; parentId?: string; x?: number; y?: number } = {},
// eslint-disable-next-line @typescript-eslint/no-explicit-any
): any => ({
id,
position: { x: opts.x ?? 0, y: opts.y ?? 0 },
parentId: opts.parentId,
data: { kind: opts.kind ?? WORKSPACE_KIND.Workspace, parentId: opts.parentId ?? null },
});
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const edge = (source: string, target: string): any => ({ id: `${source}->${target}`, source, target });
it("returns input unchanged when there is no platform node", () => {
const nodes = [node("a"), node("b", { parentId: "a", x: 5, y: 5 })];
const edges = [edge("a", "b")];
const out = stripPlatformRootForMap(nodes, edges);
expect(out.nodes).toBe(nodes); // same reference — no work done
expect(out.edges).toBe(edges);
});
it("removes the platform root, promotes its direct children to absolute positions, and drops platform-touching edges", () => {
const platform = node("P", { kind: WORKSPACE_KIND.Platform, x: 100, y: 50 });
const child = node("c", { parentId: "P", x: 10, y: 20 }); // RF-relative to P
const grandchild = node("g", { parentId: "c", x: 5, y: 5 });
const out = stripPlatformRootForMap(
[platform, child, grandchild],
[edge("P", "c"), edge("c", "g")],
);
// Platform node is gone.
expect(out.nodes.find((n) => n.id === "P")).toBeUndefined();
// Direct child promoted to top-level with absolute position (parentPos + childPos).
const c = out.nodes.find((n) => n.id === "c")!;
expect(c.parentId).toBeUndefined();
expect(c.extent).toBeUndefined();
expect(c.position).toEqual({ x: 110, y: 70 });
expect(c.data.parentId).toBeNull();
// Grandchild (child of a non-platform node) is untouched.
const g = out.nodes.find((n) => n.id === "g")!;
expect(g.parentId).toBe("c");
expect(g.position).toEqual({ x: 5, y: 5 });
// Edge touching the platform node dropped; the other preserved.
expect(out.edges.map((e) => e.id)).toEqual(["c->g"]);
});
it("leaves children of an ordinary (non-platform) parent untouched", () => {
const platform = node("P", { kind: WORKSPACE_KIND.Platform });
const ordinaryParent = node("op", { parentId: "P", x: 200, y: 0 });
const grandchild = node("gc", { parentId: "op", x: 7, y: 9 });
const out = stripPlatformRootForMap([platform, ordinaryParent, grandchild], []);
// op is a direct child of platform → promoted (absolute = 200+0, 0+0).
const op = out.nodes.find((n) => n.id === "op")!;
expect(op.parentId).toBeUndefined();
expect(op.position).toEqual({ x: 200, y: 0 });
// gc's parent is the ordinary node, not platform → relative position preserved.
const gc = out.nodes.find((n) => n.id === "gc")!;
expect(gc.parentId).toBe("op");
expect(gc.position).toEqual({ x: 7, y: 9 });
});
});
+11 -66
View File
@@ -1,7 +1,6 @@
import type { Node, Edge } from "@xyflow/react";
import type { WorkspaceData } from "./socket";
import type { WorkspaceNodeData } from "./canvas";
import { WORKSPACE_KIND } from "@/lib/workspace-kind";
const H_SPACING = 320;
const V_SPACING = 200;
@@ -52,13 +51,13 @@ export function sortParentsBeforeChildren<T extends { id: string; parentId?: str
}
// Grid-slot defaults for children laid under a parent. The card
// component (WorkspaceNode.tsx) renders leaves at exactly w-[300px] /
// min-h-[176px], so a slot stride of CHILD_DEFAULT_WIDTH + CHILD_GUTTER
// guarantees cards never bleed into their neighbour's slot. Keep these
// in sync with the Go mirror in workspace-server/internal/handlers/org.go
// changing one without the other leads to import-time / runtime drift.
export const CHILD_DEFAULT_WIDTH = 300;
export const CHILD_DEFAULT_HEIGHT = 176;
// component (WorkspaceNode.tsx) sets `max-w-[240px]` on leaves, so a
// slot stride of CHILD_DEFAULT_WIDTH + CHILD_GUTTER guarantees cards
// never bleed into their neighbour's slot. Keep these in sync with
// the Go mirror in workspace-server/internal/handlers/org.go
// changing one without the other leads to import-time / runtime drift.
export const CHILD_DEFAULT_WIDTH = 240;
export const CHILD_DEFAULT_HEIGHT = 130;
// Parent header space — reserves room above the child grid so the
// parent's own name + runtime pill + clamped role + currentTask
// banner aren't covered by the first row of child cards. The
@@ -530,10 +529,6 @@ export function buildNodesAndEdges(
// — leave undefined so the chat UI's "?? 'push'" fallback applies.
deliveryMode: ws.delivery_mode,
compute: ws.compute,
// Org-level platform agent ('platform') vs ordinary workspace. The map
// view hides the platform root (it's the undeletable org anchor) via
// stripPlatformRootForMap; the shell home tree keeps it as ROOT.
kind: ws.kind ?? WORKSPACE_KIND.Workspace,
},
};
if (hasParent) {
@@ -558,10 +553,10 @@ export function buildNodesAndEdges(
// - Collapsed parents: leaf-sized (header-only card).
// - Leaves: leaf-sized — they land in their grid slot cleanly.
//
// Sizes are fully system-controlled (free-resize was removed): these
// initial values stand, and at runtime React Flow re-measures leaves
// from their fixed-size card CSS while parents grow to fit children
// (growParentsToFitChildren). Width/height are never persisted.
// NodeResizer still drives user-initiated growth at runtime; these
// are only the initial values, and React Flow updates them in place
// when the user drags a resize handle. A future hydrate() will
// reset to the default until we persist width/height server-side.
const kids = childCounts.get(ws.id) ?? 0;
if (kids > 0 && !ws.collapsed) {
const size = parentSize.get(ws.id)!;
@@ -630,53 +625,3 @@ export function getConfigurationError(
const raw = agentCard.configuration_error;
return typeof raw === "string" && raw.length > 0 ? raw : null;
}
/**
* Map-view filter: removes the org-level platform agent (the concierge) from
* the node graph. The platform agent is the undeletable org ROOT every other
* workspace hangs under it so it is surfaced as the shell's org anchor
* (topbar + Home tree), NOT as a draggable/deletable map node.
*
* Its direct children are promoted to top-level: React Flow stores child
* positions RELATIVE to the parent, so when the parent is dropped each child is
* converted back to an absolute position (parent.position + child.position) and
* its parent binding cleared. Edges touching the platform node are dropped.
*
* The store keeps the full node set (the shell's Home agent tree renders the
* platform as ROOT); only the map's React Flow input is stripped.
*/
export function stripPlatformRootForMap(
nodes: Node<WorkspaceNodeData>[],
edges: Edge[],
): { nodes: Node<WorkspaceNodeData>[]; edges: Edge[] } {
const platformIds = new Set(
nodes.filter((n) => n.data.kind === WORKSPACE_KIND.Platform).map((n) => n.id),
);
if (platformIds.size === 0) return { nodes, edges };
const posById = new Map(nodes.map((n) => [n.id, n.position]));
const outNodes = nodes
.filter((n) => !platformIds.has(n.id))
.map((n) => {
const pid = n.parentId;
if (pid && platformIds.has(pid)) {
const parentPos = posById.get(pid) ?? { x: 0, y: 0 };
return {
...n,
parentId: undefined,
extent: undefined,
position: {
x: parentPos.x + n.position.x,
y: parentPos.y + n.position.y,
},
data: { ...n.data, parentId: null },
} as Node<WorkspaceNodeData>;
}
return n;
});
const outEdges = edges.filter(
(e) => !platformIds.has(e.source) && !platformIds.has(e.target),
);
return { nodes: outNodes, edges: outEdges };
}
+4 -26
View File
@@ -25,8 +25,8 @@ import {
/**
* Walk every parent node and bump its width/height (if explicitly set)
* so the union of its children's relative bboxes plus padding fits. A
* parent's size never shrinks via this path only grows so a parent
* that expanded to fit children stays expanded as their layout settles.
* parent's size never shrinks via this path only grows because
* shrinking on resize would fight the user's own NodeResizer drag.
*/
function growParentsToFitChildren<T extends Record<string, unknown>>(
nodes: Node<T>[],
@@ -74,12 +74,6 @@ function growParentsToFitChildren<T extends Record<string, unknown>>(
export { summarizeWorkspaceCapabilities } from "./canvas-capabilities";
export type { WorkspaceCapabilitySummary } from "./canvas-capabilities";
/** Canonical workspace `kind` values the TS mirror of Go's models.Kind*
* constants. Defined in a leaf module (`@/lib/workspace-kind`) and re-exported
* here for convenience so consumers can keep importing from `@/store/canvas`.
* Use these instead of the bare "platform"/"workspace" string literals. */
export { WORKSPACE_KIND } from "@/lib/workspace-kind";
export interface WorkspaceNodeData extends Record<string, unknown> {
name: string;
status: string;
@@ -92,10 +86,6 @@ export interface WorkspaceNodeData extends Record<string, unknown> {
lastSampleError: string;
url: string;
parentId: string | null;
/** 'platform' = the org concierge (hidden from the map graph, surfaced as the
* shell's org root); 'workspace' = ordinary agent. Optional: absent on older
* ws-server builds / some event-constructed nodes treat absent as ordinary. */
kind?: string;
currentTask: string;
runtime: string;
workspaceAccess?: string | null;
@@ -152,12 +142,6 @@ export interface WorkspaceNodeData extends Record<string, unknown> {
export type PanelTab = "details" | "skills" | "chat" | "terminal" | "display" | "container-config" | "config" | "schedule" | "channels" | "files" | "memory" | "traces" | "events" | "activity" | "audit";
/**
* Top-level canvas view. "home" is the Org Concierge view (chat with the
* platform agent); "map" is the node-graph canvas (the original view).
*/
export type TopView = "home" | "map" | "settings";
export interface ContextMenuState {
x: number;
y: number;
@@ -170,8 +154,6 @@ interface CanvasState {
edges: Edge[];
selectedNodeId: string | null;
panelTab: PanelTab;
/** Top-level view: Org Concierge home (chat) vs the node-graph map. */
topView: TopView;
dragOverNodeId: string | null;
contextMenu: ContextMenuState | null;
// Live width of the SidePanel in pixels. Only meaningful when
@@ -192,7 +174,6 @@ interface CanvasState {
savePosition: (nodeId: string, x: number, y: number) => void;
selectNode: (id: string | null) => void;
setPanelTab: (tab: PanelTab) => void;
setTopView: (view: TopView) => void;
getSelectedNode: () => Node<WorkspaceNodeData> | null;
updateNodeData: (id: string, data: Partial<WorkspaceNodeData>) => void;
restartWorkspace: (id: string, options?: { applyTemplate?: boolean }) => Promise<void>;
@@ -302,7 +283,6 @@ export const useCanvasStore = create<CanvasState>((set, get) => ({
edges: [],
selectedNodeId: null,
panelTab: "chat",
topView: "home",
dragOverNodeId: null,
contextMenu: null,
sidePanelWidth: 480, // matches SIDEPANEL_DEFAULT_WIDTH in SidePanel.tsx
@@ -438,7 +418,6 @@ export const useCanvasStore = create<CanvasState>((set, get) => ({
}
},
setPanelTab: (tab) => set({ panelTab: tab }),
setTopView: (view) => set({ topView: view }),
setDragOverNode: (id) => set({ dragOverNodeId: id }),
batchNest: async (nodeIds, targetId) => {
@@ -972,9 +951,8 @@ export const useCanvasStore = create<CanvasState>((set, get) => ({
// response to the child near its edge, the child's relative
// position becomes valid again and the grow stops mid-drag, only to
// resume on the next tick. Commit-on-release: only run grow when a
// change set contains a `dimensions` change (React Flow's auto-measure
// of a card's fixed-size CSS), not on pure `position` changes. Drag-stop
// grow is handled
// change set contains a `dimensions` change (NodeResizer commit),
// not on pure `position` changes. Drag-stop grow is handled
// explicitly in Canvas.onNodeDragStop via growOnce().
const hasDimensionChange = changes.some((c) => c.type === "dimensions");
set({ nodes: hasDimensionChange ? growParentsToFitChildren(next) : next });
-5
View File
@@ -319,11 +319,6 @@ export interface WorkspaceData {
agent_card: Record<string, unknown> | null;
url: string;
parent_id: string | null;
/** Workspace kind: 'platform' = the org-level concierge (the undeletable org
* root, hidden from the map graph); 'workspace' = an ordinary agent. Absent
* on older ws-server builds that predate the kind column treat as
* 'workspace'. (migration 20260606000000_workspaces_kind) */
kind?: string;
active_tasks: number;
max_concurrent_tasks?: number | null;
last_error_rate: number;
+1 -39
View File
@@ -69,43 +69,10 @@ services:
# Override to "production" for SaaS/staged deploys; in those modes
# ADMIN_TOKEN must also be set or every request rejects.
MOLECULE_ENV: "${MOLECULE_ENV:-development}"
# Self-hosted: no control plane to install the org's platform agent
# (concierge), so the tenant server seeds it on boot. Idempotent; unset it
# if you don't want the auto-seeded Org Concierge root.
MOLECULE_SEED_PLATFORM_AGENT: "${MOLECULE_SEED_PLATFORM_AGENT:-true}"
# Org display name. Drives the platform-agent name ("<MOLECULE_ORG_NAME>
# Agent", e.g. "Molecule AI Agent") and the canvas topbar (via the open
# GET /org/identity route). Empty → legacy "Org Concierge" + no topbar name.
MOLECULE_ORG_NAME: "${MOLECULE_ORG_NAME:-Molecule AI}"
CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:${CANVAS_PUBLISH_PORT:-3000},http://127.0.0.1:${CANVAS_PUBLISH_PORT:-3000},http://localhost:3001}
RATE_LIMIT: "${RATE_LIMIT:-1000}"
CONFIGS_DIR: /configs
# Runtime/template SSOT parity with production. The image bakes the FULL
# template set (claude-code-default, codex, google-adk, hermes, openclaw,
# seo-agent) at /workspace-configs-templates, but the ./workspace-configs-
# templates:/configs mount below only carries claude-code-default on the
# host — so without this, GET /templates (the runtime-picker SSOT) listed
# only claude-code locally while production lists them all. Pointing the
# template cache-dir at the baked bundle makes the local runtime LIST match
# production. NOTE: the local Docker provisioner bind-mounts a template
# from CONFIGS_HOST_DIR (host path) at provision time, and the host dir
# only has claude-code-default — so the other runtimes are SELECTABLE but
# only claude-code is PROVISIONABLE locally (their images + host templates
# aren't present in this lightweight dev stack). Real provisioning of the
# other runtimes is covered by the staging e2e, which carries all images.
TEMPLATE_CACHE_DIR: "${TEMPLATE_CACHE_DIR:-/workspace-configs-templates}"
CONFIGS_HOST_DIR: "${CONFIGS_HOST_DIR:-${PWD}/workspace-configs-templates}"
# ORG-TEMPLATE SSOT parity — same shadowing fix as TEMPLATE_CACHE_DIR
# above, for ORG templates (the Home page's ORG TEMPLATES section). The
# image bakes the default org templates (molecule-dev,
# molecule-worker-gemini, ux-ab-lab) at /org-templates. Previously the
# `./org-templates:/org-templates:ro` mount bind-mounted an EMPTY host dir
# over that exact path, shadowing the baked defaults — so the Home page
# showed "No org templates in org-templates/" locally while production
# listed all three. The shadowing mount is removed below; this env points
# findOrgDir() at the baked bundle so the local listing matches production.
# Override to a populated host dir to develop your own org templates.
ORG_TEMPLATES_DIR: "${ORG_TEMPLATES_DIR:-/org-templates}"
PLUGINS_HOST_DIR: "${PLUGINS_HOST_DIR:-${PWD}/plugins}"
# github-app-auth plugin — injects GITHUB_TOKEN / GH_TOKEN into every
# workspace env from the App installation token. Remap the host-side
@@ -158,12 +125,7 @@ services:
IMAGE_AUTO_REFRESH: "${IMAGE_AUTO_REFRESH:-true}"
volumes:
- ./workspace-configs-templates:/configs
# NOTE: the empty host ./org-templates is intentionally NOT mounted over
# the baked /org-templates — that shadowed the image's default org
# templates and made the Home page show "No org templates". The platform
# reads org templates from ORG_TEMPLATES_DIR (set to the baked
# /org-templates above). To develop custom org templates, mount a
# POPULATED host dir at a different path and point ORG_TEMPLATES_DIR at it.
- ./org-templates:/org-templates:ro
- ./plugins:/plugins:ro
- /var/run/docker.sock:/var/run/docker.sock
# App private key — read-only bind-mount. The host-side path is
-109
View File
@@ -1,109 +0,0 @@
# RFC: User Tasks — agent→user action requests
**Status:** Draft — pre-implementation design SSOT. New primitive; normally
needs CTO sign-off before merge (authorized in-session by the CTO for the
concierge build).
**Author:** core-devops (canvas concierge work)
**Related:** RFC #2360 (platform agent / Org Concierge), PR #2385 (canvas redesign)
## Problem
The Org Concierge home has a **Tasks** tab. "Tasks" is meant to be **things an
agent asks the *user* to do** — e.g. "Review the launch draft", "Provide the
Stripe API key", "Confirm the publish date". Today there is **no backend** for
this: the only agent→user mechanisms are
- **Approvals** (`approval_requests`) — sign-off for *destructive* ops only, and
- **`send_message_to_user` / `notify_user`** — unstructured chat messages with no
state (you can't mark them done, and they don't form a worklist).
So the Tasks tab had to be wired to **schedules** as an interim stand-in, which
is the wrong concept.
## Design
A small structured primitive that mirrors the **approvals** subsystem (same
shape, minus the destructive-gating semantics).
### Data — `user_tasks`
```sql
CREATE TABLE user_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id UUID NOT NULL, -- the agent that raised the ask
title TEXT NOT NULL, -- the ask, one line
detail TEXT, -- optional longer context
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','done','dismissed')),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
resolved_at TIMESTAMPTZ,
resolved_by TEXT
);
CREATE INDEX idx_user_tasks_pending ON user_tasks (status, created_at DESC);
```
### Endpoints (mirror `approvals`)
| Method + path | Auth | Purpose |
|---|---|---|
| `POST /workspaces/:id/user-tasks` | WorkspaceAuth | Agent raises an ask `{title, detail?}``201 {user_task_id, status:"pending"}` |
| `GET /workspaces/:id/user-tasks` | WorkspaceAuth | A workspace **reads its own** tasks (any status) |
| `PATCH /workspaces/:id/user-tasks/:taskId` | WorkspaceAuth | A workspace **updates its own** task `{title?, detail?, status?}` (scoped by `workspace_id`) |
| `DELETE /workspaces/:id/user-tasks/:taskId` | WorkspaceAuth | A workspace **deletes its own** task (scoped by `workspace_id`) |
| `GET /user-tasks/pending` | AdminAuth | Cross-workspace pending list for the concierge Tasks tab → `[{id, workspace_id, workspace_name, title, detail, status, created_at}]` |
| `POST /workspaces/:id/user-tasks/:taskId/resolve` | WorkspaceAuth | User marks `{status:"done"|"dismissed", resolved_by?}``200` |
**Any** workspace (not just the platform agent) can create and manage its own
tasks; the `:id` workspace scope on update/delete means an agent can only touch
tasks it raised. The Home Tasks list (`/user-tasks/pending`) is org-wide, so
every workspace's asks surface in one place for the user.
`/user-tasks/pending` is AdminAuth + cross-workspace exactly like
`/approvals/pending` (an unauthenticated caller must not enumerate every org's
asks).
### MCP tool — `request_user_action`
Added to the **in-workspace `a2a` MCP** (same place as `send_message_to_user`)
so every agent can raise an ask:
```
request_user_action(title, detail?) → raise an ask (insert + USER_TASK_REQUESTED)
list_user_tasks() → read the asks this workspace raised + status
update_user_task(user_task_id, title?, detail?, status?) → edit own task
delete_user_task(user_task_id) → delete own task
```
So every agent (any workspace, via MCP) can create AND manage its own asks —
`request_user_action` is the create; `list_/update_/delete_user_task` are the
read/update/delete, all scoped to tasks the calling workspace raised. None are
gated behind `MOLECULE_MCP_ALLOW_SEND_MESSAGE` (that gate is specific to
`send_message_to_user`); raising/managing an ask is always allowed.
### Events
`USER_TASK_REQUESTED`, `USER_TASK_RESOLVED` — broadcast on the existing
Broadcaster so the canvas updates live (same pattern as `APPROVAL_*`).
### Canvas wiring (PR #2385)
The concierge **Tasks** tab fetches `GET /user-tasks/pending`, renders each as a
task card (title + detail + originating agent), with **Done** / **Dismiss**
buttons calling the resolve endpoint. The tab count badge reflects the pending
count. Replaces the interim schedules wiring.
## SSOT discipline / non-goals
- Reuses the approvals pattern, Broadcaster, and WorkspaceAuth/AdminAuth split —
no new auth path, no new event bus.
- **Not** an approval/gate: resolving a user-task has no server-side enforcement
effect; it's a worklist signal. (Destructive gating stays in `approvals`.)
- No `org_id` column; cross-workspace listing joins `workspaces` like approvals.
## Rollout
Phase 0 migration ships idempotently (`IF NOT EXISTS`). Backend + MCP tool +
canvas wiring land together behind the concierge Home (already gated as the new
UI). Full molecule-core SOP gate applies (tier label + qa-review +
security-review + green CI).
+2 -2
View File
@@ -278,7 +278,7 @@ receive **HTTP 401** on every API call. Affected workflows in molecule-core:
| Workflow | Symptom | Workaround |
|---|---|---|
| `gate-check-v3.yml` | Reports BLOCKED on every PR | Provision `SOP_CHECKLIST_GATE_TOKEN`; update workflow to use it |
| `gate-check-v3.yml` | Reports BLOCKED on every PR | Provision `SOP_TIER_CHECK_TOKEN`; update workflow to use it |
| `qa-review.yml` | Fails immediately on PR open | Same — needs named secret |
| `security-review.yml` | Fails immediately on PR open | Same — needs named secret |
@@ -313,7 +313,7 @@ dispatcher may fire **only 1 of N eligible workflows** on the initial
This was observed on molecule-core PR #558 (created 2026-05-11T19:54:10Z):
12+ workflows had no `paths:` filter and should have fired, but only
`sop-checklist.yml` dispatched.
`sop-tier-check.yml` dispatched.
Concurrent PRs created within the same minute received 1230 dispatches each,
confirming this is specific to the PR-create event dispatch, not a general
-10
View File
@@ -229,11 +229,6 @@ ssm_refresh_ecr_auth() {
# to guarantee correct string escaping (OFFSEC-001 / CWE-78 hardening).
# Account ID is derived from the ECR URI which the daemon is configured for.
local acct="${ECR_ACCOUNT_ID:-153263036946}"
# #676: validate account ID is exactly 12 digits (AWS account ID format).
if ! [[ "$acct" =~ ^[0-9]{12}$ ]]; then
err "invalid ECR_ACCOUNT_ID (must be 12 digits): $acct"
return 1
fi
local params
params=$(mktemp)
python3 -c "
@@ -295,11 +290,6 @@ validate_slug() {
preflight() {
log "preflight: source=$SOURCE_TAG dest=$DEST_TAG repo=$REPO region=$REGION"
# Region validation: reject obviously malformed input (CWE-78 / injection guard).
if ! [[ "$REGION" =~ ^[a-z][a-z0-9-]*[0-9]$ ]]; then
err "invalid AWS region: $REGION"
exit 64
fi
local src_manifest
src_manifest=$(aws_ecr_get_image "$SOURCE_TAG") || {
err "source tag '$SOURCE_TAG' not found in $REPO"
+4 -19
View File
@@ -311,22 +311,7 @@ for slug in $valid_slugs; do
fi
done
printf '\n== Test 11: region validation — malicious region rejected with exit 64 (#676) ==\n'
# Attack vectors: shell metacharacters, path traversal, command substitution.
_invalid_regions='us;rm -rf / $(whoami) us"east-1 ../etc/passwd `id` $HOME us/east-1'
for bad_region in $_invalid_regions; do
set +e
out=$(AWS_REGION="$bad_region" "$SCRIPT" --source-tag x --dest-tag y --tenants chloe-dong --mock-dir /nonexistent 2>&1); rc=$?
set -e
if [[ $rc -eq 64 ]] && printf '%s' "$out" | grep -q 'invalid AWS region'; then
PASS=$((PASS + 1)); printf ' ✓ region rejected: %s\n' "$(printf '%q' "$bad_region")"
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("region-reject:$bad_region")
printf ' ✗ region should be rejected: %s — got exit %s\n' "$(printf '%q' "$bad_region")" "$rc"
fi
done
printf '\n== Test 12: ROLLBACK_TAG follows YYYYMMDD via NOW_OVERRIDE_DATE ==\n'
printf '\n== Test 11: ROLLBACK_TAG follows YYYYMMDD via NOW_OVERRIDE_DATE ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{}' 0
mock_set "$m" aws_ecr_describe_image '' 1
@@ -348,7 +333,7 @@ fi
assert_calls_contain "rollback tag uses NOW_OVERRIDE_DATE (20260603)" "$m" 'aws_ecr_put_image b-prev-20260603'
rm -rf "$m"
printf '\n== Test 13: empty source manifest fails preflight ==\n'
printf '\n== Test 12: empty source manifest fails preflight ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '' 0 # rc=0 but empty body (the "None" case)
out=$(run_script "$m")
@@ -356,7 +341,7 @@ assert_exit "empty source manifest fails preflight" "$out" 1
assert_contains "empty manifest message" "$out" 'returned empty manifest'
rm -rf "$m"
printf '\n== Test 14: tenant_buildinfo failure during verify → rollback ==\n'
printf '\n== Test 13: tenant_buildinfo failure during verify → rollback ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
@@ -370,7 +355,7 @@ assert_contains "logs buildinfo failure" "$out" '/buildinfo failed for chloe-don
assert_contains "rollback fired after verify fail" "$out" 'ROLLBACK:'
rm -rf "$m"
printf '\n== Test 15: ssm_refresh_ecr_auth JSON escaping (CWE-78 / OFFSEC-001) ==\n'
printf '\n== Test 14: ssm_refresh_ecr_auth JSON escaping (CWE-78 / OFFSEC-001) ==\n'
# Verify the python3 snippet in ssm_refresh_ecr_auth produces valid JSON and
# correctly escapes shell-injection characters in region + account ID fields.
# The fix replaces unquoted shell-printf interpolation with json.dumps.
-33
View File
@@ -1,33 +0,0 @@
# Tiny stub runtime image for the local Docker-provisioner lifecycle e2e.
#
# It impersonates a real workspace runtime's platform contract (register +
# heartbeat + A2A message/send) with ZERO LLM/SDK weight so the lifecycle e2e
# (provision -> online -> restart-survive -> proxy-reach) runs in seconds and
# without the 2.5GB real claude-code image.
#
# Resolution trick (see tests/e2e/test_local_provision_lifecycle_e2e.sh):
# the local provisioner resolves runtime=claude-code via RegistryModeLocal,
# which is a `docker image inspect` cache-check on
# molecule-local/workspace-template-claude-code:<gitea-HEAD-sha12>
# BEFORE it clones+builds. Pre-tagging THIS image to that exact cache tag makes
# the provisioner cache-hit the stub instead of building the real template.
#
# linux/amd64: the provisioner forces --platform=linux/amd64 for every workspace
# container (defaultImagePlatform, #1875) for parity with the amd64-only prod
# images. Build the stub amd64 too so the platforms match and Docker doesn't
# refuse the create with a manifest mismatch.
FROM --platform=linux/amd64 python:3.12-alpine
# /configs is the named-volume mount point the provisioner attaches
# (ws-<id>-configs:/configs). The real entrypoint chowns it; the stub just
# needs the dir to exist so a missing-mount never trips it up.
RUN mkdir -p /configs /workspace
WORKDIR /app
COPY server.py /app/server.py
EXPOSE 8000
# No gosu/agent-uid drop here — the stub does no privileged work and the e2e
# only cares about the platform contract, not the agent-uid security posture.
ENTRYPOINT ["python3", "/app/server.py"]
-307
View File
@@ -1,307 +0,0 @@
#!/usr/bin/env python3
"""Minimal stub runtime for the local Docker-provisioner lifecycle e2e.
This is NOT a real agent it carries no LLM, no claude-code SDK, no plugin
host. Its only job is to satisfy the platform's runtime<->platform contract so
the `test_local_provision_lifecycle_e2e.sh` harness can prove the LOCAL Docker
provisioner can provision a workspace, bring it online, SURVIVE A RESTART
(reusing the config volume), and route an A2A `message/send` through the
platform proxy all WITHOUT building/booting the 2.5GB real claude-code image.
Contract it replicates (discovered from workspace-server):
* registration is done BY the runtime container on boot (NOT the provisioner).
The provisioner only sets status=provisioning + pre-stores the host URL; the
container must POST /registry/register itself, and the heartbeat loop is what
transitions provisioning -> online (registry.go evaluateStatus, #1784).
* env vars the real entrypoint reads, injected by buildContainerEnv():
WORKSPACE_ID - this workspace's UUID
PLATFORM_URL - canonical platform base URL (e.g. http://platform:8080)
We read exactly those (with WORKSPACE_CONFIG_PATH for the config.yaml probe).
* POST {PLATFORM_URL}/registry/register
body: {"id", "url", "agent_card":{"name","skills":[]}}
- url MUST be push-routable. The provisioner runs the platform inside
Docker, so it rewrites a stored http://127.0.0.1:<port> URL to the
container-DNS form http://ws-<id[:12]>:8000 before proxying
(a2a_proxy.go resolveAgentURL). We register our OWN container-DNS URL
(http://<hostname>:8000) so SSRF validation passes in SaaS mode AND the
proxy can reach us; in self-hosted (non-saas) mode RFC-1918 is blocked,
so we fall back to registering by the ws-<id> alias hostname which
resolves on molecule-core-net.
- first register returns {"auth_token": ...}; we keep it for heartbeats.
* POST {PLATFORM_URL}/registry/heartbeat (every ~10s)
header: Authorization: Bearer <auth_token>
body: {"workspace_id","error_rate","sample_error","active_tasks",
"uptime_seconds","current_task"}
This is what lifts the workspace provisioning -> online and keeps the
Redis liveness TTL fresh (so the restart re-online assertion can pass).
* listen on :8000 and answer the A2A JSON-RPC the proxy forwards:
POST / {"jsonrpc","id","method":"message/send","params":{...}}
-> 200 {"jsonrpc":"2.0","id":<echoed>,
"result":{"kind":"message","role":"agent",
"parts":[{"kind":"text","text":"STUB OK"}],
"messageId":<uuid>}}
The result envelope matches what test_a2a_e2e.sh asserts on
(result.parts[0].text, role=agent, kind=text). A health path (/health and
GET /) returns 200 so any probe sees the container as alive.
"""
import json
import os
import sys
import threading
import time
import urllib.request
import urllib.error
import uuid
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
PORT = 8000
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "").strip()
PLATFORM_URL = (os.environ.get("PLATFORM_URL") or os.environ.get("MOLECULE_URL") or "").rstrip("/")
HOSTNAME = os.environ.get("HOSTNAME", "").strip() # docker sets this to the container id; ws-<id> alias also resolves
# URL we register with. Two hard constraints, discovered from workspace-server:
#
# * validateAgentURL (registry.go) blocks RFC-1918 ranges in NON-saas mode
# (this dev stack sets neither MOLECULE_DEPLOY_MODE=saas nor MOLECULE_ORG_ID
# -> strict mode). The molecule-core-net bridge is 172.18.0.0/16, INSIDE the
# blocked 172.16/12 — so registering our own ws-<id>:8000 DNS name (which
# resolves to a 172.18.x bridge IP) would be REJECTED and we'd never get an
# auth_token. "localhost" is explicitly allowed BY NAME (no DNS lookup).
#
# * the proxy doesn't use the URL we register anyway: the provisioner
# pre-stores http://127.0.0.1:<host-port>, the register upsert PRESERVES any
# existing 127.0.0.1 URL (CASE WHEN url LIKE 'http://127.0.0.1%'), and when
# the platform runs in Docker resolveAgentURL rewrites that to the container
# -DNS form http://ws-<id[:12]>:8000 before forwarding. So our listen
# address (0.0.0.0:8000, reachable as ws-<id>:8000 on the bridge) is what
# the proxy actually hits — independent of the URL string we register.
#
# Net: register a name-form localhost URL purely to satisfy push-mode's
# "url required + must pass SSRF check" and to get our auth_token. Routing is
# handled by the provisioner-stored 127.0.0.1 URL + the proxy rewrite.
_short = WORKSPACE_ID[:12] if len(WORKSPACE_ID) > 12 else WORKSPACE_ID
SELF_URL = os.environ.get("STUB_REGISTER_URL", f"http://localhost:{PORT}")
CONFIG_PATH = (os.environ.get("WORKSPACE_CONFIG_PATH") or "/configs").rstrip("/")
AUTH_TOKEN_FILE = f"{CONFIG_PATH}/.auth_token"
AUTH_TOKEN = None
_started = time.time()
def _log(msg):
print(f"[stub-runtime {_short}] {msg}", flush=True)
def read_volume_token():
"""The provisioner pre-writes the CURRENT workspace bearer to
/configs/.auth_token before every container start (issueAndInjectToken,
#1877), and ROTATES it on every (re)provision (RevokeAllForWorkspace +
IssueToken). So the volume file NOT the register-response token is the
authoritative, rotation-proof bearer. Reading it on each heartbeat means a
provision-time token rotation never wedges our heartbeat at 401 (which is
what kept the workspace stuck in 'provisioning' instead of flipping online).
"""
try:
with open(AUTH_TOKEN_FILE, "r") as f:
tok = f.read().strip()
return tok or None
except Exception:
return None
def _post_json(path, payload, token=None):
url = f"{PLATFORM_URL}{path}"
data = json.dumps(payload).encode()
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if token:
req.add_header("Authorization", f"Bearer {token}")
with urllib.request.urlopen(req, timeout=15) as resp:
body = resp.read().decode()
return resp.status, body
def register():
"""POST /registry/register. Returns the issued auth_token (first register).
C18 hijack guard: once the workspace has ANY live token on file (the
provisioner mints+injects one into /configs/.auth_token before start), a
register MUST carry that workspace's bearer or it 401s. So we send the
volume token (if present). First-ever boot has no live token yet bootstrap
register (no bearer) is allowed and returns the freshly-issued auth_token.
"""
global AUTH_TOKEN
payload = {
"id": WORKSPACE_ID,
"url": SELF_URL,
"delivery_mode": "push",
"agent_card": {
"name": WORKSPACE_ID,
"description": "stub runtime (e2e lifecycle)",
"skills": [],
},
}
status, body = _post_json("/registry/register", payload, token=read_volume_token())
_log(f"register -> {status} {body[:200]}")
try:
parsed = json.loads(body)
except Exception:
parsed = {}
tok = parsed.get("auth_token")
if tok:
AUTH_TOKEN = tok
_log("captured auth_token from register response")
return status
def current_token():
# Volume file is authoritative (rotation-proof); fall back to the token we
# captured from the register response if the file isn't there yet.
return read_volume_token() or AUTH_TOKEN
def heartbeat():
payload = {
"workspace_id": WORKSPACE_ID,
"error_rate": 0.0,
"sample_error": "",
"active_tasks": 0,
"uptime_seconds": int(time.time() - _started),
"current_task": "",
}
status, body = _post_json("/registry/heartbeat", payload, token=current_token())
return status, body
def register_with_retry():
# The platform may still be wiring the row when we boot; retry a few times.
# Register is best-effort for the e2e (heartbeat drives online); a sticky
# 401 just means the workspace already has a live token and our volume token
# is momentarily stale — the heartbeat path re-reads the volume each beat.
for attempt in range(1, 11):
try:
status = register()
if status == 200:
return True
_log(f"register attempt {attempt}: HTTP {status}, retrying")
except urllib.error.HTTPError as e:
_log(f"register attempt {attempt}: HTTPError {e.code} {e.read().decode()[:200]}")
except Exception as e:
_log(f"register attempt {attempt}: {e}")
time.sleep(2)
return False
def heartbeat_loop():
# Fire the FIRST heartbeat immediately (no initial 5s wait) — the
# provisioning->online transition is driven by the heartbeat handler
# (registry.go evaluateStatus, #1784), so an eager first beat minimises the
# provision->online latency the e2e polls on.
while True:
try:
status, body = heartbeat()
if status != 200:
_log(f"heartbeat -> {status} {body[:160]}")
# A 401 means our token was rotated (every provision rotates the
# workspace token, issueAndInjectToken -> RevokeAllForWorkspace).
# Re-register to mint a fresh one. This is what lets the SAME
# container process survive a platform-side token rotation.
if status == 401:
_log("heartbeat 401 — re-registering to refresh token")
register_with_retry()
except urllib.error.HTTPError as e:
if e.code == 401:
_log("heartbeat 401 (HTTPError) — re-registering")
register_with_retry()
else:
_log(f"heartbeat HTTPError {e.code}")
except Exception as e:
_log(f"heartbeat error: {e}")
time.sleep(5)
class Handler(BaseHTTPRequestHandler):
def log_message(self, *args): # silence default access logging
pass
def _send(self, code, obj):
body = json.dumps(obj).encode()
self.send_response(code)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def do_GET(self):
# Health: any GET returns 200 so probes see us as alive.
self._send(200, {"status": "ok", "stub": True, "workspace_id": WORKSPACE_ID})
def do_POST(self):
length = int(self.headers.get("Content-Length", "0") or "0")
raw = self.rfile.read(length) if length else b"{}"
try:
req = json.loads(raw or b"{}")
except Exception:
req = {}
method = req.get("method", "")
req_id = req.get("id", str(uuid.uuid4()))
if method and method != "message/send":
# Match the proxy's -32601 method-not-found contract for unknowns.
self._send(200, {
"jsonrpc": "2.0",
"id": req_id,
"error": {"code": -32601, "message": f"method not found: {method}"},
})
return
# Canned A2A reply — exact envelope the canvas/proxy + test_a2a_e2e.sh
# assert on: result.role=agent, result.parts[0].kind=text/text.
self._send(200, {
"jsonrpc": "2.0",
"id": req_id,
"result": {
"kind": "message",
"role": "agent",
"parts": [{"kind": "text", "text": "STUB OK"}],
"messageId": str(uuid.uuid4()),
},
})
def main():
if not WORKSPACE_ID or not PLATFORM_URL:
_log(f"FATAL: WORKSPACE_ID={WORKSPACE_ID!r} PLATFORM_URL={PLATFORM_URL!r} — both required")
sys.exit(1)
_log(f"booting: platform={PLATFORM_URL} self_url={SELF_URL} hostname={HOSTNAME}")
# Start the HTTP server FIRST so the platform can reach us the instant we
# register (avoids a race where the proxy forwards before we're listening).
server = ThreadingHTTPServer(("0.0.0.0", PORT), Handler)
threading.Thread(target=server.serve_forever, daemon=True).start()
_log(f"listening on :{PORT}")
# Try to register, but do NOT make heartbeating contingent on it. The
# provisioning->online transition is driven by the HEARTBEAT handler
# (registry.go evaluateStatus, #1784), and heartbeats authenticate with the
# volume token (rotation-proof). If register transiently 401s (e.g. a token
# rotation mid-boot), we must still heartbeat so the workspace can come
# online — blocking the heartbeat loop on register success is exactly what
# kept the workspace stuck in 'provisioning'. register_with_retry runs in a
# background thread; the foreground heartbeat loop starts immediately.
threading.Thread(target=register_with_retry, daemon=True).start()
heartbeat_loop()
if __name__ == "__main__":
main()
@@ -1,255 +0,0 @@
#!/usr/bin/env bash
# LOCAL functional variant of the concierge-creates-a-workspace gate.
#
# Same proof as tests/e2e/test_staging_concierge_creates_workspace_e2e.sh but
# against the ALREADY-RUNNING local stack (BASE, default http://localhost:8080),
# so the "concierge actually invokes create_workspace via the platform MCP" claim
# can be demonstrated locally — far faster than provisioning an EC2 tenant.
#
# Drive the AGENT (not the REST API): send the concierge an A2A message/send
# ("create a workspace named e2e-cncrg-worker-<runid> with role engineer") and
# assert the DETERMINISTIC SIDE EFFECT — that named workspace now EXISTS in
# GET /workspaces — which can only happen if the concierge's LLM really invoked
# the create_workspace platform-MCP tool.
#
# SKIP-LOUD GATE (this is the whole point of the local variant). The platform MCP
# tools — incl. create_workspace — only light up on the DEDICATED platform-agent
# image (Dockerfile.platform-agent, ships /opt/molecule-mcp-server). The ordinary
# `claude-code` image the default local stack provisions the concierge on does
# NOT ship it (platform_agent.go SELF-HOST CAVEAT). So before driving the agent
# this script PROBES the concierge's own MCP tool list (POST /workspaces/:id/mcp
# tools/list) and SKIPs LOUD (exit 0) unless create_workspace is actually present.
# It also skips-loud when no concierge is seeded or it isn't online. That makes
# this runnable on any local stack: it only EXERCISES the path when the local
# stack can actually run it, and never false-reds when it can't.
#
# To make the local stack able to run this GREEN you need BOTH:
# 1. A concierge seeded as the kind='platform' root. The self-hosted compose
# sets MOLECULE_SEED_PLATFORM_AGENT=1 so the ws-server self-seeds it
# (EnsureSelfHostedPlatformAgent) + best-effort provisions it on boot
# (MaybeProvisionPlatformAgentOnBoot).
# 2. That concierge running on the platform-agent image (so create_workspace
# exists) WITH a working model key (e.g. MINIMAX_API_KEY / a BYOK key) so its
# LLM can run the tool. The default `claude-code` image will SKIP at the MCP
# probe — that's expected and honest, not a failure.
#
# Env contract:
# BASE default http://localhost:8080
# MOLECULE_ADMIN_TOKEN platform admin bearer IF the local stack sets
# ADMIN_TOKEN (devmode fail-open if unset). Used by
# _lib.sh helpers for admin-gated GET/DELETE.
# E2E_CONCIERGE_ONLINE_SECS default 300 (local boot budget)
# E2E_AGENT_ACT_SECS default 300 (LLM think+tool-call budget)
# E2E_RUN_ID slug/name suffix; default $$-based
#
# Exit codes:
# 0 concierge created the workspace, OR honest skip-loud (path not runnable)
# 1 generic / assertion failure (agent didn't act, or the tool failed)
set -euo pipefail
: "${BASE:=http://localhost:8080}"
export BASE
# shellcheck disable=SC1091
# shellcheck source=_lib.sh
source "$(dirname "$0")/_lib.sh"
# Error-as-text scanner so a concierge that surfaces a tool error AS its reply
# is distinguished from a clean "created it" reply.
# shellcheck disable=SC1091
# shellcheck source=lib/completion_assert.sh
source "$(dirname "$0")/lib/completion_assert.sh"
CONCIERGE_ONLINE_SECS="${E2E_CONCIERGE_ONLINE_SECS:-300}"
AGENT_ACT_SECS="${E2E_AGENT_ACT_SECS:-300}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
WORKER_NAME="e2e-cncrg-worker-${RUN_ID_SUFFIX}"
WORKER_NAME=$(echo "$WORKER_NAME" | tr -cd 'a-zA-Z0-9-' | head -c 48)
export WORKER_NAME
log() { echo "[$(date +%H:%M:%S)] $*"; }
fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
skip_loud() { echo "[$(date +%H:%M:%S)] ⏭️ SKIP (local path not runnable): $*" >&2; exit 0; }
# Admin-auth curl args (if the local stack set ADMIN_TOKEN; else empty / fail-open).
ADMIN_AUTH=()
e2e_admin_auth_args ADMIN_AUTH
WORKER_ID=""
cleanup() {
# Targeted delete of the worker the concierge created (best-effort). _lib.sh's
# helper sends the admin bearer + confirm header.
if [ -n "$WORKER_ID" ]; then
log "🧹 deleting concierge-created worker $WORKER_ID ($WORKER_NAME)..."
e2e_delete_workspace "$WORKER_ID" "$WORKER_NAME" || true
fi
}
trap cleanup EXIT INT TERM
list_ws() { curl -sS --max-time 15 "$BASE/workspaces" ${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"}; }
find_platform_root() {
list_ws | python3 -c "
import sys, json
try: rows = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
for w in rows if isinstance(rows, list) else []:
if w.get('kind') == 'platform' and not w.get('parent_id'):
print(w.get('id','')); break
else:
print('')"
}
ws_field() { # <id> <field>
curl -sS --max-time 15 "$BASE/workspaces/$1" ${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"} | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
print(d.get('$2','') if isinstance(d, dict) else '')"
}
find_worker_by_name() {
list_ws | python3 -c "
import sys, json, os
want = os.environ['WORKER_NAME']
try: rows = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
for w in rows if isinstance(rows, list) else []:
if w.get('name') == want:
print(w.get('id','')); break
else:
print('')"
}
# concierge_has_create_workspace_tool <id>: probe POST /workspaces/:id/mcp
# tools/list and echo "yes" iff create_workspace is in the advertised tool set.
# This is THE gate distinguishing the platform-agent image (has the tool) from
# the ordinary claude-code image (does not).
concierge_has_create_workspace_tool() { # <id>
local wid="$1" out
out=$(curl -sS --max-time 30 -X POST "$BASE/workspaces/$wid/mcp" \
${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"} \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' 2>/dev/null || echo '{}')
echo "$out" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print('no'); sys.exit(0)
tools = (d.get('result') or {}).get('tools', []) if isinstance(d, dict) else []
names = {t.get('name','') for t in tools if isinstance(t, dict)}
# Accept the bare name or any mcp_*_create_workspace alias the bridge may expose.
print('yes' if any(n == 'create_workspace' or n.endswith('create_workspace') for n in names) else 'no')"
}
# ─── 0. Preflight ────────────────────────────────────────────────────────────
log "═══ LOCAL concierge CREATES-A-WORKSPACE (real-LLM) E2E ═══ BASE=$BASE"
log " worker the concierge will be asked to create: name=$WORKER_NAME"
curl -sS --max-time 10 "$BASE/health" >/dev/null 2>&1 || skip_loud "local stack not reachable at $BASE/health — run \`make up\` first"
ok "Local stack reachable"
# ─── 1. Discover the concierge (kind='platform' root) ─────────────────────────
CONCIERGE_ID=$(find_platform_root)
if [ -z "$CONCIERGE_ID" ]; then
skip_loud "no kind='platform' concierge seeded on the local stack. Set MOLECULE_SEED_PLATFORM_AGENT=1 \
on the ws-server (self-hosted compose does this) so it self-seeds + provisions the concierge."
fi
ok "Concierge (platform root) = $CONCIERGE_ID"
# ─── 2. Ensure the concierge is online ────────────────────────────────────────
log "Waiting for the concierge to be online (up to ${CONCIERGE_ONLINE_SECS}s)..."
ONLINE_DEADLINE=$(( $(date +%s) + CONCIERGE_ONLINE_SECS ))
C_STATUS=""; LAST_C_STATUS=""
while true; do
C_STATUS=$(ws_field "$CONCIERGE_ID" status)
if [ "$C_STATUS" != "$LAST_C_STATUS" ]; then log " concierge → ${C_STATUS:-<none>}"; LAST_C_STATUS="$C_STATUS"; fi
[ "$C_STATUS" = "online" ] && break
if [ "$(date +%s)" -gt "$ONLINE_DEADLINE" ]; then
skip_loud "concierge $CONCIERGE_ID never reached online within ${CONCIERGE_ONLINE_SECS}s (last='${C_STATUS}'). \
On the default local stack the concierge needs a model key (e.g. MINIMAX_API_KEY) to boot — without one it stays failed."
fi
sleep 5
done
ok "Concierge online"
# ─── 3. Gate: the platform MCP create_workspace tool must actually be present ──
log "Probing the concierge's MCP tool set for create_workspace..."
HAS_TOOL=$(concierge_has_create_workspace_tool "$CONCIERGE_ID")
if [ "$HAS_TOOL" != "yes" ]; then
skip_loud "the concierge's platform MCP does NOT expose create_workspace — it is running on the ordinary \
claude-code image (no /opt/molecule-mcp-server), not the platform-agent image. Provision the concierge on \
Dockerfile.platform-agent to exercise this path locally. (This is the documented SELF-HOST CAVEAT, not a bug.)"
fi
ok "Concierge advertises create_workspace via its platform MCP"
# Pre-state: the worker must not already exist.
PRE_EXISTING=$(find_worker_by_name)
[ -n "$PRE_EXISTING" ] && fail "worker '$WORKER_NAME' already exists pre-test ($PRE_EXISTING) — cannot prove causality"
ok "Pre-state confirmed: '$WORKER_NAME' does not exist yet"
# ─── 4. Drive the AGENT via A2A message/send ──────────────────────────────────
log "Sending the concierge a natural-language create-workspace request..."
AGENT_PROMPT="Please create a new workspace in this org right now using your platform tools. \
Use the create_workspace tool with name exactly ${WORKER_NAME} (use that exact string, no quotes) and role engineer. \
Do not ask me any clarifying questions — the name and role are final. \
After the tool succeeds, reply with the new workspace id."
export AGENT_PROMPT
A2A_PAYLOAD=$(python3 -c "
import json, os, uuid
print(json.dumps({
'jsonrpc': '2.0',
'method': 'message/send',
'id': 'e2e-cncrg-mk-local-1',
'params': {
'message': {
'role': 'user',
'messageId': f'e2e-{uuid.uuid4().hex[:8]}',
'parts': [{'kind': 'text', 'text': os.environ['AGENT_PROMPT']}],
}
}
}))")
A2A_TMP=$(mktemp -t cncrg-mk-local-XXXXXX)
set +e
A2A_CODE=$(curl -sS --max-time "$AGENT_ACT_SECS" -X POST "$BASE/workspaces/$CONCIERGE_ID/a2a" \
${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"} \
-H "Content-Type: application/json" \
-d "$A2A_PAYLOAD" -o "$A2A_TMP" -w '%{http_code}' 2>/dev/null)
A2A_RC=$?
set -e
A2A_CODE=${A2A_CODE:-000}
A2A_RESP=$(cat "$A2A_TMP" 2>/dev/null || echo "")
rm -f "$A2A_TMP"
if [ "$A2A_RC" != "0" ] || [ "$A2A_CODE" -lt 200 ] || [ "$A2A_CODE" -ge 300 ]; then
fail "A2A POST /workspaces/$CONCIERGE_ID/a2a failed (curl_rc=$A2A_RC, http=$A2A_CODE): $(echo "$A2A_RESP" | head -c 400)"
fi
AGENT_TEXT=$(echo "$A2A_RESP" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
parts = (d.get('result') or {}).get('parts', []) if isinstance(d, dict) else []
print(parts[0].get('text','') if parts else '')" 2>/dev/null || echo "")
log " concierge replied (first 300 chars): $(echo "$AGENT_TEXT" | head -c 300)"
# ─── 5. ASSERT the deterministic side effect: the worker now EXISTS ───────────
log "Polling GET /workspaces for the worker the concierge was asked to create..."
ACT_DEADLINE=$(( $(date +%s) + AGENT_ACT_SECS ))
while true; do
WORKER_ID=$(find_worker_by_name)
[ -n "$WORKER_ID" ] && break
if [ "$(date +%s)" -gt "$ACT_DEADLINE" ]; then
if hit=$(a2a_completion_error_marker "$AGENT_TEXT"); then
fail "TOOL FAILED: concierge surfaced an error-as-text reply (matched '$hit') and no workspace '$WORKER_NAME' was created. Reply: $(echo "$AGENT_TEXT" | head -c 400)"
fi
fail "AGENT DID NOT ACT: concierge replied but no workspace named '$WORKER_NAME' exists after ${AGENT_ACT_SECS}s — its LLM did not invoke create_workspace. Reply: $(echo "$AGENT_TEXT" | head -c 400)"
fi
sleep 6
done
ok "DETERMINISTIC SIDE EFFECT CONFIRMED: workspace '$WORKER_NAME' now EXISTS (id=$WORKER_ID)"
WORKER_KIND=$(ws_field "$WORKER_ID" kind)
if [ -n "$WORKER_KIND" ] && [ "$WORKER_KIND" != "workspace" ]; then
fail "created node '$WORKER_NAME' has kind='$WORKER_KIND' (want 'workspace')"
fi
ok "Created node is a real kind='workspace' row"
ok "═══ LOCAL CONCIERGE CREATES-A-WORKSPACE E2E PASSED ═══"
log "Proven locally: a natural-language A2A request → the concierge's LLM invoked create_workspace via the platform MCP → real workspace '$WORKER_NAME' (id=$WORKER_ID). Teardown runs via EXIT trap."
@@ -1,570 +0,0 @@
#!/usr/bin/env bash
# MANDATORY local Docker-provisioner lifecycle e2e.
#
# Why this exists: every other e2e exercises the SaaS/EC2 (control-plane)
# provisioner. NOTHING mandatory exercises the LOCAL Docker provisioner
# (MOLECULE_ENV=development, docker.sock) — the path self-hosters and dev runs
# use. A config-volume bug where a restarted workspace couldn't find its
# config.yaml (and wedged in 'failed' with "config volume is empty") went
# undetected for exactly this reason. This test provisions a REAL workspace via
# the LOCAL provisioner and asserts the full lifecycle, INCLUDING the
# restart-survival assertion that would have caught that bug.
#
# Steps (each asserts loudly):
# 1. Build + tag the stub runtime image to the provisioner's RegistryModeLocal
# cache tag so runtime=claude-code resolves to the stub (cache-hit, no
# 2.5GB build).
# 2. POST /workspaces (runtime=claude-code) — capture id.
# 3. Poll GET /workspaces/{id} until status==online (<=90s); assert a ws-<id>
# container is running.
# 4. RESTART-SURVIVAL: POST /workspaces/{id}/restart, poll until online AGAIN
# (<=90s); assert the container is back and the workspace did NOT wedge in
# failed / "config volume is empty". <-- the key assertion.
# 5. PROXY REACH: POST an A2A message/send through the PLATFORM proxy
# (/workspaces/{id}/a2a); assert 200 + the stub's canned reply (proves the
# ws-<id>:8000 Docker-DNS rewrite path works end-to-end).
# 6. Cleanup: delete the workspace (trap removes its container + volumes).
#
# Parameterizable: LIFECYCLE_RUNTIME_IMAGE selects which image the provisioner
# resolves to. Default = the freshly-built stub. Point it at the real image
# (e.g. molecule-local/workspace-template-claude-code:2ac9678422a5) for an
# advisory lifecycle-only run (the proxy-reach step then asserts reachability,
# not the canned text — a real LLM-less runtime can't produce "STUB OK").
#
# Run (stub, default — fast, no LLM):
# BASE=http://localhost:8080 ADMIN_TOKEN=dev-local-admin-token \
# bash tests/e2e/test_local_provision_lifecycle_e2e.sh
#
# Run (REAL MiniMax LLM round-trip — cheapest real model; asserts a real reply):
# BASE=http://localhost:8080 ADMIN_TOKEN=dev-local-admin-token \
# LIFECYCLE_LLM=minimax MINIMAX_API_KEY=<key> \
# bash tests/e2e/test_local_provision_lifecycle_e2e.sh
# (MINIMAX_API_KEY missing => loud skip exit 0; key is only ever sent in the
# secret-write curl body, never echoed or written to disk.)
set -euo pipefail
source "$(dirname "$0")/_lib.sh" # sets BASE default + admin-auth + cleanup helpers
# ---- config -----------------------------------------------------------------
ADMIN_TOKEN="${ADMIN_TOKEN:-${MOLECULE_ADMIN_TOKEN:-}}"
export ADMIN_TOKEN MOLECULE_ADMIN_TOKEN="${ADMIN_TOKEN}"
# Was ONLINE_TIMEOUT set by the caller? Remember before we default it so the
# minimax mode (heavier real-template boot) can bump the default without
# clobbering an explicit operator/CI override.
ONLINE_TIMEOUT_EXPLICIT=0
[ -n "${ONLINE_TIMEOUT:-}" ] && ONLINE_TIMEOUT_EXPLICIT=1
ONLINE_TIMEOUT="${ONLINE_TIMEOUT:-90}" # seconds to wait for online
A2A_TIMEOUT="${A2A_TIMEOUT:-30}"
STUB_DIR="$(cd "$(dirname "$0")/stub-runtime" && pwd)"
RUNTIME="claude-code"
# The provisioner's RegistryModeLocal resolves runtime=claude-code by checking
# the local image store for molecule-local/workspace-template-claude-code:<sha12>
# (the Gitea HEAD sha12 of the template repo's `main` branch — see
# provisioner/localbuild.go EnsureLocalImage). If that tag is missing it
# clones+builds the real 2.5GB template (slow + can OOM-kill in CI). We pre-tag
# our chosen image to that EXACT cache tag so the cache-check (dockerHasTag)
# hits and resolves to our image with no clone/build.
#
# The sha MOVES as the template repo advances, so we DISCOVER it at runtime from
# the same Gitea branch API the provisioner uses (CACHE_SHA), and only fall back
# to a pinned default (or an explicit CACHE_TAG override) when Gitea is
# unreachable. This keeps the test correct without an annual sha bump.
CACHE_REPO="molecule-local/workspace-template-${RUNTIME}"
GITEA_BRANCH_API="${GITEA_BRANCH_API:-https://git.moleculesai.app/api/v1/repos/molecule-ai/molecule-ai-workspace-template-${RUNTIME}/branches/main}"
# Model + credential choice — three coupled constraints from workspace-server:
# * Create rejects a model NOT registered for the runtime
# (UNREGISTERED_MODEL_FOR_RUNTIME, provider-registry SSOT).
# * The SLASH form (anthropic/claude-opus-4-7) derives provider=platform =>
# platform_managed billing, which ABORTS provisioning in a dev stack with
# no CP proxy env (MISSING_PLATFORM_PROXY, #2162).
# * The BARE form (claude-opus-4-7) derives provider=anthropic-api => BYOK,
# which then FAILS CLOSED unless the workspace has a usable LLM credential
# (MISSING_BYOK_CREDENTIAL). anthropic-api's auth_env is
# [ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN] — so we pass a DUMMY
# ANTHROPIC_API_KEY secret. The stub never makes an LLM call, so the dummy
# value is fine; it only needs to exist so byok resolves with a usable cred.
# This keeps the test self-contained (no platform-proxy env required) — exactly
# the portable shape the CI required job needs.
LIFECYCLE_MODEL="${LIFECYCLE_MODEL:-claude-opus-4-7}"
LIFECYCLE_LLM_KEY="${LIFECYCLE_LLM_KEY:-ANTHROPIC_API_KEY}"
LIFECYCLE_LLM_VALUE="${LIFECYCLE_LLM_VALUE:-sk-ant-e2e-stub-dummy-not-a-real-key}"
LATEST_TAG="${CACHE_REPO}:latest"
# ---- LIFECYCLE_LLM: real-LLM round-trip mode -------------------------------
# Default "" = the existing behaviour (stub or LLM-less real image).
#
# LIFECYCLE_LLM=minimax — provision the REAL claude-code template image with a
# MiniMax BYOK credential and assert an ACTUAL model reply at the proxy-reach
# step (Step 5), proving a genuine round-trip through the ws-<id>:8000 proxy.
#
# Why MiniMax: it's the cheapest LLM the platform offers (the staging canaries'
# primary auth path post-2026-05-04). The claude-code adapter's `minimax`
# provider (providers.yaml:258) reads MINIMAX_API_KEY at boot and points
# ANTHROPIC_BASE_URL at api.minimax.io/anthropic — MiniMax's OWN API, NOT the
# molecule LLM proxy — so a BYOK MiniMax workspace reaches the model DIRECTLY
# and works on this local dev stack with no CP proxy env.
#
# The registered claude-code slug is the BARE id `MiniMax-M2.7` (derives
# provider=minimax => byok). The colon form `minimax:MiniMax-M2.7` is
# UNREGISTERED on claude-code (internal#718). auth_env for `minimax` accepts
# MINIMAX_API_KEY, which the adapter projects into ANTHROPIC_AUTH_TOKEN.
#
# The real key MUST be supplied via the MINIMAX_API_KEY env var (never echoed
# or written to disk by this script — it only travels in the secret-write curl
# body, exactly like the dummy ANTHROPIC_API_KEY does today). Missing key =>
# loud skip (exit 0), never a red fail (mirrors the serving-e2e pattern).
LIFECYCLE_LLM="${LIFECYCLE_LLM:-}"
if [ "$LIFECYCLE_LLM" = "minimax" ]; then
if [ -z "${MINIMAX_API_KEY:-}" ]; then
echo "SKIP: LIFECYCLE_LLM=minimax but MINIMAX_API_KEY is not set in the env."
echo " Provide a real MiniMax key (the advisory CI job reads it from a"
echo " CI secret) to run the real-LLM round-trip. Skipping (exit 0)."
exit 0
fi
# Real claude-code template build (provisioner resolves+builds via
# RegistryModeLocal — same path as the advisory lifecycle-real job).
LIFECYCLE_PROVISIONER_BUILDS="1"
# Registered BYOK MiniMax slug for claude-code (bare id => provider=minimax).
LIFECYCLE_MODEL="MiniMax-M2.7"
LIFECYCLE_LLM_KEY="MINIMAX_API_KEY"
LIFECYCLE_LLM_VALUE="${MINIMAX_API_KEY}"
# The real template boot is heavier than the stub; give it room (unless the
# caller pinned ONLINE_TIMEOUT explicitly).
[ "$ONLINE_TIMEOUT_EXPLICIT" -eq 0 ] && ONLINE_TIMEOUT=180
fi
# Image the provisioner should actually run. Default: build the stub. Override
# to a real image (a pre-built tag) for the advisory lifecycle-only run.
LIFECYCLE_RUNTIME_IMAGE="${LIFECYCLE_RUNTIME_IMAGE:-__BUILD_STUB__}"
# LIFECYCLE_PROVISIONER_BUILDS=1: do NOT pre-tag any image — let the provisioner
# resolve runtime=claude-code itself via RegistryModeLocal (clone + docker build
# the real template). This exercises the GENUINE local image-resolution path end
# to end. Used by the advisory CI job. Implies the real (LLM-less) runtime, so
# the proxy-reach step asserts reachability, not a canned reply.
LIFECYCLE_PROVISIONER_BUILDS="${LIFECYCLE_PROVISIONER_BUILDS:-0}"
# When NOT running the stub we cannot assert the canned "STUB OK" text (no LLM);
# we assert reachability/registration instead.
USING_STUB=1
[ "$LIFECYCLE_RUNTIME_IMAGE" != "__BUILD_STUB__" ] && USING_STUB=0
[ "$LIFECYCLE_PROVISIONER_BUILDS" = "1" ] && USING_STUB=0
PASS=0
FAIL=0
WSID=""
# May be pre-pinned via env; otherwise resolved from the Gitea HEAD sha in Step 1.
CACHE_TAG="${CACHE_TAG:-}"
# Remember the tags/images we mutated so the trap can restore the cache tag to
# the real image (so a stub run never leaves the real claude-code tag pointing
# at the lightweight stub for the next developer/CI job).
ORIG_CACHE_IMAGE_ID=""
check() {
local desc="$1" expected="$2" actual="$3"
if echo "$actual" | grep -qF -- "$expected"; then
echo "PASS: $desc"; PASS=$((PASS + 1))
else
echo "FAIL: $desc"
echo " expected to contain: $expected"
echo " got: $(echo "$actual" | head -5)"
FAIL=$((FAIL + 1))
fi
}
pass() { echo "PASS: $1"; PASS=$((PASS + 1)); }
fail() { echo "FAIL: $1"; [ -n "${2:-}" ] && echo " $2"; FAIL=$((FAIL + 1)); }
admin_curl() {
local _a=(); e2e_admin_auth_args _a
curl -s "${_a[@]+"${_a[@]}"}" "$@"
}
ws_field() { # ws_field <workspace-json> <field>
echo "$1" | python3 -c "import sys,json
try:
d=json.load(sys.stdin); print(d.get('$2',''))
except Exception:
print('')"
}
container_running() { # container_running <ws-id> -> echoes name if running
local short="${1:0:12}"
docker ps --filter "name=ws-${short}" --filter "status=running" --format '{{.Names}}' 2>/dev/null | head -1
}
cleanup() {
local rc=$?
echo ""
echo "--- cleanup ---"
if [ -n "$WSID" ]; then
# SCOPED teardown — only the workspace this test created. Never a blanket
# sweep (other dev workspaces may be live on this shared daemon).
e2e_delete_workspace "$WSID" "" >/dev/null 2>&1 || true
local short="${WSID:0:12}"
docker rm -f "ws-${short}" >/dev/null 2>&1 || true
# Volume naming is split in the provisioner: configs + claude-sessions use the
# 12-char short id (ConfigVolumeName/ClaudeSessionVolumeName), but the
# /workspace volume uses the FULL UUID (buildWorkspaceMount: ws-<id>-workspace).
# Remove BOTH forms so neither leaks.
docker volume rm -f \
"ws-${short}-configs" "ws-${short}-claude-sessions" \
"ws-${short}-workspace" "ws-${WSID}-workspace" >/dev/null 2>&1 || true
echo "cleaned workspace $WSID + ws-${short} container/volumes"
fi
# Restore the cache tag to whatever it pointed at before we retagged it, so a
# stub run doesn't leave the real claude-code tag aliased to the stub.
if [ -n "$ORIG_CACHE_IMAGE_ID" ]; then
docker tag "$ORIG_CACHE_IMAGE_ID" "$CACHE_TAG" >/dev/null 2>&1 || true
echo "restored $CACHE_TAG -> ${ORIG_CACHE_IMAGE_ID:0:19}"
fi
exit $rc
}
trap cleanup EXIT INT TERM
echo "=== Local Docker-Provisioner Lifecycle E2E ==="
echo "BASE=$BASE runtime=$RUNTIME using_stub=$USING_STUB llm=${LIFECYCLE_LLM:-none} model=$LIFECYCLE_MODEL cache_tag=${CACHE_TAG:-<resolve-in-step-1>}"
echo ""
# Preflight: docker must be reachable and the platform must be up.
if ! docker info >/dev/null 2>&1; then
echo "ERROR: docker daemon not reachable — this test provisions local containers."
exit 2
fi
if ! curl -s -m 5 "$BASE/workspaces" >/dev/null 2>&1; then
echo "ERROR: platform not reachable at $BASE"
exit 2
fi
# ----------------------------------------------------------------------------
# Step 1 — build/tag the image the provisioner will resolve to.
# ----------------------------------------------------------------------------
echo "--- Step 1: resolve runtime image to the chosen target ---"
# Resolve the EXACT cache tag the provisioner will look up: <repo>:<gitea-HEAD-
# sha12>. Discover the sha from the Gitea branch API (same source the provisioner
# uses). An explicit CACHE_TAG env overrides discovery; if Gitea is unreachable
# AND no override is set, bail loudly — silently tagging the wrong sha would let
# the provisioner clone+build the real 2.5GB template (slow / OOM).
if [ -n "${CACHE_TAG:-}" ]; then
echo "Using operator-pinned CACHE_TAG=$CACHE_TAG"
else
CACHE_SHA=$(curl -s -m 10 "$GITEA_BRANCH_API" 2>/dev/null \
| python3 -c "import sys,json
try:
print(json.load(sys.stdin)['commit']['id'][:12])
except Exception:
print('')" 2>/dev/null)
if [ -z "$CACHE_SHA" ]; then
echo "ERROR: could not resolve the template HEAD sha from $GITEA_BRANCH_API"
echo " set CACHE_TAG=$CACHE_REPO:<sha12> explicitly (the tag the provisioner expects)."
exit 2
fi
CACHE_TAG="${CACHE_REPO}:${CACHE_SHA}"
echo "Resolved provisioner cache tag: $CACHE_TAG (gitea HEAD sha)"
fi
# Record what the cache tag points at NOW (if anything) so cleanup can restore.
ORIG_CACHE_IMAGE_ID="$(docker image inspect --format '{{.Id}}' "$CACHE_TAG" 2>/dev/null || true)"
if [ "$LIFECYCLE_PROVISIONER_BUILDS" = "1" ]; then
# No pre-tag — the provisioner resolves + builds the real template itself via
# RegistryModeLocal. Disarm the cache-tag restore (we never touched it).
ORIG_CACHE_IMAGE_ID=""
pass "provisioner-builds mode: leaving image resolution to RegistryModeLocal (real template build)"
elif [ "$USING_STUB" -eq 1 ]; then
echo "Building stub image from $STUB_DIR ..."
if ! docker build --platform=linux/amd64 -t molecule-local/stub-runtime:latest "$STUB_DIR" >/tmp/stub_build.log 2>&1; then
echo "FAIL: stub image build failed"; tail -20 /tmp/stub_build.log; exit 1
fi
pass "stub image built"
TARGET_IMAGE="molecule-local/stub-runtime:latest"
# Point BOTH the sha-pinned cache tag and :latest at the stub so the
# provisioner's RegistryModeLocal cache-check (dockerHasTag) resolves to it
# instead of cloning+building the template.
docker tag "$TARGET_IMAGE" "$CACHE_TAG"
docker tag "$TARGET_IMAGE" "$LATEST_TAG"
pass "tagged $TARGET_IMAGE -> $CACHE_TAG (+ :latest)"
else
TARGET_IMAGE="$LIFECYCLE_RUNTIME_IMAGE"
if ! docker image inspect "$TARGET_IMAGE" >/dev/null 2>&1; then
echo "Real image $TARGET_IMAGE not present locally — pulling ..."
docker pull "$TARGET_IMAGE" >/dev/null 2>&1 || { echo "FAIL: cannot obtain $TARGET_IMAGE"; exit 1; }
fi
pass "using real runtime image $TARGET_IMAGE"
docker tag "$TARGET_IMAGE" "$CACHE_TAG"
docker tag "$TARGET_IMAGE" "$LATEST_TAG"
pass "tagged $TARGET_IMAGE -> $CACHE_TAG (+ :latest)"
fi
echo ""
# ----------------------------------------------------------------------------
# Step 2 — provision a workspace via the real create endpoint.
# ----------------------------------------------------------------------------
echo "--- Step 2: provision workspace (POST /workspaces) ---"
# Provision-time billing on this dev stack (no CP proxy env):
# * A claude-code workspace with a BARE model id derives provider=anthropic-api
# => BYOK, which FAILS CLOSED in prepare unless a usable LLM credential
# exists (MISSING_BYOK_CREDENTIAL).
# * The per-workspace secret-write guard blocks a vendor key while the
# workspace still resolves platform-managed (the MODEL secret isn't stored
# until AFTER payload.secrets are written at create time) — so we can't pass
# the key in the create payload.
# So: create WITHOUT secrets, flip the workspace to byok (explicit override wins
# in BOTH the guard's resolver and the provision resolver), then write the dummy
# vendor key — now permitted. We do NOT rely on Create's first provision to seed
# the config volume (it aborts byok-no-cred BEFORE Start, leaving the volume
# empty). Instead we SEED config.yaml directly into the named config volume and
# then trigger ONE clean provision via /restart. Seeding the volume is also what
# makes the restart-survival assertion meaningful: the restart path reuses the
# volume rather than any template.
CREATE_BODY=$(cat <<JSON
{"name":"Lifecycle E2E Stub","tier":2,"runtime":"$RUNTIME","model":"$LIFECYCLE_MODEL"}
JSON
)
RESP=$(admin_curl -X POST "$BASE/workspaces" -H "Content-Type: application/json" -d "$CREATE_BODY")
WSID=$(ws_field "$RESP" "id")
if [ -z "$WSID" ]; then
fail "create returned no workspace id" "$RESP"
echo "=== Results: $PASS passed, $((FAIL+1)) failed ==="
exit 1
fi
pass "workspace created: $WSID"
SHORT="${WSID:0:12}"
CONFIG_VOL="ws-${SHORT}-configs"
# Mint a workspace bearer for the WorkspaceAuth-gated secret + /restart calls.
WTOKEN=$(e2e_mint_workspace_token "$WSID" || true)
if [ -z "$WTOKEN" ]; then
fail "could not mint workspace token"
echo "=== Results: $PASS passed, $FAIL failed ==="; exit 1
fi
# Flip to byok BEFORE writing the vendor key (explicit override unblocks the
# secret-write guard AND makes the provision resolver pick byok).
BM=$(admin_curl -X PUT "$BASE/admin/workspaces/$WSID/llm-billing-mode" \
-H "Content-Type: application/json" -d '{"mode":"byok"}')
check "billing mode set to byok" "byok" "$BM"
# Write the dummy LLM credential (now allowed on a byok workspace). Inert — the
# stub never calls an LLM; it only needs to exist so byok has a usable cred.
SEC=$(curl -s -X POST "$BASE/workspaces/$WSID/secrets" \
-H "Authorization: Bearer $WTOKEN" -H "Content-Type: application/json" \
-d "{\"key\":\"$LIFECYCLE_LLM_KEY\",\"value\":\"$LIFECYCLE_LLM_VALUE\"}")
echo " secret write: $(echo "$SEC" | head -c 120)"
# In minimax mode also write MODEL_PROVIDER=minimax as a secret env. The
# claude-code adapter's _resolve_model_and_provider_from_env honours
# MODEL_PROVIDER ONLY when it matches a registered provider name (else it's
# treated as a legacy model-id), so a literal "minimax" routes the workspace to
# the `minimax` provider entry — projecting MINIMAX_API_KEY → ANTHROPIC_AUTH_TOKEN
# and setting ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic. workspace-
# server injects MODEL/MOLECULE_MODEL from the picked slug but NO LONGER emits
# MODEL_PROVIDER (applyRuntimeModelEnv, post-2026-05-19), so this secret-provided
# value survives into the container env. Without it a BARE `MiniMax-M2.7` derives
# no provider and falls through to the anthropic-api default (boot banner
# "provider=anthropic-api", base_url unset → AuthenticationError on the first
# call → the "Agent error" this mode exists to catch).
if [ "$LIFECYCLE_LLM" = "minimax" ]; then
SECP=$(curl -s -X POST "$BASE/workspaces/$WSID/secrets" \
-H "Authorization: Bearer $WTOKEN" -H "Content-Type: application/json" \
-d '{"key":"MODEL_PROVIDER","value":"minimax"}')
echo " secret write (MODEL_PROVIDER): $(echo "$SECP" | head -c 120)"
fi
# Seed config.yaml directly into the named config volume so the provision (and
# every later restart) has a config source. Create's byok-no-cred abort never
# wrote it, and this dev stack ships no claude-code template in the platform's
# configsDir for the empty-volume auto-recover to fall back to. The provisioner
# created the volume on its first (aborted) Start attempt; ensure it exists,
# then drop a minimal valid config.yaml in via a throwaway alpine container.
docker volume create "$CONFIG_VOL" >/dev/null 2>&1 || true
# In minimax mode the seeded config MUST carry an explicit `provider: minimax`.
# The claude-code adapter (and the molecule_runtime wheel's
# _derive_provider_from_model) only auto-derive a provider from a `vendor:model`
# or `vendor/model` slug — a BARE `MiniMax-M2.7` derives no provider and falls
# through to the anthropic-api default (boot banner: "provider=anthropic-api",
# ANTHROPIC_BASE_URL unset → the MiniMax key is never projected and the first
# LLM call fails with AuthenticationError). Naming the provider explicitly makes
# the adapter pick the `minimax` registry entry, project
# MINIMAX_API_KEY → ANTHROPIC_AUTH_TOKEN, and set
# ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic — a real round-trip.
LIFECYCLE_PROVIDER_LINE=""
[ "$LIFECYCLE_LLM" = "minimax" ] && LIFECYCLE_PROVIDER_LINE="provider: minimax"
CFG_YAML="name: ${WSID}
description: lifecycle e2e
version: 1.0.0
tier: 2
runtime: ${RUNTIME}
model: ${LIFECYCLE_MODEL}
runtime_config:
model: ${LIFECYCLE_MODEL}
${LIFECYCLE_PROVIDER_LINE}
timeout: 0
"
if docker run --rm -v "${CONFIG_VOL}:/configs" alpine:3 sh -c "cat > /configs/config.yaml" <<EOF >/dev/null 2>&1
${CFG_YAML}
EOF
then pass "seeded config.yaml into $CONFIG_VOL"; else fail "could not seed config.yaml into $CONFIG_VOL"; fi
echo ""
# ----------------------------------------------------------------------------
# Step 3 — provision (via restart) and wait for online; assert container.
# ----------------------------------------------------------------------------
echo "--- Step 3: provision + wait for first online (<=${ONLINE_TIMEOUT}s) ---"
# Kick ONE clean provision now that byok + cred + config.yaml are all in place.
curl -s -X POST "$BASE/workspaces/$WSID/restart" \
-H "Authorization: Bearer $WTOKEN" -H "Content-Type: application/json" -d '{}' >/dev/null
STATUS=""; LAST=""; failed_since=0
for _ in $(seq 1 "$ONLINE_TIMEOUT"); do
WS=$(admin_curl "$BASE/workspaces/$WSID")
STATUS=$(ws_field "$WS" "status")
LAST=$(ws_field "$WS" "last_sample_error")
if [ "$STATUS" = "online" ]; then break; fi
if [ "$STATUS" = "failed" ]; then
failed_since=$((failed_since + 1))
# A restart re-kicks provisioning; give the coalescing pipeline room to
# converge. Only bail if it stays failed for 20s straight.
if [ "$failed_since" -ge 20 ]; then
fail "workspace STUCK in 'failed' during initial provision" "last_sample_error: $LAST"
echo "=== Results: $PASS passed, $FAIL failed ==="; exit 1
fi
else
failed_since=0
fi
sleep 1
done
check "workspace reached online (status=$STATUS)" "online" "$STATUS"
RUN=$(container_running "$WSID")
if [ -n "$RUN" ]; then pass "container running: $RUN"; else fail "no running ws-${WSID:0:12} container" "docker ps shows none"; fi
echo ""
# ----------------------------------------------------------------------------
# Step 4 — RESTART-SURVIVAL (the assertion that would have caught the bug).
# ----------------------------------------------------------------------------
echo "--- Step 4: restart-survival (POST /workspaces/$WSID/restart) ---"
# Re-mint the workspace bearer: every (re)provision rotates the workspace token
# (issueAndInjectToken -> RevokeAllForWorkspace + IssueToken), so the Step-2
# token is now stale. /restart is WorkspaceAuth-gated, so mint a fresh one.
WTOKEN=$(e2e_mint_workspace_token "$WSID" || true)
if [ -z "$WTOKEN" ]; then
fail "could not mint fresh workspace token for restart"
else
RR=$(curl -s -X POST "$BASE/workspaces/$WSID/restart" \
-H "Authorization: Bearer $WTOKEN" -H "Content-Type: application/json" -d '{}')
check "restart accepted (provisioning)" "provisioning" "$RR"
# Poll until online AGAIN. Restart reuses the EXISTING config volume (no
# template/configFiles passed) — so this passes ONLY if the config volume
# survived the stop and still has config.yaml. A regression (volume reaped /
# emptied) surfaces as status=failed with the "config volume is empty" error.
STATUS=""; LAST=""
for _ in $(seq 1 "$ONLINE_TIMEOUT"); do
WS=$(admin_curl "$BASE/workspaces/$WSID")
STATUS=$(ws_field "$WS" "status")
LAST=$(ws_field "$WS" "last_sample_error")
case "$STATUS" in
online) break ;;
failed)
fail "workspace wedged in 'failed' AFTER restart (the config-volume bug class)" "last_sample_error: $LAST"
break ;;
esac
sleep 1
done
check "workspace back online after restart (status=$STATUS)" "online" "$STATUS"
# Explicit negative on the exact bug signature.
if echo "$LAST" | grep -qiF "config volume is empty"; then
fail "restart hit 'config volume is empty' — restart-survival REGRESSION" "$LAST"
else
pass "no 'config volume is empty' error after restart"
fi
RUN=$(container_running "$WSID")
if [ -n "$RUN" ]; then pass "container back after restart: $RUN"; else fail "container missing after restart"; fi
fi
echo ""
# ----------------------------------------------------------------------------
# Step 5 — proxy reach (ws-<id>:8000 Docker-DNS rewrite, end to end).
# ----------------------------------------------------------------------------
echo "--- Step 5: proxy reach (POST /workspaces/$WSID/a2a) ---"
# In minimax mode we send a DETERMINISTIC known-answer prompt and assert the
# model echoes the answer back — proving a real LLM round-trip, not just
# reachability. Otherwise a plain "ping".
if [ "$LIFECYCLE_LLM" = "minimax" ]; then
A2A_PROMPT="Reply with exactly the single word PONG and nothing else."
else
A2A_PROMPT="ping"
fi
A2A_BODY=$(python3 -c "
import json,sys
print(json.dumps({'method':'message/send','params':{'message':{'role':'user','parts':[{'type':'text','text':sys.argv[1]}]}}}))
" "$A2A_PROMPT")
# Real LLM cold-start (first turn boots the claude-code SDK + dials MiniMax) is
# slower than the stub; give the real-LLM call a longer ceiling.
A2A_CEIL="$A2A_TIMEOUT"
[ "$LIFECYCLE_LLM" = "minimax" ] && A2A_CEIL="${A2A_MINIMAX_TIMEOUT:-120}"
A2A=$(curl -s --max-time "$A2A_CEIL" -X POST "$BASE/workspaces/$WSID/a2a" \
-H "Content-Type: application/json" \
-d "$A2A_BODY")
# Extract the assistant text part once (shared by the minimax assertion +
# diagnostics). Tolerates result.parts[].text and result.message.parts[].text.
a2a_text() {
echo "$1" | python3 -c "import sys,json
try:
d=json.load(sys.stdin); r=d.get('result',d)
m=r.get('message',r)
parts=m.get('parts',[]) or r.get('parts',[])
print(' '.join(p.get('text','') for p in parts if isinstance(p,dict)))
except Exception:
print('')"
}
if [ "$LIFECYCLE_LLM" = "minimax" ]; then
# REAL round-trip assertion. The reply must be model-produced text — NOT a
# proxy-level unreachable, NOT an LLM-less "Agent error", NOT an empty
# completion. Then it must contain the known answer (PONG).
check "proxy returned a result envelope" '"result"' "$A2A"
AGENT_TEXT="$(a2a_text "$A2A")"
echo " MiniMax reply: $(echo "$AGENT_TEXT" | head -c 200)"
if echo "$A2A" | grep -qiE 'unreachable|workspace has no URL|restarting'; then
fail "MiniMax runtime not reachable through proxy" "$A2A"
elif echo "$AGENT_TEXT" | grep -qiF "message contained no text content"; then
fail "MiniMax returned an EMPTY completion (no text part) — backend/key issue, not a real round-trip" "$AGENT_TEXT"
elif echo "$AGENT_TEXT" | grep -qiE 'agent error|exception|invalid api key|insufficient_quota|exceeded your current quota'; then
fail "MiniMax round-trip returned an error-shaped reply (no real completion)" "$AGENT_TEXT"
elif echo "$AGENT_TEXT" | tr '[:lower:]' '[:upper:]' | grep -qF "PONG"; then
pass "REAL MiniMax round-trip: model replied with the known answer (PONG)"
else
# Non-error, non-empty, but didn't contain PONG — still a real reply (the
# model answered with its own words). Accept as a real round-trip but note it.
if [ -n "$AGENT_TEXT" ]; then
pass "REAL MiniMax round-trip: non-error model reply (did not contain PONG, but real text)"
else
fail "MiniMax round-trip produced no assertable text" "$A2A"
fi
fi
elif [ "$USING_STUB" -eq 1 ]; then
check "proxy returned a result envelope" '"result"' "$A2A"
check "proxy reached stub (canned reply)" 'STUB OK' "$A2A"
# Parse the envelope so whitespace/key-ordering doesn't break the assertion.
ROLE=$(echo "$A2A" | python3 -c "import sys,json
try:
print(json.load(sys.stdin).get('result',{}).get('role',''))
except Exception:
print('')")
check "reply has agent role" "agent" "$ROLE"
else
# Real LLM-less image: we can't get a canned text, but a reachable runtime
# must answer with EITHER a result OR a structured JSON-RPC error — NOT a
# proxy-level "workspace agent unreachable" / "no URL". Assert reachability.
if echo "$A2A" | grep -qiE 'unreachable|workspace has no URL|restarting'; then
fail "real runtime not reachable through proxy" "$A2A"
else
pass "real runtime reachable through proxy (got a JSON-RPC response)"
echo " response: $(echo "$A2A" | head -c 200)"
fi
fi
echo ""
echo "=== Results: $PASS passed, $FAIL failed ==="
exit "$FAIL"
@@ -1,459 +0,0 @@
#!/usr/bin/env bash
# FUNCTIONAL real-LLM E2E: prove the org concierge (the platform agent) can
# actually DO org-management work — send it a natural-language request and
# assert it REALLY CREATES a workspace via its platform MCP (87 org-admin tools,
# incl. create_workspace), NOT just that a REST API returned 200.
#
# This is the RFC docs/design/rfc-platform-agent.md §11.4 "Reach" check, made
# into a gating CI test:
#
# "chat the platform agent → it list_workspaces then create_workspace via the
# platform MCP and reports back via send_message_to_user."
#
# Unlike test_staging_concierge_e2e.sh (which drives the user_tasks REST+MCP
# primitive directly — a pure DB/handler contract with NO LLM), THIS test drives
# the AGENT: it sends an A2A message/send envelope (the user→concierge chat
# path) and asserts the DETERMINISTIC SIDE EFFECT — a workspace with the exact
# name we asked for now EXISTS in GET /workspaces — which can only happen if the
# concierge's LLM actually invoked the create_workspace platform-MCP tool.
#
# WHAT MUST BE LIVE for this to pass GREEN (else it SKIPs LOUD, never false-red):
# • The org's concierge must be installed as the kind='platform' root AND
# provisioned on the DEDICATED platform-agent image (Dockerfile.platform-agent),
# which ships /opt/molecule-mcp-server — the ONLY image where the platform MCP
# (create_workspace) lights up. On SaaS staging the CP installs + provisions it
# at org-provision time. (See platform_agent.go's SELF-HOST CAVEAT: the ordinary
# claude-code image does NOT ship the platform MCP, so create_workspace is a
# no-op there.) A parallel agent is wiring the platform-agent image into the
# staging provision path; until that lands, this test SKIPs LOUD with a clear
# "concierge not on platform-agent image" message rather than failing red.
# • A working model for the concierge. On SaaS the concierge is platform_managed
# (the CP-exported LLM proxy supplies the model) so no BYOK key is needed for
# the concierge itself.
#
# Env contract (same as test_staging_concierge_e2e.sh / test_staging_full_saas.sh):
# MOLECULE_CP_URL default: https://staging-api.moleculesai.app
# MOLECULE_ADMIN_TOKEN CP admin bearer — Railway staging CP_ADMIN_API_TOKEN
#
# Optional env:
# E2E_PROVISION_TIMEOUT_SECS default 900 (15 min cold tenant EC2 budget)
# E2E_CONCIERGE_ONLINE_SECS default 900 (concierge boot-to-online budget)
# E2E_AGENT_ACT_SECS default 420 (LLM think+tool-call budget after we
# send the message — generous for nondeterminism)
# E2E_KEEP_ORG 1 → skip teardown (debugging only)
# E2E_RUN_ID slug suffix; CI: ${GITHUB_RUN_ID}-${RUN_ATTEMPT}
# E2E_AWS_LEAK_CHECK auto (default) | required | off
# E2E_AWS_TERMINATE_LEAKS 1 → terminate slug-tagged leaked EC2 on exit
# E2E_REQUIRE_LIVE 1 → a SKIP for "no concierge on platform image"
# becomes a hard FAIL (CI sets this so a silently-
# missing platform-agent image can't false-green
# the gate). Default 0 (local: skip-loud).
#
# Exit codes:
# 0 happy path (concierge created the workspace) OR honest skip-loud
# 1 generic / assertion failure (agent didn't act, or tool failed)
# 2 missing required env
# 3 provisioning timed out
# 4 teardown left orphan resources
# 5 E2E_REQUIRE_LIVE=1 but the concierge could not be exercised (no
# platform-agent image / never came online) — false-green guard
set -euo pipefail
# shellcheck disable=SC1091
# shellcheck source=_lib.sh
source "$(dirname "$0")/_lib.sh"
# AWS-leak-check lib — same teardown leak assertion the full-SaaS harness uses.
# shellcheck disable=SC1091
# shellcheck source=lib/aws_leak_check.sh
source "$(dirname "$0")/lib/aws_leak_check.sh"
# Real-completion error-as-text scanner — used to detect the concierge
# surfacing its tool/LLM error AS a reply ("Agent error …") so a broken agent
# can't read as "asked but politely declined".
# shellcheck disable=SC1091
# shellcheck source=lib/completion_assert.sh
source "$(dirname "$0")/lib/completion_assert.sh"
CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}"
CONCIERGE_ONLINE_SECS="${E2E_CONCIERGE_ONLINE_SECS:-900}"
AGENT_ACT_SECS="${E2E_AGENT_ACT_SECS:-420}"
REQUIRE_LIVE="${E2E_REQUIRE_LIVE:-0}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
# Fixed e2e- prefix so sweep-stale-e2e-orgs.yml + lint_cleanup_traps.sh reap any
# orphan org. (The lint requires a quoted SLUG=... with a literal e2e-/rt-e2e-
# head.)
SLUG="e2e-cncrg-mk-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
SLUG=$(echo "$SLUG" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-' | head -c 32)
# The workspace name we will ask the concierge to create. The RUN_ID makes it
# unique per run so a poll for it can never collide with a sibling run's name.
WORKER_NAME="e2e-cncrg-worker-${RUN_ID_SUFFIX}"
WORKER_NAME=$(echo "$WORKER_NAME" | tr -cd 'a-zA-Z0-9-' | head -c 48)
# Exported so the find_worker_by_name python subshell (run in a pipe) reads it
# via os.environ — a bare shell var would not survive into the subprocess env.
export WORKER_NAME
log() { echo "[$(date +%H:%M:%S)] $*"; }
fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
# skip_loud <reason>: honest skip when the concierge can't be exercised. In CI
# (E2E_REQUIRE_LIVE=1) this is a HARD FAIL (exit 5) so a missing platform-agent
# image can't false-green the gate; locally it skips 0.
skip_loud() {
echo "[$(date +%H:%M:%S)] ⏭️ SKIP: $*" >&2
if [ "$REQUIRE_LIVE" = "1" ]; then
echo "[$(date +%H:%M:%S)] ❌ E2E_REQUIRE_LIVE=1 — a skip is a false-green guard breach here. Failing." >&2
exit 5
fi
exit 0
}
CURL_COMMON=(-sS --max-time 30)
TMPDIR_E2E=$(mktemp -d -t cncrg-mk-XXXXXX)
# ─── teardown trap (worker delete + org delete + leak check) ─────────────────
CLEANUP_DONE=0
WORKER_ID="" # set once the concierge creates it (for targeted delete)
TENANT_URL="" # set after provisioning
TENANT_TOKEN=""
ORG_ID=""
cleanup() {
local entry_rc=$?
[ "$CLEANUP_DONE" = "1" ] && return 0
CLEANUP_DONE=1
rm -rf "$TMPDIR_E2E" 2>/dev/null || true
# Best-effort targeted delete of the worker the concierge created, so the org
# delete below isn't the only thing reaping it (defensive — org delete cascades
# anyway). Only attempted if we resolved its id and have tenant creds.
if [ -n "$WORKER_ID" ] && [ -n "$TENANT_URL" ] && [ -n "$TENANT_TOKEN" ]; then
curl "${CURL_COMMON[@]}" -X DELETE "$TENANT_URL/workspaces/$WORKER_ID?confirm=true" \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Origin: $TENANT_URL" \
-H "X-Confirm-Name: $WORKER_NAME" >/dev/null 2>&1 || true
fi
if [ "${E2E_KEEP_ORG:-0}" = "1" ]; then
log "E2E_KEEP_ORG=1 — skipping teardown. Manually delete $SLUG when done."
return 0
fi
log "🧹 Tearing down org $SLUG..."
if curl "${CURL_COMMON[@]}" --max-time 120 -X DELETE "$CP_URL/cp/admin/tenants/$SLUG" \
-H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
-d "{\"confirm\":\"$SLUG\"}" >/dev/null 2>&1; then
ok "Teardown request accepted"
else
log "Teardown returned non-2xx (may already be gone)"
fi
# Eventual-consistency wait: org row gone / purged.
local leak_count=1 elapsed=0
while [ "$elapsed" -lt 60 ]; do
leak_count=$(curl "${CURL_COMMON[@]}" "$CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for o in d.get('orgs', []) if o.get('slug')=='$SLUG' and o.get('status') != 'purged'))" \
2>/dev/null || echo 1)
[ "$leak_count" = "0" ] && break
sleep 5; elapsed=$((elapsed + 5))
done
if [ "$leak_count" != "0" ]; then
echo "⚠️ LEAK: org $SLUG still present post-teardown after ${elapsed}s (count=$leak_count)" >&2
exit 4
fi
local aws_leak_rc=0
e2e_verify_no_ec2_leaks_for_slug "$SLUG" || aws_leak_rc=$?
if [ "$aws_leak_rc" != "0" ]; then
case "$aws_leak_rc" in 2) exit 2 ;; *) exit 4 ;; esac
fi
ok "Teardown clean — no orphan org or EC2 resources for $SLUG (${elapsed}s)"
case "$entry_rc" in 0|1|2|3|4|5) ;; *) exit 1 ;; esac
}
trap cleanup EXIT INT TERM
admin_call() { # <method> <path> [curl args…]
local method="$1" path="$2"; shift 2
curl "${CURL_COMMON[@]}" -X "$method" "$CP_URL$path" \
-H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" "$@"
}
# tenant_call: Authorization (tenant admin token — also authenticates the
# concierge, which holds no per-workspace token: validateDiscoveryCaller's admin
# fallback) + X-Molecule-Org-Id (TenantGuard 404s without it) + Origin (edge WAF).
tenant_call() { # <method> <path> [curl args…]
local method="$1" path="$2"; shift 2
curl "${CURL_COMMON[@]}" -X "$method" "$TENANT_URL$path" \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Origin: $TENANT_URL" "$@"
}
# list_workspaces_json: echo the raw GET /workspaces JSON array (tenant-scoped).
list_workspaces_json() { tenant_call GET /workspaces; }
# find_platform_root: echo the id of the kind='platform' parent_id-null root, or
# "" if none. This IS the concierge — the org's front-door agent.
find_platform_root() {
list_workspaces_json | python3 -c "
import sys, json
try: rows = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
for w in rows if isinstance(rows, list) else []:
if w.get('kind') == 'platform' and not w.get('parent_id'):
print(w.get('id','')); break
else:
print('')"
}
# workspace_field <id> <field>: echo a single field off GET /workspaces/:id.
workspace_field() { # <id> <field>
tenant_call GET "/workspaces/$1" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
print(d.get('$2','') if isinstance(d, dict) else '')"
}
# find_worker_by_name: echo the id of a workspace whose name == WORKER_NAME, or
# "" if not present. THIS is the deterministic side effect we assert on.
find_worker_by_name() {
list_workspaces_json | python3 -c "
import sys, json, os
want = os.environ['WORKER_NAME']
try: rows = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
for w in rows if isinstance(rows, list) else []:
if w.get('name') == want:
print(w.get('id','')); break
else:
print('')"
}
# ─── 0. Preflight ────────────────────────────────────────────────────────────
log "═══ Staging concierge CREATES-A-WORKSPACE (real-LLM) E2E ═══ CP=$CP_URL Slug=$SLUG"
log " worker the concierge will be asked to create: name=$WORKER_NAME"
curl "${CURL_COMMON[@]}" "$CP_URL/health" >/dev/null || fail "CP health check failed"
ok "CP reachable"
# ─── 1. Create org (CP installs + provisions the concierge as platform root) ──
log "1/6 Creating org $SLUG..."
CREATE_RESP=$(admin_call POST /cp/admin/orgs \
-d "{\"slug\":\"$SLUG\",\"name\":\"E2E $SLUG\",\"owner_user_id\":\"e2e-runner:$SLUG\"}")
echo "$CREATE_RESP" | python3 -m json.tool >/dev/null || fail "Org create non-JSON: $CREATE_RESP"
ORG_ID=$(echo "$CREATE_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))")
[ -z "$ORG_ID" ] && fail "Org create response missing 'id': $CREATE_RESP"
ok "Org created (id=$ORG_ID)"
# ─── 2. Wait for tenant provisioning ─────────────────────────────────────────
log "2/6 Waiting for tenant provisioning (up to ${PROVISION_TIMEOUT_SECS}s)..."
DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
LAST_STATUS=""
while true; do
[ "$(date +%s)" -gt "$DEADLINE" ] && exit 3
LIST_JSON=$(admin_call GET /cp/admin/orgs 2>/dev/null || echo '{"orgs":[]}')
STATUS=$(echo "$LIST_JSON" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for o in d.get('orgs', []):
if o.get('slug') == '$SLUG':
print(o.get('instance_status', '')); sys.exit(0)
print('')" 2>/dev/null || echo "")
if [ "$STATUS" != "$LAST_STATUS" ]; then log " status → $STATUS"; LAST_STATUS="$STATUS"; fi
case "$STATUS" in
running) break ;;
failed) fail "Tenant provisioning failed for $SLUG" ;;
*) sleep 15 ;;
esac
done
ok "Tenant provisioning complete"
# Derive tenant domain from CP hostname (prod vs staging).
CP_HOST=$(echo "$CP_URL" | sed -E 's#^https?://##; s#/.*$##')
case "$CP_HOST" in
api.*) DERIVED_DOMAIN="${CP_HOST#api.}" ;;
staging-api.*) DERIVED_DOMAIN="staging.${CP_HOST#staging-api.}" ;;
*) DERIVED_DOMAIN="$CP_HOST" ;;
esac
TENANT_DOMAIN="${MOLECULE_TENANT_DOMAIN:-$DERIVED_DOMAIN}"
TENANT_URL="https://$SLUG.$TENANT_DOMAIN"
log " TENANT_URL=$TENANT_URL"
# ─── 3. Per-tenant admin token + TLS readiness ───────────────────────────────
log "3/6 Fetching per-tenant admin token..."
TENANT_TOKEN=$(admin_call GET "/cp/admin/orgs/$SLUG/admin-token" \
| python3 -c "import json,sys; print(json.load(sys.stdin).get('admin_token',''))" 2>/dev/null || echo "")
[ -z "$TENANT_TOKEN" ] && fail "Could not retrieve per-tenant admin token for $SLUG"
ok "Tenant admin token retrieved (len=${#TENANT_TOKEN})"
log " Waiting for tenant TLS / DNS propagation..."
TLS_DEADLINE=$(( $(date +%s) + 15 * 60 ))
while true; do
curl -sSfk --max-time 5 "$TENANT_URL/health" >/dev/null 2>&1 && break
[ "$(date +%s)" -gt "$TLS_DEADLINE" ] && fail "Tenant /health never 2xx within 15m"
sleep 5
done
ok "Tenant reachable at $TENANT_URL"
# ─── 4. Discover the concierge (kind='platform' root) + ensure it can act ─────
log "4/6 Discovering the concierge (kind='platform' root)..."
# The CP installs the platform agent at org-provision; allow a short settle for
# the row + re-parent backfill to land.
CONCIERGE_ID=""
DISC_DEADLINE=$(( $(date +%s) + 180 ))
while true; do
CONCIERGE_ID=$(find_platform_root)
[ -n "$CONCIERGE_ID" ] && break
[ "$(date +%s)" -gt "$DISC_DEADLINE" ] && break
sleep 10
done
if [ -z "$CONCIERGE_ID" ]; then
skip_loud "no kind='platform' concierge root in this org — the platform agent was not installed at provision. \
This needs the CP platform-agent install (RFC §3) live on staging. Until then there is no agent to drive."
fi
ok "Concierge (platform root) = $CONCIERGE_ID"
# The concierge must be ONLINE + routable for its LLM to receive the A2A message
# and reach the platform MCP. Bounded poll — generous because a cold concierge
# boots its container + loads the platform MCP server before it is reachable.
log " Waiting for the concierge to be online (up to ${CONCIERGE_ONLINE_SECS}s)..."
ONLINE_DEADLINE=$(( $(date +%s) + CONCIERGE_ONLINE_SECS ))
C_STATUS=""; C_URL=""; LAST_C_STATUS=""
while true; do
C_STATUS=$(workspace_field "$CONCIERGE_ID" status)
C_URL=$(workspace_field "$CONCIERGE_ID" url)
if [ "$C_STATUS" != "$LAST_C_STATUS" ]; then log " concierge → ${C_STATUS:-<none>}"; LAST_C_STATUS="$C_STATUS"; fi
if [ "$C_STATUS" = "online" ] && [ -n "$C_URL" ]; then break; fi
if [ "$(date +%s)" -gt "$ONLINE_DEADLINE" ]; then
LAST_ERR=$(workspace_field "$CONCIERGE_ID" last_sample_error)
skip_loud "concierge $CONCIERGE_ID never reached online+routable within ${CONCIERGE_ONLINE_SECS}s \
(last status='${C_STATUS}', url='${C_URL}', err='${LAST_ERR}'). On a tenant where the concierge is NOT \
provisioned on the platform-agent image (no /opt/molecule-mcp-server, no model), it cannot run the \
create_workspace tool — that is the parallel-agent image work this gate depends on."
fi
sleep 10
done
ok "Concierge online + routable (url assigned)"
# Pre-state: the worker MUST NOT exist yet (so its later appearance is causally
# the concierge's doing, not a pre-existing row).
PRE_EXISTING=$(find_worker_by_name)
[ -n "$PRE_EXISTING" ] && fail "worker '$WORKER_NAME' already exists pre-test ($PRE_EXISTING) — name collision, cannot prove causality"
ok "Pre-state confirmed: '$WORKER_NAME' does not exist yet"
# ─── 5. Drive the AGENT: A2A message/send → it must create the workspace ──────
log "5/6 Sending the concierge a natural-language create-workspace request..."
# Imperative + explicit to defuse LLM nondeterminism: name the tool, the exact
# workspace NAME and ROLE, and tell it not to ask a clarifying question. The
# message/send envelope is the canvas user→agent chat path (handlers/a2a_proxy.go),
# identical to the shape test_a2a_e2e.sh / test_staging_full_saas.sh use.
AGENT_PROMPT="Please create a new workspace in this org right now using your platform tools. \
Use the create_workspace tool with name exactly \"${WORKER_NAME}\" and role \"engineer\". \
Do not ask me any clarifying questions — the name and role are final. \
After the tool succeeds, reply with the new workspace id."
A2A_PAYLOAD=$(WORKER_NAME="$WORKER_NAME" AGENT_PROMPT="$AGENT_PROMPT" python3 -c "
import json, os, uuid
print(json.dumps({
'jsonrpc': '2.0',
'method': 'message/send',
'id': 'e2e-cncrg-mk-1',
'params': {
'message': {
'role': 'user',
'messageId': f'e2e-{uuid.uuid4().hex[:8]}',
'parts': [{'kind': 'text', 'text': os.environ['AGENT_PROMPT']}],
}
}
}))")
# Cold concierge: first turn opens TLS to the LLM, loads the platform MCP, runs
# a tool call. Give it a wide per-call window AND retry on edge cold-start 5xx.
A2A_TMP="$TMPDIR_E2E/a2a_out"
AGENT_TEXT=""
A2A_OK=0
for A2A_ATTEMPT in $(seq 1 8); do
: >"$A2A_TMP"
set +e
A2A_CODE=$(tenant_call POST "/workspaces/$CONCIERGE_ID/a2a" \
--max-time "$AGENT_ACT_SECS" \
-H "Content-Type: application/json" \
-d "$A2A_PAYLOAD" \
-o "$A2A_TMP" -w '%{http_code}' 2>/dev/null)
A2A_RC=$?
set -e
A2A_CODE=${A2A_CODE:-000}
A2A_RESP=$(cat "$A2A_TMP" 2>/dev/null || echo "")
if [ "$A2A_RC" = "0" ] && [ "$A2A_CODE" -ge 200 ] && [ "$A2A_CODE" -lt 300 ]; then
A2A_OK=1
break
fi
if echo "$A2A_CODE" | grep -Eq '^(502|503|504)$'; then
log " A2A cold-start attempt $A2A_ATTEMPT/8 returned $A2A_CODE — retrying"
[ "$A2A_ATTEMPT" -lt 8 ] && { sleep 15; continue; }
fi
break
done
if [ "$A2A_OK" != "1" ]; then
# A non-2xx A2A POST is an INFRA/transport failure (agent unreachable), not an
# "agent declined" — distinct from the assertion below.
fail "A2A POST /workspaces/$CONCIERGE_ID/a2a failed (curl_rc=$A2A_RC, http=$A2A_CODE) after $A2A_ATTEMPT attempt(s): $(echo "$A2A_RESP" | head -c 400)"
fi
AGENT_TEXT=$(echo "$A2A_RESP" | python3 -c "
import sys, json
try: d = json.load(sys.stdin)
except Exception: print(''); sys.exit(0)
parts = (d.get('result') or {}).get('parts', []) if isinstance(d, dict) else []
print(parts[0].get('text','') if parts else '')" 2>/dev/null || echo "")
log " concierge replied (first 300 chars): $(echo "$AGENT_TEXT" | head -c 300)"
# ─── 6. ASSERT the deterministic side effect: the worker now EXISTS ───────────
log "6/6 Polling GET /workspaces for the worker the concierge was asked to create..."
# The create is the side effect; the LLM may take a few turns / a moment to flush
# the tool call. Poll the NAME (deterministic) — tolerant of when exactly the row
# lands, intolerant of it never landing.
ACT_DEADLINE=$(( $(date +%s) + AGENT_ACT_SECS ))
while true; do
WORKER_ID=$(find_worker_by_name)
[ -n "$WORKER_ID" ] && break
if [ "$(date +%s)" -gt "$ACT_DEADLINE" ]; then
# The agent answered but the workspace never appeared → the LLM did NOT call
# create_workspace (or the tool failed). Distinguish the two for the operator.
if hit=$(a2a_completion_error_marker "$AGENT_TEXT"); then
fail "TOOL FAILED: concierge surfaced an error-as-text reply (matched '$hit') and no workspace '$WORKER_NAME' was created. \
The platform MCP create_workspace tool errored. Reply: $(echo "$AGENT_TEXT" | head -c 400)"
fi
fail "AGENT DID NOT ACT: concierge replied but no workspace named '$WORKER_NAME' exists in GET /workspaces after ${AGENT_ACT_SECS}s. \
The concierge's LLM did not invoke the create_workspace platform-MCP tool. \
Reply: $(echo "$AGENT_TEXT" | head -c 400)"
fi
sleep 8
done
ok "DETERMINISTIC SIDE EFFECT CONFIRMED: workspace '$WORKER_NAME' now EXISTS (id=$WORKER_ID)"
# Confirm it is a real workspace row (kind='workspace') parented under the org —
# i.e. a genuine create, not a no-op echo. parent_id may be the concierge (the
# concierge creates children under itself by convention) or another node; we
# assert only that it's a non-platform workspace, which is what create_workspace
# yields.
WORKER_KIND=$(workspace_field "$WORKER_ID" kind)
if [ -n "$WORKER_KIND" ] && [ "$WORKER_KIND" != "workspace" ]; then
fail "created node '$WORKER_NAME' has kind='$WORKER_KIND' (want 'workspace') — not a real worker create"
fi
ok "Created node is a real kind='workspace' row"
# Soft confirmation: the concierge SHOULD report back. Non-fatal (the side
# effect above is the hard proof) — but a reply that is itself an error is a
# yellow flag worth logging even though the row landed.
if [ -n "$AGENT_TEXT" ]; then
if a2a_completion_error_marker "$AGENT_TEXT" >/dev/null; then
log " ⚠️ concierge reply looks like an error-as-text even though the workspace was created — investigate the tool result surfacing."
else
ok "Concierge replied confirming the action (non-error)"
fi
else
log " (concierge returned no text part — the row landing is the proof; reply is optional)"
fi
ok "═══ STAGING CONCIERGE CREATES-A-WORKSPACE E2E PASSED ═══"
log "Proven: a natural-language A2A request → the concierge's LLM invoked create_workspace via the platform MCP → real org mutation (workspace '$WORKER_NAME' id=$WORKER_ID). Teardown runs via EXIT trap."
-376
View File
@@ -1,376 +0,0 @@
#!/usr/bin/env bash
# Real-staging E2E for the concierge user_tasks primitive (Feature 3 of the
# concierge / platform-agent set). Exercises the FULL agent→user "ask" contract
# both surfaces expose, END-TO-END against a real EC2-backed staging tenant:
#
# REST (per-workspace, tenant-admin-token authenticated):
# POST /workspaces/:id/user-tasks create an ask
# GET /workspaces/:id/user-tasks this workspace's asks
# GET /user-tasks/pending (AdminAuth) org-wide pending asks
# PATCH /workspaces/:id/user-tasks/:taskId edit (scoped by ws id)
# DELETE /workspaces/:id/user-tasks/:taskId remove (scoped by ws id)
# POST /workspaces/:id/user-tasks/:taskId/resolve done|dismissed
#
# MCP a2a-bridge tools (POST /workspaces/:id/mcp, JSON-RPC tools/call):
# request_user_action(title, detail?) list_user_tasks()
# update_user_task(user_task_id, …) delete_user_task(user_task_id)
#
# Cross-workspace authz: workspace B cannot PATCH/DELETE workspace A's task
# (the user_tasks handler scopes every mutation by the URL :id, so a B-path
# call against an A-owned task 404s — the same scoping the local
# test_user_tasks_e2e.sh pins, here proven over the real tenant ws-server).
#
# Why a real-staging sibling to the LOCAL test_user_tasks_e2e.sh: the local one
# runs against a dev workspace-server with external/in-memory workspaces. This
# one provisions a REAL throwaway org + tenant (same CP-admin scaffolding as
# test_staging_full_saas.sh) and drives the user_tasks surfaces through the live
# tenant auth chain (TenantGuard + WorkspaceAuth + Cloudflare edge) — the exact
# path a canvas concierge agent hits in production. It REUSES the staging
# harness's env contract, org-provision/teardown shape, _lib.sh helpers, and the
# AWS-leak-check lib, so the org lifecycle scaffolding is shared, not duplicated.
#
# NOTE: user_tasks is a pure DB/handler primitive — no LLM container is needed.
# We DO NOT wait for any workspace to boot online (no MINIMAX/ANTHROPIC key
# required), which keeps this test fast and decoupled from EC2 cold-boot flake.
# Workspaces are created in 'external' mode so the tenant ws-server registers
# the row without provisioning an EC2 (no leak beyond the org teardown).
#
# Required env (same contract as test_staging_full_saas.sh):
# MOLECULE_CP_URL default: https://staging-api.moleculesai.app
# MOLECULE_ADMIN_TOKEN CP admin bearer — Railway staging CP_ADMIN_API_TOKEN
#
# Optional env:
# E2E_PROVISION_TIMEOUT_SECS default 900 (15 min cold tenant EC2 budget)
# E2E_KEEP_ORG 1 → skip teardown (debugging only)
# E2E_RUN_ID slug suffix; CI: ${GITHUB_RUN_ID}-${RUN_ATTEMPT}
# E2E_AWS_LEAK_CHECK auto (default) | required | off
# E2E_AWS_TERMINATE_LEAKS 1 → terminate slug-tagged leaked EC2 on exit
#
# Exit codes:
# 0 happy path
# 1 generic / assertion failure
# 2 missing required env
# 3 provisioning timed out
# 4 teardown left orphan resources
set -euo pipefail
# _lib.sh gives us sanitize/admin-auth conventions shared across the suite.
# shellcheck disable=SC1091
# shellcheck source=_lib.sh
source "$(dirname "$0")/_lib.sh"
# AWS-leak-check lib — same teardown leak assertion the full-SaaS harness uses.
# shellcheck disable=SC1091
# shellcheck source=lib/aws_leak_check.sh
source "$(dirname "$0")/lib/aws_leak_check.sh"
CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
# Fixed e2e- prefix so sweep-stale-e2e-orgs.yml + lint_cleanup_traps.sh reap any
# orphan. (The lint requires a quoted SLUG=... with a literal e2e-/rt-e2e- head.)
SLUG="e2e-cncrg-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
SLUG=$(echo "$SLUG" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-' | head -c 32)
log() { echo "[$(date +%H:%M:%S)] $*"; }
fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
PASS=0
FAIL=0
check() { # <desc> <expected-substr> <actual>
if echo "$3" | grep -qF -- "$2"; then echo " PASS: $1"; PASS=$((PASS + 1));
else echo " FAIL: $1"; echo " expected to contain: $2"; echo " got: $(echo "$3" | head -c 300)"; FAIL=$((FAIL + 1)); fi
}
check_not() { # <desc> <unexpected-substr> <actual>
if echo "$3" | grep -qF -- "$2"; then echo " FAIL: $1 (should NOT contain: $2)"; FAIL=$((FAIL + 1));
else echo " PASS: $1"; PASS=$((PASS + 1)); fi
}
check_code() { # <desc> <expected> <actual>
if [ "$3" = "$2" ]; then echo " PASS: $1 (HTTP $3)"; PASS=$((PASS + 1));
else echo " FAIL: $1 (expected HTTP $2, got HTTP $3)"; FAIL=$((FAIL + 1)); fi
}
CURL_COMMON=(-sS --max-time 30)
TMPDIR_E2E=$(mktemp -d -t cncrg-staging-XXXXXX)
# ─── teardown trap (org delete + leak check) ─────────────────────────────────
CLEANUP_DONE=0
cleanup_org() {
local entry_rc=$?
[ "$CLEANUP_DONE" = "1" ] && return 0
CLEANUP_DONE=1
rm -rf "$TMPDIR_E2E" 2>/dev/null || true
if [ "${E2E_KEEP_ORG:-0}" = "1" ]; then
log "E2E_KEEP_ORG=1 — skipping teardown. Manually delete $SLUG when done."
return 0
fi
log "🧹 Tearing down org $SLUG..."
if curl "${CURL_COMMON[@]}" --max-time 120 -X DELETE "$CP_URL/cp/admin/tenants/$SLUG" \
-H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
-d "{\"confirm\":\"$SLUG\"}" >/dev/null 2>&1; then
ok "Teardown request accepted"
else
log "Teardown returned non-2xx (may already be gone)"
fi
# Eventual-consistency wait: org row gone / purged.
local leak_count=1 elapsed=0
while [ "$elapsed" -lt 60 ]; do
leak_count=$(curl "${CURL_COMMON[@]}" "$CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for o in d.get('orgs', []) if o.get('slug')=='$SLUG' and o.get('status') != 'purged'))" \
2>/dev/null || echo 1)
[ "$leak_count" = "0" ] && break
sleep 5; elapsed=$((elapsed + 5))
done
if [ "$leak_count" != "0" ]; then
echo "⚠️ LEAK: org $SLUG still present post-teardown after ${elapsed}s (count=$leak_count)" >&2
exit 4
fi
local aws_leak_rc=0
e2e_verify_no_ec2_leaks_for_slug "$SLUG" || aws_leak_rc=$?
if [ "$aws_leak_rc" != "0" ]; then
case "$aws_leak_rc" in 2) exit 2 ;; *) exit 4 ;; esac
fi
ok "Teardown clean — no orphan org or EC2 resources for $SLUG (${elapsed}s)"
case "$entry_rc" in 0|1|2|3|4) ;; *) exit 1 ;; esac
}
trap cleanup_org EXIT INT TERM
admin_call() { # <method> <path> [curl args…]
local method="$1" path="$2"; shift 2
curl "${CURL_COMMON[@]}" -X "$method" "$CP_URL$path" \
-H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" "$@"
}
# ─── 0. Preflight ────────────────────────────────────────────────────────────
log "═══ Staging concierge user_tasks E2E ═══ CP=$CP_URL Slug=$SLUG"
curl "${CURL_COMMON[@]}" "$CP_URL/health" >/dev/null || fail "CP health check failed"
ok "CP reachable"
# ─── 1. Create org ───────────────────────────────────────────────────────────
log "1/6 Creating org $SLUG..."
CREATE_RESP=$(admin_call POST /cp/admin/orgs \
-d "{\"slug\":\"$SLUG\",\"name\":\"E2E $SLUG\",\"owner_user_id\":\"e2e-runner:$SLUG\"}")
echo "$CREATE_RESP" | python3 -m json.tool >/dev/null || fail "Org create non-JSON: $CREATE_RESP"
ORG_ID=$(echo "$CREATE_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))")
[ -z "$ORG_ID" ] && fail "Org create response missing 'id': $CREATE_RESP"
ok "Org created (id=$ORG_ID)"
# ─── 2. Wait for tenant provisioning ─────────────────────────────────────────
log "2/6 Waiting for tenant provisioning (up to ${PROVISION_TIMEOUT_SECS}s)..."
DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
LAST_STATUS=""
while true; do
[ "$(date +%s)" -gt "$DEADLINE" ] && exit 3
LIST_JSON=$(admin_call GET /cp/admin/orgs 2>/dev/null || echo '{"orgs":[]}')
STATUS=$(echo "$LIST_JSON" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for o in d.get('orgs', []):
if o.get('slug') == '$SLUG':
print(o.get('instance_status', '')); sys.exit(0)
print('')" 2>/dev/null || echo "")
if [ "$STATUS" != "$LAST_STATUS" ]; then log " status → $STATUS"; LAST_STATUS="$STATUS"; fi
case "$STATUS" in
running) break ;;
failed) fail "Tenant provisioning failed for $SLUG" ;;
*) sleep 15 ;;
esac
done
ok "Tenant provisioning complete"
# Derive tenant domain from CP hostname (prod vs staging).
CP_HOST=$(echo "$CP_URL" | sed -E 's#^https?://##; s#/.*$##')
case "$CP_HOST" in
api.*) DERIVED_DOMAIN="${CP_HOST#api.}" ;;
staging-api.*) DERIVED_DOMAIN="staging.${CP_HOST#staging-api.}" ;;
*) DERIVED_DOMAIN="$CP_HOST" ;;
esac
TENANT_DOMAIN="${MOLECULE_TENANT_DOMAIN:-$DERIVED_DOMAIN}"
TENANT_URL="https://$SLUG.$TENANT_DOMAIN"
log " TENANT_URL=$TENANT_URL"
# ─── 3. Per-tenant admin token + TLS readiness ───────────────────────────────
log "3/6 Fetching per-tenant admin token..."
TENANT_TOKEN=$(admin_call GET "/cp/admin/orgs/$SLUG/admin-token" \
| python3 -c "import json,sys; print(json.load(sys.stdin).get('admin_token',''))" 2>/dev/null || echo "")
[ -z "$TENANT_TOKEN" ] && fail "Could not retrieve per-tenant admin token for $SLUG"
ok "Tenant admin token retrieved (len=${#TENANT_TOKEN})"
log " Waiting for tenant TLS / DNS propagation..."
TLS_DEADLINE=$(( $(date +%s) + 15 * 60 ))
while true; do
curl -sSfk --max-time 5 "$TENANT_URL/health" >/dev/null 2>&1 && break
[ "$(date +%s)" -gt "$TLS_DEADLINE" ] && fail "Tenant /health never 2xx within 15m"
sleep 5
done
ok "Tenant reachable at $TENANT_URL"
# tenant_call: Authorization (tenant admin token, valid for every workspace) +
# X-Molecule-Org-Id (TenantGuard 404s without it) + Origin (Cloudflare edge).
tenant_call() { # <method> <path> [curl args…]
local method="$1" path="$2"; shift 2
curl "${CURL_COMMON[@]}" -X "$method" "$TENANT_URL$path" \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "X-Molecule-Org-Id: $ORG_ID" \
-H "Origin: $TENANT_URL" "$@"
}
# Create an external workspace (row only — no EC2). Echoes its id.
create_external_ws() { # <name>
local name="$1" resp
resp=$(tenant_call POST /workspaces -H "Content-Type: application/json" \
-d "{\"name\":\"$name\",\"tier\":1,\"runtime\":\"external\",\"external\":true}")
echo "$resp" | python3 -c "import sys,re
b=sys.stdin.read()
m=re.search(r'\"id\"\s*:\s*\"([^\"]+)\"', b)
print(m.group(1) if m else '')"
}
# MCP JSON-RPC tools/call against /workspaces/:id/mcp. Echoes the result text
# (result.content[].text). Persists HTTP code to a file (runs in $()).
MCP_CODE_FILE="$TMPDIR_E2E/mcp_code"
mcp_call() { # <wsid> <tool> <args-json>
local wsid="$1" tool="$2" args="$3" out code
out="$TMPDIR_E2E/mcp_out"
set +e
code=$(tenant_call POST "/workspaces/$wsid/mcp" -H "Content-Type: application/json" \
-d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"$tool\",\"arguments\":$args}}" \
-o "$out" -w "%{http_code}" 2>/dev/null)
set -e
printf '%s' "$code" > "$MCP_CODE_FILE"
python3 -c "
import sys, json
try: d = json.load(open('$out'))
except Exception: print(''); sys.exit(0)
res = d.get('result') if isinstance(d, dict) else None
print(''.join(c.get('text','') for c in res.get('content', [])) if isinstance(res, dict) else '')"
}
mcp_http_code() { cat "$MCP_CODE_FILE" 2>/dev/null || echo ''; }
# ─── 4. Provision two workspaces (A raises asks, B probes cross-ws authz) ─────
log "4/6 Creating two tenant workspaces (external rows — no EC2)..."
WS_A=$(create_external_ws "Concierge-UT-A-$$")
[ -n "$WS_A" ] || fail "ws-A create returned no id"
WS_B=$(create_external_ws "Concierge-UT-B-$$")
[ -n "$WS_B" ] || fail "ws-B create returned no id"
ok "ws-A=$WS_A ws-B=$WS_B"
# ─── 5. user_tasks REST + MCP + authz ────────────────────────────────────────
log "5/6 user_tasks contract (REST + MCP + cross-ws authz)..."
# 5.1 REST create → 201, status pending
R=$(tenant_call POST "/workspaces/$WS_A/user-tasks" -H "Content-Type: application/json" \
-d '{"title":"Review the Q3 draft","detail":"Need your sign-off before send"}' \
-o "$TMPDIR_E2E/c.json" -w "%{http_code}" 2>/dev/null || echo "000")
BODY=$(cat "$TMPDIR_E2E/c.json" 2>/dev/null || echo "")
check_code "REST create user-task" "201" "$R"
check "create returns status pending" '"status":"pending"' "$BODY"
TASK_ID=$(echo "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin).get('user_task_id',''))" 2>/dev/null || echo "")
[ -n "$TASK_ID" ] || fail "no user_task_id returned: $BODY"
log " TASK_ID=$TASK_ID"
# 5.2 REST read (this workspace + admin org-wide pending)
R=$(tenant_call GET "/workspaces/$WS_A/user-tasks")
check "GET ws-A user-tasks contains the task" "$TASK_ID" "$R"
check "GET ws-A user-tasks shows title" 'Review the Q3 draft' "$R"
R=$(tenant_call GET "/user-tasks/pending")
check "GET /user-tasks/pending (admin) contains the task" "$TASK_ID" "$R"
check "pending entry carries workspace_name" "Concierge-UT-A-$$" "$R"
# 5.3 REST PATCH title/detail → 200, applied
R=$(tenant_call PATCH "/workspaces/$WS_A/user-tasks/$TASK_ID" -H "Content-Type: application/json" \
-d '{"title":"Review the Q3 draft (URGENT)","detail":"Sign-off needed by EOD"}' \
-o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "REST PATCH user-task" "200" "$R"
R=$(tenant_call GET "/workspaces/$WS_A/user-tasks")
check "PATCH applied new title" '(URGENT)' "$R"
check "PATCH applied new detail" 'Sign-off needed by EOD' "$R"
# 5.4 REST resolve done → 200, gone from pending
R=$(tenant_call POST "/workspaces/$WS_A/user-tasks/$TASK_ID/resolve" -H "Content-Type: application/json" \
-d '{"status":"done","resolved_by":"cto"}' -o "$TMPDIR_E2E/r.json" -w "%{http_code}" 2>/dev/null || echo "000")
BODY=$(cat "$TMPDIR_E2E/r.json" 2>/dev/null || echo "")
check_code "REST resolve done" "200" "$R"
check "resolve echoes status done" '"status":"done"' "$BODY"
R=$(tenant_call GET "/user-tasks/pending")
check_not "resolved task no longer pending (admin feed)" "$TASK_ID" "$R"
# 5.5 MCP request_user_action → new pending task surfaces on the admin feed
TEXT=$(mcp_call "$WS_A" "request_user_action" '{"title":"Provide the staging API key","detail":"Blocked on it for the deploy"}')
check_code "MCP request_user_action HTTP" "200" "$(mcp_http_code)"
check "MCP request_user_action success text" 'Asked the user' "$TEXT"
R=$(tenant_call GET "/user-tasks/pending")
check "MCP-created ask appears in pending feed" 'Provide the staging API key' "$R"
MCP_TASK_ID=$(echo "$R" | python3 -c "
import sys, json
for t in json.load(sys.stdin):
if t.get('title') == 'Provide the staging API key':
print(t.get('id','')); break" 2>/dev/null || echo "")
log " MCP_TASK_ID=$MCP_TASK_ID"
# 5.6 MCP list_user_tasks returns ws-A's task(s)
TEXT=$(mcp_call "$WS_A" "list_user_tasks" '{}')
check_code "MCP list_user_tasks HTTP" "200" "$(mcp_http_code)"
check "list_user_tasks contains the MCP task" 'Provide the staging API key' "$TEXT"
check "list_user_tasks shows it pending" '"status":"pending"' "$TEXT"
# 5.7 MCP update_user_task changes it
if [ -n "$MCP_TASK_ID" ]; then
TEXT=$(mcp_call "$WS_A" "update_user_task" "{\"user_task_id\":\"$MCP_TASK_ID\",\"title\":\"Provide the PROD API key\"}")
check_code "MCP update_user_task HTTP" "200" "$(mcp_http_code)"
check "MCP update_user_task success text" 'User task updated' "$TEXT"
TEXT=$(mcp_call "$WS_A" "list_user_tasks" '{}')
check "update applied (new title)" 'Provide the PROD API key' "$TEXT"
check_not "update applied (old title gone)" 'staging API key' "$TEXT"
# 5.8 MCP delete_user_task → gone from list
TEXT=$(mcp_call "$WS_A" "delete_user_task" "{\"user_task_id\":\"$MCP_TASK_ID\"}")
check_code "MCP delete_user_task HTTP" "200" "$(mcp_http_code)"
check "MCP delete_user_task success text" 'User task deleted' "$TEXT"
TEXT=$(mcp_call "$WS_A" "list_user_tasks" '{}')
check_not "deleted task gone from list" 'Provide the PROD API key' "$TEXT"
else
echo " FAIL: could not resolve MCP_TASK_ID — MCP update/delete steps skipped"
FAIL=$((FAIL + 1))
fi
# 5.9 Cross-workspace authz: ws-B cannot mutate ws-A's task (scoped by URL :id)
SCOPE_ID=$(tenant_call POST "/workspaces/$WS_A/user-tasks" -H "Content-Type: application/json" \
-d '{"title":"Scope probe task"}' | python3 -c "import sys,json; print(json.load(sys.stdin).get('user_task_id',''))" 2>/dev/null || echo "")
[ -n "$SCOPE_ID" ] || fail "scope-probe task create failed"
log " SCOPE_ID=$SCOPE_ID (owned by ws-A)"
# ws-B PATCHes ws-A's task → 404 (workspace_id scope).
R=$(tenant_call PATCH "/workspaces/$WS_B/user-tasks/$SCOPE_ID" -H "Content-Type: application/json" \
-d '{"title":"hijack"}' -o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "ws-B PATCH of ws-A's task scoped out" "404" "$R"
# ws-B DELETEs ws-A's task → 404.
R=$(tenant_call DELETE "/workspaces/$WS_B/user-tasks/$SCOPE_ID" -o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "ws-B DELETE of ws-A's task scoped out" "404" "$R"
# Task survived unchanged on ws-A.
R=$(tenant_call GET "/workspaces/$WS_A/user-tasks")
check "ws-A's task survived cross-ws attempts" "$SCOPE_ID" "$R"
check_not "ws-A's task title was NOT hijacked" 'hijack' "$R"
# ws-B's own list must NOT see ws-A's task at all.
R=$(tenant_call GET "/workspaces/$WS_B/user-tasks")
check_not "ws-B list excludes ws-A's task (read isolation)" "$SCOPE_ID" "$R"
# 5.10 Validation contracts
R=$(tenant_call POST "/workspaces/$WS_A/user-tasks" -H "Content-Type: application/json" \
-d '{"detail":"no title here"}' -o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "create without title → 400" "400" "$R"
R=$(tenant_call POST "/workspaces/$WS_A/user-tasks/$SCOPE_ID/resolve" -H "Content-Type: application/json" \
-d '{"status":"banana"}' -o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "resolve with invalid status → 400" "400" "$R"
R=$(tenant_call PATCH "/workspaces/$WS_A/user-tasks/$SCOPE_ID" -H "Content-Type: application/json" \
-d '{"status":"banana"}' -o /dev/null -w "%{http_code}" 2>/dev/null || echo "000")
check_code "PATCH with invalid status → 400" "400" "$R"
# ─── 6. Results ──────────────────────────────────────────────────────────────
log "6/6 Results: $PASS passed, $FAIL failed (teardown runs via EXIT trap)"
[ "$FAIL" -eq 0 ] || fail "$FAIL user_tasks assertion(s) failed"
ok "═══ STAGING CONCIERGE user_tasks E2E PASSED ($PASS checks) ═══"
-351
View File
@@ -1,351 +0,0 @@
#!/usr/bin/env bash
# E2E tests for the user_tasks platform ability — agent → user action
# requests ("asks"). Exercises the FULL contract both surfaces expose:
#
# REST (WorkspaceAuth unless noted):
# POST /workspaces/:id/user-tasks create an ask
# GET /workspaces/:id/user-tasks this workspace's asks
# GET /user-tasks/pending (AdminAuth) org-wide pending asks
# PATCH /workspaces/:id/user-tasks/:taskId edit (scoped by ws id)
# DELETE /workspaces/:id/user-tasks/:taskId remove (scoped by ws id)
# POST /workspaces/:id/user-tasks/:taskId/resolve done|dismissed
#
# MCP a2a-bridge tools (POST /workspaces/:id/mcp, JSON-RPC tools/call):
# request_user_action(title, detail?) list_user_tasks()
# update_user_task(user_task_id, …) delete_user_task(user_task_id)
#
# The MCP arm is what proves the agent→user ability END-TO-END: it drives
# the literal `tools/call` envelope through the real WorkspaceAuth chain
# (the exact call a canvas agent makes), then asserts the new task surfaces
# on the admin-gated concierge feed (/user-tasks/pending).
#
# Requires: platform running on $BASE (default http://localhost:8080).
# Env contract (same as its siblings in this dir):
# BASE platform base URL (default http://localhost:8080)
# ADMIN_TOKEN / platform admin bearer; MOLECULE_ADMIN_TOKEN wins.
# MOLECULE_ADMIN_TOKEN Sent on AdminAuth routes (create/delete ws,
# /user-tasks/pending). Fail-open dev platform with
# no admin token still works (helpers send nothing).
set -euo pipefail
source "$(dirname "$0")/_lib.sh" # sets BASE default + admin-auth helpers
PASS=0
FAIL=0
check() {
local desc="$1"
local expected="$2"
local actual="$3"
if echo "$actual" | grep -qF -- "$expected"; then
echo "PASS: $desc"
PASS=$((PASS + 1))
else
echo "FAIL: $desc"
echo " expected to contain: $expected"
echo " got: $(echo "$actual" | head -5)"
FAIL=$((FAIL + 1))
fi
}
check_not() {
local desc="$1"
local unexpected="$2"
local actual="$3"
if echo "$actual" | grep -qF -- "$unexpected"; then
echo "FAIL: $desc"
echo " should NOT contain: $unexpected"
FAIL=$((FAIL + 1))
else
echo "PASS: $desc"
PASS=$((PASS + 1))
fi
}
# Assert an exact HTTP status. $1 desc, $2 expected code, $3 actual code.
check_code() {
local desc="$1"
local expected="$2"
local actual="$3"
if [ "$actual" = "$expected" ]; then
echo "PASS: $desc (HTTP $actual)"
PASS=$((PASS + 1))
else
echo "FAIL: $desc"
echo " expected HTTP $expected, got HTTP $actual"
FAIL=$((FAIL + 1))
fi
}
# Admin bearer for AdminAuth routes (create/delete workspace, pending feed).
ADMIN_AUTH=()
e2e_admin_auth_args ADMIN_AUTH
acurl() { curl -s ${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"} "$@"; }
# The local create-workspace response embeds a claude_code_channel_snippet
# whose raw newlines/escapes make the body un-loadable by strict json.load
# (the same reason _extract_token.py can emit empty here). So pull id +
# auth_token with tolerant regexes that don't parse the whole envelope.
extract_field_regex() { # <field> ; reads body on stdin
local field="$1"
python3 -c "
import sys, re
body = sys.stdin.read()
m = re.search(r'\"$field\"\s*:\s*\"([^\"]+)\"', body)
print(m.group(1) if m else '')
"
}
extract_ws_id() { extract_field_regex "id"; }
extract_ws_token() { extract_field_regex "auth_token"; }
# Create an external workspace; echo "<id>\t<token>". Caller registers ids
# in CREATED_WSIDS for the scoped teardown.
create_workspace() { # <name>
local name="$1" resp wid tok
resp=$(acurl -X POST "$BASE/workspaces" -H "Content-Type: application/json" \
-d "{\"name\":\"$name\",\"tier\":1,\"runtime\":\"external\",\"external\":true}")
wid=$(printf '%s' "$resp" | extract_ws_id)
tok=$(printf '%s' "$resp" | extract_ws_token)
if [ -z "$wid" ]; then
echo "FATAL: create workspace '$name' returned no id: $(printf '%s' "$resp" | head -c 200)" >&2
return 1
fi
if [ -z "$tok" ]; then
# External create did not echo a token — mint one via the admin endpoint.
tok=$(e2e_mint_workspace_token "$wid" 2>/dev/null || echo "")
fi
if [ -z "$tok" ]; then
echo "FATAL: no workspace bearer for '$name' ($wid)" >&2
return 1
fi
printf '%s\t%s\n' "$wid" "$tok"
}
# Issue a JSON-RPC tools/call to a workspace MCP endpoint. Echoes the raw
# HTTP body on stdout and persists the HTTP status to $MCP_CODE_FILE (mcp_call
# runs in a command substitution, so a plain var would be lost in the
# subshell — read the code back via mcp_http_code after the call).
# <wsid> <bearer> <tool> <args-json>
MCP_CODE_FILE="$(mktemp -t ut_mcp_code.XXXXXX)"
MCP_BODY_FILE="$(mktemp -t ut_mcp_body.XXXXXX)"
mcp_call() {
local wsid="$1" bearer="$2" tool="$3" args="$4" code
set +e
code=$(curl -sS -X POST "$BASE/workspaces/$wsid/mcp" \
-H "Authorization: Bearer $bearer" \
-H "Content-Type: application/json" \
-d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"$tool\",\"arguments\":$args}}" \
-o "$MCP_BODY_FILE" -w "%{http_code}" 2>/dev/null)
set -e
printf '%s' "$code" > "$MCP_CODE_FILE"
cat "$MCP_BODY_FILE" 2>/dev/null || echo ''
}
mcp_http_code() { cat "$MCP_CODE_FILE" 2>/dev/null || echo ''; }
# Extract the `result.content[].text` from an MCP tools/call response.
mcp_result_text() { # reads body on stdin
python3 -c "
import sys, json
try:
d = json.load(sys.stdin)
except Exception:
print(''); sys.exit(0)
res = d.get('result') if isinstance(d, dict) else None
if not isinstance(res, dict):
print(''); sys.exit(0)
print(''.join(c.get('text','') for c in res.get('content', []) if c.get('type') == 'text'))
"
}
# ─── Scoped teardown ───────────────────────────────────────────────────
# Deletes ONLY the workspaces THIS run created (CREATED_WSIDS). Deleting a
# workspace cascades its user_tasks rows, so no separate task cleanup is
# needed. NEVER a blanket sweep — a local stack can be shared with other
# concurrent E2E runs.
CREATED_WSIDS=()
teardown() {
local rc=$?
set +e
echo ""
echo "[teardown] deleting ${#CREATED_WSIDS[@]} workspace(s) this run created (scoped)"
for wid in ${CREATED_WSIDS[@]+"${CREATED_WSIDS[@]}"}; do
[ -n "$wid" ] || continue
e2e_delete_workspace "$wid" "" ${ADMIN_AUTH[@]+"${ADMIN_AUTH[@]}"}
done
exit $rc
}
trap teardown EXIT INT TERM
echo "=== user_tasks E2E (REST + MCP) ==="
echo ""
# ─── Setup: two sibling workspaces (A raises asks; B probes scoping) ────
IFS=$'\t' read -r WS_A A_TOK < <(create_workspace "UserTasks-A-$$") || true
[ -n "${WS_A:-}" ] || { echo "FATAL: ws-A setup failed"; exit 1; }
CREATED_WSIDS+=("$WS_A")
IFS=$'\t' read -r WS_B B_TOK < <(create_workspace "UserTasks-B-$$") || true
[ -n "${WS_B:-}" ] || { echo "FATAL: ws-B setup failed"; exit 1; }
CREATED_WSIDS+=("$WS_B")
echo "ws-A=$WS_A ws-B=$WS_B"
echo ""
# ─── 1. Create (REST) on ws-A → 201, status pending ────────────────────
echo "--- 1. Create (REST) ---"
R=$(curl -s -w "\n%{http_code}" -X POST "$BASE/workspaces/$WS_A/user-tasks" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" \
-d '{"title":"Review the Q3 draft","detail":"Need your sign-off before send"}')
CODE=$(printf '%s' "$R" | tail -n1)
BODY=$(printf '%s' "$R" | sed '$d')
check_code "POST create user-task" "201" "$CODE"
check "create returns status pending" '"status":"pending"' "$BODY"
TASK_ID=$(printf '%s' "$BODY" | python3 -c "import sys,json; print(json.load(sys.stdin)['user_task_id'])")
echo " TASK_ID=$TASK_ID"
[ -n "$TASK_ID" ] || { echo "FATAL: no user_task_id returned"; }
# ─── 2. Read (REST workspace + admin pending) ──────────────────────────
echo ""
echo "--- 2. Read ---"
R=$(curl -s "$BASE/workspaces/$WS_A/user-tasks" -H "Authorization: Bearer $A_TOK")
check "GET ws-A user-tasks contains the task id" "$TASK_ID" "$R"
check "GET ws-A user-tasks shows title" 'Review the Q3 draft' "$R"
R=$(acurl "$BASE/user-tasks/pending")
check "GET /user-tasks/pending (admin) contains the task" "$TASK_ID" "$R"
check "pending entry carries workspace_name" "UserTasks-A-$$" "$R"
# ─── 3. Update (REST) PATCH title/detail → 200, change applied ─────────
echo ""
echo "--- 3. Update (REST PATCH) ---"
R=$(curl -s -w "\n%{http_code}" -X PATCH "$BASE/workspaces/$WS_A/user-tasks/$TASK_ID" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" \
-d '{"title":"Review the Q3 draft (URGENT)","detail":"Sign-off needed by EOD"}')
CODE=$(printf '%s' "$R" | tail -n1)
check_code "PATCH update user-task" "200" "$CODE"
R=$(curl -s "$BASE/workspaces/$WS_A/user-tasks" -H "Authorization: Bearer $A_TOK")
check "PATCH applied new title" '(URGENT)' "$R"
check "PATCH applied new detail" 'Sign-off needed by EOD' "$R"
# ─── 4. Resolve (REST) done → 200, gone from pending ───────────────────
echo ""
echo "--- 4. Resolve (REST done) ---"
R=$(curl -s -w "\n%{http_code}" -X POST "$BASE/workspaces/$WS_A/user-tasks/$TASK_ID/resolve" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" \
-d '{"status":"done","resolved_by":"cto"}')
CODE=$(printf '%s' "$R" | tail -n1)
BODY=$(printf '%s' "$R" | sed '$d')
check_code "POST resolve done" "200" "$CODE"
check "resolve echoes status done" '"status":"done"' "$BODY"
R=$(acurl "$BASE/user-tasks/pending")
check_not "resolved task no longer pending (admin feed)" "$TASK_ID" "$R"
# ─── 5. Create via MCP tool request_user_action → new pending task ─────
# This is the agent→user ability proven end-to-end: the literal tools/call
# the canvas agent makes, surfacing on the admin concierge feed.
echo ""
echo "--- 5. Create via MCP (request_user_action) ---"
BODY=$(mcp_call "$WS_A" "$A_TOK" "request_user_action" '{"title":"Provide the staging API key","detail":"Blocked on it for the deploy"}')
check_code "MCP request_user_action HTTP" "200" "$(mcp_http_code)"
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check "MCP request_user_action success text" 'Asked the user' "$TEXT"
# A NEW pending task must appear on the admin feed.
R=$(acurl "$BASE/user-tasks/pending")
check "MCP-created ask appears in pending feed" 'Provide the staging API key' "$R"
MCP_TASK_ID=$(printf '%s' "$R" | python3 -c "
import sys, json
d = json.load(sys.stdin)
for t in d:
if t.get('title') == 'Provide the staging API key':
print(t['id']); break
")
echo " MCP_TASK_ID=$MCP_TASK_ID"
[ -n "$MCP_TASK_ID" ] || echo " (note: could not resolve MCP_TASK_ID — later MCP steps assert by title)"
# ─── 6. list_user_tasks (MCP) returns ws-A's task(s) ───────────────────
echo ""
echo "--- 6. list_user_tasks (MCP) ---"
BODY=$(mcp_call "$WS_A" "$A_TOK" "list_user_tasks" '{}')
check_code "MCP list_user_tasks HTTP" "200" "$(mcp_http_code)"
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check "list_user_tasks contains the MCP task" 'Provide the staging API key' "$TEXT"
check "list_user_tasks shows it pending" '"status":"pending"' "$TEXT"
# ─── 7. update_user_task (MCP) changes it → verify ─────────────────────
echo ""
echo "--- 7. update_user_task (MCP) ---"
BODY=$(mcp_call "$WS_A" "$A_TOK" "update_user_task" \
"{\"user_task_id\":\"$MCP_TASK_ID\",\"title\":\"Provide the PROD API key\"}")
check_code "MCP update_user_task HTTP" "200" "$(mcp_http_code)"
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check "MCP update_user_task success text" 'User task updated' "$TEXT"
BODY=$(mcp_call "$WS_A" "$A_TOK" "list_user_tasks" '{}')
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check "update applied (new title visible)" 'Provide the PROD API key' "$TEXT"
check_not "update applied (old title gone)" 'staging API key' "$TEXT"
# ─── 8. delete_user_task (MCP) → gone from list ────────────────────────
echo ""
echo "--- 8. delete_user_task (MCP) ---"
BODY=$(mcp_call "$WS_A" "$A_TOK" "delete_user_task" "{\"user_task_id\":\"$MCP_TASK_ID\"}")
check_code "MCP delete_user_task HTTP" "200" "$(mcp_http_code)"
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check "MCP delete_user_task success text" 'User task deleted' "$TEXT"
BODY=$(mcp_call "$WS_A" "$A_TOK" "list_user_tasks" '{}')
TEXT=$(printf '%s' "$BODY" | mcp_result_text)
check_not "deleted task gone from list" 'Provide the PROD API key' "$TEXT"
# ─── 9. Scoping / authz ────────────────────────────────────────────────
echo ""
echo "--- 9. Scoping / authz ---"
# A fresh ws-A task to attempt cross-workspace mutation against.
SCOPE_ID=$(curl -s -X POST "$BASE/workspaces/$WS_A/user-tasks" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" \
-d '{"title":"Scope probe task"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['user_task_id'])")
echo " SCOPE_ID=$SCOPE_ID (owned by ws-A)"
# ws-B PATCHes ws-A's task → 404 (workspace_id scope).
CODE=$(curl -s -o /dev/null -w "%{http_code}" -X PATCH "$BASE/workspaces/$WS_B/user-tasks/$SCOPE_ID" \
-H "Authorization: Bearer $B_TOK" -H "Content-Type: application/json" -d '{"title":"hijack"}')
check_code "ws-B PATCH of ws-A's task is scoped out" "404" "$CODE"
# ws-B DELETEs ws-A's task → 404.
CODE=$(curl -s -o /dev/null -w "%{http_code}" -X DELETE "$BASE/workspaces/$WS_B/user-tasks/$SCOPE_ID" \
-H "Authorization: Bearer $B_TOK")
check_code "ws-B DELETE of ws-A's task is scoped out" "404" "$CODE"
# Task survived the cross-workspace attempts (still on ws-A, unchanged).
R=$(curl -s "$BASE/workspaces/$WS_A/user-tasks" -H "Authorization: Bearer $A_TOK")
check "ws-A's task survived cross-ws attempts" "$SCOPE_ID" "$R"
check_not "ws-A's task title was NOT hijacked" 'hijack' "$R"
# /user-tasks/pending is AdminAuth — a workspace bearer must be rejected.
CODE=$(curl -s -o /dev/null -w "%{http_code}" "$BASE/user-tasks/pending" -H "Authorization: Bearer $A_TOK")
if [ "$CODE" = "401" ] || [ "$CODE" = "403" ]; then
echo "PASS: /user-tasks/pending rejects a workspace token (HTTP $CODE)"
PASS=$((PASS + 1))
else
echo "FAIL: /user-tasks/pending should reject a workspace token, got HTTP $CODE"
FAIL=$((FAIL + 1))
fi
# …and reject no auth at all.
CODE=$(curl -s -o /dev/null -w "%{http_code}" "$BASE/user-tasks/pending")
if [ "$CODE" = "401" ] || [ "$CODE" = "403" ]; then
echo "PASS: /user-tasks/pending rejects an unauthenticated caller (HTTP $CODE)"
PASS=$((PASS + 1))
else
echo "FAIL: /user-tasks/pending should reject no auth, got HTTP $CODE"
FAIL=$((FAIL + 1))
fi
# ─── 10. Validation ────────────────────────────────────────────────────
echo ""
echo "--- 10. Validation ---"
# Missing title → 400.
CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST "$BASE/workspaces/$WS_A/user-tasks" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" -d '{"detail":"no title here"}')
check_code "create without title → 400" "400" "$CODE"
# Resolve with an invalid status → 400.
CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST "$BASE/workspaces/$WS_A/user-tasks/$SCOPE_ID/resolve" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" -d '{"status":"banana"}')
check_code "resolve with invalid status → 400" "400" "$CODE"
# PATCH with an invalid status → 400.
CODE=$(curl -s -o /dev/null -w "%{http_code}" -X PATCH "$BASE/workspaces/$WS_A/user-tasks/$SCOPE_ID" \
-H "Authorization: Bearer $A_TOK" -H "Content-Type: application/json" -d '{"status":"banana"}')
check_code "PATCH with invalid status → 400" "400" "$CODE"
echo ""
echo "=== Results: $PASS passed, $FAIL failed ==="
exit $FAIL
-14
View File
@@ -433,17 +433,6 @@ def signal_4_branch_divergence(
# ── Signal 6: CI required-checks awareness ───────────────────────────────────
# Governance checks that are ALWAYS required for every PR, regardless of
# branch-protection configuration. These are the uniform-gate checks that
# must pass before any PR can merge (SOP tier removal makes them mandatory
# for all PRs, not just tier:medium/tier:high).
GOVERNANCE_REQUIRED_CONTEXTS = [
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
]
def signal_6_ci(pr_number: int, repo: str, branch: str | None = None, pr_data: dict | None = None) -> dict:
"""
Query combined CI status for PR head commit.
@@ -481,9 +470,6 @@ def signal_6_ci(pr_number: int, repo: str, branch: str | None = None, pr_data: d
required_checks.append(check["context"])
except GiteaError:
pass # No protection or no read access
# Uniform gate: governance checks are ALWAYS required, even if branch
# protection does not enumerate them. Deduplicate against BP list.
required_checks = list(dict.fromkeys(required_checks + GOVERNANCE_REQUIRED_CONTEXTS))
failing_required = []
passing_required = []
-130
View File
@@ -354,133 +354,3 @@ def test_signal_4_branch_api_error_returns_na(monkeypatch):
assert result["verdict"] == "N/A"
assert "error" in result
# ── Signal 6: CI required checks ────────────────────────────────────────────
def _signal_6_api_get(required_checks, statuses):
"""Return a fake_api_get closure for signal_6 tests."""
def fake_api_get(path):
if path == "/repos/molecule-ai/molecule-core/pulls/200":
return {"base": {"sha": "base000", "ref": "main"}, "head": {"sha": "pr222"}}
if path == "/repos/molecule-ai/molecule-core/commits/pr222/status":
return {"state": "failure", "statuses": statuses}
if path == "/repos/molecule-ai/molecule-core/branches/main/protection":
return {"required_status_checks": {"checks": [{"context": c} for c in required_checks]}}
raise AssertionError(f"unexpected api_get: {path}")
return fake_api_get
def test_signal_6_missing_required_context_returns_ci_pending(monkeypatch):
"""A required check that is ABSENT from the status list is treated as missing,
which is fail-closed CI_PENDING (never ready-by-absence)."""
mod = load_gate_check()
monkeypatch.setattr(
mod, "api_get",
_signal_6_api_get(
required_checks=["qa-review / approved (pull_request)", "security-review / approved (pull_request)"],
statuses=[
{"context": "qa-review / approved (pull_request)", "status": "success"},
# security-review is completely missing
],
),
)
result = mod.signal_6_ci(200, "molecule-ai/molecule-core")
assert result["verdict"] == "CI_PENDING"
assert "security-review / approved (pull_request)" in result["pending_required"]
def test_signal_6_pending_required_context_returns_ci_pending(monkeypatch):
"""A required check with status 'pending' blocks the gate with CI_PENDING."""
mod = load_gate_check()
monkeypatch.setattr(
mod, "api_get",
_signal_6_api_get(
required_checks=[
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
],
statuses=[
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "pending"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
],
),
)
result = mod.signal_6_ci(200, "molecule-ai/molecule-core")
assert result["verdict"] == "CI_PENDING"
assert "security-review / approved (pull_request)" in result["pending_required"]
def test_signal_6_failing_required_context_returns_ci_fail(monkeypatch):
"""A required check with status 'failure' blocks the gate with CI_FAIL."""
mod = load_gate_check()
monkeypatch.setattr(
mod, "api_get",
_signal_6_api_get(
required_checks=[
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
"CI / all-required (pull_request)",
],
statuses=[
{"context": "qa-review / approved (pull_request)", "status": "failure"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
{"context": "CI / all-required (pull_request)", "status": "success"},
],
),
)
result = mod.signal_6_ci(200, "molecule-ai/molecule-core")
assert result["verdict"] == "CI_FAIL"
assert "qa-review / approved (pull_request)" in result["failing_required"]
def test_signal_6_all_required_green_returns_clear(monkeypatch):
"""When every required check is success/neutral, the gate is CLEAR."""
mod = load_gate_check()
monkeypatch.setattr(
mod, "api_get",
_signal_6_api_get(
required_checks=[
"qa-review / approved (pull_request)",
"security-review / approved (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
"CI / all-required (pull_request)",
],
statuses=[
{"context": "qa-review / approved (pull_request)", "status": "success"},
{"context": "security-review / approved (pull_request)", "status": "success"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
{"context": "CI / all-required (pull_request)", "status": "success"},
],
),
)
result = mod.signal_6_ci(200, "molecule-ai/molecule-core")
assert result["verdict"] == "CLEAR"
assert result["pending_required"] == []
assert result["failing_required"] == []
def test_signal_6_governance_checks_always_required_even_when_bp_empty(monkeypatch):
"""Uniform gate: qa/security/sop are REQUIRED even if branch protection
does not enumerate them. A PR with only CI/all-required green but missing
governance contexts must be CI_PENDING (fail-closed)."""
mod = load_gate_check()
monkeypatch.setattr(
mod, "api_get",
_signal_6_api_get(
required_checks=[], # BP lists nothing
statuses=[
{"context": "CI / all-required (pull_request)", "status": "success"},
],
),
)
result = mod.signal_6_ci(200, "molecule-ai/molecule-core")
assert result["verdict"] == "CI_PENDING"
assert "qa-review / approved (pull_request)" in result["pending_required"]
assert "security-review / approved (pull_request)" in result["pending_required"]
assert "sop-checklist / all-items-acked (pull_request)" in result["pending_required"]
-31
View File
@@ -119,18 +119,6 @@ func main() {
}
}
// Self-hosted platform-agent seed. With no control plane present to install
// the org's concierge (SaaS leaves it to the CP at org-provision time), the
// tenant server seeds it itself when MOLECULE_SEED_PLATFORM_AGENT is set —
// the self-hosted docker-compose sets it, while CI harnesses + SaaS tenants
// leave it unset (so e2e empty-DB assertions and the CP path are unaffected).
// Idempotent + best-effort — never fatal.
if v := os.Getenv("MOLECULE_SEED_PLATFORM_AGENT"); v == "true" || v == "1" {
if err := handlers.EnsureSelfHostedPlatformAgent(context.Background(), db.DB); err != nil {
log.Printf("boot: platform-agent self-seed failed (non-fatal): %v", err)
}
}
// Redis
redisURL := envOr("REDIS_URL", "redis://localhost:6379")
if err := db.InitRedis(redisURL); err != nil {
@@ -249,25 +237,6 @@ func main() {
wh.SetCPProvisioner(cpProv)
}
// Self-hosted platform-agent boot-provision (Change 1). The line-128 seed
// only creates the concierge DB ROW; on a fresh self-host that leaves it
// with no container (status='failed'/'online' but nothing running). Now that
// the local Docker provisioner (prov) and WorkspaceHandler (RestartByID)
// exist, kick off a best-effort provision so a self-hosted concierge comes
// online automatically once LLM creds exist.
//
// Guarded to self-host ONLY: same MOLECULE_SEED_PLATFORM_AGENT flag as the
// seed AND prov != nil (local Docker active ⇒ MOLECULE_ORG_ID unset). The
// SaaS path (cpProv != nil ⇒ prov == nil) never triggers — the CP owns
// concierge provisioning there. Best-effort + non-fatal + runs once: on a
// fresh self-host with no creds the provision fails and the agent stays
// 'failed' until BYOK is configured via Settings; RestartByID is itself
// debounced so this can't loop. Runs in a goroutine inside the helper so a
// slow image pull never delays the HTTP server.
if v := os.Getenv("MOLECULE_SEED_PLATFORM_AGENT"); (v == "true" || v == "1") && prov != nil {
handlers.MaybeProvisionPlatformAgentOnBoot(context.Background(), db.DB, prov, wh.RestartByID)
}
// Memory v2 plugin (RFC #2728): build the dependency bundle once
// here so all three handlers (MCPHandler, AdminMemoriesHandler,
// WorkspaceHandler) get the same plugin/resolver pair. memBundle
+12 -498
View File
@@ -12,63 +12,12 @@
"host": "api.moleculesai.app",
"basePath": "/",
"paths": {
"/org/identity": {
"get": {
"produces": [
"application/json"
],
"tags": [
"org"
],
"summary": "Get the org's display name",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/handlers.OrgIdentityResponse"
}
}
}
}
},
"/user-tasks/pending": {
"get": {
"security": [
{
"BearerAuth": []
}
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "List pending user tasks across all workspaces",
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/handlers.PendingUserTask"
}
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
}
},
"/workspaces/{id}/schedules": {
"get": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"produces": [
@@ -108,7 +57,8 @@
"post": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"consumes": [
@@ -165,7 +115,8 @@
"delete": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"produces": [
@@ -215,7 +166,8 @@
"patch": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"consumes": [
@@ -285,7 +237,8 @@
"get": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"produces": [
@@ -334,7 +287,8 @@
"post": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
"BearerAuth": [],
"OrgSlugAuth": []
}
],
"produces": [
@@ -381,293 +335,6 @@
}
}
}
},
"/workspaces/{id}/user-tasks": {
"get": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
}
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "List a workspace's own user tasks",
"parameters": [
{
"type": "string",
"description": "Workspace ID",
"name": "id",
"in": "path",
"required": true
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/handlers.UserTask"
}
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
},
"post": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
}
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "Raise a user task",
"parameters": [
{
"type": "string",
"description": "Workspace ID",
"name": "id",
"in": "path",
"required": true
},
{
"description": "Task fields",
"name": "body",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/handlers.CreateUserTaskRequest"
}
}
],
"responses": {
"201": {
"description": "Created",
"schema": {
"$ref": "#/definitions/handlers.CreateUserTaskResponse"
}
},
"400": {
"description": "Bad Request",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
}
},
"/workspaces/{id}/user-tasks/{taskId}": {
"delete": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
}
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "Delete a workspace's own user task",
"parameters": [
{
"type": "string",
"description": "Workspace ID",
"name": "id",
"in": "path",
"required": true
},
{
"type": "string",
"description": "User task ID",
"name": "taskId",
"in": "path",
"required": true
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/handlers.UserTaskMutationResponse"
}
},
"404": {
"description": "Not Found",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
},
"patch": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
}
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "Update a workspace's own user task",
"parameters": [
{
"type": "string",
"description": "Workspace ID",
"name": "id",
"in": "path",
"required": true
},
{
"type": "string",
"description": "User task ID",
"name": "taskId",
"in": "path",
"required": true
},
{
"description": "Partial task fields (only provided keys are updated)",
"name": "body",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/handlers.UpdateUserTaskRequest"
}
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/handlers.UserTaskMutationResponse"
}
},
"400": {
"description": "Bad Request",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"404": {
"description": "Not Found",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
}
},
"/workspaces/{id}/user-tasks/{taskId}/resolve": {
"post": {
"security": [
{
"BearerAuth \u0026\u0026 OrgSlugAuth": []
}
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"tags": [
"user-tasks"
],
"summary": "Resolve a user task",
"parameters": [
{
"type": "string",
"description": "Workspace ID",
"name": "id",
"in": "path",
"required": true
},
{
"type": "string",
"description": "User task ID",
"name": "taskId",
"in": "path",
"required": true
},
{
"description": "Resolution",
"name": "body",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/handlers.ResolveUserTaskRequest"
}
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/handlers.ResolveUserTaskResponse"
}
},
"400": {
"description": "Bad Request",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"404": {
"description": "Not Found",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/handlers.ErrorResponse"
}
}
}
}
}
},
"definitions": {
@@ -709,31 +376,6 @@
}
}
},
"handlers.CreateUserTaskRequest": {
"type": "object",
"required": [
"title"
],
"properties": {
"detail": {
"type": "string"
},
"title": {
"type": "string"
}
}
},
"handlers.CreateUserTaskResponse": {
"type": "object",
"properties": {
"status": {
"type": "string"
},
"user_task_id": {
"type": "string"
}
}
},
"handlers.ErrorResponse": {
"type": "object",
"properties": {
@@ -762,73 +404,6 @@
}
}
},
"handlers.OrgIdentityResponse": {
"type": "object",
"properties": {
"name": {
"description": "Name is the org's display name (MOLECULE_ORG_NAME, \"\" when unset).",
"type": "string"
}
}
},
"handlers.PendingUserTask": {
"type": "object",
"properties": {
"created_at": {
"type": "string"
},
"detail": {
"type": "string"
},
"id": {
"type": "string"
},
"status": {
"type": "string",
"enum": [
"pending"
]
},
"title": {
"type": "string"
},
"workspace_id": {
"type": "string"
},
"workspace_name": {
"type": "string"
}
}
},
"handlers.ResolveUserTaskRequest": {
"type": "object",
"required": [
"status"
],
"properties": {
"resolved_by": {
"type": "string"
},
"status": {
"type": "string",
"enum": [
"done",
"dismissed"
]
}
}
},
"handlers.ResolveUserTaskResponse": {
"type": "object",
"properties": {
"status": {
"type": "string"
},
"user_task_id": {
"type": "string"
}
}
},
"handlers.RunNowResponse": {
"type": "object",
"properties": {
@@ -921,67 +496,6 @@
"type": "string"
}
}
},
"handlers.UpdateUserTaskRequest": {
"type": "object",
"properties": {
"detail": {
"type": "string"
},
"status": {
"type": "string",
"enum": [
"pending",
"done",
"dismissed"
]
},
"title": {
"type": "string"
}
}
},
"handlers.UserTask": {
"type": "object",
"properties": {
"created_at": {
"type": "string"
},
"detail": {
"type": "string"
},
"id": {
"type": "string"
},
"resolved_at": {
"type": "string"
},
"resolved_by": {
"type": "string"
},
"status": {
"type": "string",
"enum": [
"pending",
"done",
"dismissed"
]
},
"title": {
"type": "string"
}
}
},
"handlers.UserTaskMutationResponse": {
"type": "object",
"properties": {
"status": {
"type": "string"
},
"user_task_id": {
"type": "string"
}
}
}
},
"securityDefinitions": {
+12 -322
View File
@@ -25,22 +25,6 @@ definitions:
status:
type: string
type: object
handlers.CreateUserTaskRequest:
properties:
detail:
type: string
title:
type: string
required:
- title
type: object
handlers.CreateUserTaskResponse:
properties:
status:
type: string
user_task_id:
type: string
type: object
handlers.ErrorResponse:
properties:
error:
@@ -59,50 +43,6 @@ definitions:
timestamp:
type: string
type: object
handlers.OrgIdentityResponse:
properties:
name:
description: Name is the org's display name (MOLECULE_ORG_NAME, "" when unset).
type: string
type: object
handlers.PendingUserTask:
properties:
created_at:
type: string
detail:
type: string
id:
type: string
status:
enum:
- pending
type: string
title:
type: string
workspace_id:
type: string
workspace_name:
type: string
type: object
handlers.ResolveUserTaskRequest:
properties:
resolved_by:
type: string
status:
enum:
- done
- dismissed
type: string
required:
- status
type: object
handlers.ResolveUserTaskResponse:
properties:
status:
type: string
user_task_id:
type: string
type: object
handlers.RunNowResponse:
properties:
prompt:
@@ -165,47 +105,6 @@ definitions:
timezone:
type: string
type: object
handlers.UpdateUserTaskRequest:
properties:
detail:
type: string
status:
enum:
- pending
- done
- dismissed
type: string
title:
type: string
type: object
handlers.UserTask:
properties:
created_at:
type: string
detail:
type: string
id:
type: string
resolved_at:
type: string
resolved_by:
type: string
status:
enum:
- pending
- done
- dismissed
type: string
title:
type: string
type: object
handlers.UserTaskMutationResponse:
properties:
status:
type: string
user_task_id:
type: string
type: object
host: api.moleculesai.app
info:
contact: {}
@@ -216,38 +115,6 @@ info:
title: Molecule AI Workspace Server API
version: "1.0"
paths:
/org/identity:
get:
produces:
- application/json
responses:
"200":
description: OK
schema:
$ref: '#/definitions/handlers.OrgIdentityResponse'
summary: Get the org's display name
tags:
- org
/user-tasks/pending:
get:
produces:
- application/json
responses:
"200":
description: OK
schema:
items:
$ref: '#/definitions/handlers.PendingUserTask'
type: array
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth: []
summary: List pending user tasks across all workspaces
tags:
- user-tasks
/workspaces/{id}/schedules:
get:
parameters:
@@ -270,7 +137,8 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: List schedules for a workspace
tags:
- schedules
@@ -305,7 +173,8 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: Create a schedule
tags:
- schedules
@@ -338,7 +207,8 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: Delete a schedule
tags:
- schedules
@@ -382,7 +252,8 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: Update a schedule
tags:
- schedules
@@ -413,7 +284,8 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: Get past runs of a schedule
tags:
- schedules
@@ -446,193 +318,11 @@ paths:
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
- BearerAuth: []
OrgSlugAuth: []
summary: Fire a schedule manually
tags:
- schedules
/workspaces/{id}/user-tasks:
get:
parameters:
- description: Workspace ID
in: path
name: id
required: true
type: string
produces:
- application/json
responses:
"200":
description: OK
schema:
items:
$ref: '#/definitions/handlers.UserTask'
type: array
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
summary: List a workspace's own user tasks
tags:
- user-tasks
post:
consumes:
- application/json
parameters:
- description: Workspace ID
in: path
name: id
required: true
type: string
- description: Task fields
in: body
name: body
required: true
schema:
$ref: '#/definitions/handlers.CreateUserTaskRequest'
produces:
- application/json
responses:
"201":
description: Created
schema:
$ref: '#/definitions/handlers.CreateUserTaskResponse'
"400":
description: Bad Request
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
summary: Raise a user task
tags:
- user-tasks
/workspaces/{id}/user-tasks/{taskId}:
delete:
parameters:
- description: Workspace ID
in: path
name: id
required: true
type: string
- description: User task ID
in: path
name: taskId
required: true
type: string
produces:
- application/json
responses:
"200":
description: OK
schema:
$ref: '#/definitions/handlers.UserTaskMutationResponse'
"404":
description: Not Found
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
summary: Delete a workspace's own user task
tags:
- user-tasks
patch:
consumes:
- application/json
parameters:
- description: Workspace ID
in: path
name: id
required: true
type: string
- description: User task ID
in: path
name: taskId
required: true
type: string
- description: Partial task fields (only provided keys are updated)
in: body
name: body
required: true
schema:
$ref: '#/definitions/handlers.UpdateUserTaskRequest'
produces:
- application/json
responses:
"200":
description: OK
schema:
$ref: '#/definitions/handlers.UserTaskMutationResponse'
"400":
description: Bad Request
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"404":
description: Not Found
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
summary: Update a workspace's own user task
tags:
- user-tasks
/workspaces/{id}/user-tasks/{taskId}/resolve:
post:
consumes:
- application/json
parameters:
- description: Workspace ID
in: path
name: id
required: true
type: string
- description: User task ID
in: path
name: taskId
required: true
type: string
- description: Resolution
in: body
name: body
required: true
schema:
$ref: '#/definitions/handlers.ResolveUserTaskRequest'
produces:
- application/json
responses:
"200":
description: OK
schema:
$ref: '#/definitions/handlers.ResolveUserTaskResponse'
"400":
description: Bad Request
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"404":
description: Not Found
schema:
$ref: '#/definitions/handlers.ErrorResponse'
"500":
description: Internal Server Error
schema:
$ref: '#/definitions/handlers.ErrorResponse'
security:
- BearerAuth && OrgSlugAuth: []
summary: Resolve a user task
tags:
- user-tasks
schemes:
- https
securityDefinitions:
@@ -80,10 +80,6 @@ const (
EventApprovalRequested EventType = "APPROVAL_REQUESTED"
EventApprovalEscalated EventType = "APPROVAL_ESCALATED"
// User tasks (agent → user asks).
EventUserTaskRequested EventType = "USER_TASK_REQUESTED"
EventUserTaskResolved EventType = "USER_TASK_RESOLVED"
// Auth / credentials.
EventExternalCredentialsRotated EventType = "EXTERNAL_CREDENTIALS_ROTATED"
)
@@ -116,8 +112,6 @@ var AllEventTypes = []EventType{
EventDelegationStatus,
EventExternalCredentialsRotated,
EventTaskUpdated,
EventUserTaskRequested,
EventUserTaskResolved,
EventWorkspaceAwaitingAgent,
EventWorkspaceDegraded,
EventWorkspaceHeartbeat,
@@ -41,8 +41,6 @@ func TestAllEventTypes_IsSnapshot(t *testing.T) {
"DELEGATION_STATUS",
"EXTERNAL_CREDENTIALS_ROTATED",
"TASK_UPDATED",
"USER_TASK_REQUESTED",
"USER_TASK_RESOLVED",
"WORKSPACE_AWAITING_AGENT",
"WORKSPACE_DEGRADED",
"WORKSPACE_HEARTBEAT",
@@ -225,16 +225,6 @@ func (e *proxyA2AError) Error() string {
return "proxy a2a error"
}
// EnqueueA2A is a method wrapper around the package-level EnqueueA2A function so
// that *WorkspaceHandler satisfies the scheduler's A2AProxy interface. The
// scheduler cannot call the package function directly (it would have to import
// internal/handlers, but handlers already imports internal/scheduler → import
// cycle), so it goes through this method on the proxy it already holds. Used by
// the cron scheduler to durably buffer a tick when the target workspace is busy.
func (h *WorkspaceHandler) EnqueueA2A(ctx context.Context, workspaceID, callerID string, priority int, body []byte, method, idempotencyKey string, expiresAt *time.Time) (string, int, error) {
return EnqueueA2A(ctx, workspaceID, callerID, priority, body, method, idempotencyKey, expiresAt)
}
// ProxyA2ARequest is the public wrapper for proxyA2ARequest, used by the
// cron scheduler and other internal callers that need to send A2A messages
// to workspaces programmatically (not from an HTTP handler).
@@ -97,10 +97,10 @@ type QueuedItem struct {
// returns the new row ID + current queue depth. Caller MUST have already
// determined the target is busy — this function does not check.
//
// Idempotency: when idempotencyKey is non-empty, a duplicate active enqueue
// for the same (workspace, key) is collapsed rather than double-buffered. On
// a duplicate this returns the existing row's ID so the caller's log still
// points at the live queue entry.
// Idempotency: when idempotencyKey is non-empty, the partial unique index
// `idx_a2a_queue_idempotency` prevents duplicate active rows for the same
// (workspace_id, idempotency_key). On conflict this returns the existing
// row's ID so the caller's log still points at the live queue entry.
func EnqueueA2A(
ctx context.Context,
workspaceID, callerID string,
@@ -129,32 +129,6 @@ func EnqueueA2A(
expiresAtArg = *expiresAt
}
// Supersede any already-expired pending row for this same key before we
// insert. The drain path skips expired pending rows, so such a row never
// completes on its own — it lingers in the active set and would block the
// conflict check below, silently swallowing this fresh enqueue. Retiring
// it here (a) frees the active set so the insert below proceeds and (b)
// cleans the stale row up so expired rows don't accumulate. Scoped to the
// idempotency key so unrelated traffic is untouched.
if idempotencyKey != "" {
if _, supErr := db.DB.ExecContext(ctx, `
UPDATE a2a_queue
SET status = 'dropped',
last_error = 'superseded: expired before drain; replaced by a fresh enqueue'
WHERE workspace_id = $1
AND idempotency_key = $2
AND status = 'queued'
AND expires_at IS NOT NULL
AND expires_at <= now()
`, workspaceID, idempotencyKey); supErr != nil {
// Non-fatal: if the cleanup fails we still attempt the insert. Worst
// case the conflict path returns the (stale) existing row's id, which
// is the pre-fix behaviour — no new breakage introduced here.
log.Printf("A2AQueue: supersede-expired cleanup failed for workspace %s key %s: %v",
workspaceID, idempotencyKey, supErr)
}
}
// INSERT ... ON CONFLICT DO NOTHING RETURNING id. The conflict target
// must reference the partial unique INDEX columns + WHERE clause directly
// (Postgres can't reference partial unique indexes by name in
@@ -1,160 +0,0 @@
package handlers
// a2a_queue_enqueue_expired_test.go — regression for CR3 RC 9853.
//
// Bug: a pending buffered tick that expires before the drain reaches it is
// skipped by the drain (it filters out expired pending rows) yet still occupies
// the active set the idempotency check guards. A later tick for the SAME key
// would then collapse onto that dead row and be silently swallowed — the exact
// drop the busy-buffer path was built to prevent.
//
// Fix: EnqueueA2A retires any already-expired pending row for the key BEFORE the
// insert, so the fresh tick buffers (and the stale row is cleaned up) instead of
// being dropped.
//
// These tests use the QueryMatcherEqual mock (setupTestDBForQueueTests) so the
// SQL strings below must match the handler's queries verbatim.
import (
"context"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
)
const (
enqWorkspaceID = "ws-enq-expired"
enqKey = "sched-aaaa-bbbb" // schedule_id used as idempotency key
enqBody = `{"method":"message/send"}`
enqMethod = "message/send"
)
// expectSupersedeExpired registers the cleanup UPDATE EnqueueA2A issues before
// the insert when an idempotency key is present. rowsRetired is how many expired
// pending rows the UPDATE claims to have dropped.
func expectSupersedeExpired(mock sqlmock.Sqlmock, workspaceID, key string, rowsRetired int64) {
mock.ExpectExec(`
UPDATE a2a_queue
SET status = 'dropped',
last_error = 'superseded: expired before drain; replaced by a fresh enqueue'
WHERE workspace_id = $1
AND idempotency_key = $2
AND status = 'queued'
AND expires_at IS NOT NULL
AND expires_at <= now()
`).
WithArgs(workspaceID, key).
WillReturnResult(sqlmock.NewResult(0, rowsRetired))
}
// expectInsert registers the INSERT ... ON CONFLICT DO NOTHING RETURNING id.
// newID is the id the insert returns (non-conflict / fresh enqueue path).
func expectInsert(mock sqlmock.Sqlmock, newID string) {
mock.ExpectQuery(`
INSERT INTO a2a_queue (workspace_id, caller_id, priority, body, method, idempotency_key, expires_at)
VALUES ($1, $2, $3, $4::jsonb, $5, $6, $7)
ON CONFLICT (workspace_id, idempotency_key)
WHERE idempotency_key IS NOT NULL AND status IN ('queued','dispatched')
DO NOTHING
RETURNING id
`).WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow(newID))
}
// expectDepth registers the trailing queue-depth count query.
func expectDepth(mock sqlmock.Sqlmock, workspaceID string, depth int) {
mock.ExpectQuery(`
SELECT COUNT(*) FROM a2a_queue
WHERE workspace_id = $1 AND status = 'queued'
`).WithArgs(workspaceID).
WillReturnRows(sqlmock.NewRows([]string{"count"}).AddRow(depth))
}
// TestEnqueueA2A_ExpiredRowDoesNotBlockFreshTick is the core CR3 regression:
// an existing expired pending row for a schedule's key must NOT cause the next
// tick's enqueue to be dropped. The expired row is retired first, then the
// fresh tick inserts and returns a NEW id.
func TestEnqueueA2A_ExpiredRowDoesNotBlockFreshTick(t *testing.T) {
mock := setupTestDBForQueueTests(t)
// One expired pending row exists for this key and gets retired.
expectSupersedeExpired(mock, enqWorkspaceID, enqKey, 1)
// With the active set cleared, the insert proceeds (no conflict) → new id.
const freshID = "fresh-tick-id"
expectInsert(mock, freshID)
expectDepth(mock, enqWorkspaceID, 1)
nextRun := time.Now().Add(30 * time.Second)
id, depth, err := EnqueueA2A(
context.Background(), enqWorkspaceID, "", PriorityTask,
[]byte(enqBody), enqMethod, enqKey, &nextRun,
)
if err != nil {
t.Fatalf("EnqueueA2A returned error: %v", err)
}
if id != freshID {
t.Errorf("expected the fresh tick to enqueue with a new id %q, got %q "+
"(an expired row must not swallow the new tick)", freshID, id)
}
if depth != 1 {
t.Errorf("expected depth 1, got %d", depth)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestEnqueueA2A_NoExpiredRow_NormalEnqueue: when no expired row exists the
// supersede UPDATE simply affects zero rows and the enqueue proceeds normally.
func TestEnqueueA2A_NoExpiredRow_NormalEnqueue(t *testing.T) {
mock := setupTestDBForQueueTests(t)
expectSupersedeExpired(mock, enqWorkspaceID, enqKey, 0) // nothing to retire
const newID = "new-id"
expectInsert(mock, newID)
expectDepth(mock, enqWorkspaceID, 2)
nextRun := time.Now().Add(30 * time.Second)
id, depth, err := EnqueueA2A(
context.Background(), enqWorkspaceID, "", PriorityTask,
[]byte(enqBody), enqMethod, enqKey, &nextRun,
)
if err != nil {
t.Fatalf("EnqueueA2A returned error: %v", err)
}
if id != newID {
t.Errorf("expected id %q, got %q", newID, id)
}
if depth != 2 {
t.Errorf("expected depth 2, got %d", depth)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestEnqueueA2A_NoKey_SkipsSupersede: with no idempotency key there is no
// active-set conflict to guard, so the supersede cleanup is skipped entirely
// and only the insert + depth queries run.
func TestEnqueueA2A_NoKey_SkipsSupersede(t *testing.T) {
mock := setupTestDBForQueueTests(t)
// No expectSupersedeExpired — it must NOT be issued when key is empty.
const newID = "no-key-id"
expectInsert(mock, newID)
expectDepth(mock, enqWorkspaceID, 1)
id, _, err := EnqueueA2A(
context.Background(), enqWorkspaceID, "", PriorityTask,
[]byte(enqBody), enqMethod, "", nil,
)
if err != nil {
t.Fatalf("EnqueueA2A returned error: %v", err)
}
if id != newID {
t.Errorf("expected id %q, got %q", newID, id)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
+10 -1
View File
@@ -154,7 +154,16 @@ func (h *ChannelHandler) Create(c *gin.Context) {
}
// #319: encrypt sensitive fields (bot_token, webhook_secret) before
// persisting. Exactly one call here; duplicate removed in this PR.
// persisting so a DB read/backup leak can't recover the credentials.
// Validation above ran against plaintext; storage is ciphertext.
if err := channels.EncryptSensitiveFields(body.Config); err != nil {
log.Printf("Channels: encrypt config failed for workspace %s: %v", workspaceID, err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "encrypt failed"})
return
}
// #319: encrypt sensitive fields (bot_token, webhook_secret) before
// persisting so a DB read/backup leak can't recover the credentials.
// Validation above ran against plaintext; storage is ciphertext.
if err := channels.EncryptSensitiveFields(body.Config); err != nil {
log.Printf("Channels: encrypt config failed for workspace %s: %v", workspaceID, err)
@@ -5,21 +5,16 @@ import (
"context"
"crypto/ed25519"
"crypto/rand"
"database/sql/driver"
"encoding/base64"
"encoding/hex"
"encoding/json"
"errors"
"io"
"net/http"
"net/http/httptest"
"os"
"strings"
"testing"
sqlmock "github.com/DATA-DOG/go-sqlmock"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/channels"
channels_crypto "git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/crypto"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
"github.com/gin-gonic/gin"
)
@@ -171,42 +166,6 @@ func TestChannelHandler_List_InvalidJSON_FallsBack(t *testing.T) {
}
}
func TestChannelHandler_List_RowsErr_LogsError(t *testing.T) {
mock := setupTestDB(t)
handler := NewChannelHandler(newTestChannelManager())
rows := sqlmock.NewRows([]string{
"id", "workspace_id", "channel_type", "channel_config", "enabled",
"allowed_users", "last_message_at", "message_count", "created_at", "updated_at",
}).AddRow(
"ch-1", "ws-1", "telegram",
[]byte(`{"bot_token":"123:ABCDEFGHIJ","chat_id":"-100"}`),
true, []byte(`["user-1"]`), nil, 5, nil, nil,
).RowError(1, errors.New("storage engine fault"))
mock.ExpectQuery("SELECT .* FROM workspace_channels WHERE workspace_id").
WithArgs("ws-1").
WillReturnRows(rows)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request, _ = http.NewRequest("GET", "/workspaces/ws-1/channels", nil)
c.Params = gin.Params{{Key: "id", Value: "ws-1"}}
handler.List(c)
// rows.Err() is non-fatal — the handler logs and still returns the row
// that was successfully scanned before the iteration error.
if w.Code != 200 {
t.Errorf("expected 200, got %d", w.Code)
}
var result []map[string]interface{}
json.Unmarshal(w.Body.Bytes(), &result)
if len(result) != 1 {
t.Fatalf("expected 1 channel despite rows.Err, got %d", len(result))
}
}
// ==================== Create ====================
func TestChannelHandler_Create_Success(t *testing.T) {
@@ -244,66 +203,6 @@ func TestChannelHandler_Create_Success(t *testing.T) {
}
}
// encryptedConfigArg matches INSERT args where bot_token has the ec1: prefix.
type encryptedConfigArg struct{}
func (a encryptedConfigArg) Match(v driver.Value) bool {
s, ok := v.(string)
if !ok {
return false
}
var cfg map[string]interface{}
if err := json.Unmarshal([]byte(s), &cfg); err != nil {
return false
}
token, ok := cfg["bot_token"].(string)
if !ok {
return false
}
// #319: bot_token must be encrypted (ciphertextPrefix "ec1:")
// before persistence, NOT stored plaintext.
return strings.HasPrefix(token, "ec1:")
}
func TestChannelHandler_Create_EncryptsSensitiveFields(t *testing.T) {
// Enable encryption for this test so EncryptSensitiveFields actually transforms.
os.Setenv("SECRETS_ENCRYPTION_KEY", base64.StdEncoding.EncodeToString(make([]byte, 32)))
channels_crypto.ResetForTesting()
channels_crypto.Init()
defer func() {
os.Unsetenv("SECRETS_ENCRYPTION_KEY")
channels_crypto.ResetForTesting()
}()
mock := setupTestDB(t)
handler := NewChannelHandler(newTestChannelManager())
mock.ExpectQuery("INSERT INTO workspace_channels").
WithArgs("ws-1", "telegram", encryptedConfigArg{}, true, sqlmock.AnyArg()).
WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow("new-ch-id"))
// Reload query
mock.ExpectQuery("SELECT .* FROM workspace_channels").
WillReturnRows(sqlmock.NewRows([]string{"id", "workspace_id", "channel_type", "channel_config", "enabled", "allowed_users"}))
body, _ := json.Marshal(map[string]interface{}{
"channel_type": "telegram",
"config": map[string]interface{}{"bot_token": "123456789:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", "chat_id": "-100"},
"allowed_users": []string{"user-1"},
})
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request, _ = http.NewRequest("POST", "/workspaces/ws-1/channels", bytes.NewReader(body))
c.Request.Header.Set("Content-Type", "application/json")
c.Params = gin.Params{{Key: "id", Value: "ws-1"}}
handler.Create(c)
if w.Code != 201 {
t.Errorf("expected 201, got %d: %s", w.Code, w.Body.String())
}
}
func TestChannelHandler_Create_MissingType(t *testing.T) {
handler := NewChannelHandler(newTestChannelManager())
+33 -53
View File
@@ -2,18 +2,15 @@ package handlers
import (
"context"
"crypto/subtle"
"database/sql"
"encoding/json"
"errors"
"log"
"net/http"
"os"
"strings"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/middleware"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/orgtoken"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/provisioner"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/registry"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/wsauth"
@@ -453,58 +450,41 @@ func validateDiscoveryCaller(ctx context.Context, c *gin.Context, workspaceID st
// NEXT_PUBLIC_ADMIN_TOKEN (see scripts/dev-start.sh), so the Details
// tab loads peers with a real credential rather than via fail-open.
// Precedence MUST match middleware.WorkspaceAuth: try the bearer token
// first (admin → org → per-workspace), and only fall back to a verified
// CP-session cookie when no bearer is presented. Keeping the two auth
// surfaces in the same order means a credential that passes one passes
// the other — divergent precedence is how an admin/org bearer ended up
// 401'ing on one surface but not the other.
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
if tok != "" {
// Admin-token fallback — lets the canvas operator (dashboard /
// concierge Settings config tabs) read a workspace's peers with the
// single admin credential, mirroring middleware.WorkspaceAuth.
// Without this the operator's admin bearer fell through to the
// per-workspace ValidateToken below and 401'd for any workspace it
// doesn't personally hold a token for — e.g. the platform agent
// surfaced in the concierge config tabs.
if adminSecret := os.Getenv("ADMIN_TOKEN"); adminSecret != "" &&
subtle.ConstantTimeCompare([]byte(tok), []byte(adminSecret)) == 1 {
return nil
}
// Org-scoped API token — grants access to every workspace in the org
// (same product spec as WorkspaceAuth). Checked before the
// per-workspace token so an org-key presenter doesn't hit the
// narrower failure path.
if _, _, _, err := orgtoken.Validate(ctx, db.DB, tok); err == nil {
return nil
} else if !errors.Is(err, orgtoken.ErrInvalidToken) {
log.Printf("wsauth: discovery orgtoken.Validate(%s): datastore lookup failed (returning 503): %v", workspaceID, err)
c.JSON(http.StatusServiceUnavailable, gin.H{
"error": "platform datastore unavailable — retry shortly",
"code": "platform_unavailable",
})
return err
}
if err := wsauth.ValidateToken(ctx, db.DB, workspaceID, tok); err != nil {
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid workspace auth token"})
return err
}
return nil
}
// No bearer: SaaS-canvas path authenticates via a CP-session cookie.
// VerifiedCPSession returns (valid, presented):
// - (false, false) = no cookie, 401 (missing auth)
// Try session cookie auth first (SaaS canvas path).
// verifiedCPSession returns (valid, presented):
// - (false, false) = no cookie, fall through to bearer
// - (true, true) = valid session, allow
// - (false, true) = cookie presented but invalid, 401
if ok, presented := middleware.VerifiedCPSession(c.GetHeader("Cookie")); presented {
if ok {
return nil
if cookieHeader := c.GetHeader("Cookie"); cookieHeader != "" {
if ok, presented := middleware.VerifiedCPSession(cookieHeader); presented {
if ok {
return nil // session verified, allow
}
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid session"})
return errors.New("invalid session")
}
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid session"})
return errors.New("invalid session")
}
c.JSON(http.StatusUnauthorized, gin.H{"error": "missing workspace auth token"})
return errors.New("missing token")
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
if tok == "" {
// Canvas hits this endpoint via session cookie, not bearer token.
// verifiedCPSession returns (valid, presented):
// - (false, false) = no cookie, 401
// - (true, true) = valid session, allow
// - (false, true) = cookie presented but invalid, 401
if ok, presented := middleware.VerifiedCPSession(c.GetHeader("Cookie")); presented {
if ok {
return nil
}
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid session"})
return errors.New("invalid session")
}
c.JSON(http.StatusUnauthorized, gin.H{"error": "missing workspace auth token"})
return errors.New("missing token")
}
if err := wsauth.ValidateToken(ctx, db.DB, workspaceID, tok); err != nil {
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid workspace auth token"})
return err
}
return nil
}
@@ -277,52 +277,6 @@ func TestPeers_RootWorkspace_NoPeers(t *testing.T) {
}
}
// validateDiscoveryCaller must accept the org ADMIN_TOKEN (the canvas
// operator's credential) even when the workspace has its OWN live token — so
// the concierge config tabs (Details → peers) load for the platform agent,
// which the operator doesn't personally hold a per-workspace token for.
// Regression guard for the 401 the discovery routes returned before the
// admin/org-token fallback was added.
func TestPeers_AdminToken_Allowed(t *testing.T) {
mock := setupTestDB(t)
setupTestRedis(t)
handler := NewDiscoveryHandler()
const adminTok = "test-admin-token"
t.Setenv("ADMIN_TOKEN", adminTok)
// A live token EXISTS for the workspace (grandfather path NOT taken), so a
// valid credential is required. The operator presents ADMIN_TOKEN, not the
// workspace's own per-workspace token.
mock.ExpectQuery("SELECT COUNT.+workspace_auth_tokens").
WithArgs("ws-platform").
WillReturnRows(sqlmock.NewRows([]string{"count"}).AddRow(1))
// After the admin-token fallback allows, Peers runs its lookups (org root).
mock.ExpectQuery("SELECT parent_id FROM workspaces WHERE id =").
WithArgs("ws-platform").
WillReturnRows(sqlmock.NewRows([]string{"parent_id"}).AddRow(nil))
peerCols := []string{"id", "name", "role", "tier", "status", "agent_card", "url", "parent_id", "active_tasks"}
mock.ExpectQuery("SELECT w.id, w.name.*WHERE w.parent_id = \\$1 AND w.id != \\$2").
WithArgs("ws-platform", "ws-platform").
WillReturnRows(sqlmock.NewRows(peerCols))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ws-platform"}}
c.Request = httptest.NewRequest("GET", "/registry/ws-platform/peers", nil)
c.Request.Header.Set("Authorization", "Bearer "+adminTok)
handler.Peers(c)
if w.Code != http.StatusOK {
t.Errorf("admin token should be accepted; expected 200, got %d: %s", w.Code, w.Body.String())
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// ==================== Peers — ?q= filter (#1038) ====================
// peersFilterFixture mocks the 4 SQL reads (parent_id lookup + siblings +
@@ -244,13 +244,13 @@ func TestWorkspaceList_WithData(t *testing.T) {
"last_error_rate", "last_sample_error",
"uptime_seconds", "current_task", "runtime", "workspace_dir", "x", "y", "collapsed",
"budget_limit", "monthly_spend",
"broadcast_enabled", "talk_to_user_enabled", "compute", "kind",
"broadcast_enabled", "talk_to_user_enabled", "compute",
}
rows := sqlmock.NewRows(columns).
AddRow("ws-1", "Agent One", "worker", 1, "online", []byte(`{"name":"agent1"}`), "http://localhost:8001",
nil, 3, 1, 0.02, "", 7200, "processing", "claude-code", "", 10.0, 20.0, false, nil, int64(0), false, true, []byte(`{}`), "workspace").
nil, 3, 1, 0.02, "", 7200, "processing", "claude-code", "", 10.0, 20.0, false, nil, int64(0), false, true, []byte(`{}`)).
AddRow("ws-2", "Agent Two", "", 2, "degraded", []byte("null"), "",
nil, 0, 1, 0.6, "timeout", 100, "", "claude-code", "", 50.0, 60.0, true, nil, int64(0), false, true, []byte(`{}`), "workspace")
nil, 0, 1, 0.6, "timeout", 100, "", "claude-code", "", 50.0, 60.0, true, nil, int64(0), false, true, []byte(`{}`))
mock.ExpectQuery("SELECT w.id, w.name").
WillReturnRows(rows)
@@ -533,13 +533,13 @@ func TestWorkspaceList(t *testing.T) {
"last_error_rate", "last_sample_error",
"uptime_seconds", "current_task", "runtime", "workspace_dir", "x", "y", "collapsed",
"budget_limit", "monthly_spend",
"broadcast_enabled", "talk_to_user_enabled", "compute", "kind",
"broadcast_enabled", "talk_to_user_enabled", "compute",
}
rows := sqlmock.NewRows(columns).
AddRow("ws-1", "Agent One", "worker", 1, "online", []byte("null"), "http://localhost:8001",
nil, 0, 1, 0.0, "", 100, "", "claude-code", "", 10.0, 20.0, false, nil, int64(0), false, true, []byte(`{}`), "workspace").
nil, 0, 1, 0.0, "", 100, "", "claude-code", "", 10.0, 20.0, false, nil, int64(0), false, true, []byte(`{}`)).
AddRow("ws-2", "Agent Two", "manager", 2, "provisioning", []byte("null"), "",
nil, 0, 1, 0.0, "", 0, "", "claude-code", "", 50.0, 60.0, false, nil, int64(0), false, true, []byte(`{}`), "workspace")
nil, 0, 1, 0.0, "", 0, "", "claude-code", "", 50.0, 60.0, false, nil, int64(0), false, true, []byte(`{}`))
mock.ExpectQuery("SELECT w.id, w.name").
WillReturnRows(rows)
@@ -1253,14 +1253,14 @@ func TestWorkspaceGet_CurrentTask(t *testing.T) {
"parent_id", "active_tasks", "max_concurrent_tasks", "last_error_rate", "last_sample_error",
"uptime_seconds", "current_task", "runtime", "workspace_dir", "x", "y", "collapsed",
"budget_limit", "monthly_spend",
"broadcast_enabled", "talk_to_user_enabled", "compute", "kind",
"broadcast_enabled", "talk_to_user_enabled", "compute",
}
mock.ExpectQuery("SELECT w.id, w.name").
WithArgs("dddddddd-0004-0000-0000-000000000000").
WillReturnRows(sqlmock.NewRows(columns).AddRow(
"dddddddd-0004-0000-0000-000000000000", "Task Worker", "worker", 1, "online", []byte("null"), "http://localhost:9000",
nil, 2, 1, 0.0, "", 300, "Analyzing document", "claude-code", "", 10.0, 20.0, false,
nil, int64(0), false, true, []byte(`{}`), "workspace",
nil, int64(0), false, true, []byte(`{}`),
))
w := httptest.NewRecorder()
@@ -14,17 +14,13 @@
//
// Why this is NOT a sqlmock test
// ------------------------------
// Two DB-level invariants back the platform agent:
// - "a platform agent must be the org root (parent_id IS NULL)" — the
// workspaces_platform_root_check CHECK in migration 20260606000000.
// - "at most one platform agent per org" — the partial unique index
// uniq_workspaces_one_platform_root in migration 20260607000000. The CHECK
// does NOT bound the count (it permits multiple parentless platform rows);
// the unique index does. This closes a privilege-escalation path (a rogue
// second org root getting the org-admin token at provision time).
// sqlmock cannot execute DDL or evaluate these, so only a real Postgres can
// prove they fire. The Register handler's isPlatformRootViolation()/409 path
// depends on both constraints.
// The invariant "a platform agent must be the org root (parent_id IS NULL),
// which structurally also means at most one platform agent per org" is enforced
// by the workspaces_platform_root_check CHECK constraint in migration
// 20260606000000_workspaces_kind. sqlmock cannot execute DDL or evaluate a CHECK
// constraint, so only a real Postgres can prove the constraint actually rejects
// a non-root platform agent and accepts a root one. The Register handler's
// isPlatformRootViolation()/409 path depends on this constraint firing.
package handlers
@@ -124,64 +120,3 @@ func TestIntegration_PlatformKind_RootAllowed_NonRootRejected(t *testing.T) {
t.Fatalf("unknown kind wanted workspaces_kind_check rejection, got: %v", err)
}
}
// TestIntegration_PlatformKind_SecondRootRejected proves the privilege-escalation
// fix at the DB level: the workspaces_platform_root_check CHECK alone permits
// MULTIPLE parentless platform rows; the partial unique index
// uniq_workspaces_one_platform_root (migration 20260607000000) forbids a SECOND
// platform root. Without it, an ordinary in-VPC workspace could register a fresh
// UUID as kind='platform' and mint itself a second org root that then gets the
// org-admin token at provision time. This is what the per-row CHECK could not
// stop — only a real Postgres with the unique index proves it.
func TestIntegration_PlatformKind_SecondRootRejected(t *testing.T) {
conn := integrationDB_PlatformKind(t)
ctx := context.Background()
prefix := fmt.Sprintf("itest-2root-%s", uuid.New().String()[:8])
cleanup := func() {
if _, err := conn.ExecContext(ctx,
`DELETE FROM workspaces WHERE name LIKE $1`, prefix+"%"); err != nil {
t.Logf("cleanup (non-fatal): %v", err)
}
}
t.Cleanup(cleanup)
cleanup()
// NOTE: the shared integration DB is single-org by construction, but a stray
// platform row from another suite would make the FIRST insert below collide
// instead of the second. Guard by asserting we start from zero platform rows
// for our prefix and using a savepoint-free, prefix-scoped check.
first := uuid.New().String()
second := uuid.New().String()
// First parentless platform root: allowed.
if _, err := conn.ExecContext(ctx, `
INSERT INTO workspaces (id, name, kind, tier, runtime, status, parent_id)
VALUES ($1, $2, 'platform', 0, 'claude-code', 'online', NULL)
`, first, prefix+"-first"); err != nil {
// If this fails on the unique index, another platform root already exists
// in the shared DB — skip rather than false-fail this isolation-sensitive case.
if strings.Contains(err.Error(), "uniq_workspaces_one_platform_root") {
t.Skipf("shared integration DB already has a platform root; cannot isolate: %v", err)
}
t.Fatalf("first platform root insert: unexpected error: %v", err)
}
// Second parentless platform root: the per-row CHECK is satisfied
// (parent_id IS NULL), so ONLY the unique index can reject it.
_, err := conn.ExecContext(ctx, `
INSERT INTO workspaces (id, name, kind, tier, runtime, status, parent_id)
VALUES ($1, $2, 'platform', 0, 'claude-code', 'online', NULL)
`, second, prefix+"-second")
if err == nil {
t.Fatalf("second platform root accepted — uniq_workspaces_one_platform_root did not fire (privilege-escalation guard missing)")
}
if !strings.Contains(err.Error(), "uniq_workspaces_one_platform_root") {
t.Fatalf("second platform root rejection wanted uniq_workspaces_one_platform_root, got: %v", err)
}
// And isPlatformRootViolation maps it to the friendly 409 surface.
if !isPlatformRootViolation(err) {
t.Fatalf("isPlatformRootViolation should classify the unique-index violation as a platform-root 409, got false for: %v", err)
}
}
@@ -131,30 +131,6 @@ type BillingModeResolution struct {
ProviderSelection *string `json:"provider_selection"`
}
// defaultClosedBillingMode is the mode the resolver falls back to when it
// cannot DERIVE a provider (no model, unknown runtime, unregistered/ambiguous
// model, registry-load failure, or the pre-provision empty-id path).
//
// Historically this was an UNCONDITIONAL platform_managed ("unset → platform
// default", CTO 2026-05-27). That is correct on SaaS: an undecided workspace
// bills the platform proxy. But on a SELF-HOSTED stack there IS no Molecule
// proxy and no credit ledger (PlatformManagedProxyConfigured() == false), so a
// platform_managed default is unreachable — the provision path would inject no
// usable credential and fail closed (MISSING_PLATFORM_PROXY). On self-host the
// honest default is byok: the tenant must bring their own provider key, and the
// resolved mode should say so rather than advertise an impossible mode.
//
// Strictly gated on the no-proxy condition: when a proxy IS configured (SaaS),
// this returns platform_managed exactly as before — SaaS behavior is unchanged.
// This only changes the FALLBACK; an explicit operator override and a
// successfully-derived provider are decided before this is ever consulted.
func defaultClosedBillingMode() string {
if PlatformManagedProxyConfigured() {
return LLMBillingModePlatformManaged
}
return LLMBillingModeBYOK
}
// isKnownBillingMode is the enum-recognizer for the resolver's default-closed
// branch. Returning false for an unknown string forces the resolver to fall
// through to the next layer (or the constant fallback) — NEVER to honor a
@@ -236,7 +212,7 @@ func ResolveLLMBillingModeDerived(ctx context.Context, workspaceID, runtime, mod
// the no-id path historically does no DB work and the strip gate only runs
// post-create, so keep it a pure default to preserve that contract.)
if workspaceID == "" {
res.ResolvedMode = defaultClosedBillingMode()
res.ResolvedMode = LLMBillingModePlatformManaged
res.Source = BillingModeSourceDerivedDefault
return res, nil
}
@@ -259,8 +235,8 @@ func ResolveLLMBillingModeDerived(ctx context.Context, workspaceID, runtime, mod
manifest, mErr := providerRegistry()
if mErr != nil || manifest == nil {
// Registry unavailable (malformed embedded YAML — a build-time defect the
// gates catch). Default closed (byok on self-host where no proxy exists).
res.ResolvedMode = defaultClosedBillingMode()
// gates catch). Default closed.
res.ResolvedMode = LLMBillingModePlatformManaged
res.Source = BillingModeSourceDerivedDefault
return res, mErr
}
@@ -270,10 +246,8 @@ func ResolveLLMBillingModeDerived(ctx context.Context, workspaceID, runtime, mod
// NOT an error to the caller: an unregistered model is a legitimate
// "we can't say it's BYOK, so bill the platform default" outcome, and the
// only-registered gate at the create/config API is where an unregistered
// model is rejected loudly. Here we just fail closed for safety. On a
// self-hosted stack (no proxy configured) the safe default is byok, since
// platform_managed is unreachable there.
res.ResolvedMode = defaultClosedBillingMode()
// model is rejected loudly. Here we just fail closed for safety.
res.ResolvedMode = LLMBillingModePlatformManaged
res.Source = BillingModeSourceDerivedDefault
sel := model
if sel != "" {
@@ -36,18 +36,7 @@ func expectOverrideQuery(m sqlmock.Sqlmock, wsID, value string) {
WillReturnRows(rows)
}
// withProxyConfigured sets the Molecule LLM proxy env (base URL + usage token)
// for the duration of a test so PlatformManagedProxyConfigured() is true — i.e.
// the SaaS context, where the default-closed billing mode is platform_managed.
// Self-host (no proxy env) is covered separately by the *_SelfHost tests.
func withProxyConfigured(t *testing.T) {
t.Helper()
t.Setenv("MOLECULE_LLM_BASE_URL", "https://proxy.example/v1")
t.Setenv("MOLECULE_LLM_USAGE_TOKEN", "tok-test")
}
func TestResolveLLMBillingModeDerived_BehaviorDelta(t *testing.T) {
withProxyConfigured(t) // SaaS context: default-closed → platform_managed.
ctx := context.Background()
const wsID = "33333333-3333-3333-3333-333333333333"
@@ -204,9 +193,6 @@ func TestResolveLLMBillingModeDerived_BehaviorDelta(t *testing.T) {
// error reading the override column defaults closed to platform_managed and
// propagates the error — never silently flips a workspace off platform creds.
func TestResolveLLMBillingModeDerived_OverrideDBError_DefaultClosed(t *testing.T) {
// A transient DB error MUST default to platform_managed regardless of proxy
// config (it propagates an error; it is not the no-proxy decision path).
withProxyConfigured(t)
ctx := context.Background()
const wsID = "44444444-4444-4444-4444-444444444444"
@@ -231,7 +217,6 @@ func TestResolveLLMBillingModeDerived_OverrideDBError_DefaultClosed(t *testing.T
// pre-provision context (no workspace id, no override read) defaults to
// platform_managed without a DB query.
func TestResolveLLMBillingModeDerived_EmptyWorkspaceID_PlatformDefault(t *testing.T) {
withProxyConfigured(t) // SaaS context.
ctx := context.Background()
mock := setupTestDB(t) // no query expected
res, err := ResolveLLMBillingModeDerived(ctx, "", "claude-code", "kimi-for-coding", nil)
@@ -245,90 +230,3 @@ func TestResolveLLMBillingModeDerived_EmptyWorkspaceID_PlatformDefault(t *testin
t.Errorf("sqlmock expectations: %v", err)
}
}
// TestResolveLLMBillingModeDerived_SelfHost_DefaultsBYOK asserts the
// environment-aware default: on a SELF-HOSTED stack (no Molecule LLM proxy env
// configured) the default-closed branches resolve to byok instead of
// platform_managed (which is unreachable there). It covers all three derive-
// failure fallbacks: unset model, unregistered model, and the empty-workspace
// pre-provision path. A successfully-DERIVED provider and an explicit override
// are NOT affected by the no-proxy default (decided before the fallback).
func TestResolveLLMBillingModeDerived_SelfHost_DefaultsBYOK(t *testing.T) {
// Ensure no proxy env leaks in from the host.
t.Setenv("MOLECULE_LLM_BASE_URL", "")
t.Setenv("MOLECULE_LLM_USAGE_TOKEN", "")
t.Setenv("OPENAI_BASE_URL", "")
t.Setenv("OPENAI_API_KEY", "")
ctx := context.Background()
const wsID = "55555555-5555-5555-5555-555555555555"
t.Run("unset_model_defaults_byok_on_selfhost", func(t *testing.T) {
mock := setupTestDB(t)
expectOverrideQuery(mock, wsID, "") // NULL override
res, err := ResolveLLMBillingModeDerived(ctx, wsID, "claude-code", "", nil)
if err != nil {
t.Fatalf("unexpected err: %v", err)
}
if res.ResolvedMode != LLMBillingModeBYOK {
t.Errorf("self-host unset model: got %q want byok", res.ResolvedMode)
}
if res.Source != BillingModeSourceDerivedDefault {
t.Errorf("source: got %q want %q", res.Source, BillingModeSourceDerivedDefault)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
})
t.Run("unregistered_model_defaults_byok_on_selfhost", func(t *testing.T) {
mock := setupTestDB(t)
expectOverrideQuery(mock, wsID, "")
res, err := ResolveLLMBillingModeDerived(ctx, wsID, "claude-code", "totally-made-up-model-xyz", nil)
if err != nil {
t.Fatalf("unexpected err: %v", err)
}
if res.ResolvedMode != LLMBillingModeBYOK {
t.Errorf("self-host unregistered model: got %q want byok", res.ResolvedMode)
}
if res.Source != BillingModeSourceDerivedDefault {
t.Errorf("source: got %q want %q", res.Source, BillingModeSourceDerivedDefault)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
})
t.Run("empty_workspace_id_defaults_byok_on_selfhost", func(t *testing.T) {
mock := setupTestDB(t) // no query expected (pre-provision path)
res, err := ResolveLLMBillingModeDerived(ctx, "", "claude-code", "kimi-for-coding", nil)
if err != nil {
t.Fatalf("unexpected err: %v", err)
}
if res.ResolvedMode != LLMBillingModeBYOK {
t.Errorf("self-host empty workspace id: got %q want byok", res.ResolvedMode)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
})
t.Run("explicit_platform_override_still_wins_on_selfhost", func(t *testing.T) {
// An operator override is honored even on self-host (escape hatch); the
// no-proxy default only governs the derive-failure fallback.
mock := setupTestDB(t)
expectOverrideQuery(mock, wsID, LLMBillingModePlatformManaged)
res, err := ResolveLLMBillingModeDerived(ctx, wsID, "claude-code", "", nil)
if err != nil {
t.Fatalf("unexpected err: %v", err)
}
if res.ResolvedMode != LLMBillingModePlatformManaged {
t.Errorf("explicit override must win: got %q want platform_managed", res.ResolvedMode)
}
if res.Source != BillingModeSourceWorkspaceOverride {
t.Errorf("source: got %q want %q", res.Source, BillingModeSourceWorkspaceOverride)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
})
}
@@ -145,7 +145,6 @@ func TestPutWorkspaceLLMBillingMode_SetByok(t *testing.T) {
func TestPutWorkspaceLLMBillingMode_ExplicitNullClearsOverride(t *testing.T) {
t.Setenv("MOLECULE_LLM_BILLING_MODE", LLMBillingModePlatformManaged)
withProxyConfigured(t) // SaaS context: cleared override → derived_default → platform_managed.
mock := setupTestDB(t)
mock.ExpectExec(`UPDATE workspaces SET llm_billing_mode = NULL WHERE id = \$1`).
WithArgs(testWSID).
@@ -173,7 +173,6 @@ func TestApplyPlatformManagedLLMEnv_ReadProvisionParity(t *testing.T) {
// This mirrors the agents-team genuinely-platform case. Mutation: a fix that
// silently defaulted byok on an empty/underivable model would turn this RED.
func TestApplyPlatformManagedLLMEnv_DefaultPreservation(t *testing.T) {
withProxyConfigured(t) // SaaS context: no-model default stays platform_managed.
ctx := context.Background()
const wsID = "11111111-2222-3333-4444-555555555555"
@@ -46,7 +46,6 @@ func expectLegacyShimQueries(m sqlmock.Sqlmock, wsID, runtime, model string) {
}
func TestResolveLLMBillingMode_LegacyShimDerives(t *testing.T) {
withProxyConfigured(t) // SaaS context: default-closed → platform_managed.
ctx := context.Background()
const wsID = "11111111-1111-1111-1111-111111111111"
@@ -164,7 +163,6 @@ func TestResolveLLMBillingMode_LegacyShimDerives(t *testing.T) {
// (no workspace id) defaults closed with no DB read (org rung retired, so the
// old "org_only" behavior is gone — it's now the platform default).
func TestResolveLLMBillingMode_EmptyWorkspaceID_PlatformDefault(t *testing.T) {
withProxyConfigured(t) // SaaS context.
ctx := context.Background()
mock := setupTestDB(t) // no DB read expected
res, err := ResolveLLMBillingMode(ctx, "", LLMBillingModeBYOK)
@@ -184,7 +182,6 @@ func TestResolveLLMBillingMode_EmptyWorkspaceID_PlatformDefault(t *testing.T) {
// values. The strip gate downstream relies on this so it can switch on
// res.ResolvedMode without a separate is-valid check on every call site.
func TestResolveLLMBillingMode_ResolvedModeIsAlwaysValid(t *testing.T) {
withProxyConfigured(t) // SaaS context: default-closed → platform_managed.
ctx := context.Background()
const wsID = "22222222-2222-2222-2222-222222222222"

Some files were not shown because too many files have changed in this diff Show More