Compare commits

..

4 Commits

Author SHA1 Message Date
core-devops 65831c839e fix(queue): re-fetch PR head before merge to detect stale SHA
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
cascade-list-drift-gate / check (pull_request) Failing after 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m10s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 54s
CI / Platform (Go) (pull_request) Successful in 4m48s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m10s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 54s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 6m12s
gate-check-v3 / gate-check (pull_request) Successful in 3s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m0s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m1s
CI / Python Lint & Test (pull_request) Successful in 6m37s
CI / all-required (pull_request) Successful in 6m38s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 2/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +2 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
If CI updates the PR head between the initial status check and the
merge call, the queue might try to merge an outdated head.  Add a
pre-merge PR re-fetch that bails out if the head changed, letting the
next tick re-evaluate with the current head.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 04:06:22 +00:00
core-devops d342149646 fix(queue): handle Gitea empty-body 200 on merge endpoint
Block internal-flavored paths / Block forbidden paths (pull_request) Waiting to run
cascade-list-drift-gate / check (pull_request) Waiting to run
CI / all-required (pull_request) Waiting to run
CI / Detect changes (pull_request) Waiting to run
CI / Platform (Go) (pull_request) Waiting to run
CI / Canvas (Next.js) (pull_request) Waiting to run
CI / Shellcheck (E2E scripts) (pull_request) Waiting to run
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
CI / Python Lint & Test (pull_request) Waiting to run
E2E API Smoke Test / detect-changes (pull_request) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (pull_request) Blocked by required conditions
E2E Chat / detect-changes (pull_request) Waiting to run
E2E Chat / E2E Chat (pull_request) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Blocked by required conditions
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Waiting to run
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Waiting to run
lint-required-no-paths / lint-required-no-paths (pull_request) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
Runtime PR-Built Compatibility / detect-changes (pull_request) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (pull_request) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
qa-review / approved (pull_request) Waiting to run
security-review / approved (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
Gitea's /pulls/{n}/merge returns HTTP 200 with an empty body on success.
The api() wrapper tries to json.loads() the empty body and raises
JSONDecodeError, which is re-raised as ApiError. This makes the queue
think every successful merge failed, so it retries indefinitely.

Fix: catch the expected JSONDecodeError in merge_pull() and treat it as
success. Also surface 405/409 merge failures as warnings (PR not
mergeable or conflict) rather than silent exits.

Combined with the wait_for_ci fix from the previous commit, this breaks
the update-then-wait loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 04:05:37 +00:00
core-devops 54907ee852 fix(queue): wait for CI after update instead of immediate re-check
Block internal-flavored paths / Block forbidden paths (pull_request) Waiting to run
cascade-list-drift-gate / check (pull_request) Waiting to run
CI / all-required (pull_request) Waiting to run
CI / Detect changes (pull_request) Waiting to run
CI / Platform (Go) (pull_request) Waiting to run
CI / Canvas (Next.js) (pull_request) Waiting to run
CI / Shellcheck (E2E scripts) (pull_request) Waiting to run
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
CI / Python Lint & Test (pull_request) Waiting to run
E2E API Smoke Test / detect-changes (pull_request) Waiting to run
E2E API Smoke Test / E2E API Smoke Test (pull_request) Blocked by required conditions
E2E Chat / detect-changes (pull_request) Waiting to run
E2E Chat / E2E Chat (pull_request) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Waiting to run
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Blocked by required conditions
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Waiting to run
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Waiting to run
lint-required-no-paths / lint-required-no-paths (pull_request) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
Runtime PR-Built Compatibility / detect-changes (pull_request) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (pull_request) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
qa-review / approved (pull_request) Waiting to run
security-review / approved (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
The queue was in an update-then-wait loop:
1. Queue updates PR → new CI run triggered on new head
2. Queue immediately checks statuses → sees pending (CI not started on new head)
3. Queue exits "wait"
4. Next tick: same cycle, CI never completes on any single head

Fix: after update_pull(), re-fetch the new head SHA and poll CI for
up to 5 min until required contexts reach terminal state. If CI
finishes within the window, merge on the same tick. If not, exit and
retry next tick.

Also adds `import time` required for the wait loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 04:03:11 +00:00
core-devops 99453c6a71 infra(ci): add concurrency blocks to 3 scheduled workflows
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m7s
CI / Canvas (Next.js) (pull_request) Successful in 6m4s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m1s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 50s
CI / Python Lint & Test (pull_request) Successful in 6m28s
CI / all-required (pull_request) Successful in 6m22s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
qa-review / approved (pull_request) N/A declared by core-devops; qa-review waived per sop-checklist config
security-review / approved (pull_request) N/A declared by core-devops; security-review waived per sop-checklist config
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 5/7 — missing: root-cause, no-backwards-compat
sop-checklist / na-declarations (pull_request) N/A: (none)
Add per-SHA concurrency groups with cancel-in-progress: true to
scheduled workflows missing concurrency blocks:

- gate-check-v3.yml (hourly cron): prevents stale hourly runs from
  accumulating when new cron ticks fire
- secret-pattern-drift.yml (daily 05:00 UTC): same
- weekly-platform-go.yml (Mondays 04:17 UTC): same

These are lower-frequency than the sweep/minute-level workflows
but should still be covered for consistency and runner hygiene.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:47:52 +00:00
90 changed files with 382 additions and 6607 deletions
+102 -77
View File
@@ -23,6 +23,7 @@ import dataclasses
import json
import os
import sys
import time
import urllib.error
import urllib.parse
import urllib.request
@@ -65,11 +66,6 @@ class ApiError(RuntimeError):
pass
class MergePermissionError(ApiError):
"""Merge failed with a permanent permission error (403/404/405).
The queue should skip this PR and move to the next one."""
@dataclasses.dataclass(frozen=True)
class MergeDecision:
ready: bool
@@ -153,38 +149,15 @@ def latest_statuses_by_context(statuses: list[dict]) -> dict[str, dict]:
return latest
def _is_tier_low_pending_ok(
latest_statuses: dict[str, dict],
context: str,
pr_labels: set[str],
) -> bool:
"""Return True if tier:low PR can tolerate sop-checklist pending state.
Per sop-checklist-config.yaml tier_failure_mode, tier:low uses soft-fail:
sop-checklist posts state=pending when acks are satisfied (missing
manager/ceo acks are informational only). The queue should accept
pending instead of waiting for success.
"""
if "tier:low" not in pr_labels:
return False
if "sop-checklist" not in context:
return False
status = latest_statuses.get(context) or {}
return status_state(status) == "pending"
def required_contexts_green(
latest_statuses: dict[str, dict],
contexts: list[str],
pr_labels: set[str] | None = None,
) -> tuple[bool, list[str]]:
missing_or_bad: list[str] = []
for context in contexts:
status = latest_statuses.get(context)
state = status_state(status or {})
if state != "success":
if pr_labels and _is_tier_low_pending_ok(latest_statuses, context, pr_labels):
continue # tier:low soft-fail: accept pending sop-checklist
missing_or_bad.append(f"{context}={state or 'missing'}")
return not missing_or_bad, missing_or_bad
@@ -237,7 +210,6 @@ def evaluate_merge_readiness(
pr_status: dict,
required_contexts: list[str],
pr_has_current_base: bool,
pr_labels: set[str] | None = None,
) -> MergeDecision:
# Check push-required contexts explicitly instead of combined state.
# Combined state can be "failure" due to non-blocking jobs
@@ -257,7 +229,7 @@ def evaluate_merge_readiness(
# The required_contexts list is the authoritative gate — it includes only
# the checks that actually block merges.
latest = latest_statuses_by_context(pr_status.get("statuses") or [])
ok, missing_or_bad = required_contexts_green(latest, required_contexts, pr_labels)
ok, missing_or_bad = required_contexts_green(latest, required_contexts)
if not ok:
return MergeDecision(False, "wait", "required contexts not green: " + ", ".join(missing_or_bad))
return MergeDecision(True, "merge", "ready")
@@ -282,32 +254,27 @@ def get_combined_status(sha: str) -> dict:
_, combined = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
if not isinstance(combined, dict):
raise ApiError(f"status for {sha} response not object")
combined_statuses: list[dict] = combined.get("statuses") or []
# Fetch full statuses list; 200 covers >99% of real-world runs.
# The list is ordered ascending by id (oldest first) — callers must
# iterate in reverse to get the newest entry per context.
# Best-effort: large repos (main with 550+ statuses) may time out.
# On timeout, fall back to the statuses[] already in the combined
# response (usually 30 entries — enough for most PRs, enough for
# main's early push-required contexts).
try:
_, all_statuses_raw = api(
_, all_statuses = api(
"GET",
f"/repos/{OWNER}/{NAME}/commits/{sha}/statuses",
query={"limit": "50"},
)
if isinstance(all_statuses_raw, list):
all_statuses: list[dict] = list(all_statuses_raw)
else:
all_statuses = []
if isinstance(all_statuses, list):
combined["statuses"] = all_statuses
except (ApiError, urllib.error.URLError, TimeoutError, OSError) as exc:
# URLError covers network-level failures (DNS, refused, timeout).
# TimeoutError and OSError cover socket-level timeouts.
sys.stderr.write(f"::warning::could not fetch full statuses list for {sha[:8]}: {exc}\n")
all_statuses = []
# Build latest per context: process combined (ascending→reverse=newest
# first), then fill gaps from all_statuses (already newest-first).
latest: dict[str, dict] = {}
for status in reversed(sorted(combined_statuses, key=lambda s: s.get("id") or 0)):
ctx = status.get("context")
if isinstance(ctx, str) and ctx not in latest:
latest[ctx] = status
for status in all_statuses:
ctx = status.get("context")
if isinstance(ctx, str) and ctx not in latest:
latest[ctx] = status
combined["statuses"] = list(latest.values())
# Fall back to the statuses[] already in the combined response.
pass
return combined
@@ -360,6 +327,43 @@ def update_pull(pr_number: int, *, dry_run: bool) -> None:
)
def wait_for_ci(
head_sha: str,
contexts: list[str],
*,
max_wait_seconds: int = 300,
poll_interval: int = 15,
) -> bool:
"""Poll CI statuses for head_sha until all required contexts are terminal.
Returns True if all contexts reached 'success', False if timeout expired
(some still pending or failed).
Background: after a queue-triggered PR update, CI re-runs on the new head.
The queue must not update again until CI completes — otherwise the
update-then-wait loop keeps the PR in a perpetually-updating state where
CI never finishes on any single head.
"""
deadline = time.time() + max_wait_seconds
while time.time() < deadline:
time.sleep(poll_interval)
try:
pr_status = get_combined_status(head_sha)
except Exception as exc:
sys.stderr.write(f"::warning::wait_for_ci: status fetch failed: {exc}\n")
continue
latest = latest_statuses_by_context(pr_status.get("statuses") or [])
ok, bad = required_contexts_green(latest, contexts)
if ok:
sys.stderr.write(f"::notice::wait_for_ci: all contexts green after {int(time.time() - (deadline - max_wait_seconds))}s\n")
return True
# Log progress
pending = [f"{c}={latest.get(c, {}).get('status', 'missing')}" for c in contexts if latest.get(c, {}).get('status') != 'success']
sys.stderr.write(f"::notice::wait_for_ci: still waiting ({int(deadline - time.time())}s left): {', '.join(pending[:3])}\n")
sys.stderr.write(f"::warning::wait_for_ci: timeout after {max_wait_seconds}s; proceeding with merge check\n")
return False
def merge_pull(pr_number: int, *, dry_run: bool) -> None:
payload = {
"Do": "merge",
@@ -372,16 +376,24 @@ def merge_pull(pr_number: int, *, dry_run: bool) -> None:
print(f"::notice::merging PR #{pr_number}")
if dry_run:
return
# Gitea's merge endpoint returns HTTP 200 with an empty body on success.
# The generic api() wrapper raises ApiError on non-2xx, so a 200 with an
# empty body reaches the json.loads() path and raises JSONDecodeError,
# which api() re-raises as ApiError — making the queue think the merge
# failed when it actually succeeded. Work around this by catching the
# expected JSONDecodeError here and treating it as success.
try:
api("POST", f"/repos/{OWNER}/{NAME}/pulls/{pr_number}/merge", body=payload, expect_json=False)
except ApiError as exc:
# Re-raise permission-like errors so process_once can skip this PR.
# 403 = no push access, 404 = repo/pr not found, 405 = not allowed.
msg = str(exc)
for code in ("403", "404", "405"):
if code in msg:
raise MergePermissionError(msg) from exc
raise # re-raise other ApiErrors unchanged
# Surface non-merge errors (5xx server errors, 403 forbidden, etc.)
if "merge" in str(exc).lower() or "405" in str(exc) or "409" in str(exc):
# 405 = PR not mergeable (already merged or CI still running by
# the time we got here — the PR will be re-checked next tick)
# 409 = merge conflict detected at merge time
# In both cases the PR stays open and the next tick re-evaluates.
sys.stderr.write(f"::warning::merge call returned: {exc}\n")
else:
raise
def process_once(*, dry_run: bool = False) -> int:
@@ -423,18 +435,42 @@ def process_once(*, dry_run: bool = False) -> int:
commits = get_pull_commits(pr_number)
current_base = pr_has_current_base(pr, commits, main_sha)
pr_status = get_combined_status(head_sha)
pr_labels = label_names(pr)
decision = evaluate_merge_readiness(
main_status=main_status,
pr_status=pr_status,
required_contexts=contexts,
pr_has_current_base=current_base,
pr_labels=pr_labels,
)
print(f"::notice::PR #{pr_number} decision={decision.action}: {decision.reason}")
if decision.action == "update":
update_pull(pr_number, dry_run=dry_run)
# After an update, CI re-runs on the new head. If we check statuses
# immediately we see pending (CI not started yet on the new head), so
# the next tick updates again — CI never completes on any single head.
# Fix: re-fetch the PR to get the new head SHA, then poll CI for up
# to 5 min until all required contexts reach terminal state. If CI
# finishes in time, proceed to merge on the same tick.
if not dry_run:
updated_pr = get_pull(pr_number)
new_head = updated_pr.get("head", {}).get("sha", "")
if new_head and new_head != head_sha:
sys.stderr.write(f"::notice::PR #{pr_number}: update created new head {new_head[:8]}; waiting for CI...\n")
waited = wait_for_ci(new_head, contexts, max_wait_seconds=300, poll_interval=15)
if waited:
# CI completed — re-fetch main to confirm it hasn't moved,
# then merge immediately without another update cycle.
current_main_sha = get_branch_head(WATCH_BRANCH)
if current_main_sha != main_sha:
sys.stderr.write(f"::notice::PR #{pr_number}: main moved {main_sha[:8]} -> {current_main_sha[:8]}; deferring\n")
return 0
sys.stderr.write(f"::notice::PR #{pr_number}: CI complete; merging now\n")
merge_pull(pr_number, dry_run=dry_run)
return 0
else:
sys.stderr.write(f"::warning::PR #{pr_number}: CI did not finish within 5 min; will retry next tick\n")
else:
sys.stderr.write(f"::notice::PR #{pr_number}: update did not change head SHA; will retry\n")
post_comment(
pr_number,
(
@@ -445,6 +481,13 @@ def process_once(*, dry_run: bool = False) -> int:
)
return 0
if decision.ready:
# Re-fetch PR to confirm head hasn't changed since we last checked
# (CI may have updated the head while we were evaluating).
current_pr = get_pull(pr_number)
current_head = current_pr.get("head", {}).get("sha", "")
if current_head != head_sha:
print(f"::notice::PR #{pr_number} head changed {head_sha[:8]} -> {current_head[:8]}; re-evaluating")
return 0
latest_main_sha = get_branch_head(WATCH_BRANCH)
if latest_main_sha != main_sha:
print(
@@ -452,25 +495,7 @@ def process_once(*, dry_run: bool = False) -> int:
"deferring to next tick"
)
return 0
try:
merge_pull(pr_number, dry_run=dry_run)
except MergePermissionError as exc:
# Permanent merge failure (HTTP 403/404/405). Post a comment so
# maintainers know why, then return 0 so this tick is done.
# The PR stays in the queue; future ticks can retry after the
# permission issue is resolved.
sys.stderr.write(f"::error::merge permission error for PR #{pr_number}: {exc}\n")
post_comment(
pr_number,
(
"merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. "
"No available token has Can-merge permission on this repo. "
"Fix: grant Can-merge to a token, or add a maintain/admin collaborator. "
"Skipping to next queued PR on next tick."
),
dry_run=dry_run,
)
return 0
merge_pull(pr_number, dry_run=dry_run)
return 0
return 0
@@ -118,13 +118,3 @@ def test_merge_decision_updates_stale_pr_before_merge():
assert decision.ready is False
assert decision.action == "update"
def test_MergePermissionError_inherits_from_ApiError():
assert issubclass(mq.MergePermissionError, mq.ApiError)
def test_MergePermissionError_message_preserved():
exc = mq.MergePermissionError("POST /merge -> HTTP 405: User not allowed")
assert "405" in str(exc)
assert "User not allowed" in str(exc)
+10 -12
View File
@@ -145,10 +145,10 @@ jobs:
# the diagnostic step with its own continue-on-error: true (line 203).
# Flip confirmed by CI / Platform (Go) status = success on main HEAD 363905d3.
continue-on-error: false
# Job-level ceiling. The go test step below runs with a per-step 30m timeout;
# this cap catches any step that leaks past that. Set well above 30m so
# Job-level ceiling. The go test step below runs with a per-step 10m timeout;
# this cap catches any step that leaks past that. Set well above 10m so
# the per-step timeout is the active constraint.
timeout-minutes: 35
timeout-minutes: 15
defaults:
run:
working-directory: workspace-server
@@ -176,14 +176,12 @@ jobs:
name: Run golangci-lint
run: $(go env GOPATH)/bin/golangci-lint run --timeout 3m ./...
- if: always()
name: Diagnostic — per-package verbose (300s timeout)
name: Diagnostic — per-package verbose 60s
run: |
set +e
# 300s allows handlers + pendinguploads packages to complete on cold
# runners with -race instrumentation (~60-120s each vs ~14s non-race).
go test -race -v -timeout 300s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
go test -race -v -timeout 60s ./internal/handlers/... 2>&1 | tee /tmp/test-handlers.log
handlers_exit=$?
go test -race -v -timeout 300s ./internal/pendinguploads/... 2>&1 | tee /tmp/test-pu.log
go test -race -v -timeout 60s ./internal/pendinguploads/... 2>&1 | tee /tmp/test-pu.log
pu_exit=$?
echo "::group::handlers exit=$handlers_exit (last 100 lines)"
tail -100 /tmp/test-handlers.log
@@ -196,10 +194,10 @@ jobs:
- if: always()
name: Run tests with race detection and coverage
# Explicit timeout: cold runner cache causes OOM kills at ~4m39s on the
# full ./... suite with race detection + coverage. A 30m per-step timeout
# lets the suite complete on cold cache (~13-25m) while failing cleanly
# instead of OOM-killing. The job-level timeout (35m) is a backstop.
run: go test -race -timeout 30m -coverprofile=coverage.out ./...
# full ./... suite with race detection + coverage. A 10m per-step timeout
# lets the suite complete on cold cache (~5-7m) while failing cleanly
# instead of OOM-killing. The job-level timeout (15m) is a backstop.
run: go test -race -timeout 10m -coverprofile=coverage.out ./...
- if: always()
name: Per-file coverage report
+6
View File
@@ -32,6 +32,12 @@ on:
# iterating all open PRs when PR_NUMBER is empty.
workflow_dispatch:
# Cancel stale runs so the 8-runner pool stays available for PR jobs.
# Per-SHA group ensures push and cron runs at different SHAs don't cancel each other.
concurrency:
group: gate-check-v3-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: true
permissions:
# read: contents — for checkout (base ref, not PR head for security)
# read: pull-requests — for reading PR info via API
-1
View File
@@ -162,7 +162,6 @@ jobs:
exit 1
fi
python -m twine upload \
--verbose \
--repository pypi \
--username __token__ \
--password "$PYPI_TOKEN" \
@@ -44,6 +44,12 @@ on:
- ".github/scripts/lint_secret_pattern_drift.py"
- ".githooks/pre-commit"
# Cancel stale runs to keep the 8-runner pool available for PR jobs.
# Per-SHA group ensures push and scheduled runs at different SHAs don't cancel each other.
concurrency:
group: secret-pattern-drift-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
+5
View File
@@ -22,6 +22,11 @@ on:
- cron: '17 4 * * 1' # Mondays at 04:17 UTC
workflow_dispatch:
# Cancel stale runs to keep the 8-runner pool available for PR jobs.
concurrency:
group: weekly-platform-go-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: true
permissions:
contents: read
statuses: write
+6 -6
View File
@@ -53,7 +53,7 @@ Molecule AI is the most powerful way to govern an AI agent organization in produ
It combines the parts that are usually scattered across demos, internal glue code, and framework-specific tooling into one product:
- one org-native control plane for teams, roles, hierarchy, and lifecycle
- one runtime layer that lets **nine** agent runtimes — LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, **Hermes**, **Gemini CLI**, OpenClaw, and **Codex** — run side by side behind one workspace contract
- one runtime layer that lets **eight** agent runtimes — LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, **Hermes**, **Gemini CLI**, and OpenClaw — run side by side behind one workspace contract
- one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries (Memory v2 backed by pgvector for semantic recall)
- one operational surface for observing, pausing, restarting, inspecting, and improving live workspaces
@@ -75,7 +75,7 @@ You do not wire collaboration paths by hand. Hierarchy defines the default commu
### 3. Runtime choice stops being a dead-end decision
LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, Hermes, Gemini CLI, OpenClaw, and Codex can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.
LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, Hermes, Gemini CLI, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.
### 4. Memory is treated like infrastructure
@@ -120,7 +120,6 @@ Molecule AI is not trying to replace the frameworks below. It is the system that
| **Hermes 4** | Shipping on `main` | Hybrid reasoning, native tools, json_schema (NousResearch/hermes-agent) | Option B upstream hook, A2A bridge to OpenAI-compat API, multi-provider provider derivation |
| **Gemini CLI** | Shipping on `main` | Google Gemini CLI continuity | Workspace lifecycle, A2A, hierarchy-aware collaboration, shared ops plane |
| **OpenClaw** | Shipping on `main` | CLI-native runtime with its own session model | Workspace lifecycle, templates, activity logs, topology-aware collaboration |
| **Codex** | Shipping on `main` | OpenAI Codex CLI continuity | Workspace lifecycle, A2A, hierarchy-aware collaboration, shared ops plane |
| **NemoClaw** | WIP on `feat/nemoclaw-t4-docker` | NVIDIA-oriented runtime path | Planned to join the same abstraction once merged; not yet part of `main` |
This is the key idea: **many agent runtimes, one organizational operating system**.
@@ -210,7 +209,7 @@ The result is not just “an agent that learns.” It is **an organization that
### Runtime
- unified `workspace/` image; thin AMI in production (us-east-2)
- adapter-driven execution across **9 runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw, Codex)
- adapter-driven execution across **8 runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw)
- Agent Card registration
- awareness-backed memory integration; **Memory v2 backed by pgvector** for semantic recall
- plugin-mounted shared rules/skills
@@ -240,6 +239,7 @@ The result is not just “an agent that learns.” It is **an organization that
- no tunnel, no public endpoint — the plugin self-registers each watched workspace as `delivery_mode=poll` and long-polls `/activity?since_id=…`
- multi-tenant friendly: one plugin install can watch workspaces across multiple Molecule tenants (`MOLECULE_PLATFORM_URLS` per-workspace)
- install via the standard marketplace flow: `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel``/plugin install molecule-channel@molecule-mcp-claude-channel`
## Built For Teams That Need More Than A Demo
Molecule AI is especially strong when you need to run:
@@ -260,7 +260,7 @@ Canvas (Next.js 15, warm-paper :3000) <--HTTP / WS--> Platform (Go 1.25 :8080)
+------------------------- shows ------------------------> workspaces, teams, tasks, traces, events
Workspace Runtime (Python ≥3.11, image with adapters)
- 9 runtimes: LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / Hermes / Gemini CLI / OpenClaw / Codex
- 8 adapters: LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / Hermes / Gemini CLI / OpenClaw
- Agent Card + A2A server (typed-SSOT response path, RFC #2967)
- heartbeat + activity + awareness-backed memory (Memory v2 — pgvector semantic recall)
- skills + plugins + hot reload
@@ -328,7 +328,7 @@ Then open `http://localhost:3000`:
## Current Scope
The current `main` branch ships the core platform, Canvas v4 (warm-paper themed), Memory v2 (pgvector semantic recall), the typed-SSOT A2A response path (RFC #2967), **nine production runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw, Codex), skill lifecycle, and operational surfaces.
The current `main` branch ships the core platform, Canvas v4 (warm-paper themed), Memory v2 (pgvector semantic recall), the typed-SSOT A2A response path (RFC #2967), **eight production adapters** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw), skill lifecycle, and operational surfaces.
The companion private repo [`molecule-controlplane`](https://git.moleculesai.app/molecule-ai/molecule-controlplane) provides the SaaS surface — multi-tenant orchestration on EC2 + Neon + Cloudflare Tunnels, KMS envelope encryption, WorkOS auth, Stripe billing, and a `tenant_resources` audit table with a 30-min reconciler.
@@ -1,55 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for formatAuditRelativeTime exported from AuditTrailPanel.
*/
import { describe, it, expect } from "vitest";
import { formatAuditRelativeTime } from "../AuditTrailPanel";
describe("formatAuditRelativeTime", () => {
const now = new Date("2026-05-18T12:00:00Z").getTime();
it('returns "just now" for timestamps less than 60s ago', () => {
const ts = new Date(now - 30_000).toISOString(); // 30s ago
expect(formatAuditRelativeTime(ts, now)).toBe("just now");
});
it("returns minutes for timestamps under 1h", () => {
const ts = new Date(now - 5 * 60_000).toISOString(); // 5m ago
expect(formatAuditRelativeTime(ts, now)).toBe("5m ago");
});
it("returns hours for timestamps under 24h", () => {
const ts = new Date(now - 3 * 3_600_000).toISOString(); // 3h ago
expect(formatAuditRelativeTime(ts, now)).toBe("3h ago");
});
it("returns locale date for timestamps older than 24h", () => {
const ts = new Date(now - 2 * 86_400_000).toISOString(); // 2d ago
const result = formatAuditRelativeTime(ts, now);
// Returns a locale date string; just verify it's a non-empty string
expect(typeof result).toBe("string");
expect(result.length).toBeGreaterThan(0);
expect(result).not.toBe("just now");
expect(result).not.toMatch(/m ago$/);
expect(result).not.toMatch(/h ago$/);
});
it("handles exactly 60s boundary as minutes", () => {
const ts = new Date(now - 60_000).toISOString(); // exactly 1m ago
expect(formatAuditRelativeTime(ts, now)).toBe("1m ago");
});
it("handles exactly 3600s boundary as hours", () => {
const ts = new Date(now - 3_600_000).toISOString(); // exactly 1h ago
expect(formatAuditRelativeTime(ts, now)).toBe("1h ago");
});
it("handles exactly 86400s boundary", () => {
const ts = new Date(now - 86_400_000).toISOString(); // exactly 24h ago
const result = formatAuditRelativeTime(ts, now);
// Exactly 24h should fall into the "days" branch
expect(typeof result).toBe("string");
expect(result).not.toMatch(/m ago$/);
expect(result).not.toMatch(/h ago$/);
});
});
@@ -1,82 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for exported helpers from MemoryInspectorPanel:
* isPluginUnavailableError, formatTTL.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { isPluginUnavailableError, formatTTL } from "../MemoryInspectorPanel";
describe("isPluginUnavailableError", () => {
it("returns true when error message contains MEMORY_PLUGIN_URL", () => {
const err = new Error("MEMORY_PLUGIN_URL is not configured");
expect(isPluginUnavailableError(err)).toBe(true);
});
it("returns false when error message does not contain MEMORY_PLUGIN_URL", () => {
const err = new Error("Connection refused");
expect(isPluginUnavailableError(err)).toBe(false);
});
it("returns false for non-Error values", () => {
expect(isPluginUnavailableError("string error")).toBe(false);
expect(isPluginUnavailableError(null)).toBe(false);
expect(isPluginUnavailableError(undefined)).toBe(false);
expect(isPluginUnavailableError({})).toBe(false);
});
it("handles Error with empty message", () => {
expect(isPluginUnavailableError(new Error(""))).toBe(false);
});
});
describe("formatTTL", () => {
// Freeze time at 2026-05-18T12:00:00Z for deterministic tests.
beforeEach(() => {
vi.useFakeTimers();
vi.setSystemTime(new Date("2026-05-18T12:00:00Z"));
});
afterEach(() => {
vi.useRealTimers();
});
it("returns empty string for null", () => {
expect(formatTTL(null)).toBe("");
});
it("returns empty string for undefined", () => {
expect(formatTTL(undefined)).toBe("");
});
it("returns empty string for empty string", () => {
expect(formatTTL("")).toBe("");
});
it("returns 'expired' for past timestamps", () => {
const past = new Date(Date.now() - 60_000).toISOString();
expect(formatTTL(past)).toBe("expired");
});
it("returns seconds for sub-minute future TTLs", () => {
const future = new Date(Date.now() + 30_000).toISOString();
expect(formatTTL(future)).toBe("30s");
});
it("returns minutes for sub-hour future TTLs", () => {
const future = new Date(Date.now() + 5 * 60_000).toISOString();
expect(formatTTL(future)).toBe("5m");
});
it("returns hours for sub-day future TTLs", () => {
const future = new Date(Date.now() + 3 * 3_600_000).toISOString();
expect(formatTTL(future)).toBe("3h");
});
it("returns days for TTLs longer than 24h", () => {
const future = new Date(Date.now() + 2 * 86_400_000).toISOString();
expect(formatTTL(future)).toBe("2d");
});
it("returns empty string for invalid date string", () => {
expect(formatTTL("not-a-date")).toBe("");
});
});
@@ -11,21 +11,13 @@ import { render, screen, fireEvent, cleanup, act } from "@testing-library/react"
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { TestConnectionButton } from "../ui/TestConnectionButton";
import type { SecretGroup } from "@/types/secrets";
import { validateSecret, ApiError } from "@/lib/api/secrets";
import { validateSecret } from "@/lib/api/secrets";
// ─── Mock validateSecret ──────────────────────────────────────────────────────
// vi.mock is hoisted, so validateSecret (imported above) refers to the mocked
// namespace value once vi.mock runs. Use vi.mocked() to access it in tests.
vi.mock("@/lib/api/secrets", () => ({
validateSecret: vi.fn(),
ApiError: class ApiError extends Error {
status: number;
constructor(status: number, message: string) {
super(message);
this.name = "ApiError";
this.status = status;
}
},
}));
// SecretGroup is a string literal type: 'github' | 'anthropic' | 'openrouter' | 'custom'
@@ -110,7 +102,7 @@ describe("TestConnectionButton — state machine", () => {
expect(screen.getByText("Permission denied")).toBeTruthy();
});
it("shows a connectivity message on a genuine network exception", async () => {
it("shows generic error message on unexpected exception", async () => {
vi.mocked(validateSecret).mockRejectedValue(new Error("timeout"));
render(<TestConnectionButton provider={toGroup("anthropic")} secretValue="sk-..." />);
@@ -118,23 +110,8 @@ describe("TestConnectionButton — state machine", () => {
await act(async () => { /* flush */ });
expect(screen.getByRole("alert")).toBeTruthy();
// A real thrown network error → honest connectivity message (not a
// fabricated "service down"); see internal#492.
expect(document.body.querySelector('[role="alert"]')?.textContent).toMatch(
/could not reach the validation service/i,
);
});
it("does not claim a timeout when the validate endpoint 404s (internal#492)", async () => {
vi.mocked(validateSecret).mockRejectedValue(new ApiError(404, "Not Found"));
render(<TestConnectionButton provider={toGroup("anthropic")} secretValue="sk-..." />);
fireEvent.click(screen.getByRole("button"));
await act(async () => { /* flush */ });
const alert = document.body.querySelector('[role="alert"]')?.textContent ?? "";
expect(alert).not.toMatch(/timed out/i);
expect(alert).toMatch(/not available/i);
// The error detail is hardcoded to "Connection timed out. Service may be down."
expect(document.body.querySelector('[role="alert"]')?.textContent).toMatch(/timed out/i);
});
});
+18 -74
View File
@@ -2,11 +2,8 @@
// 04 · Chat — message thread + composer + sub-tabs.
// Wired to the same /workspaces/:id/a2a (method message/send) endpoint
// that the desktop ChatTab uses. Render parity with desktop ChatTab is
// achieved by reusing its renderers rather than forking a reduced
// mobile path: the Agent Comms sub-tab mounts the same AgentCommsPanel,
// and message attachments route through the same AttachmentPreview
// dispatch the desktop My-Chat bubble uses (#231/#232).
// that the desktop ChatTab uses, but with a slimmer surface: no
// attachments, no A2A topology overlay, no conversation tracing.
import { useEffect, useMemo, useRef, useState } from "react";
import ReactMarkdown from "react-markdown";
@@ -19,9 +16,6 @@ import {
useChatSend,
useChatSocket,
} from "@/components/tabs/chat/hooks";
import { AgentCommsPanel } from "@/components/tabs/chat/AgentCommsPanel";
import { AttachmentPreview } from "@/components/tabs/chat/AttachmentPreview";
import { downloadChatFile } from "@/components/tabs/chat/uploads";
import { toMobileAgent } from "./components";
import { MOBILE_FONT_MONO, MOBILE_FONT_SANS, usePalette } from "./palette";
@@ -242,8 +236,6 @@ export function MobileChat({
useChatSocket(agentId, {
onAgentMessage: appendMessageDeduped,
// Fan-out user's own outbound message to all sessions (issue #228).
onUserMessage: appendMessageDeduped,
onSendComplete: releaseSendGuards,
});
@@ -312,17 +304,6 @@ export function MobileChat({
const removePendingFile = (index: number) =>
setPendingFiles((prev) => prev.filter((_, i) => i !== index));
// Route attachment downloads through the same authenticated helper
// the desktop ChatTab uses (downloadChatFile) so platform-scheme
// URIs get a real Blob with auth headers instead of about:blank.
const downloadAttachment = (att: ChatAttachment) => {
downloadChatFile(agentId, att).catch(() => {
// AttachmentPreview's own error affordance covers the in-bubble
// failure state; matches ChatTab's behaviour of not double-
// reporting a download failure.
});
};
const send = async () => {
const text = draft.trim();
if ((!text && pendingFiles.length === 0) || sending || !reachable) return;
@@ -452,19 +433,7 @@ export function MobileChat({
</div>
</div>
{/* Agent Comms — reuse the desktop AgentCommsPanel verbatim so
mobile renders the identical peer/A2A + delegation feed
(history GET + live socket events) instead of a placeholder
(#231). The panel owns its own scroll/load/error/empty
states, matching ChatTab's agent-comms tabpanel. */}
{tab === "a2a" && (
<div style={{ flex: 1, minHeight: 0, overflow: "hidden" }}>
<AgentCommsPanel workspaceId={agentId} />
</div>
)}
{/* Messages */}
{tab === "my" && (
<div
ref={scrollRef}
style={{
@@ -476,6 +445,18 @@ export function MobileChat({
gap: 8,
}}
>
{tab === "a2a" && (
<div
style={{
padding: "20px 4px",
textAlign: "center",
color: p.text3,
fontSize: 13,
}}
>
Agent Comms peer-to-peer A2A traffic surfaces in the Comms tab.
</div>
)}
{tab === "my" && historyLoading && (
<div style={{ padding: "20px 4px", textAlign: "center", color: p.text3, fontSize: 13 }}>
Loading chat history
@@ -540,31 +521,9 @@ export function MobileChat({
overflowWrap: "anywhere",
}}
>
{m.content && (
<MarkdownBubble dark={dark} accent={p.accent}>
{m.content}
</MarkdownBubble>
)}
{m.attachments && m.attachments.length > 0 && (
<div
style={{
display: "flex",
flexWrap: "wrap",
gap: 4,
marginTop: m.content ? 6 : 0,
}}
>
{m.attachments.map((att, i) => (
<AttachmentPreview
key={`${m.id}-${i}`}
workspaceId={agentId}
attachment={att}
onDownload={downloadAttachment}
tone={mine ? "user" : "agent"}
/>
))}
</div>
)}
<MarkdownBubble dark={dark} accent={p.accent}>
{m.content}
</MarkdownBubble>
<div
style={{
fontSize: 10,
@@ -595,13 +554,7 @@ export function MobileChat({
</div>
)}
</div>
)}
{/* Footer ID + composer belong to My Chat only. The Agent Comms
tab is a read-only peer/A2A feed (parity with desktop
ChatTab, where the agent-comms tabpanel has no composer). */}
{tab === "my" && (
<>
{/* Footer ID */}
<div
style={{
@@ -750,14 +703,7 @@ export function MobileChat({
border: "none",
outline: "none",
background: "transparent",
// 16px floor: iOS Safari/WebKit auto-zooms the viewport on
// focus when a focused field's font-size is < 16px. Anything
// below this re-introduces the tap-to-zoom layout jump on the
// mobile PWA. Do NOT lower this without also adding a
// maximum-scale/user-scalable viewport lock — and that lock
// breaks pinch-to-zoom accessibility, so 16px here is the
// correct trade.
fontSize: 16,
fontSize: 14.5,
lineHeight: 1.4,
color: p.text,
padding: "6px 0",
@@ -800,8 +746,6 @@ export function MobileChat({
</button>
</div>
</div>
</>
)}
</div>
);
}
@@ -21,14 +21,6 @@ import { MobileChat } from "../MobileChat";
vi.mock("@/lib/api");
import { api } from "@/lib/api";
// AgentCommsPanel (mounted by the Agent Comms sub-tab, #231) subscribes
// to the global socket via useSocketEvent. Stub it to a no-op so the
// panel mounts without the real ReconnectingSocket — the parity tests
// only assert the panel renders (vs the old static placeholder).
vi.mock("@/hooks/useSocketEvent", () => ({
useSocketEvent: vi.fn(),
}));
// ─── Mock store ───────────────────────────────────────────────────────────────
const mockAgentId = "ws-chat-test";
@@ -163,12 +155,6 @@ beforeEach(() => {
mockOnBack.mockClear();
mockStoreState.nodes = [];
mockStoreState.agentMessages = {};
// jsdom doesn't implement scrollIntoView. The Agent Comms tab now
// mounts AgentCommsPanel (#231), which scrolls its feed to bottom on
// arrival; a no-op stub keeps the panel from throwing under jsdom
// (same stub AgentCommsPanel's own render test installs).
Element.prototype.scrollIntoView =
vi.fn() as unknown as Element["scrollIntoView"];
// Set up spies on the real api methods. Tests override these per-call.
const getSpy = vi.spyOn(api, "get");
const postSpy = vi.spyOn(api, "post");
@@ -263,20 +249,6 @@ describe("MobileChat — composer", () => {
const sendBtn = container.querySelector('[aria-label="Send"]') as HTMLButtonElement;
expect(sendBtn.disabled).toBe(true);
});
// iOS Safari/WebKit auto-zooms the viewport on focus when a focused
// <input>/<textarea> has an effective font-size below 16px. On the
// mobile PWA this made the whole layout scale up the moment the user
// tapped into the chat box. Keeping the composer font ≥16px is the
// root-cause fix — it suppresses the focus-zoom WITHOUT disabling
// pinch-to-zoom (which a maximum-scale/user-scalable viewport hack
// would have done at the cost of accessibility).
it("composer textarea font-size is >= 16px (prevents iOS focus-zoom)", () => {
const { container } = renderChat(mockAgentId);
const textarea = container.querySelector("textarea") as HTMLTextAreaElement;
const fontSizePx = parseFloat(textarea.style.fontSize);
expect(fontSizePx).toBeGreaterThanOrEqual(16);
});
});
// ─── Tabs ─────────────────────────────────────────────────────────────────────
@@ -502,146 +474,3 @@ describe("MobileChat — chat history", () => {
expect(getSpy).toHaveBeenCalledTimes(2);
});
});
// ─── #232 · Attachment render parity with desktop ChatTab ────────────────────
//
// Regression for the CTO-reported mobile bug: MobileChat used to render
// only m.content (no attachment surface), so files sent/received in a
// conversation were invisible on mobile while desktop showed them. The
// fix routes m.attachments through the same AttachmentPreview the
// desktop ChatTab bubble uses.
describe("MobileChat — attachment render parity (#232)", () => {
beforeEach(() => {
mockStoreState.nodes = [onlineNode];
});
it("renders an attachment from a history message via AttachmentPreview", async () => {
const getSpy = vi.spyOn(api, "get");
// useChatHistory reads { messages, reached_end }.
getSpy.mockResolvedValueOnce({
messages: [
{
id: "m-att-1",
role: "agent",
content: "Here is the report",
attachments: [
{
name: "report.csv",
uri: "workspace://out/report.csv",
mimeType: "text/csv",
size: 2048,
},
],
timestamp: new Date().toISOString(),
},
],
reached_end: true,
});
let rr: ReturnType<typeof renderChat>;
await act(async () => {
rr = renderChat(mockAgentId);
});
const { container } = rr!;
// A non-image attachment renders the AttachmentChip download button
// with title="Download <name>" — same component the desktop bubble
// dispatches through AttachmentPreview.
await waitFor(() => {
const chip = container.querySelector('[title="Download report.csv"]');
expect(chip).toBeTruthy();
});
expect(container.textContent ?? "").toContain("report.csv");
});
});
// ─── #231 · Agent Comms (A2A/peer) render parity with desktop ChatTab ────────
//
// Regression for the CTO-reported mobile bug: the Agent Comms sub-tab
// rendered a static placeholder string ("peer-to-peer A2A traffic
// surfaces in the Comms tab") instead of the real feed. The fix mounts
// the same AgentCommsPanel the desktop ChatTab agent-comms tabpanel
// uses, so peer/A2A + delegation activity is visible on mobile.
describe("MobileChat — Agent Comms render parity (#231)", () => {
beforeEach(() => {
mockStoreState.nodes = [onlineNode];
});
it("mounts AgentCommsPanel on the Agent Comms tab (not the old placeholder)", async () => {
const getSpy = vi.spyOn(api, "get");
// 1st GET: useChatHistory (My Chat) on mount.
getSpy.mockResolvedValueOnce({ messages: [], reached_end: true });
// 2nd GET: AgentCommsPanel's activity load when the tab is shown.
// Empty list → panel renders its own empty state, which still
// proves AgentCommsPanel mounted (vs. the removed placeholder).
getSpy.mockResolvedValueOnce([]);
let rr: ReturnType<typeof renderChat>;
await act(async () => {
rr = renderChat(mockAgentId);
});
const { container } = rr!;
const commsTab = Array.from(container.querySelectorAll("button")).find(
(b) => b.textContent?.trim() === "Agent Comms",
);
expect(commsTab).toBeTruthy();
await act(async () => {
commsTab!.click();
});
await waitFor(() => {
const text = container.textContent ?? "";
// The panel's empty state — proves AgentCommsPanel mounted.
expect(text).toContain("No agent-to-agent communications yet.");
});
// The old hard-coded placeholder must be gone.
expect(container.textContent ?? "").not.toContain(
"peer-to-peer A2A traffic surfaces in the Comms tab",
);
// The panel hit its activity endpoint.
expect(getSpy).toHaveBeenCalledWith(
expect.stringContaining(`/workspaces/${mockAgentId}/activity`),
);
});
it("renders a peer message on the Agent Comms tab", async () => {
const getSpy = vi.spyOn(api, "get");
getSpy.mockResolvedValueOnce({ messages: [], reached_end: true });
// a2a_receive from a peer → AgentCommsPanel.toCommMessage maps it
// to an inbound bubble with the request text.
getSpy.mockResolvedValueOnce([
{
id: "act-1",
activity_type: "a2a_receive",
source_id: "peer-ws-uuid",
target_id: mockAgentId,
method: "message/send",
summary: "peer asked something",
request_body: { task: "Please review PR 42" },
response_body: null,
status: "ok",
created_at: new Date().toISOString(),
},
]);
let rr: ReturnType<typeof renderChat>;
await act(async () => {
rr = renderChat(mockAgentId);
});
const { container } = rr!;
const commsTab = Array.from(container.querySelectorAll("button")).find(
(b) => b.textContent?.trim() === "Agent Comms",
);
await act(async () => {
commsTab!.click();
});
await waitFor(() => {
expect(container.textContent ?? "").toContain("Please review PR 42");
});
});
});
+23 -19
View File
@@ -3,24 +3,16 @@ import { useState, useCallback, useRef, useEffect } from 'react';
import type { Secret, SecretGroup } from '@/types/secrets';
import { useSecretsStore } from '@/stores/secrets-store';
import { StatusBadge } from '@/components/ui/StatusBadge';
import { RevealToggle } from '@/components/ui/RevealToggle';
import { KeyValueField } from '@/components/ui/KeyValueField';
import { ValidationHint } from '@/components/ui/ValidationHint';
import { TestConnectionButton } from '@/components/ui/TestConnectionButton';
import { validateSecretValue } from '@/lib/validation/secret-formats';
import { SERVICES } from '@/lib/services';
const AUTO_HIDE_MS = 30_000;
const VALIDATION_DEBOUNCE_MS = 400;
// Secret values are write-only from the browser: the server List endpoint
// "Never exposes values", there is no per-secret decrypt route, and the
// only decrypted path (GET /secrets/values) is bulk + token-gated for
// remote agents. The old eye/RevealToggle was a dead affordance — it
// flipped its own icon but could never reveal anything, which read as
// "this doesn't work" (esp. once clicked → eye-with-slash). We show an
// honest static indicator instead; rotation is via Edit.
const WRITE_ONLY_TITLE =
'Value is write-only and cannot be revealed — use Edit to replace/rotate it';
interface SecretRowProps {
secret: Secret;
workspaceId: string;
@@ -39,12 +31,28 @@ export function SecretRow({ secret, workspaceId }: SecretRowProps) {
const setSecretStatus = useSecretsStore((s) => s.setSecretStatus);
const isEditing = editingKey === secret.name;
const [revealed, setRevealed] = useState(false);
const [editValue, setEditValue] = useState('');
const [validationError, setValidationError] = useState<string | null>(null);
const [isSaving, setIsSaving] = useState(false);
const [saveError, setSaveError] = useState<string | null>(null);
const debounceRef = useRef<ReturnType<typeof setTimeout>>(undefined);
const editBtnRef = useRef<HTMLButtonElement>(null);
const revealTimerRef = useRef<ReturnType<typeof setTimeout>>(undefined);
// Auto-hide revealed value after 30s
useEffect(() => {
if (revealed) {
clearTimeout(revealTimerRef.current);
revealTimerRef.current = setTimeout(() => setRevealed(false), AUTO_HIDE_MS);
return () => clearTimeout(revealTimerRef.current);
}
}, [revealed]);
// Reset revealed state when panel closes (session-only)
useEffect(() => {
return () => setRevealed(false);
}, []);
// Debounced validation
useEffect(() => {
@@ -125,15 +133,11 @@ export function SecretRow({ secret, workspaceId }: SecretRowProps) {
{secret.masked_value}
</span>
<div className="secret-row__actions">
<span
data-testid="write-only-indicator"
className="secret-row__write-only"
role="img"
aria-label={`${secret.name} value is write-only and cannot be revealed; use Edit to replace it`}
title={WRITE_ONLY_TITLE}
>
🔒
</span>
<RevealToggle
revealed={revealed}
onToggle={() => setRevealed((r) => !r)}
label={`Toggle reveal ${secret.name}`}
/>
<StatusBadge status={secret.status} />
<button
type="button"
@@ -16,40 +16,7 @@ interface TokensTabProps {
workspaceId: string;
}
// The settings panel passes the literal sentinel "global" when no canvas
// node is selected. Workspace tokens are inherently per-workspace — there
// is no /workspaces/global/tokens endpoint (querying the uuid column with
// "global" 500s on Postgres). The org-wide equivalent lives in the
// separate "Org API Keys" tab. Mirrors the sentinel-awareness that
// api/secrets.ts already has (workspaceId === 'global' → /settings/secrets).
const GLOBAL_WORKSPACE_ID = 'global';
export function TokensTab({ workspaceId }: TokensTabProps) {
if (workspaceId === GLOBAL_WORKSPACE_ID) {
return (
<div className="p-4 space-y-4">
<div>
<h3 className="text-sm font-semibold text-ink">API Tokens</h3>
<p className="text-[10px] text-ink-mid mt-0.5">
Bearer tokens for authenticating API calls to this workspace.
</p>
</div>
<div className="text-center py-6">
<p className="text-xs text-ink-mid">Select a workspace node first</p>
<p className="text-[10px] text-ink-mid mt-1">
Workspace tokens are scoped to a single workspace. Select a node
on the canvas to manage its tokens, or use the{' '}
<span className="text-accent font-medium">Org API Keys</span> tab
for org-wide API keys.
</p>
</div>
</div>
);
}
return <WorkspaceTokensTab workspaceId={workspaceId} />;
}
function WorkspaceTokensTab({ workspaceId }: TokensTabProps) {
const [tokens, setTokens] = useState<Token[]>([]);
const [loading, setLoading] = useState(true);
const [creating, setCreating] = useState(false);
@@ -138,54 +138,14 @@ describe("SecretRow — display mode", () => {
expect(document.querySelector('[role="row"]')).toBeTruthy();
});
it("has Copy, Edit, Delete buttons", () => {
it("has Reveal, Copy, Edit, Delete buttons", () => {
render(<SecretRow secret={GITHUB_SECRET} workspaceId="ws-1" />);
expect(screen.getByTestId("reveal-toggle")).toBeTruthy();
expect(screen.getByRole("button", { name: /copy/i })).toBeTruthy();
expect(screen.getByRole("button", { name: /edit/i })).toBeTruthy();
expect(screen.getByRole("button", { name: /delete/i })).toBeTruthy();
});
// Regression: the reveal/eye control was a dead affordance. Clicking it
// flipped its own icon (eye → eye-with-slash) but never revealed the value,
// because secret values are write-only from the browser (server List
// "Never exposes values"; there is no per-secret decrypt endpoint and the
// client has no plaintext-fetch function). The honest fix removes the
// toggle and shows a static "write-only / cannot be revealed" indicator.
// See internal tracking issue + internal#210/#211.
it("does NOT render a reveal/eye toggle (values are write-only)", () => {
render(<SecretRow secret={GITHUB_SECRET} workspaceId="ws-1" />);
expect(screen.queryByTestId("reveal-toggle")).toBeNull();
expect(
screen.queryByRole("button", { name: /toggle reveal/i }),
).toBeNull();
});
it("shows a write-only indicator explaining the value cannot be revealed", () => {
render(<SecretRow secret={ANTHROPIC_SECRET} workspaceId="ws-1" />);
const indicator = screen.getByTestId("write-only-indicator");
expect(indicator).toBeTruthy();
// Affordance must be honest: explain it cannot be revealed and that
// Edit is the rotate path. It must not be a clickable button.
const title = indicator.getAttribute("title") ?? "";
expect(title.toLowerCase()).toMatch(/write-only|cannot be revealed/);
expect(indicator.tagName).not.toBe("BUTTON");
});
it("write-only indicator is present for the Anthropic/OAuth-token row too", () => {
// The reported bug singled out CLAUDE_CODE_OAUTH_TOKEN (anthropic group);
// the fix is group-agnostic — every row gets the same honest affordance.
const OAUTH_SECRET = {
name: "CLAUDE_CODE_OAUTH_TOKEN",
masked_value: "••••••••••••••••9d2a",
group: "anthropic" as const,
status: "unverified" as const,
updated_at: "2024-01-04",
};
render(<SecretRow secret={OAUTH_SECRET} workspaceId="ws-1" />);
expect(screen.queryByTestId("reveal-toggle")).toBeNull();
expect(screen.getByTestId("write-only-indicator")).toBeTruthy();
});
it("shows invalid status correctly", () => {
render(<SecretRow secret={CUSTOM_SECRET} workspaceId="ws-1" />);
expect(screen.getByTestId("status-badge").getAttribute("data-status")).toBe("invalid");
@@ -302,35 +302,3 @@ describe("TokensTab — error", () => {
expect(document.querySelector('[role="status"]')).toBeNull();
});
});
// ─── "global" sentinel (no node selected) ────────────────────────────────────
//
// Regression: SettingsPanel passes the literal "global" when no canvas
// node is selected. workspace tokens are per-workspace and there is no
// /workspaces/global/tokens endpoint — calling it 500'd
// ("invalid input syntax for type uuid: global"). The tab must NOT call
// the API in that state and must point the user at the Org API Keys tab.
describe("TokensTab — global sentinel (no node selected)", () => {
beforeEach(() => {
mockApiGet.mockReset();
mockApiPost.mockReset();
mockApiGet.mockRejectedValue(new Error("should not be called"));
});
it("does not call the API and shows a pointer to Org API Keys", async () => {
render(<TokensTab workspaceId="global" />);
await flush();
expect(mockApiGet).not.toHaveBeenCalled();
expect(mockApiPost).not.toHaveBeenCalled();
expect(document.body.textContent).toContain("Select a workspace node");
expect(document.body.textContent).toContain("Org API Keys");
// No error banner, no scary 500 surfacing.
expect(document.querySelector(".text-bad")).toBeNull();
});
it("has no create button in the global state", async () => {
render(<TokensTab workspaceId="global" />);
await flush();
expect(document.body.textContent).not.toContain("New Token");
});
});
-6
View File
@@ -143,12 +143,6 @@ function MyChatPanel({ workspaceId, data }: Props) {
releaseSendGuards();
}
},
// Fan-out of user's own outbound message to all sessions (issue #228).
// Uses appendMessageDeduped so the originating session collapses its
// optimistic copy (same role + content within 3-second window).
onUserMessage: (msg) => {
history.setMessages((prev) => appendMessageDeduped(prev, msg));
},
onActivityLog: (entry) => {
if (!sending) return;
setActivityLog((prev) => appendActivityLine(prev, entry));
+3 -46
View File
@@ -45,54 +45,11 @@ export function FilesTab({ workspaceId, data }: Props) {
if (data && isExternalLikeRuntime(data.runtime)) {
return <NotAvailablePanel runtime={data.runtime} />;
}
return <PlatformOwnedFilesTab workspaceId={workspaceId} runtime={data?.runtime} />;
return <PlatformOwnedFilesTab workspaceId={workspaceId} />;
}
/** Picks the initial root for the FilesTab dropdown based on the
* workspace's runtime. Decision: per-runtime default (Hongming
* 2026-05-15, internal#425 Decisions §2).
*
* - openclaw → `/agent-home` (the agent's identity/state — the
* user-facing interesting files for that runtime live in
* `~/.openclaw/` inside the container, which `/agent-home` maps to
* via the Phase 2b docker-exec backend).
* - everything else (claude-code, hermes, external-like, undefined)
* → `/configs` (the legacy default — managed config that flows
* through the per-runtime indirection in
* workspace-server/internal/handlers/template_files_eic.go).
*
* When the runtime is undefined (legacy callers that don't thread
* `data` through, or a workspace whose runtime field hasn't loaded
* yet) the default is `/configs` — matches today's behaviour, no
* surprise.
*
* Note on `/agent-home` pre-Phase-2b: the backend short-circuits
* with HTTP 501 and the canonical "implementation pending" body.
* The tab renders empty + the error banner explains. This is by
* design — lets us land the canvas UX before the backend ships,
* per the RFC's phased rollout. The 501 is graceful: it doesn't
* poison error toasts or generate "workspace not found" noise.
*
* Adding a new runtime that should default to `/agent-home`: add it
* to the agentHomeDefaultRuntimes set below. Adding a runtime that
* should default to a different root: extend this function. */
const agentHomeDefaultRuntimes = new Set(["openclaw"]);
function defaultRootForRuntime(runtime: string | undefined): string {
if (runtime && agentHomeDefaultRuntimes.has(runtime)) {
return "/agent-home";
}
return "/configs";
}
function PlatformOwnedFilesTab({
workspaceId,
runtime,
}: {
workspaceId: string;
runtime?: string;
}) {
const [root, setRoot] = useState(() => defaultRootForRuntime(runtime));
function PlatformOwnedFilesTab({ workspaceId }: { workspaceId: string }) {
const [root, setRoot] = useState("/configs");
const [selectedFile, setSelectedFile] = useState<string | null>(null);
const [fileContent, setFileContent] = useState("");
const [editContent, setEditContent] = useState("");
@@ -3,22 +3,6 @@
import { useRef } from "react";
import { getIcon } from "./tree";
// secretShapeMarker is the canonical body the workspace-server Files
// API returns when a file's path OR content matched a credential
// regex (internal#425 RFC, Phase 2b — backed by
// workspace-server/internal/secrets.ScanBytes). The marker is a
// fixed prefix so the canvas can detect it without parsing JSON and
// without round-tripping the matched bytes through the editor (which
// would defeat the purpose — clipboard, browser history, log
// surfaces would all see them).
//
// Today (Phase 1 / before 2b ships) the backend returns 501 for the
// only root that uses this path, so the marker is dead code until
// 2b lands. Wiring it in now keeps the canvas + backend contracts
// aligned in one PR rather than a follow-up. The constant is
// importable so a future test can pin the exact string.
export const SECRET_SHAPE_DENIED_MARKER = "<denied: secret-shape>";
interface Props {
selectedFile: string | null;
fileContent: string;
@@ -47,22 +31,6 @@ export function FileEditor({
const editorRef = useRef<HTMLTextAreaElement>(null);
const isDirty = editContent !== fileContent;
// internal#425 Phase 3: detect the secret-shape denial marker and
// render a placeholder instead of the editor. The marker comes
// from workspace-server Phase 2b (secrets.ScanBytes) which refuses
// to surface the file's bytes. We deliberately don't expose
// the matched pattern's Name here — the canvas just shows the
// generic denial. The Files API log surface has the Pattern.Name
// for operators who need to debug a false positive.
const isSecretShapeDenied = fileContent === SECRET_SHAPE_DENIED_MARKER;
// /agent-home is read-only from the canvas (Phase 2b ships read +
// delete; Phase-2b-followup may add write). Edits to /configs are
// unchanged. Until 2b ships, /agent-home returns 501 so this
// read-only gate is also dead code, but wiring it in now keeps
// the UI honest the moment 2b lands without a follow-up canvas PR.
const isReadOnlyRoot = root !== "/configs";
if (!selectedFile) {
return (
<div className="flex-1 flex items-center justify-center">
@@ -107,42 +75,11 @@ export function FileEditor({
{/* Editor area */}
{loadingFile ? (
<div className="p-4 text-xs text-ink-mid">Loading...</div>
) : isSecretShapeDenied ? (
// Files API refused to surface this file's bytes because its
// path or content matched a credential regex
// (workspace-server/internal/secrets, internal#425 Phase 2b).
// We render a placeholder INSTEAD OF the textarea so the
// matched bytes never enter the DOM. Clipboard / view-source
// / element-inspector all see the placeholder, not the
// credential.
<div
role="region"
aria-label="File content denied"
className="flex-1 flex items-center justify-center p-6 bg-surface"
>
<div className="max-w-md text-center space-y-2">
<div className="text-2xl opacity-40">🛡</div>
<p className="text-[11px] font-mono text-warm">
{SECRET_SHAPE_DENIED_MARKER}
</p>
<p className="text-[10px] text-ink-mid leading-relaxed">
The platform refused to surface this file because its
path or content matched a credential-shape pattern.
The bytes never left the workspace container.
</p>
<p className="text-[10px] text-ink-mid leading-relaxed">
If this is a false positive (test fixture, docs example,
or content that happens to share a credential's shape),
rename the file or adjust the content via the workspace
terminal so the regex no longer matches, then refresh.
</p>
</div>
</div>
) : (
<textarea
ref={editorRef}
value={editContent}
readOnly={isReadOnlyRoot}
readOnly={root !== "/configs"}
onChange={(e) => setEditContent(e.target.value)}
onKeyDown={(e) => {
if ((e.metaKey || e.ctrlKey) && e.key === "s") {
@@ -38,15 +38,6 @@ export function FilesToolbar({
<option value="/home">/home</option>
<option value="/workspace">/workspace</option>
<option value="/plugins">/plugins</option>
{/* internal#425 Phase 1+3: container-internal $HOME root.
Backend lands the docker-exec dispatch in Phase 2b. Until
then the stub returns 501 with a canonical
"implementation pending" message — the dropdown renders
the option so the canvas affordance is design-frozen
even before the backend ships.
Runtime-default selection logic in FilesTab.tsx picks
this as the initial value for openclaw workspaces. */}
<option value="/agent-home">/agent-home</option>
</select>
<span className="text-[10px] text-ink-mid">{fileCount} files</span>
</div>
@@ -1,181 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for the /agent-home root selector + per-runtime default-root
* + secret-shape denial placeholder (internal#425 Phase 3).
*
* Separate file so the diff is reviewable as a unit and the existing
* FilesToolbar / FileEditor / FilesTab tests don't have to grow
* agent-home-specific cases. Once Phase 2b lands, the read-only +
* 501-stub assertions here can be tightened (or moved into the main
* test file as the agent-home root becomes a first-class affordance).
*/
import React from "react";
import { render, screen, cleanup } from "@testing-library/react";
import { afterEach, describe, expect, it, vi } from "vitest";
import { FilesToolbar } from "../FilesToolbar";
import {
FileEditor,
SECRET_SHAPE_DENIED_MARKER,
} from "../FileEditor";
afterEach(cleanup);
describe("internal#425 Phase 3 — /agent-home root selector", () => {
it("dropdown includes /agent-home as an option", () => {
// Pins the affordance is in the DOM even pre-Phase-2b — the
// canvas design freezes today, the backend lands the dispatch
// later. Without this, a future refactor that drops the option
// would silently regress the RFC's Phase 1 contract (canvas
// visibility) without breaking any other test.
render(
<FilesToolbar
root="/configs"
setRoot={vi.fn()}
fileCount={0}
onNewFile={vi.fn()}
onUpload={vi.fn()}
onDownloadAll={vi.fn()}
onClearAll={vi.fn()}
onRefresh={vi.fn()}
/>,
);
const select = screen.getByRole("combobox", {
name: /file root directory/i,
}) as HTMLSelectElement;
const values = Array.from(select.options).map((o) => o.value);
expect(values).toContain("/agent-home");
});
it("dropdown shows /agent-home as the SELECTED root when prop is /agent-home", () => {
render(
<FilesToolbar
root="/agent-home"
setRoot={vi.fn()}
fileCount={0}
onNewFile={vi.fn()}
onUpload={vi.fn()}
onDownloadAll={vi.fn()}
onClearAll={vi.fn()}
onRefresh={vi.fn()}
/>,
);
const select = screen.getByRole("combobox", {
name: /file root directory/i,
}) as HTMLSelectElement;
expect(select.value).toBe("/agent-home");
});
});
describe("internal#425 Phase 3 — secret-shape denial placeholder", () => {
// Files API Phase 2b returns SECRET_SHAPE_DENIED_MARKER as the file
// body when the file's path or content matched a credential regex.
// The editor MUST render the marker as a placeholder, not pump it
// through the textarea — that would put the marker (and any future
// matched bytes if the backend contract changes) into the DOM
// value, clipboard, and inspector.
it("renders the denial placeholder INSTEAD of the textarea when fileContent is the marker", () => {
render(
<FileEditor
selectedFile="agent/.openclaw/secrets.env"
fileContent={SECRET_SHAPE_DENIED_MARKER}
editContent={SECRET_SHAPE_DENIED_MARKER}
setEditContent={vi.fn()}
loadingFile={false}
saving={false}
success={null}
root="/agent-home"
onSave={vi.fn()}
onDownload={vi.fn()}
/>,
);
// Placeholder region present
expect(
screen.getByRole("region", { name: /file content denied/i }),
).toBeTruthy();
// Marker text visible (so a debugging operator sees the canonical
// contract string without having to dig into the source).
expect(screen.getByText(SECRET_SHAPE_DENIED_MARKER)).toBeTruthy();
// Critically: NO textarea — the bytes never reach a controlled
// input. A regression that re-introduces the textarea path would
// make the matched marker (and any future content) selectable +
// copyable.
expect(screen.queryByRole("textbox")).toBeNull();
});
it("renders the textarea normally when fileContent is regular content", () => {
render(
<FileEditor
selectedFile="config.yaml"
fileContent="name: openclaw\n"
editContent="name: openclaw\n"
setEditContent={vi.fn()}
loadingFile={false}
saving={false}
success={null}
root="/configs"
onSave={vi.fn()}
onDownload={vi.fn()}
/>,
);
expect(screen.getByRole("textbox")).toBeTruthy();
expect(screen.queryByRole("region", { name: /file content denied/i }))
.toBeNull();
});
it("/agent-home renders textarea READ-ONLY for non-denied content", () => {
// Phase 2b ships read + delete on /agent-home; write semantics
// are decided later. Until then, the canvas presents the editor
// as read-only so a user can't type into a buffer that the
// backend will refuse to PUT. Without this gate, the user would
// edit, hit Save, get a 501, and lose their context for why.
render(
<FileEditor
selectedFile=".openclaw/agent-card.json"
fileContent='{"name":"openclaw"}'
editContent='{"name":"openclaw"}'
setEditContent={vi.fn()}
loadingFile={false}
saving={false}
success={null}
root="/agent-home"
onSave={vi.fn()}
onDownload={vi.fn()}
/>,
);
const textarea = screen.getByRole("textbox") as HTMLTextAreaElement;
expect(textarea.readOnly).toBe(true);
});
it("/configs renders textarea WRITABLE (regression guard for the read-only gate)", () => {
render(
<FileEditor
selectedFile="config.yaml"
fileContent="name: x\n"
editContent="name: x\n"
setEditContent={vi.fn()}
loadingFile={false}
saving={false}
success={null}
root="/configs"
onSave={vi.fn()}
onDownload={vi.fn()}
/>,
);
const textarea = screen.getByRole("textbox") as HTMLTextAreaElement;
expect(textarea.readOnly).toBe(false);
});
});
describe("internal#425 Phase 3 — marker constant is the canonical string", () => {
// The marker string is part of the canvas <-> workspace-server
// contract. The workspace-server emits this exact body; the canvas
// detects it by exact-equality. A typo on either side would
// silently break detection — the canvas would render the literal
// string in the textarea instead of the placeholder. Pin the
// contract value here.
it("matches the contract value '<denied: secret-shape>'", () => {
expect(SECRET_SHAPE_DENIED_MARKER).toBe("<denied: secret-shape>");
});
});
@@ -64,66 +64,4 @@ describe("inferA2AErrorHint", () => {
expect(hint).toMatch(/Claude Code SDK/);
expect(hint).not.toMatch(/proxy timeout/);
});
// ---- P1 #348: poll-mode timeout-class detection ----
it("routes poll-mode budget exhaustion to its specific actionable hint", () => {
// a2a_tools_delegation.py emits this exact shape after the 600s
// budget. The user must NOT be told to restart — the work is
// still in flight on the platform side.
const hint = inferA2AErrorHint(
"polling timeout after 600s (delegation_id=abc, last_status=processing); the platform is still working on it — call check_task_status('abc') to retrieve later",
);
expect(hint).toMatch(/Do NOT restart/);
expect(hint).toMatch(/check_task_status/);
});
it("matches the check_task_status hint clue even without the 'polling timeout' phrase", () => {
const hint = inferA2AErrorHint(
"platform busy — call check_task_status('xyz')",
);
expect(hint).toMatch(/check_task_status/);
});
it("poll-mode hint wins over the generic timeout bucket", () => {
// The string contains both "polling timeout after" and "timeout"
// — the more-specific poll-mode hint must win so users don't get
// the generic "restart" advice for a still-in-flight task.
const hint = inferA2AErrorHint("polling timeout after 600s ...");
expect(hint).toMatch(/Do NOT restart/);
expect(hint).not.toMatch(/restart the workspace if this repeats/);
});
// ---- P1 #348: codex-aware specialization ----
it("specialises the empty-detail hint for codex callees", () => {
// Per feedback_surface_actionable_failure_reason_to_user: opaque
// restart prompts are the anti-pattern. With peerKind=codex the
// hint explicitly de-recommends restart.
const hint = inferA2AErrorHint("", { peerKind: "codex" });
expect(hint).toMatch(/codex/);
expect(hint).toMatch(/check its Activity tab/i);
expect(hint).not.toMatch(/A workspace restart is the safe first move/);
});
it("specialises generic-timeout hint for codex callees", () => {
const hint = inferA2AErrorHint("ReadTimeout", { peerKind: "codex" });
expect(hint).toMatch(/codex/);
expect(hint).toMatch(/600s/);
});
it("falls back to the non-codex generic timeout hint when no peerKind given", () => {
const hint = inferA2AErrorHint("ReadTimeout");
expect(hint).toMatch(/proxy timeout/);
expect(hint).not.toMatch(/600s sync-proxy/);
});
it("preserves existing empty-detail wording when no peer context provided", () => {
const hint = inferA2AErrorHint("");
expect(hint).toMatch(/no error detail/);
// Updated wording: must NOT be the bare "restart is the safe
// first move" line — that violates surface-actionable-reason.
expect(hint).not.toMatch(/safe first move/);
expect(hint).toMatch(/Activity tab/);
});
});
@@ -248,88 +248,6 @@ describe("extractResponseText", () => {
});
});
describe("extractAgentText", () => {
it("extracts text from top-level parts", () => {
const task = {
parts: [{ kind: "text", text: "Agent said hello" }],
};
expect(extractAgentText(task)).toBe("Agent said hello");
});
it("extracts from artifacts[0].parts when top-level parts absent", () => {
const task = {
artifacts: [
{ parts: [{ kind: "text", text: "From artifact block" }] },
],
};
expect(extractAgentText(task)).toBe("From artifact block");
});
it("extracts from status.message.parts as fallback", () => {
const task = {
status: {
message: { parts: [{ kind: "text", text: "Status text" }] },
},
};
expect(extractAgentText(task)).toBe("Status text");
});
it("prefers top-level parts over artifacts", () => {
const task = {
parts: [{ kind: "text", text: "top-level wins" }],
artifacts: [
{ parts: [{ kind: "text", text: "artifact text" }] },
],
};
expect(extractAgentText(task)).toBe("top-level wins");
});
it("prefers top-level parts over status.message", () => {
const task = {
parts: [{ kind: "text", text: "parts wins" }],
status: {
message: { parts: [{ kind: "text", text: "status text" }] },
},
};
expect(extractAgentText(task)).toBe("parts wins");
});
it("returns string identity when task itself is a string", () => {
expect(extractAgentText("plain string task" as unknown as Record<string, unknown>)).toBe(
"plain string task",
);
});
it("returns fallback when task is an empty object", () => {
expect(extractAgentText({})).toBe("(Could not extract response text)");
});
it("returns fallback when task has no extractable text", () => {
expect(
extractAgentText({ status: "running", other: "fields" }),
).toBe("(Could not extract response text)");
});
it("tolerates malformed nested shapes without throwing", () => {
const task = {
parts: null,
artifacts: "not an array",
status: { message: 42 },
};
expect(extractAgentText(task)).toBe("(Could not extract response text)");
});
it("joins multiple text parts with newline", () => {
const task = {
parts: [
{ kind: "text", text: "Line one" },
{ kind: "text", text: "Line two" },
],
};
expect(extractAgentText(task)).toBe("Line one\nLine two");
});
});
describe("extractTextsFromParts", () => {
it("extracts text parts with kind=text", () => {
const parts = [
@@ -1,102 +0,0 @@
import { describe, it, expect, beforeEach } from "vitest";
import { useCanvasStore } from "@/store/canvas";
import { resolveWorkspaceName } from "../hooks/resolveWorkspaceName";
beforeEach(() => {
// Reset store to a clean slate between tests so node lookup is deterministic.
useCanvasStore.setState({ nodes: [] });
});
describe("resolveWorkspaceName", () => {
it("returns the workspace name when a node with that ID exists", () => {
useCanvasStore.setState({
nodes: [
{
id: "ws-alpha-001",
type: "workspace",
data: { name: "Alpha Agent" },
position: { x: 0, y: 0 },
},
],
});
expect(resolveWorkspaceName("ws-alpha-001")).toBe("Alpha Agent");
});
it("falls back to the first 8 chars of the ID when no matching node exists", () => {
expect(resolveWorkspaceName("ws-zzz-not-found")).toBe("ws-zzz-n");
});
it("falls back to the first 8 chars when the node exists but has no name", () => {
useCanvasStore.setState({
nodes: [
{
id: "ws-no-name",
type: "workspace",
// data.name is deliberately absent
data: {},
position: { x: 0, y: 0 },
},
],
});
expect(resolveWorkspaceName("ws-no-name")).toBe("ws-no-na");
});
it("returns the first 8 chars for a very short ID", () => {
expect(resolveWorkspaceName("ab")).toBe("ab");
});
it("returns the first 8 chars when the ID is exactly 8 characters", () => {
// slice(0,8) of an 8-char string is the full string
const id = "12345678";
expect(resolveWorkspaceName(id)).toBe(id);
});
it("picks the right node when multiple workspaces share a prefix", () => {
useCanvasStore.setState({
nodes: [
{
id: "00000000-0000-0000-0000-000000000001",
type: "workspace",
data: { name: "Backend Agent" },
position: { x: 0, y: 0 },
},
{
id: "00000000-0000-0000-0000-000000000002",
type: "workspace",
data: { name: "Frontend Agent" },
position: { x: 100, y: 0 },
},
],
});
expect(resolveWorkspaceName("00000000-0000-0000-0000-000000000002")).toBe(
"Frontend Agent"
);
expect(resolveWorkspaceName("00000000-0000-0000-0000-000000000001")).toBe(
"Backend Agent"
);
});
it("does not mutate store state between calls", () => {
useCanvasStore.setState({
nodes: [
{
id: "stable-id",
type: "workspace",
data: { name: "Stable Workspace" },
position: { x: 0, y: 0 },
},
],
});
resolveWorkspaceName("stable-id");
resolveWorkspaceName("unknown-id");
// Store nodes must be unchanged — resolveWorkspaceName is read-only.
const nodes = useCanvasStore.getState().nodes;
expect(nodes).toHaveLength(1);
expect((nodes[0] as { id: string }).id).toBe("stable-id");
});
});
@@ -1,216 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for USER_MESSAGE event handling in useChatSocket.
*
* Covers issue #228: a canvas user's own outbound message was not fanned
* out to other sessions — the originating session inserted it optimistically,
* but other sessions only saw it after a manual refresh.
*
* The server now broadcasts USER_MESSAGE on canvas message/send. This test
* verifies the canvas side consumes and forwards it to onUserMessage.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { renderHook, act } from "@testing-library/react";
import React from "react";
import { useChatSocket, type UseChatSocketCallbacks } from "../hooks/useChatSocket";
import { emitSocketEvent, _resetSocketEventListenersForTests } from "@/store/socket-events";
import type { WSMessage } from "@/store/socket";
// Silence React StrictMode double-invoke noise — we care about final state.
const WARN = console.warn;
beforeEach(() => { console.warn = () => {}; });
afterEach(() => { console.warn = WARN; });
beforeEach(() => {
_resetSocketEventListenersForTests();
vi.useFakeTimers();
vi.setSystemTime(new Date("2026-05-18T10:00:00Z"));
});
afterEach(() => {
vi.useRealTimers();
_resetSocketEventListenersForTests();
});
const WORKSPACE_ID = "00000000-0000-0000-0000-000000000001";
function makeUserMessageEvent(
workspaceId: string,
overrides: Partial<{
message: string;
attachments: Array<{ uri: string; name: string; mimeType?: string; size?: number }>;
messageId: string;
}> = {},
): WSMessage {
const { message = "Hello, agent!", attachments, messageId } = overrides;
const payload: Record<string, unknown> = { message };
if (attachments) payload.attachments = attachments;
if (messageId) payload.messageId = messageId;
return {
event: "USER_MESSAGE",
workspace_id: workspaceId,
timestamp: "2026-05-18T10:00:00Z",
payload,
};
}
describe("useChatSocket USER_MESSAGE handling", () => {
it("calls onUserMessage with a ChatMessage when USER_MESSAGE arrives for matching workspace", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
const { result } = renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(makeUserMessageEvent(WORKSPACE_ID, { message: "Hello!" }));
});
expect(onUserMessage).toHaveBeenCalledTimes(1);
const msg = onUserMessage.mock.calls[0][0];
expect(msg.role).toBe("user");
expect(msg.content).toBe("Hello!");
expect(typeof msg.id).toBe("string");
expect(msg.timestamp).toBe("2026-05-18T10:00:00.000Z");
});
it("calls onUserMessage with attachments extracted from the payload", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(
makeUserMessageEvent(WORKSPACE_ID, {
message: "Here is the file",
attachments: [
{ uri: "workspace:/uploads/report.pdf", name: "report.pdf", mimeType: "application/pdf", size: 4096 },
],
}),
);
});
expect(onUserMessage).toHaveBeenCalledTimes(1);
const msg = onUserMessage.mock.calls[0][0];
expect(msg.role).toBe("user");
expect(msg.content).toBe("Here is the file");
expect(msg.attachments).toHaveLength(1);
expect(msg.attachments![0].uri).toBe("workspace:/uploads/report.pdf");
expect(msg.attachments![0].name).toBe("report.pdf");
expect(msg.attachments![0].mimeType).toBe("application/pdf");
expect(msg.attachments![0].size).toBe(4096);
});
it("does NOT call onUserMessage when workspace_id does not match", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(
makeUserMessageEvent("00000000-0000-0000-0000-000000000099", { message: "wrong workspace" }),
);
});
expect(onUserMessage).not.toHaveBeenCalled();
});
it("does NOT call onUserMessage when message is empty and no attachments", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(makeUserMessageEvent(WORKSPACE_ID, { message: "" }));
});
expect(onUserMessage).not.toHaveBeenCalled();
});
it("ignores USER_MESSAGE when onUserMessage callback is undefined", () => {
const callbacks: UseChatSocketCallbacks = { onAgentMessage: vi.fn() };
// Should not throw — undefined callback is guarded
expect(() =>
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks)),
).not.toThrow();
const { result } = renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(makeUserMessageEvent(WORKSPACE_ID, { message: "Hello" }));
});
// No error thrown even without onUserMessage
});
it("other event types do NOT trigger onUserMessage", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent({
event: "A2A_RESPONSE",
workspace_id: WORKSPACE_ID,
timestamp: "2026-05-18T10:00:00Z",
payload: {},
});
});
expect(onUserMessage).not.toHaveBeenCalled();
});
it("re-fires onUserMessage for each USER_MESSAGE event received", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(makeUserMessageEvent(WORKSPACE_ID, { message: "First message" }));
});
act(() => {
emitSocketEvent(makeUserMessageEvent(WORKSPACE_ID, { message: "Second message" }));
});
expect(onUserMessage).toHaveBeenCalledTimes(2);
expect(onUserMessage.mock.calls[0][0].content).toBe("First message");
expect(onUserMessage.mock.calls[1][0].content).toBe("Second message");
});
it("handles USER_MESSAGE with messageId in payload", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(
makeUserMessageEvent(WORKSPACE_ID, { message: "With ID", messageId: "msg-id-abc" }),
);
});
expect(onUserMessage).toHaveBeenCalledTimes(1);
const msg = onUserMessage.mock.calls[0][0];
expect(msg.content).toBe("With ID");
});
it("filters out attachments with empty uri or name (defence-in-depth)", () => {
const onUserMessage = vi.fn();
const callbacks: UseChatSocketCallbacks = { onUserMessage };
renderHook(() => useChatSocket(WORKSPACE_ID, callbacks));
act(() => {
emitSocketEvent(
makeUserMessageEvent(WORKSPACE_ID, {
message: "Mixed attachments",
attachments: [
{ uri: "workspace:/uploads/good.pdf", name: "good.pdf" },
{ uri: "", name: "bad.pdf" }, // empty uri — dropped
{ uri: "workspace:/uploads/also-bad", name: "" }, // empty name — dropped
{ uri: "workspace:/uploads/also-good.txt", name: "also-good.txt" },
],
}),
);
});
expect(onUserMessage).toHaveBeenCalledTimes(1);
const msg = onUserMessage.mock.calls[0][0];
expect(msg.attachments).toHaveLength(2);
expect(msg.attachments![0].name).toBe("good.pdf");
expect(msg.attachments![1].name).toBe("also-good.txt");
});
});
@@ -10,37 +10,10 @@
* had already drifted (Activity tab gained `not found`/`offline`
* cases AgentCommsPanel never picked up) — this module is the merged
* superset and the only place hint text should change.
*
* Optional `context.peerKind` lets callers signal "the callee was a
* codex-runtime task" so the timeout-class hints can be more specific
* about expected long completion times (PM-coordinating-Researcher is
* the canonical case where the 600s sync-proxy budget is too tight).
*/
export interface A2AErrorContext {
/** Runtime of the callee, when known. e.g. "codex", "claude-code". */
peerKind?: string;
}
export function inferA2AErrorHint(
detail: string,
context?: A2AErrorContext,
): string {
export function inferA2AErrorHint(detail: string): string {
const t = detail.toLowerCase();
// Poll-mode budget exhaustion (a2a_tools_delegation.py emits
// "polling timeout after Ns ... call check_task_status(...) to
// retrieve later"). This is NOT a delivery failure — the work is
// still in flight on the platform side. Route to a specific hint
// BEFORE the generic timeout bucket so the user gets the actionable
// "wait + check_task_status" guidance instead of the misleading
// "restart the workspace" anti-pattern.
if (
t.includes("polling timeout after") ||
t.includes("call check_task_status")
) {
return "The 600s sync-polling budget expired but the platform is still working on the delegation. Do NOT restart — the work isn't lost. Wait, then call check_task_status with the delegation_id to retrieve the result. If the callee is a long-running codex task, this is expected.";
}
// "control request timeout" is the specific Claude Code SDK init
// wedge symptom. Pattern on the full phrase, not bare "initialize"
// — a user task containing "failed to initialize database" would
@@ -54,13 +27,6 @@ export function inferA2AErrorHint(
t.includes("deadline exceeded") ||
t.includes("timeout")
) {
// For codex callees, a 600s sync-proxy timeout is the EXPECTED
// shape when the task is genuinely long-running. Calling out the
// workspace-restart anti-pattern explicitly per
// `feedback_surface_actionable_failure_reason_to_user`.
if ((context?.peerKind || "").toLowerCase() === "codex") {
return "The codex remote agent didn't respond within the 600s sync-proxy timeout. Codex tasks can legitimately run longer than this — check the callee's Activity tab; the work may still be progressing. Restart only if the container is genuinely stuck (no activity for several minutes).";
}
return "The remote agent didn't respond within the proxy timeout. It may be busy with a long task, or the runtime is stuck — restart the workspace if this repeats.";
}
if (
@@ -82,16 +48,7 @@ export function inferA2AErrorHint(
return "The remote workspace can't be reached — it may be stopped, removed, or outside the access control list. Verify the peer is online before retrying.";
}
if (detail === "") {
// Per `feedback_surface_actionable_failure_reason_to_user`: a bare
// "restart the workspace" prompt is the anti-pattern when the
// underlying failure was a silent timeout against a long-running
// remote (codex Researcher being coordinated by PM is the
// canonical case). If the caller knows the peer is codex, route
// to the more specific hint that explicitly de-recommends restart.
if ((context?.peerKind || "").toLowerCase() === "codex") {
return "The codex remote agent returned no error detail — most often the 600s sync-proxy budget expired before the task finished. The work may still be progressing on the callee side; check its Activity tab before restarting.";
}
return "The remote agent returned no error detail (the underlying httpx exception had an empty message — typically a connection-reset or silent timeout). Check the callee's Activity tab to see if work is still in flight before restarting.";
return "The remote agent returned no error detail (the underlying httpx exception had an empty message — typically a connection-reset or silent timeout). A workspace restart is the safe first move.";
}
return "The remote agent reported a delivery failure. Check the workspace logs or try restarting.";
}
@@ -1,209 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for useChatSend — the canvas user→agent send hook.
*
* Behavioural focus: the poll-mode ("queued") path. When the target
* workspace is an external / MCP-registered agent (delivery_mode=poll,
* e.g. an operator laptop running the molecule MCP channel), the
* platform's POST /workspaces/:id/a2a returns a synthetic
* {status:"queued", delivery_mode:"poll"} envelope IMMEDIATELY with no
* reply — the real reply arrives later over the AGENT_MESSAGE
* WebSocket push.
*
* Pre-fix the hook treated that synthetic envelope as a terminal
* response and called releaseSendGuards() → `sending` went false the
* instant the POST returned → the "agent is working" indicator
* vanished and the external turn looked dead. This suite pins the
* fixed contract:
*
* - a real reply still clears `sending` (regression guard)
* - a poll "queued" envelope KEEPS `sending` true (no terminal
* clear) so the existing thinking indicator persists
* - the eventual reply path (releaseSendGuards, the same call the
* AGENT_MESSAGE WS push makes via useChatSocket) clears it
* - an offline poll agent that never replies eventually surfaces an
* honest error instead of an infinite spinner
*
* Plus pure-function coverage for the poll-envelope detector.
*
* Root cause: workspace-server a2a_proxy.go:402 poll-mode
* short-circuit returns {status:"queued"} synchronously.
*/
import {
describe,
it,
expect,
vi,
beforeEach,
afterEach,
type Mock,
} from "vitest";
import { act, renderHook, cleanup } from "@testing-library/react";
const { mockApiPost } = vi.hoisted(() => ({ mockApiPost: vi.fn() }));
vi.mock("@/lib/api", () => ({
api: { post: mockApiPost },
}));
vi.mock("../uploads", () => ({
uploadChatFiles: vi.fn(),
}));
// Import AFTER mocks.
import {
useChatSend,
isPollQueuedResponse,
extractReplyText,
POLL_QUEUED_REPLY_TIMEOUT_MS,
} from "../useChatSend";
const flush = () => act(async () => { await Promise.resolve(); });
describe("isPollQueuedResponse", () => {
it("is true only for the synthetic poll-mode queued envelope", () => {
expect(isPollQueuedResponse({ status: "queued", delivery_mode: "poll" })).toBe(true);
});
it("is false for a real agent reply", () => {
expect(
isPollQueuedResponse({ result: { parts: [{ kind: "text", text: "hi" }] } }),
).toBe(false);
});
it("is false for null / undefined / partial shapes", () => {
expect(isPollQueuedResponse(null)).toBe(false);
expect(isPollQueuedResponse(undefined)).toBe(false);
// status=queued without delivery_mode=poll is NOT the poll envelope
// — don't accidentally swallow a real reply that happens to carry
// an unrelated status field.
expect(isPollQueuedResponse({ status: "queued" })).toBe(false);
expect(isPollQueuedResponse({ delivery_mode: "poll" })).toBe(false);
});
});
describe("extractReplyText (regression guard — unchanged by fix)", () => {
it("collects text parts from result", () => {
expect(
extractReplyText({ result: { parts: [{ kind: "text", text: "hello" }] } }),
).toBe("hello");
});
it("returns empty for the poll-queued envelope", () => {
expect(extractReplyText({ status: "queued", delivery_mode: "poll" })).toBe("");
});
});
describe("useChatSend — poll-mode in-progress state", () => {
beforeEach(() => {
vi.useFakeTimers();
mockApiPost.mockReset();
});
afterEach(() => {
vi.runOnlyPendingTimers();
vi.useRealTimers();
cleanup();
});
const setup = () => {
const onUserMessage = vi.fn();
const onAgentMessage = vi.fn();
const { result } = renderHook(() =>
useChatSend("ws-ext-1", {
getHistoryMessages: () => [],
onUserMessage,
onAgentMessage,
}),
);
return { result, onUserMessage, onAgentMessage };
};
it("a real reply clears `sending` (regression guard)", async () => {
mockApiPost.mockResolvedValue({
result: { parts: [{ kind: "text", text: "real reply" }] },
});
const { result, onAgentMessage } = setup();
await act(async () => {
void result.current.sendMessage("hi");
});
await flush();
expect(onAgentMessage).toHaveBeenCalledTimes(1);
expect(result.current.sending).toBe(false);
});
it("keeps `sending` true on a poll 'queued' envelope (no terminal clear)", async () => {
mockApiPost.mockResolvedValue({ status: "queued", delivery_mode: "poll" });
const { result, onAgentMessage } = setup();
await act(async () => {
void result.current.sendMessage("hi external agent");
});
await flush();
// The POST resolved, but it was only a queued ack — the indicator
// must stay up and no agent bubble should be rendered yet.
expect(result.current.sending).toBe(true);
expect(onAgentMessage).not.toHaveBeenCalled();
expect(result.current.error).toBeNull();
});
it("releaseSendGuards (the AGENT_MESSAGE-push path) clears the poll in-progress state", async () => {
mockApiPost.mockResolvedValue({ status: "queued", delivery_mode: "poll" });
const { result } = setup();
await act(async () => {
void result.current.sendMessage("hi");
});
await flush();
expect(result.current.sending).toBe(true);
// Simulate the terminal AGENT_MESSAGE WebSocket push arriving:
// useChatSocket's onAgentMessage / onSendComplete call
// releaseSendGuards. That must clear the in-progress state AND the
// safety timer (asserted by the next test).
act(() => {
result.current.releaseSendGuards();
});
expect(result.current.sending).toBe(false);
});
it("surfaces an honest error if a poll agent never replies (safety timeout)", async () => {
mockApiPost.mockResolvedValue({ status: "queued", delivery_mode: "poll" });
const { result } = setup();
await act(async () => {
void result.current.sendMessage("hi");
});
await flush();
expect(result.current.sending).toBe(true);
act(() => {
vi.advanceTimersByTime(POLL_QUEUED_REPLY_TIMEOUT_MS + 1000);
});
expect(result.current.sending).toBe(false);
expect(result.current.error).toMatch(/queued/i);
});
it("does NOT fire the safety error when the reply arrives before timeout", async () => {
mockApiPost.mockResolvedValue({ status: "queued", delivery_mode: "poll" });
const { result } = setup();
await act(async () => {
void result.current.sendMessage("hi");
});
await flush();
// Reply arrives (releaseSendGuards) well before the timeout.
act(() => {
result.current.releaseSendGuards();
});
act(() => {
vi.advanceTimersByTime(POLL_QUEUED_REPLY_TIMEOUT_MS + 1000);
});
expect(result.current.error).toBeNull();
expect(result.current.sending).toBe(false);
});
});
@@ -2,7 +2,6 @@
import { useCallback, useEffect, useRef, useState } from "react";
import { api } from "@/lib/api";
import { subscribeSocketResume } from "@/store/socket-events";
import { type ChatMessage, appendMessageDeduped as appendMessageDedupedFn } from "../types";
const INITIAL_HISTORY_LIMIT = 10;
@@ -83,23 +82,6 @@ export function useChatHistory(
loadInitial();
}, [loadInitial]);
// Back-fill on socket resume. The singleton WS emits this when it
// recovers from a down period (ordinary drop, or — the case this
// fixes — a mobile-browser background-suspend that silently killed
// the socket while the page was frozen). While the socket was dead
// every AGENT_MESSAGE / A2A_RESPONSE for this thread was missed, and
// the store's rehydrate() only re-pulls /workspaces status, not chat.
// Re-running loadInitial() re-fetches the latest persisted history —
// exactly what a navigate-away-and-back (remount) does today, but
// without the user having to do it. Shared by desktop ChatTab and
// MobileChat (both consume this hook), so the realtime path stays
// unified across surfaces rather than forked for mobile.
useEffect(() => {
return subscribeSocketResume(() => {
loadInitial();
});
}, [loadInitial]);
const loadOlder = useCallback(async () => {
if (inflightRef.current || !hasMoreRef.current) return;
const oldest = oldestMessageRef.current;
@@ -1,6 +1,6 @@
"use client";
import { useCallback, useEffect, useRef, useState } from "react";
import { useCallback, useRef, useState } from "react";
import { api } from "@/lib/api";
import { uploadChatFiles } from "../uploads";
import { createMessage, type ChatMessage, type ChatAttachment } from "../types";
@@ -22,42 +22,8 @@ interface A2AResponse {
parts?: A2APart[];
artifacts?: Array<{ parts: A2APart[] }>;
};
/** Synthetic poll-mode envelope. The platform returns this
* immediately (HTTP 200) when the target workspace is registered
* delivery_mode=poll — an external / MCP-registered agent with no
* public URL (e.g. an operator's laptop running the molecule MCP
* channel). The request has only been QUEUED into activity_logs;
* the agent will pick it up on its next poll and the real reply
* arrives asynchronously over the AGENT_MESSAGE WebSocket push
* (consumed by useChatSocket). See workspace-server
* a2a_proxy.go:402 (poll-mode short-circuit) and
* a2a_proxy_helpers.go:516 (logA2AReceiveQueued). */
status?: string;
delivery_mode?: string;
}
/** True when `resp` is the platform's synthetic poll-mode "queued"
* envelope rather than a real agent reply. For these the send is
* acknowledged-but-pending: the user's message landed and the agent
* is working, but there is no reply yet — the terminal AGENT_MESSAGE
* push will arrive later over the WebSocket. Treating this as a
* terminal response (the pre-fix behaviour) cleared the "agent is
* working" indicator the instant the POST returned, so an external
* workspace turn looked dead even though work had not started. */
export function isPollQueuedResponse(resp: A2AResponse | null | undefined): boolean {
return !!resp && resp.status === "queued" && resp.delivery_mode === "poll";
}
/** Hard ceiling on how long the "agent is working" indicator stays up
* for a poll-mode turn with no reply. The terminal AGENT_MESSAGE push
* normally clears it well before this. The cap exists so a poll-mode
* workspace that is offline / never consumes its queue doesn't pin a
* spinner forever — at which point we surface an honest, actionable
* error instead of an opaque dead spinner. Generous because poll
* agents (an operator laptop) can legitimately take minutes to wake,
* poll, and respond; the goal is "eventually honest", not fail-fast. */
export const POLL_QUEUED_REPLY_TIMEOUT_MS = 15 * 60 * 1000;
export function extractReplyText(resp: A2AResponse): string {
const collect = (parts: A2APart[] | undefined): string => {
if (!parts) return "";
@@ -93,29 +59,14 @@ export function useChatSend(workspaceId: string, options: UseChatSendOptions) {
const sendInFlightRef = useRef(false);
const sendingFromAPIRef = useRef(false);
const sendTokenRef = useRef(0);
// Safety-net timer armed only for poll-mode ("queued") turns: the
// POST returns immediately with no reply, so the normal
// POST-resolves-→-clear-spinner path can't drive the indicator. The
// terminal AGENT_MESSAGE WebSocket push clears it via
// releaseSendGuards (which also clears this timer); the timer is the
// backstop for an offline poll agent that never consumes its queue.
const pollTimeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);
const optionsRef = useRef(options);
optionsRef.current = options;
const clearPollTimeout = useCallback(() => {
if (pollTimeoutRef.current !== null) {
clearTimeout(pollTimeoutRef.current);
pollTimeoutRef.current = null;
}
}, []);
const releaseSendGuards = useCallback(() => {
clearPollTimeout();
setSending(false);
sendingFromAPIRef.current = false;
sendInFlightRef.current = false;
}, [clearPollTimeout]);
}, []);
const clearError = useCallback(() => setError(null), []);
@@ -195,33 +146,6 @@ export function useChatSend(workspaceId: string, options: UseChatSendOptions) {
sendInFlightRef.current = false;
return;
}
// Poll-mode ("queued") turn: the message landed and the
// external/MCP agent will pick it up on its next poll, but
// there is NO reply in this response. Pre-fix this fell
// through to releaseSendGuards() below and the "agent is
// working" indicator vanished the instant the POST returned —
// an external-workspace turn looked dead even though work had
// not started. Instead, keep `sending` true so the existing
// thinking indicator (the same one internal agents use)
// persists as a "received — agent is working" state; the
// terminal AGENT_MESSAGE WebSocket push (consumed by
// useChatSocket → onAgentMessage / onSendComplete →
// releaseSendGuards) clears it when the real reply arrives,
// exactly the path an internal async reply already uses.
if (isPollQueuedResponse(resp)) {
clearPollTimeout();
pollTimeoutRef.current = setTimeout(() => {
if (sendTokenRef.current !== myToken) return;
if (!sendingFromAPIRef.current) return;
releaseSendGuards();
setError(
"No response yet from this agent — it may be offline or " +
"busy. Your message was delivered and is queued; the " +
"reply will appear here if the agent picks it up.",
);
}, POLL_QUEUED_REPLY_TIMEOUT_MS);
return;
}
const replyText = extractReplyText(resp);
const replyFiles = extractFilesFromTask(
(resp?.result ?? {}) as Record<string, unknown>,
@@ -243,15 +167,9 @@ export function useChatSend(workspaceId: string, options: UseChatSendOptions) {
setError("Failed to send message — agent may be unreachable");
});
},
[workspaceId, sending, uploading, clearPollTimeout],
[workspaceId, sending, uploading],
);
// Drop the poll-mode safety timer on unmount / workspace switch so a
// stale timeout can't fire setError against a panel the user has
// already navigated away from. sendTokenRef guards correctness if it
// ever did fire; this just avoids the wasted timer + setState churn.
useEffect(() => clearPollTimeout, [clearPollTimeout]);
return {
sending,
uploading,
@@ -7,10 +7,6 @@ import { createMessage, type ChatMessage } from "../types";
export interface UseChatSocketCallbacks {
onAgentMessage?: (msg: ChatMessage) => void;
/** Called when another session sent a user message — used to fan out
* the user's own outbound text to all sessions so a second device
* sees the question live without a manual refresh (issue #228). */
onUserMessage?: (msg: ChatMessage) => void;
onActivityLog?: (entry: string) => void;
onSendComplete?: () => void;
onSendError?: (error: string) => void;
@@ -47,33 +43,6 @@ export function useChatSocket(
useSocketEvent((msg) => {
try {
if (msg.event === "USER_MESSAGE" && msg.workspace_id === workspaceId) {
const p = msg.payload || {};
const message = typeof p.message === "string" ? p.message : "";
const rawAttachments = p.attachments;
const attachments =
Array.isArray(rawAttachments)
? (rawAttachments as Array<{ uri?: unknown; name?: unknown; mimeType?: unknown; size?: unknown }>)
.filter(
(a) =>
typeof a?.uri === "string" && a.uri.length > 0 &&
typeof a?.name === "string" && a.name.length > 0,
)
.map((a) => ({
uri: a.uri as string,
name: a.name as string,
mimeType: typeof a.mimeType === "string" ? a.mimeType : undefined,
size: typeof a.size === "number" ? a.size : undefined,
}))
: undefined;
if (message || (attachments && attachments.length > 0)) {
callbacksRef.current.onUserMessage?.(
createMessage("user", message, attachments),
);
}
return;
}
if (msg.event === "ACTIVITY_LOGGED") {
if (msg.workspace_id !== workspaceId) return;
@@ -98,21 +67,9 @@ export function useChatSocket(
const own = (targetId || msg.workspace_id) === workspaceId;
if (own) {
callbacksRef.current.onSendComplete?.();
// internal#211/#212: surface the runtime's curated,
// user-actionable reason (provider HTTP status + error
// code + the provider's own guidance, e.g. a 403 "org
// disabled · use an API key / ask your admin"). The
// server now includes error_detail in the ACTIVITY_LOGGED
// broadcast; fall back to summary, and only as a last
// resort to a generic line. The old hardcoded
// "Agent error (Exception) — see workspace logs for
// details." string pointed at a logs UI that does not
// exist and discarded the actionable reason entirely.
const detail =
(p.error_detail as string) ||
(p.summary as string) ||
"The agent turn failed but the runtime reported no detail. Retry once; if it repeats the workspace runtime may need a restart.";
callbacksRef.current.onSendError?.(detail);
callbacksRef.current.onSendError?.(
"Agent error (Exception) — see workspace logs for details.",
);
}
}
} else if (type === "a2a_send") {
@@ -2,7 +2,7 @@
import { useState, useCallback, useRef, useEffect } from 'react';
import type { TestConnectionState, SecretGroup } from '@/types/secrets';
import { validateSecret, ApiError } from '@/lib/api/secrets';
import { validateSecret } from '@/lib/api/secrets';
interface TestConnectionButtonProps {
provider: SecretGroup;
@@ -55,23 +55,9 @@ export function TestConnectionButton({
}
onResult?.(result.valid);
resetTimerRef.current = setTimeout(() => setState('idle'), RESET_DELAYS[nextState]!);
} catch (err) {
// Distinguish a real failure shape rather than always claiming a
// timeout. A reachable server that answered with an HTTP status
// (ApiError) did NOT time out — most commonly the validation route
// is not available (404/501), which must not masquerade as
// "service down". Only an actual thrown network/abort error is a
// connectivity failure.
} catch {
setState('failure');
if (err instanceof ApiError) {
setErrorDetail(
err.status === 404 || err.status === 501
? 'Key validation is not available for this service yet. The key was not tested.'
: `Could not verify key (server returned ${err.status}). Saving is unaffected.`,
);
} else {
setErrorDetail('Could not reach the validation service. Check your connection and try again.');
}
setErrorDetail('Connection timed out. Service may be down.');
onResult?.(false);
resetTimerRef.current = setTimeout(() => setState('idle'), RESET_DELAYS.failure);
}
@@ -28,20 +28,8 @@ const mockValidateSecret = vi.fn();
vi.mock("@/lib/api/secrets", () => ({
validateSecret: (...args: unknown[]) => mockValidateSecret(...args),
ApiError: class ApiError extends Error {
status: number;
constructor(status: number, message: string) {
super(message);
this.name = "ApiError";
this.status = status;
}
},
}));
// Re-import the mocked ApiError so test cases construct the same class the
// component's `instanceof` check sees.
import { ApiError } from "@/lib/api/secrets";
beforeEach(() => {
vi.useFakeTimers();
vi.clearAllMocks();
@@ -213,27 +201,8 @@ describe("TestConnectionButton — failure path", () => {
});
describe("TestConnectionButton — catch path", () => {
it("does NOT claim a timeout when the validate endpoint 404s (regression: internal#492)", async () => {
// The validate route is unimplemented on the server and returns a fast
// 404. Before the fix this rendered the misleading hardcoded string
// "Connection timed out. Service may be down." It must instead state
// honestly that validation isn't available and the key was not tested.
mockValidateSecret.mockRejectedValue(new ApiError(404, "Not Found"));
render(
<TestConnectionButton provider="anthropic" secretValue="sk-ant-xxx" />,
);
fireEvent.click(document.querySelector('button[type="button"]')!);
await act(async () => {
await vi.advanceTimersByTimeAsync(0);
});
expect(document.body.textContent).not.toContain("Connection timed out");
expect(document.body.textContent).not.toContain("Service may be down");
expect(document.body.textContent).toContain("not available");
expect(document.body.textContent).toContain("not tested");
});
it("reports a non-404 server error with its status, not a timeout", async () => {
mockValidateSecret.mockRejectedValue(new ApiError(500, "Internal Server Error"));
it("shows 'Connection timed out' on network error", async () => {
mockValidateSecret.mockRejectedValue(new Error("timeout"));
render(
<TestConnectionButton provider="github" secretValue="ghp_xxx" />,
);
@@ -241,20 +210,7 @@ describe("TestConnectionButton — catch path", () => {
await act(async () => {
await vi.advanceTimersByTimeAsync(0);
});
expect(document.body.textContent).toContain("500");
expect(document.body.textContent).not.toContain("Connection timed out");
});
it("shows a connectivity message on a genuine network error", async () => {
mockValidateSecret.mockRejectedValue(new Error("network down"));
render(
<TestConnectionButton provider="github" secretValue="ghp_xxx" />,
);
fireEvent.click(document.querySelector('button[type="button"]')!);
await act(async () => {
await vi.advanceTimersByTimeAsync(0);
});
expect(document.body.textContent).toContain("Could not reach the validation service");
expect(document.body.textContent).toContain("Connection timed out");
});
it("calls onResult(false) on network error", async () => {
@@ -1,166 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for useKeyboardShortcut.
*
* Strategy: use renderHook from @testing-library/react so useEffect fires
* before dispatch. We spy on window.addEventListener to capture the registered
* handler. Events are dispatched by calling the captured handler directly
* with a KeyboardEvent that has metaKey/ctrlKey overridden via
* Object.defineProperty (jsdom's built-in modifier-key event is unreliable).
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { cleanup, act, renderHook } from "@testing-library/react";
import { useState, useCallback } from "react";
import { useKeyboardShortcut } from "../use-keyboard-shortcut";
afterEach(cleanup);
// Capture the most-recently registered keydown handler so tests can dispatch through it.
let registeredHandler: ((e: KeyboardEvent) => void) | null = null;
const addSpy = vi.spyOn(window, "addEventListener").mockImplementation(
(event: string, handler: EventListener) => {
if (event === "keydown") {
registeredHandler = handler as (e: KeyboardEvent) => void;
}
},
);
const removeSpy = vi.spyOn(window, "removeEventListener").mockImplementation(
(event: string) => {
if (event === "keydown") {
registeredHandler = null;
}
},
);
beforeEach(() => {
registeredHandler = null;
addSpy.mockClear();
removeSpy.mockClear();
});
/**
* Dispatch a keydown event through the captured handler.
* Wrapped in act() so React flushes any state updates synchronously.
* Bypasses jsdom's internal event routing (which doesn't go through
* window.EventTarget.prototype.addEventListener for fireEvent dispatch).
*/
function dispatchKeydown(
key: string,
{ meta = false, ctrl = false }: { meta?: boolean; ctrl?: boolean } = {},
) {
act(() => {
const e = new KeyboardEvent("keydown", { key, bubbles: true });
Object.defineProperty(e, "metaKey", { value: meta });
Object.defineProperty(e, "ctrlKey", { value: ctrl });
registeredHandler?.(e);
});
}
describe("useKeyboardShortcut", () => {
describe("enabled=false", () => {
it("does not register a keydown listener", () => {
renderHook(() =>
useKeyboardShortcut("k", vi.fn(), { enabled: false }),
);
expect(addSpy).not.toHaveBeenCalledWith("keydown", expect.any(Function));
});
});
describe("meta modifier", () => {
it("fires callback on Cmd+K", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { meta: true }));
dispatchKeydown("k", { meta: true });
expect(cb).toHaveBeenCalledTimes(1);
});
it("does NOT fire on Ctrl+K when only meta=true", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { meta: true }));
dispatchKeydown("k", { ctrl: true });
expect(cb).not.toHaveBeenCalled();
});
it("does NOT fire on plain K even with meta=true", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { meta: true }));
dispatchKeydown("k", { meta: false, ctrl: false });
expect(cb).not.toHaveBeenCalled();
});
});
describe("ctrl modifier", () => {
it("fires callback on Ctrl+K", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { ctrl: true }));
dispatchKeydown("k", { ctrl: true });
expect(cb).toHaveBeenCalledTimes(1);
});
it("does NOT fire on Cmd+K when only ctrl=true", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { ctrl: true }));
dispatchKeydown("k", { meta: true });
expect(cb).not.toHaveBeenCalled();
});
});
describe("no-modifier guard", () => {
it("does not fire when no modifier is held", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, {}));
dispatchKeydown("k", { meta: false, ctrl: false });
expect(cb).not.toHaveBeenCalled();
});
});
describe("key mismatch", () => {
it("does not fire when wrong key is pressed", () => {
const cb = vi.fn();
renderHook(() => useKeyboardShortcut("k", cb, { meta: true }));
dispatchKeydown("j", { meta: true });
expect(cb).not.toHaveBeenCalled();
});
});
describe("count reflects shortcut fires", () => {
it("increments when Cmd+K fires", () => {
const { result } = renderHook(() => {
const [count, setCount] = useState(0);
const cb = useCallback(() => setCount((c) => c + 1), []);
useKeyboardShortcut("k", cb, { meta: true });
return count;
});
expect(result.current).toBe(0);
dispatchKeydown("k", { meta: true });
expect(result.current).toBe(1);
dispatchKeydown("k", { meta: true });
expect(result.current).toBe(2);
});
it("does not increment on wrong modifier", () => {
const { result } = renderHook(() => {
const [count, setCount] = useState(0);
const cb = useCallback(() => setCount((c) => c + 1), []);
useKeyboardShortcut("k", cb, { meta: true });
return count;
});
dispatchKeydown("k", { ctrl: true }); // wrong modifier
expect(result.current).toBe(0);
});
});
describe("cleanup on unmount", () => {
it("removes the keydown listener on unmount", () => {
const cb = vi.fn();
const { unmount } = renderHook(() =>
useKeyboardShortcut("k", cb, { meta: true }),
);
expect(removeSpy).not.toHaveBeenCalled();
unmount();
expect(removeSpy).toHaveBeenCalledWith("keydown", expect.any(Function));
});
});
});
@@ -1,84 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for useSocketEvent.
*
* Covers:
* - subscribeSocketEvents is called on mount
* - Unsubscribe is called on unmount
* - subscribeSocketEvents is called only once (ref-based, not render-based)
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, cleanup } from "@testing-library/react";
import React from "react";
import { useSocketEvent } from "../useSocketEvent";
afterEach(cleanup);
// Mutable ref shared between vi.mock factory and test helpers
const state = {
handler: null as ((msg: unknown) => void) | null,
unsubscribe: null as (() => void) | null,
};
// Module-level mock — factory uses the state object so beforeEach can update it
vi.mock("@/store/socket-events", () => ({
subscribeSocketEvents: vi.fn().mockImplementation(() => {
if (state.unsubscribe) return state.unsubscribe;
const fn = vi.fn();
state.unsubscribe = fn;
return fn;
}),
}));
import { subscribeSocketEvents } from "@/store/socket-events";
beforeEach(() => {
state.handler = null;
state.unsubscribe = null;
vi.mocked(subscribeSocketEvents).mockImplementation(() => {
const fn = vi.fn();
state.unsubscribe = fn;
return fn;
});
});
// Dispatch a message through the subscribed handler
function dispatchMsg(msg: unknown) {
if (state.handler) {
state.handler(msg);
}
}
// Consumer component that stores the handler ref
function SocketConsumer({ cb }: { cb: (msg: unknown) => void }) {
useSocketEvent(cb as (msg: unknown) => void);
// Store the handler so tests can dispatch through it
// We do this by re-mocking to capture the handler
return <div data-testid="consumer" />;
}
describe("useSocketEvent", () => {
it("calls subscribeSocketEvents on mount", () => {
render(<SocketConsumer cb={vi.fn()} />);
expect(subscribeSocketEvents).toHaveBeenCalledTimes(1);
});
it("calls the unsubscribe function on unmount", () => {
const unsubscribe = vi.fn();
vi.mocked(subscribeSocketEvents).mockReturnValueOnce(unsubscribe);
const { unmount } = render(<SocketConsumer cb={vi.fn()} />);
unmount();
expect(unsubscribe).toHaveBeenCalledTimes(1);
});
it("subscribeSocketEvents is called only once on re-renders", () => {
const { rerender } = render(<SocketConsumer cb={vi.fn()} />);
const initial = vi.mocked(subscribeSocketEvents).mock.calls.length;
rerender(<SocketConsumer cb={vi.fn()} />);
rerender(<SocketConsumer cb={vi.fn()} />);
rerender(<SocketConsumer cb={vi.fn()} />);
expect(vi.mocked(subscribeSocketEvents).mock.calls.length).toBe(initial);
});
});
@@ -1,98 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for useWorkspaceName.
*
* Tests that the hook correctly resolves workspace IDs to names
* using the canvas store's nodes.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { renderHook, cleanup } from "@testing-library/react";
import React from "react";
import { useWorkspaceName } from "../useWorkspaceName";
afterEach(cleanup);
const mockNodes = [
{ id: "ws-1", data: { name: "Alpha Workspace" } },
{ id: "ws-2", data: { name: "Beta Workspace" } },
{ id: "ws-3", data: {} }, // node without name
{ id: "ws-4", data: { name: "" } }, // empty name
] as const;
// Stable reference so useCallback deps are stable across re-renders
const stableNodes = [...mockNodes];
vi.mock("@/store/canvas", () => ({
useCanvasStore: Object.assign(
vi.fn((selector?: (s: { nodes: typeof stableNodes }) => unknown) => {
if (typeof selector === "function") {
return selector({ nodes: stableNodes });
}
return { nodes: stableNodes };
}),
{ getState: vi.fn(() => ({ nodes: stableNodes })) },
),
}));
import { useCanvasStore } from "@/store/canvas";
beforeEach(() => {
vi.mocked(useCanvasStore).mockClear();
});
describe("useWorkspaceName", () => {
it("returns the workspace name for a known ID", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve("ws-1");
});
expect(result.current).toBe("Alpha Workspace");
});
it("returns the workspace name for another known ID", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve("ws-2");
});
expect(result.current).toBe("Beta Workspace");
});
it("returns empty string for null", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve(null);
});
expect(result.current).toBe("");
});
it("falls back to first 8 chars of ID when node has no name", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve("ws-3");
});
expect(result.current).toBe("ws-3".slice(0, 8));
});
it("falls back to first 8 chars of ID when name is empty string", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve("ws-4");
});
expect(result.current).toBe("ws-4".slice(0, 8));
});
it("falls back to first 8 chars of ID for unknown workspace", () => {
const { result } = renderHook(() => {
const resolve = useWorkspaceName();
return resolve("ws-999");
});
expect(result.current).toBe("ws-999".slice(0, 8));
});
it("callback is memoized — same reference across renders", () => {
const { result, rerender } = renderHook(() => useWorkspaceName());
const first = result.current;
rerender();
expect(result.current).toBe(first);
});
});
+55 -20
View File
@@ -1,32 +1,67 @@
// @vitest-environment jsdom
/**
* Tests for cssVar — maps ColorToken to a CSS variable string.
*
* Exists for the rare case where an inline style="" or SVG fill needs
* a token value rather than a Tailwind class. The returned var(--color-foo)
* string follows the live theme without re-renders.
*/
import { describe, it, expect } from "vitest";
import { cssVar, type ColorToken } from "../theme";
import { cssVar } from "../theme";
import type { ColorToken } from "../theme";
describe("cssVar", () => {
const tokens: ColorToken[] = [
"surface", "surface-elevated", "surface-sunken", "surface-card",
"line", "line-soft", "ink", "ink-mid", "ink-soft",
"accent", "accent-strong", "warm", "good", "bad",
"bg", "bg-elev", "bg-card", "line-strong",
"ink-mute", "ink-dim", "accent-dim", "plasma", "warn",
];
it("returns 'var(--color-surface)' for 'surface'", () => {
expect(cssVar("surface")).toBe("var(--color-surface)");
});
it("returns a CSS variable string for every colour token", () => {
for (const token of tokens) {
expect(cssVar(token)).toBe(`var(--color-${token})`);
it("returns 'var(--color-ink)' for 'ink'", () => {
expect(cssVar("ink")).toBe("var(--color-ink)");
});
it("returns 'var(--color-accent)' for 'accent'", () => {
expect(cssVar("accent")).toBe("var(--color-accent)");
});
it("returns 'var(--color-good)' for 'good'", () => {
expect(cssVar("good")).toBe("var(--color-good)");
});
it("returns 'var(--color-bad)' for 'bad'", () => {
expect(cssVar("bad")).toBe("var(--color-bad)");
});
it("returns 'var(--color-warn)' for 'warn'", () => {
expect(cssVar("warn")).toBe("var(--color-warn)");
});
it("handles all surface variants", () => {
const surfaces: ColorToken[] = ["surface", "surface-elevated", "surface-sunken", "surface-card"];
for (const t of surfaces) {
expect(cssVar(t)).toBe(`var(--color-${t})`);
}
});
it("returned string can be used as an inline style value", () => {
const el = document.createElement("div");
el.style.color = cssVar("ink");
el.style.backgroundColor = cssVar("surface");
expect(el.style.color).toBe("var(--color-ink)");
expect(el.style.backgroundColor).toBe("var(--color-surface)");
it("handles all ink variants", () => {
const inks: ColorToken[] = ["ink", "ink-mid", "ink-soft", "ink-mute", "ink-dim"];
for (const t of inks) {
expect(cssVar(t)).toBe(`var(--color-${t})`);
}
});
it("returned string contains the token name verbatim", () => {
expect(cssVar("accent-strong")).toContain("accent-strong");
expect(cssVar("ink-dim")).toContain("ink-dim");
it("handles always-dark tokens", () => {
const dark: ColorToken[] = ["bg", "bg-elev", "bg-card", "line-strong", "accent-dim", "plasma"];
for (const t of dark) {
expect(cssVar(t)).toBe(`var(--color-${t})`);
}
});
it("is a pure function — same input always returns same output", () => {
const tokens: ColorToken[] = ["surface", "accent", "good", "bad", "warm"];
for (const t of tokens) {
for (let i = 0; i < 3; i++) {
expect(cssVar(t)).toBe(`var(--color-${t})`);
}
}
});
});
@@ -1,134 +0,0 @@
// @vitest-environment jsdom
/**
* Tests for ThemeProvider and useTheme.
*
* Uses renderHook so useEffect fires before assertions.
* matchMedia is stubbed via Object.defineProperty in beforeEach.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, renderHook, cleanup, act } from "@testing-library/react";
import React from "react";
import { ThemeProvider, useTheme } from "../theme-provider";
afterEach(cleanup);
function makeMatcher(prefersDark: boolean) {
return {
matches: prefersDark,
media: "(prefers-color-scheme: dark)",
onchange: null,
addListener: vi.fn(),
removeListener: vi.fn(),
addEventListener: vi.fn(),
removeEventListener: vi.fn(),
dispatchEvent: vi.fn(),
};
}
beforeEach(() => {
Object.defineProperty(window, "matchMedia", {
writable: true,
configurable: true,
value: vi.fn().mockImplementation(() => makeMatcher(false)),
});
});
afterEach(() => {
vi.restoreAllMocks();
});
describe("useTheme", () => {
it("returns noopTheme when no provider is in the tree", () => {
const { result } = renderHook(() => useTheme());
expect(result.current).toMatchObject({
theme: "system",
resolvedTheme: "light",
});
expect(typeof result.current.setTheme).toBe("function");
});
});
describe("ThemeProvider", () => {
it("initialises with the initialTheme prop", () => {
const { result } = renderHook(() => useTheme(), {
wrapper: ({ children }) => (
<ThemeProvider initialTheme="dark">{children}</ThemeProvider>
),
});
expect(result.current).toMatchObject({
theme: "dark",
resolvedTheme: "dark",
});
expect(document.documentElement.dataset.theme).toBe("dark");
});
it("reflects system preference when theme=system", () => {
Object.defineProperty(window, "matchMedia", {
writable: true,
configurable: true,
value: vi.fn().mockImplementation(() => makeMatcher(true)),
});
const { result } = renderHook(() => useTheme(), {
wrapper: ({ children }) => (
<ThemeProvider initialTheme="system">{children}</ThemeProvider>
),
});
expect(result.current).toMatchObject({
theme: "system",
resolvedTheme: "dark",
});
expect(document.documentElement.dataset.theme).toBe("dark");
});
it("resolvedTheme follows explicit theme, not system, when theme != system", () => {
Object.defineProperty(window, "matchMedia", {
writable: true,
configurable: true,
value: vi.fn().mockImplementation(() => makeMatcher(true)),
});
const { result } = renderHook(() => useTheme(), {
wrapper: ({ children }) => (
<ThemeProvider initialTheme="light">{children}</ThemeProvider>
),
});
expect(result.current).toMatchObject({
theme: "light",
resolvedTheme: "light",
});
expect(document.documentElement.dataset.theme).toBe("light");
});
it("setTheme updates theme state", () => {
let setThemeRef: ((t: string) => void) | null = null;
const { result } = renderHook(() => {
const ctx = useTheme();
// Capture setTheme on first render
if (!setThemeRef) setThemeRef = ctx.setTheme;
return ctx;
}, {
wrapper: ({ children }) => (
<ThemeProvider initialTheme="light">{children}</ThemeProvider>
),
});
expect(result.current.theme).toBe("light");
act(() => { setThemeRef!("dark"); });
expect(result.current.theme).toBe("dark");
expect(document.documentElement.dataset.theme).toBe("dark");
});
it("sets document.documentElement.dataset.theme to resolvedTheme on mount", () => {
render(
<ThemeProvider initialTheme="dark">
<div />
</ThemeProvider>,
);
// renderHook already flushed effects; plain render also needs act
act(() => {});
expect(document.documentElement.dataset.theme).toBe("dark");
});
});
-199
View File
@@ -21,22 +21,12 @@ vi.mock("../canvas", () => ({
class MockWebSocket {
static instances: MockWebSocket[] = [];
// Mirror the real WebSocket readyState constants — socket.ts's wake
// path reads WebSocket.OPEN / WebSocket.CONNECTING and this.ws.readyState.
static readonly CONNECTING = 0;
static readonly OPEN = 1;
static readonly CLOSING = 2;
static readonly CLOSED = 3;
url: string;
onopen: (() => void) | null = null;
onmessage: ((event: { data: string }) => void) | null = null;
onclose: (() => void) | null = null;
onerror: (() => void) | null = null;
closeCallCount = 0;
// Starts OPEN once triggerOpen runs; tests flip this to simulate a
// mobile background-suspend that left a dead/half-open socket.
readyState = MockWebSocket.CONNECTING;
constructor(url: string) {
this.url = url;
@@ -45,12 +35,10 @@ class MockWebSocket {
close() {
this.closeCallCount++;
this.readyState = MockWebSocket.CLOSED;
}
// Helpers to trigger events in tests
triggerOpen() {
this.readyState = MockWebSocket.OPEN;
this.onopen?.();
}
@@ -71,46 +59,6 @@ class MockWebSocket {
}
}
// ---------------------------------------------------------------------------
// Minimal DOM stub (vitest environment is 'node' — no window/document).
// socket.ts's wake-recovery attaches visibilitychange/pageshow/online/
// focus listeners; under node it self-no-ops via a typeof guard, so to
// exercise the path we inject just enough of window/document here, the
// same way WebSocket is stubbed above. Kept tiny on purpose — a single
// listener registry keyed by event name, plus a settable
// visibilityState.
// ---------------------------------------------------------------------------
interface FakeTarget {
_l: Record<string, Array<() => void>>;
addEventListener: (type: string, fn: () => void) => void;
removeEventListener: (type: string, fn: () => void) => void;
dispatch: (type: string) => void;
}
function makeFakeTarget(): FakeTarget {
const l: Record<string, Array<() => void>> = {};
return {
_l: l,
addEventListener(type, fn) {
(l[type] ||= []).push(fn);
},
removeEventListener(type, fn) {
l[type] = (l[type] || []).filter((f) => f !== fn);
},
dispatch(type) {
for (const fn of l[type] || []) fn();
},
};
}
const fakeWindow = makeFakeTarget();
const fakeDocument = Object.assign(makeFakeTarget(), {
visibilityState: "visible" as string,
});
(globalThis as unknown as Record<string, unknown>).window = fakeWindow;
(globalThis as unknown as Record<string, unknown>).document = fakeDocument;
// Install mock WebSocket globally before importing socket module
(globalThis as unknown as Record<string, unknown>).WebSocket = MockWebSocket;
@@ -380,153 +328,6 @@ describe("WebSocket onerror", () => {
});
});
// ---------------------------------------------------------------------------
// Wake recovery — mobile background-suspend regression (mobile chat not
// updating in real time until refresh). Simulates: connect → open →
// the OS freezes the page and silently kills the WS WITHOUT firing
// onclose → user returns (visibilitychange / pageshow / online /
// focus) → assert the dead socket is replaced AND, on the new socket's
// open, the resume signal fires so chat history back-fills the missed
// AGENT_MESSAGE / A2A_RESPONSE events.
// ---------------------------------------------------------------------------
import {
subscribeSocketResume,
_resetSocketResumeListenersForTests,
} from "../socket-events";
describe("wake recovery (mobile background-suspend)", () => {
beforeEach(() => {
_resetSocketResumeListenersForTests();
fakeDocument.visibilityState = "visible";
});
function suspendKill(ws: MockWebSocket) {
// Mobile background-suspend: the OS tore the transport down but the
// page was frozen so onclose never ran. The socket object survives
// with a CLOSED readyState and no reconnect was scheduled.
ws.readyState = MockWebSocket.CLOSED;
}
it("reconnects on visibilitychange when the socket was silently killed", () => {
connectSocket();
const ws = getLastWS();
ws.triggerOpen();
expect(MockWebSocket.instances).toHaveLength(1);
suspendKill(ws);
fakeDocument.dispatch("visibilitychange");
// A fresh socket must have been created — the stale one is not
// reused.
expect(MockWebSocket.instances.length).toBeGreaterThan(1);
});
it("does NOT reconnect on visibilitychange while the socket is still healthy", () => {
connectSocket();
const ws = getLastWS();
ws.triggerOpen();
expect(MockWebSocket.instances).toHaveLength(1);
// Healthy OPEN socket + a spurious visibilitychange (e.g. quick tab
// peek that never actually suspended) → no churn.
fakeDocument.dispatch("visibilitychange");
expect(MockWebSocket.instances).toHaveLength(1);
});
it("ignores visibilitychange when the page is hidden (the hide transition)", () => {
connectSocket();
const ws = getLastWS();
ws.triggerOpen();
suspendKill(ws);
fakeDocument.visibilityState = "hidden";
fakeDocument.dispatch("visibilitychange");
// Hidden → must not reconnect (would defeat the purpose; we only
// re-arm when the user is actually looking at the page again).
expect(MockWebSocket.instances).toHaveLength(1);
});
it.each(["pageshow", "online", "focus"])(
"reconnects on window '%s' after a silent kill",
(evt) => {
connectSocket();
const ws = getLastWS();
ws.triggerOpen();
suspendKill(ws);
fakeWindow.dispatch(evt);
expect(MockWebSocket.instances.length).toBeGreaterThan(1);
},
);
it("emits the resume signal once the recovered socket re-opens (so chat back-fills missed messages)", () => {
const onResume = vi.fn();
const unsub = subscribeSocketResume(onResume);
connectSocket();
const ws1 = getLastWS();
ws1.triggerOpen();
// First open must NOT fire resume — the mount-time chat-history
// fetch already covers the initial load.
expect(onResume).not.toHaveBeenCalled();
// Background-suspend silently kills the socket, then the user
// returns.
suspendKill(ws1);
fakeDocument.dispatch("visibilitychange");
// The wake handler force-reconnected; the new socket completing its
// handshake is what signals "we recovered from a gap — re-fetch".
const ws2 = getLastWS();
expect(ws2).not.toBe(ws1);
ws2.triggerOpen();
expect(onResume).toHaveBeenCalledTimes(1);
unsub();
});
it("does not emit resume on the very first connect", () => {
const onResume = vi.fn();
const unsub = subscribeSocketResume(onResume);
connectSocket();
getLastWS().triggerOpen();
expect(onResume).not.toHaveBeenCalled();
unsub();
});
it("emits resume after an ordinary onclose-driven reconnect too (desktop path unchanged)", () => {
const onResume = vi.fn();
const unsub = subscribeSocketResume(onResume);
connectSocket();
const ws1 = getLastWS();
ws1.triggerOpen();
// Ordinary network drop — onclose fires normally.
ws1.triggerClose();
vi.advanceTimersByTime(1100); // past the 1s backoff
const ws2 = getLastWS();
expect(ws2).not.toBe(ws1);
ws2.triggerOpen();
expect(onResume).toHaveBeenCalledTimes(1);
unsub();
});
it("detaches wake listeners on disconnect (no reconnect after teardown)", () => {
connectSocket();
const ws = getLastWS();
ws.triggerOpen();
disconnectSocket();
const countAfterDisconnect = MockWebSocket.instances.length;
// A wake event after teardown must be inert.
fakeDocument.dispatch("visibilitychange");
fakeWindow.dispatch("focus");
expect(MockWebSocket.instances.length).toBe(countAfterDisconnect);
});
});
// ---------------------------------------------------------------------------
// Health check (startHealthCheck / stopHealthCheck via onopen / disconnect)
// ---------------------------------------------------------------------------
-50
View File
@@ -61,53 +61,3 @@ export function subscribeSocketEvents(listener: Listener): () => void {
export function _resetSocketEventListenersForTests(): void {
listeners.clear();
}
// ---------------------------------------------------------------------------
// Socket-resume signal
// ---------------------------------------------------------------------------
//
// Fired by the ReconnectingSocket when the WS comes back up AFTER having
// been down (drop, or a mobile-browser background-suspend that silently
// killed the socket while the page was frozen). Distinct from the raw
// event bus above: while the socket was dead the page missed every
// AGENT_MESSAGE / A2A_RESPONSE, and the store's rehydrate() only re-pulls
// /workspaces status — it does NOT back-fill chat messages. Components
// that render a live message thread (desktop ChatTab + MobileChat, both
// via useChatHistory) subscribe here to re-fetch their history on resume
// so missed agent replies appear without the user having to navigate
// away+back or hard-refresh. Shared by desktop and mobile — the recovery
// is in the singleton socket, not forked per-surface.
type ResumeListener = () => void;
const resumeListeners = new Set<ResumeListener>();
/** Notify every resume subscriber that the socket just recovered from a
* down period. Called by ReconnectingSocket.onopen, but only when the
* open follows a prior loss (not the very first connect — the initial
* mount-time history fetch already covers that). */
export function emitSocketResume(): void {
for (const listener of resumeListeners) {
try {
listener();
} catch (err) {
if (typeof console !== "undefined") {
console.error("socket-resume listener threw:", err);
}
}
}
}
/** Register a resume subscriber. Returns an unsubscribe function the
* caller must invoke from its effect cleanup. */
export function subscribeSocketResume(listener: ResumeListener): () => void {
resumeListeners.add(listener);
return () => {
resumeListeners.delete(listener);
};
}
/** Test-only: drop all resume subscribers. */
export function _resetSocketResumeListenersForTests(): void {
resumeListeners.clear();
}
+1 -117
View File
@@ -1,6 +1,6 @@
import { useCanvasStore } from "./canvas";
import { deriveWsBaseUrl } from "@/lib/ws-url";
import { emitSocketEvent, emitSocketResume } from "./socket-events";
import { emitSocketEvent } from "./socket-events";
// If explicit WS_URL is set, use it as-is (may include custom path).
// Otherwise derive base + append /ws.
@@ -98,107 +98,9 @@ class ReconnectingSocket {
// caller can fire-and-forget without coordinating.
private rehydrateInFlight: Promise<void> | null = null;
private rehydrateDedup = new RehydrateDedup(REHYDRATE_DEDUP_WINDOW_MS);
// True once any onopen has fired. Gates the resume signal so the very
// first connect doesn't fire it (the mount-time chat-history fetch
// already covers the initial load — a resume here would be a wasted
// duplicate). Set on the first successful open and stays true.
private everConnected = false;
// True between a loss (onclose / wake-detected stale socket) and the
// next successful onopen. Only when this is set does onopen emit the
// resume signal — i.e. we recovered from a real gap during which
// AGENT_MESSAGE / A2A_RESPONSE events may have been missed.
private wasDown = false;
// Bound wake handler. iOS Safari / Chrome-mobile freeze the page and
// its timers when the tab is backgrounded or the device locks, and
// tear the WS down WITHOUT reliably firing onclose before the freeze.
// On thaw nothing re-arms: onclose never ran so no reconnect was
// scheduled, and the health-check / fallback-poll intervals were
// suspended. The socket is silently dead until a manual refresh. This
// handler force-reconnects on any wake signal when the socket isn't
// healthy. Stored so disconnect() can detach the listeners.
private onWake: (() => void) | null = null;
constructor(url: string) {
this.url = url;
this.installWakeListeners();
}
/** Attach page-lifecycle listeners that force a reconnect when the
* page returns to the foreground / regains connectivity and the
* socket is not OPEN. Shared by desktop and mobile — desktop rarely
* hits the stale-socket path (its onclose fires promptly) so this is
* effectively a no-op there, while mobile depends on it because the
* background-suspend kills the socket without an onclose. */
private installWakeListeners() {
if (typeof window === "undefined" || typeof document === "undefined") {
return;
}
const wake = () => {
if (this.disposed) return;
// Only act on a visible page — visibilitychange also fires on the
// hide transition, which we must ignore (closing here would defeat
// the point).
if (
typeof document.visibilityState === "string" &&
document.visibilityState !== "visible"
) {
return;
}
// Healthy socket → nothing to do. A stale/half-open socket on
// mobile reports CLOSED or CLOSING (the OS tore the transport
// down); CONNECTING is also unhealthy from the user's POV but a
// reconnect attempt is already in flight, so leave it.
const live =
this.ws !== null &&
(this.ws.readyState === WebSocket.OPEN ||
this.ws.readyState === WebSocket.CONNECTING);
if (live) return;
// Tear down any zombie and reconnect immediately. Mark wasDown so
// the subsequent onopen emits the resume signal and chat threads
// back-fill the messages missed while frozen.
this.wasDown = true;
this.forceReconnect();
};
this.onWake = wake;
document.addEventListener("visibilitychange", wake);
window.addEventListener("pageshow", wake);
window.addEventListener("online", wake);
window.addEventListener("focus", wake);
}
private removeWakeListeners() {
if (!this.onWake) return;
if (typeof window !== "undefined" && typeof document !== "undefined") {
document.removeEventListener("visibilitychange", this.onWake);
window.removeEventListener("pageshow", this.onWake);
window.removeEventListener("online", this.onWake);
window.removeEventListener("focus", this.onWake);
}
this.onWake = null;
}
/** Detach the current (presumed dead/stale) socket without routing
* through its onclose, cancel any pending backoff timer, and
* reconnect now. Used by the wake path: the browser already killed
* the transport, so the exponential backoff that onclose would have
* scheduled is both absent and undesirable — the user is looking at
* the page and wants it live immediately. */
private forceReconnect() {
if (this.disposed) return;
if (this.reconnectTimer) {
clearTimeout(this.reconnectTimer);
this.reconnectTimer = null;
}
if (this.ws) {
this.ws.onopen = null;
this.ws.onmessage = null;
this.ws.onclose = null;
this.ws.onerror = null;
try { this.ws.close(); } catch { /* noop */ }
this.ws = null;
}
this.attempt = 0;
this.connect();
}
connect() {
@@ -230,18 +132,6 @@ class ReconnectingSocket {
this.stopFallbackPoll();
this.rehydrate();
this.startHealthCheck();
// If this open follows a real loss (drop, or a mobile background-
// suspend that the wake handler recovered from), signal resume so
// live message threads re-fetch the AGENT_MESSAGE / A2A_RESPONSE
// history they missed while the socket was dead — rehydrate()
// above only refreshes /workspaces status, not chat. Gate on
// everConnected so the very first open (covered by the mount-time
// history fetch) doesn't fire a redundant resume.
if (this.everConnected && this.wasDown) {
emitSocketResume();
}
this.everConnected = true;
this.wasDown = false;
};
ws.onmessage = (event) => {
@@ -267,11 +157,6 @@ class ReconnectingSocket {
// corresponds to the WS we just tore down (prevents a stale
// onclose from a zombie socket from re-arming the loop).
if (this.disposed || this.ws !== ws) return;
// We had a live socket and lost it — mark down so the next onopen
// emits the resume signal and chat threads back-fill missed
// messages. (The wake path also sets this; setting it here covers
// the ordinary network-drop case.)
this.wasDown = true;
this.stopHealthCheck();
useCanvasStore.getState().setWsStatus("connecting");
this.startFallbackPoll();
@@ -362,7 +247,6 @@ class ReconnectingSocket {
disconnect() {
this.disposed = true;
this.removeWakeListeners();
this.stopHealthCheck();
this.stopFallbackPoll();
if (this.reconnectTimer) {
-1
View File
@@ -62,7 +62,6 @@ TOP_LEVEL_MODULES = {
"a2a_tools_memory",
"a2a_tools_messaging",
"a2a_tools_rbac",
"a2a_tools_identity",
"adapter_base",
"agent",
"agents_md",
+2 -4
View File
@@ -41,9 +41,8 @@ type EventType string
// scan-friendly as it grows.
const (
// Chat / agent messaging — surfaces in canvas chat panels.
EventAgentMessage EventType = "AGENT_MESSAGE"
EventA2AResponse EventType = "A2A_RESPONSE"
EventUserMessage EventType = "USER_MESSAGE"
EventAgentMessage EventType = "AGENT_MESSAGE"
EventA2AResponse EventType = "A2A_RESPONSE"
EventActivityLogged EventType = "ACTIVITY_LOGGED"
EventChannelMessage EventType = "CHANNEL_MESSAGE"
@@ -96,7 +95,6 @@ const (
var AllEventTypes = []EventType{
EventA2AResponse,
EventActivityLogged,
EventUserMessage,
EventAgentAssigned,
EventAgentCardUpdated,
EventAgentMessage,
@@ -41,7 +41,6 @@ func TestAllEventTypes_IsSnapshot(t *testing.T) {
"DELEGATION_STATUS",
"EXTERNAL_CREDENTIALS_ROTATED",
"TASK_UPDATED",
"USER_MESSAGE",
"WORKSPACE_AWAITING_AGENT",
"WORKSPACE_DEGRADED",
"WORKSPACE_HEARTBEAT",
@@ -399,21 +399,7 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
// (no Do(), no maybeMarkContainerDead). The response is a synthetic
// {status:"queued"} envelope so the caller (canvas, another workspace)
// knows delivery is acknowledged but pending consumption.
deliveryMode, deliveryModeErr := lookupDeliveryMode(ctx, workspaceID)
if deliveryModeErr != nil {
// internal#497 fail-closed: a real DB/context error on the
// delivery-mode read MUST NOT silently fall through to the push
// dispatch path — that is exactly what silently misrouted every
// poll-mode peer for 5 days under the ce2db75f regression. Surface
// a structured error so the delegation is marked failed (loud +
// retryable) instead of dispatched to the wrong path.
log.Printf("ProxyA2A: delivery-mode lookup failed for %s: %v — failing closed", workspaceID, deliveryModeErr)
return 0, nil, &proxyA2AError{
Status: http.StatusServiceUnavailable,
Response: gin.H{"error": "delivery-mode lookup failed; refusing to dispatch to avoid silent misrouting"},
}
}
if deliveryMode == models.DeliveryModePoll {
if lookupDeliveryMode(ctx, workspaceID) == models.DeliveryModePoll {
if logActivity {
h.logA2AReceiveQueued(ctx, workspaceID, callerID, body, a2aMethod)
}
@@ -11,7 +11,6 @@ import (
"log"
"net/http"
"strconv"
"strings"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
@@ -195,11 +194,6 @@ func (h *WorkspaceHandler) maybeMarkContainerDead(ctx context.Context, workspace
}
db.ClearWorkspaceKeys(ctx, workspaceID)
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOffline), workspaceID, map[string]interface{}{})
// Tracked via goAsync (not bare `go`) so the asyncWG can be drained
// before a test swaps the global db.DB. runRestartCycle reads db.DB
// before its provisioner gate, so an untracked detached goroutine
// races setupTestDB's t.Cleanup db.DB restore. Matches the already-
// correct site at a2a_proxy.go:648.
h.goAsync(func() { h.RestartByID(workspaceID) })
return true
}
@@ -247,9 +241,6 @@ func (h *WorkspaceHandler) preflightContainerHealth(ctx context.Context, workspa
}
db.ClearWorkspaceKeys(ctx, workspaceID)
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOffline), workspaceID, map[string]interface{}{})
// Tracked via goAsync (see maybeMarkContainerDead): preflight's
// detached restart must be drainable so it doesn't race the global
// db.DB swap in test cleanup.
h.goAsync(func() { h.RestartByID(workspaceID) })
return &proxyA2AError{
Status: http.StatusServiceUnavailable,
@@ -271,9 +262,8 @@ func (h *WorkspaceHandler) logA2AFailure(ctx context.Context, workspaceID, calle
errWsName = workspaceID
}
summary := "A2A request to " + errWsName + " failed: " + errMsg
parent := ctx
h.goAsync(func() {
logCtx, cancel := context.WithTimeout(context.WithoutCancel(parent), 30*time.Second)
logCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), 30*time.Second)
defer cancel()
LogActivity(logCtx, h.broadcaster, ActivityParams{
WorkspaceID: workspaceID,
@@ -319,9 +309,8 @@ func (h *WorkspaceHandler) logA2ASuccess(ctx context.Context, workspaceID, calle
}
summary := a2aMethod + " → " + wsNameForLog
toolTrace := extractToolTrace(respBody)
parent := ctx
h.goAsync(func() {
logCtx, cancel := context.WithTimeout(context.WithoutCancel(parent), 30*time.Second)
logCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), 30*time.Second)
defer cancel()
LogActivity(logCtx, h.broadcaster, ActivityParams{
WorkspaceID: workspaceID,
@@ -345,19 +334,6 @@ func (h *WorkspaceHandler) logA2ASuccess(ctx context.Context, workspaceID, calle
"duration_ms": durationMs,
})
}
// #228: fan user's own outbound message to all sessions of the workspace.
// When a canvas user sends a message (callerID == "" and method == "message/send"),
// the originating session already inserted it optimistically in useChatSend.
// Other sessions see nothing until a manual refresh — this broadcast closes
// that gap. The originating session collapses its optimistic copy via the
// 3-second appendMessageDeduped window (same role + content = deduped).
if callerID == "" && a2aMethod == "message/send" && statusCode < 400 {
userPayload := extractCanvasUserMessage(body)
if userPayload != nil {
h.broadcaster.BroadcastOnly(workspaceID, string(events.EventUserMessage), userPayload)
}
}
}
func nilIfEmpty(s string) *string {
@@ -407,110 +383,6 @@ func validateCallerToken(ctx context.Context, c *gin.Context, callerID string) e
// matching (the wsauth errors are typed for the invalid case).
var errInvalidCallerToken = errors.New("missing caller auth token")
// extractCanvasUserMessage parses an A2A JSON-RPC request body and extracts
// the user-authored text and attachments from a canvas-initiated message/send.
// Returns nil when the body is not a canvas user message (empty, malformed,
// or not a message/send from canvas). The returned payload is safe to pass
// directly to BroadcastOnly — nil fields are omitted from JSON.
func extractCanvasUserMessage(body []byte) map[string]interface{} {
if len(body) == 0 {
return nil
}
var top map[string]json.RawMessage
if err := json.Unmarshal(body, &top); err != nil {
return nil
}
// Only handle message/send from canvas
var method string
if err := json.Unmarshal(top["method"], &method); err != nil || method != "message/send" {
return nil
}
params, ok := top["params"]
if !ok {
return nil
}
var paramsMap map[string]json.RawMessage
if err := json.Unmarshal(params, &paramsMap); err != nil {
return nil
}
msgRaw, ok := paramsMap["message"]
if !ok {
return nil
}
var msg map[string]json.RawMessage
if err := json.Unmarshal(msgRaw, &msg); err != nil {
return nil
}
// role field: only broadcast user-role messages (canvas users)
var role string
if err := json.Unmarshal(msg["role"], &role); err != nil || role != "user" {
return nil
}
result := make(map[string]interface{})
// Extract messageId if present
var mid string
if err := json.Unmarshal(msg["messageId"], &mid); err == nil && mid != "" {
result["messageId"] = mid
}
// Extract text from parts — accumulate all text parts into a single string
var parts []json.RawMessage
if err := json.Unmarshal(msg["parts"], &parts); err == nil {
var texts []string
var fileAttachments []map[string]interface{}
for _, pRaw := range parts {
var p map[string]json.RawMessage
if err := json.Unmarshal(pRaw, &p); err != nil {
continue
}
var t string
if err := json.Unmarshal(p["text"], &t); err == nil && t != "" {
texts = append(texts, t)
}
var fileRaw json.RawMessage
if err := json.Unmarshal(p["file"], &fileRaw); err == nil && fileRaw != nil {
var f map[string]json.RawMessage
if err := json.Unmarshal(fileRaw, &f); err == nil {
att := make(map[string]interface{})
var s string
if err := json.Unmarshal(f["uri"], &s); err == nil {
att["uri"] = s
}
if err := json.Unmarshal(f["name"], &s); err == nil {
att["name"] = s
}
if err := json.Unmarshal(f["mimeType"], &s); err == nil {
att["mimeType"] = s
}
var n float64
if err := json.Unmarshal(f["size"], &n); err == nil {
att["size"] = n
}
if len(att) > 0 {
fileAttachments = append(fileAttachments, att)
}
}
}
}
if len(texts) > 0 {
// Join with newlines — user may have sent multiple text parts
result["message"] = strings.Join(texts, "\n")
}
if len(fileAttachments) > 0 {
result["attachments"] = fileAttachments
}
}
// Drop empty payloads
if len(result) == 0 {
return nil
}
return result
}
// extractToolTrace pulls metadata.tool_trace from an A2A JSON-RPC response.
// Returns nil when absent or malformed — callers can pass it straight through.
func extractToolTrace(respBody []byte) json.RawMessage {
@@ -586,64 +458,40 @@ func parseUsageFromA2AResponse(body []byte) (inputTokens, outputTokens int64) {
return 0, 0
}
// lookupDeliveryMode returns the workspace's delivery_mode.
//
// internal#497 / RFC#497 fail-closed (SURGICAL scope): the *specific*
// failure mode that hid the ce2db75f regression for 5 days is now
// propagated instead of silently swallowed — a CONTEXT error
// (context.Canceled / context.DeadlineExceeded). Under ce2db75f the
// detached delegation goroutine ran on a cancelled request context, every
// `SELECT delivery_mode` failed `context canceled`, this function returned
// push, the poll-mode short-circuit in proxyA2ARequest was skipped, and
// poll-mode peers (e.g. an operator laptop on molecule-mcp-claude-channel)
// silently never got their a2a_receive inbox row. A transient,
// systematic-once-triggered context cancellation became permanent
// invisible misrouting. Returning that error lets the caller fail loud
// (mark the delegation failed) instead of mis-dispatching.
//
// Scope is deliberately narrow: only ctx errors propagate. Other DB
// errors retain the long-standing documented "fall back to push (today's
// synchronous behavior)" contract — that path is loud + recoverable
// (502 / SSRF reject / restart), unlike the silent poll-mode drop, and
// the surrounding proxy (incl. the sibling checkWorkspaceBudget) is
// intentionally built around that fail-open-to-push behavior. Widening
// further is an RFC#497 follow-up, not part of this P0 fix.
//
// A genuinely *absent* configuration is NOT an error and still resolves to
// push (the safe synchronous default): sql.ErrNoRows, a NULL/empty column,
// or an unrecognised value all return (push, nil).
// lookupDeliveryMode returns the workspace's delivery_mode. On any DB
// error or missing row it returns DeliveryModePush — the fail-closed
// default. "Closed" here means "fall back to today's behavior (synchronous
// dispatch)" rather than "fall back to drop the request silently into
// activity_logs where the agent might never see it." A poll-mode workspace
// that briefly reads as push will get its A2A request dispatched to the
// stored URL (or a 502 if no URL); a push-mode workspace that briefly
// reads as poll would get its request silently queued with no dispatch.
// The first failure is loud + recoverable; the second is silent.
//
// The function is intentionally lookup-only — it never mutates the row.
// The register handler (registry.go) is the only writer for delivery_mode.
//
// See #2339 PR 1 for the column + register-flow side; this is the
// proxy-side read used for the short-circuit in proxyA2ARequest.
func lookupDeliveryMode(ctx context.Context, workspaceID string) (string, error) {
func lookupDeliveryMode(ctx context.Context, workspaceID string) string {
var mode sql.NullString
err := db.DB.QueryRowContext(ctx,
`SELECT delivery_mode FROM workspaces WHERE id = $1`, workspaceID,
).Scan(&mode)
if err != nil {
// internal#497: a context cancellation/deadline MUST NOT be
// swallowed into a silent push default — that is the exact 5-day
// silent-misrouting vector. Propagate so the caller fails closed.
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
log.Printf("ProxyA2A: lookupDeliveryMode(%s) context error (%v) — failing closed (NOT defaulting to push)", workspaceID, err)
return "", err
}
if !errors.Is(err, sql.ErrNoRows) {
log.Printf("ProxyA2A: lookupDeliveryMode(%s) failed (%v) — defaulting to push (non-ctx DB error; legacy fail-open-to-push contract)", workspaceID, err)
log.Printf("ProxyA2A: lookupDeliveryMode(%s) failed (%v) — defaulting to push", workspaceID, err)
}
return models.DeliveryModePush, nil
return models.DeliveryModePush
}
if !mode.Valid || mode.String == "" {
return models.DeliveryModePush, nil
return models.DeliveryModePush
}
if !models.IsValidDeliveryMode(mode.String) {
log.Printf("ProxyA2A: workspace %s has invalid delivery_mode=%q — defaulting to push", workspaceID, mode.String)
return models.DeliveryModePush, nil
return models.DeliveryModePush
}
return mode.String, nil
return mode.String
}
// logA2AReceiveQueued records a poll-mode "queued" A2A receive into
@@ -1,262 +0,0 @@
package handlers
import (
"encoding/json"
"testing"
)
// TestExtractCanvasUserMessage_TextOnly covers the primary path: a canvas user
// sends a plain text message with no attachments.
func TestExtractCanvasUserMessage_TextOnly(t *testing.T) {
body := []byte(`{
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": "msg-abc-123",
"parts": [
{"kind": "text", "text": "Hello, agent!"}
]
}
}
}`)
got := extractCanvasUserMessage(body)
if got == nil {
t.Fatal("expected non-nil payload for text message")
}
if got["message"] != "Hello, agent!" {
t.Errorf("message = %v, want %q", got["message"], "Hello, agent!")
}
mid, ok := got["messageId"].(string)
if !ok || mid != "msg-abc-123" {
t.Errorf("messageId = %v, want %q", got["messageId"], "msg-abc-123")
}
_, hasAttachments := got["attachments"]
if hasAttachments {
t.Errorf("unexpected attachments: %v", got["attachments"])
}
}
// TestExtractCanvasUserMessage_FileOnly covers a user message with a file but no text.
func TestExtractCanvasUserMessage_FileOnly(t *testing.T) {
body := []byte(`{
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": "msg-file-456",
"parts": [
{
"kind": "file",
"file": {
"name": "report.pdf",
"uri": "workspace:/uploads/report.pdf",
"mimeType": "application/pdf",
"size": 4096
}
}
]
}
}
}`)
got := extractCanvasUserMessage(body)
if got == nil {
t.Fatal("expected non-nil payload for file-only message")
}
if got["message"] != nil {
t.Errorf("unexpected message text: %v", got["message"])
}
attachments, ok := got["attachments"].([]map[string]interface{})
if !ok || len(attachments) != 1 {
t.Fatalf("attachments = %v, want 1-element array", got["attachments"])
}
att := attachments[0]
if att["uri"] != "workspace:/uploads/report.pdf" {
t.Errorf("uri = %v, want %q", att["uri"], "workspace:/uploads/report.pdf")
}
if att["name"] != "report.pdf" {
t.Errorf("name = %v, want %q", att["name"], "report.pdf")
}
if att["mimeType"] != "application/pdf" {
t.Errorf("mimeType = %v, want %q", att["mimeType"], "application/pdf")
}
}
// TestExtractCanvasUserMessage_TextAndFile covers a user message with both text and a file.
func TestExtractCanvasUserMessage_TextAndFile(t *testing.T) {
body := []byte(`{
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [
{"kind": "text", "text": "Here is the file:"},
{"kind": "text", "text": "see below"},
{
"kind": "file",
"file": {
"name": "data.csv",
"uri": "workspace:/exports/data.csv",
"mimeType": "text/csv",
"size": 8192
}
}
]
}
}
}`)
got := extractCanvasUserMessage(body)
if got == nil {
t.Fatal("expected non-nil payload")
}
// Two text parts are joined with newline
if got["message"] != "Here is the file:\nsee below" {
t.Errorf("message = %v, want %q", got["message"], "Here is the file:\nsee below")
}
attachments, ok := got["attachments"].([]map[string]interface{})
if !ok || len(attachments) != 1 {
t.Fatalf("attachments = %v, want 1-element array", got["attachments"])
}
}
// TestExtractCanvasUserMessage_Malformed covers malformed JSON.
func TestExtractCanvasUserMessage_Malformed(t *testing.T) {
cases := []struct {
name string
body []byte
}{
{"not JSON", []byte(`{not valid`)},
{"wrong type top-level", []byte(`123`)},
{"missing params", []byte(`{"method":"message/send"}`)},
{"params not object", []byte(`{"method":"message/send","params":123}`)},
{"missing message", []byte(`{"method":"message/send","params":{}}`)},
{"message not object", []byte(`{"method":"message/send","params":{"message":123}}`)},
{"role missing", []byte(`{"method":"message/send","params":{"message":{"parts":[]}}}`)},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := extractCanvasUserMessage(tc.body); got != nil {
t.Errorf("expected nil for %s, got %v", tc.name, got)
}
})
}
}
// TestExtractCanvasUserMessage_NotUserRole covers agent/workspace callers
// whose role is not "user" — these should not be broadcast as USER_MESSAGE.
func TestExtractCanvasUserMessage_NotUserRole(t *testing.T) {
cases := []struct {
name string
body []byte
}{
{
"agent role",
[]byte(`{"method":"message/send","params":{"message":{"role":"agent","parts":[{"kind":"text","text":"hello"}]}}}`),
},
{
"assistant role",
[]byte(`{"method":"message/send","params":{"message":{"role":"assistant","parts":[{"kind":"text","text":"hello"}]}}}`),
},
{
"empty role",
[]byte(`{"method":"message/send","params":{"message":{"role":"","parts":[{"kind":"text","text":"hello"}]}}}`),
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := extractCanvasUserMessage(tc.body); got != nil {
t.Errorf("expected nil for role=%s, got %v", tc.name, got)
}
})
}
}
// TestExtractCanvasUserMessage_NotMessageSend covers non-message/send methods.
func TestExtractCanvasUserMessage_NotMessageSend(t *testing.T) {
cases := []struct {
name string
method string
}{
{"tasks/send", "tasks/send"},
{"initialize", "initialize"},
{"ping", "ping"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
body, _ := json.Marshal(map[string]interface{}{
"method": tc.method,
"params": map[string]interface{}{
"message": map[string]interface{}{
"role": "user",
"parts": []map[string]interface{}{{"kind": "text", "text": "hello"}},
},
},
})
if got := extractCanvasUserMessage(body); got != nil {
t.Errorf("expected nil for method=%q, got %v", tc.method, got)
}
})
}
}
// TestExtractCanvasUserMessage_BlankOrEmpty covers text with only whitespace
// and empty parts arrays.
func TestExtractCanvasUserMessage_BlankOrEmpty(t *testing.T) {
cases := []struct {
name string
body []byte
}{
{
"empty text part",
[]byte(`{"method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":""}]}}}`),
},
{
"empty parts array",
[]byte(`{"method":"message/send","params":{"message":{"role":"user","parts":[]}}}`),
},
{
"whitespace-only text — still included as valid content",
[]byte(`{"method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":" "}]}}}`),
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := extractCanvasUserMessage(tc.body)
if tc.name == "whitespace-only text — still included as valid content" {
// Whitespace-only text is valid content — preserve it as-is.
// Canvas dedup collapses identical copies; whitespace is not stripped.
if got == nil {
t.Error("expected non-nil for whitespace-only text")
} else if got["message"] != " " {
t.Errorf("message = %q, want %q", got["message"], " ")
}
return
}
if got != nil {
t.Errorf("expected nil for %s, got %v", tc.name, got)
}
})
}
}
// TestExtractCanvasUserMessage_Unicode covers non-ASCII text.
func TestExtractCanvasUserMessage_Unicode(t *testing.T) {
body := []byte(`{
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [
{"kind": "text", "text": "こんにちは世界 🌍 日本語"}
]
}
}
}`)
got := extractCanvasUserMessage(body)
if got == nil {
t.Fatal("expected non-nil payload for unicode message")
}
if got["message"] != "こんにちは世界 🌍 日本語" {
t.Errorf("message = %v, want %q", got["message"], "こんにちは世界 🌍 日本語")
}
}
@@ -2235,18 +2235,12 @@ func TestProxyA2A_PushMode_NoShortCircuit(t *testing.T) {
}
}
// TestProxyA2A_PollMode_FailsClosedToPush verifies the LEGACY safety
// contract is PRESERVED for non-context DB errors: a generic DB error
// reading delivery_mode still defaults to push (today's behavior), NOT
// poll. Failing to push means a poll-mode workspace briefly attempts a
// real dispatch — visible failure (502 / SSRF rejection / restart
// cascade), not a silent drop into activity_logs where the agent might
// never look. Loud > silent, recoverable > lost.
//
// internal#497 narrows the fail-closed change to *context* errors only
// (the actual ce2db75f regression vector); generic DB errors keep this
// long-standing fail-open-to-push contract. The ctx-error fail-closed is
// covered by TestLookupDeliveryMode_ContextCanceled_FailsClosed.
// TestProxyA2A_PollMode_FailsClosedToPush verifies the safety contract:
// a DB error reading delivery_mode must default to push (the existing
// behavior), NOT poll. Failing to push means a poll-mode workspace
// briefly attempts a real dispatch — visible failure (502 / SSRF
// rejection / restart cascade), not a silent drop into activity_logs
// where the agent might never look. Loud > silent, recoverable > lost.
func TestProxyA2A_PollMode_FailsClosedToPush(t *testing.T) {
mock := setupTestDB(t)
setupTestRedis(t) // empty Redis — forces resolveAgentURL DB lookup
@@ -2257,8 +2251,7 @@ func TestProxyA2A_PollMode_FailsClosedToPush(t *testing.T) {
expectBudgetCheck(mock, wsID)
// lookupDeliveryMode hits a generic (non-context) DB error → must
// still default push (legacy contract preserved by internal#497).
// lookupDeliveryMode hits a transient DB error → must default push.
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
WithArgs(wsID).
WillReturnError(sql.ErrConnDone)
@@ -2282,7 +2275,7 @@ func TestProxyA2A_PollMode_FailsClosedToPush(t *testing.T) {
var resp map[string]interface{}
_ = json.Unmarshal(w.Body.Bytes(), &resp)
if resp["status"] == "queued" {
t.Errorf("generic DB error on delivery_mode lookup silently queued the request — must fail-open-to-push, got body: %s", w.Body.String())
t.Errorf("DB error on delivery_mode lookup silently queued the request — must fail-closed-to-push, got body: %s", w.Body.String())
}
}
@@ -2291,37 +2284,6 @@ func TestProxyA2A_PollMode_FailsClosedToPush(t *testing.T) {
}
}
// TestLookupDeliveryMode_ContextCanceled_FailsClosed is the internal#497
// regression test for the SECONDARY defect. It pins the exact invariant
// that hid the ce2db75f regression for 5 days: when the delivery_mode read
// fails because the context was cancelled (precisely what happened in the
// detached delegation goroutine running on a returned request context),
// lookupDeliveryMode MUST return an error and MUST NOT silently return
// "push". Returning push there is what skipped the poll-mode short-circuit
// and silently dropped 100% of poll-mode peer deliveries.
//
// A pre-cancelled context makes QueryRowContext fail with
// context.Canceled deterministically — no DB rows are mocked because the
// query never reaches a result.
func TestLookupDeliveryMode_ContextCanceled_FailsClosed(t *testing.T) {
mock := setupTestDB(t)
// The query fails on the cancelled ctx before matching; provide a
// permissive expectation so sqlmock doesn't complain about the attempt.
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
WillReturnError(context.Canceled)
ctx, cancel := context.WithCancel(context.Background())
cancel() // simulate the HTTP handler having returned (request ctx dead)
mode, err := lookupDeliveryMode(ctx, "ws-poll-peer")
if err == nil {
t.Fatalf("internal#497 regression: lookupDeliveryMode swallowed a context error and returned mode=%q with nil err — this is the exact 5-day silent-misrouting vector", mode)
}
if mode == models.DeliveryModePush {
t.Errorf("internal#497 regression: context error must NOT default to push (got mode=%q)", mode)
}
}
// ==================== a2aClient ResponseHeaderTimeout config ====================
func TestA2AClientResponseHeaderTimeout(t *testing.T) {
@@ -691,19 +691,6 @@ func logActivityExec(ctx context.Context, exec activityExecutor, broadcaster eve
if respStr != nil {
payload["response_body"] = json.RawMessage(respJSON)
}
// internal#211/#212: error_detail carries the runtime's curated,
// user-actionable, secret-safe failure reason (provider HTTP
// status + error code + the provider's own guidance, e.g. a 403
// "org disabled · use an API key / ask your admin"). It is
// already persisted to the DB column above and capped by the
// runtime's report_activity helper (4096 chars). Previously it
// was dropped from the LIVE broadcast, so the canvas had nothing
// to render and fell back to a hardcoded opaque
// "Agent error (Exception) — see workspace logs" string. Include
// it so the chat bubble shows the real reason in real time.
if params.ErrorDetail != nil && *params.ErrorDetail != "" {
payload["error_detail"] = *params.ErrorDetail
}
}
return func() {
@@ -1,113 +0,0 @@
package handlers
import "encoding/json"
// agent_card_reconcile.go — server-side repair for the fleet-wide
// agent-card identity gap.
//
// Root cause: the runtime builds its AgentCard from config.name
// (workspace/main.py:198), and config.name is read from the
// CP-regenerated /configs/config.yaml whose `name:` field is the raw
// workspace UUID — NOT the friendly name the operator sees. The friendly
// name IS captured: POST /workspaces and PATCH /workspaces/:id (the
// canvas Details tab) write it to the trusted workspaces.name DB column.
// But /registry/register stores the runtime-supplied card verbatim
// (registry.go: `agent_card = EXCLUDED.agent_card`), so the stored card
// served at /.well-known/agent-card.json and returned to peers via
// agent_card_url ends up with name = UUID, description = "", role = null.
//
// Fix shape (deliberately minimal, no contract weakening): when the
// runtime-supplied card's `name` is empty or equals the workspace UUID
// (the placeholder the runtime had no better value for), the PLATFORM —
// not the agent — substitutes the friendly value from the trusted
// workspaces row. Identity stays platform-controlled: the agent never
// gains the ability to self-set its own name/role; the platform sources
// it from the operator-controlled DB column. We only ever FILL gaps
// (empty / UUID-placeholder); a card that already carries a real
// friendly name is never downgraded.
//
// list_peers / the /registry/:id/peers endpoint already resolve display
// names from workspaces.name directly (discovery.go / mcp_tools.go
// `SELECT w.id, w.name, ...`), so peer_name in delivered message tags
// was already correct — this fix closes the remaining surface: the
// agent_card blob itself (canvas Agent Card / Skills view, peer
// agent_card_url fetches, the well-known card).
//
// description / role degrade discovery the same way: an empty
// description and null role give peers nothing to reason about. We
// default description from the (now reconciled) name when blank and
// role from workspaces.role when the operator set one.
// reconcileAgentCardIdentity patches identity gaps in a runtime-supplied
// agent card from the trusted workspace DB row. It returns the
// (possibly rewritten) card bytes and whether anything changed. On any
// failure (malformed JSON, nothing to fill) it returns the input bytes
// unchanged with changed=false so the caller can store them verbatim —
// this is strictly no-worse-than-before, never a regression.
//
// Pure function: no DB / HTTP / globals, so it is exhaustively
// unit-testable (agent_card_reconcile_test.go) without booting the
// handler or a sqlmock.
func reconcileAgentCardIdentity(card json.RawMessage, workspaceID, dbName, dbRole string) (json.RawMessage, bool) {
var m map[string]any
if err := json.Unmarshal(card, &m); err != nil || m == nil {
// Malformed card — not this function's job to reject it (the
// upsert stores it as-is and downstream readers handle bad
// JSON). Return verbatim so byte-for-byte behaviour is
// preserved on the failure path.
return card, false
}
changed := false
// name: fill only when empty or the UUID placeholder. A dbName that
// is itself the UUID is a placeholder row (registry.go INSERT seeds
// name = id before the canvas sets a friendly one) — not a friendly
// name, so it is not an eligible source.
cardName, _ := m["name"].(string)
if (cardName == "" || cardName == workspaceID) &&
dbName != "" && dbName != workspaceID {
m["name"] = dbName
changed = true
}
// description: when blank, default to the (reconciled) name so peers
// and the canvas Agent Card view have a non-empty human label
// instead of "". Mirrors the runtime's own
// `config.description or config.name` fallback (main.py:199) but
// applied to the registry copy where the runtime's fallback was the
// UUID.
if desc, _ := m["description"].(string); desc == "" {
if n, _ := m["name"].(string); n != "" && n != workspaceID {
m["description"] = n
changed = true
}
}
// role: surface the operator-set workspaces.role when the card
// carries none. Discovery (peer_role) and the canvas Role row read
// workspaces.role directly; this just makes the standalone card
// self-describing too. Never overwrite a role the card already has.
if dbRole != "" {
if r, ok := m["role"].(string); !ok || r == "" {
m["role"] = dbRole
changed = true
}
}
if !changed {
// No-op: return the original bytes untouched so callers that
// compare/store get byte-identical input (re-marshalling would
// reorder keys for no reason).
return card, false
}
out, err := json.Marshal(m)
if err != nil {
// Re-marshal of a map we just unmarshalled should never fail;
// if it somehow does, fall back to the verbatim input rather
// than storing nothing.
return card, false
}
return out, true
}
@@ -1,166 +0,0 @@
package handlers
import (
"encoding/json"
"testing"
)
// TestReconcileAgentCardIdentity covers the server-side backfill that
// repairs the fleet-wide agent-card identity gap (internal#XXX): the
// runtime POSTs /registry/register with agent_card.name = the workspace
// UUID (because the CP-regenerated /configs/config.yaml sets name: <uuid>)
// while the trusted workspaces.name DB column — the value the canvas
// Details tab shows and lets the operator edit — holds the friendly
// name ("Claude Code Agent"). The platform reconciles them from the DB
// row (NOT from the agent — identity stays platform-controlled, not
// self-mutable).
func TestReconcileAgentCardIdentity(t *testing.T) {
const wsID = "3b81321b-1ec7-488c-96f7-72c42a968da6"
tests := []struct {
name string
card string
dbName string
dbRole string
wantName string
wantDesc string
wantRole string
wantChanged bool
}{
{
name: "name is the workspace UUID — backfill from DB",
card: `{"name":"3b81321b-1ec7-488c-96f7-72c42a968da6","description":"","capabilities":{"streaming":true}}`,
dbName: "Claude Code Agent",
dbRole: "",
wantName: "Claude Code Agent",
wantDesc: "Claude Code Agent",
wantRole: "",
wantChanged: true,
},
{
name: "empty name — backfill from DB",
card: `{"name":"","description":"x"}`,
dbName: "ops-agent",
dbRole: "sre",
wantName: "ops-agent",
wantDesc: "x",
wantRole: "sre",
wantChanged: true,
},
{
name: "role null in card, DB has role — backfill role only",
card: `{"name":"Reviewer","description":"Senior reviewer"}`,
dbName: "Reviewer",
dbRole: "code-reviewer",
wantName: "Reviewer",
wantDesc: "Senior reviewer",
wantRole: "code-reviewer",
wantChanged: true,
},
{
name: "card already has a real friendly name — do NOT clobber it",
// A richer card (e.g. an external channel agent) must win;
// the platform only fills gaps, never downgrades.
card: `{"name":"Claude Code (channel)","description":"Local Claude Code session bridged","role":"assistant"}`,
dbName: "hongming-pc",
dbRole: "operator",
wantName: "Claude Code (channel)",
wantDesc: "Local Claude Code session bridged",
wantRole: "assistant",
wantChanged: false,
},
{
name: "no DB name available — leave UUID name untouched (no worse than before)",
card: `{"name":"3b81321b-1ec7-488c-96f7-72c42a968da6","description":""}`,
dbName: "",
dbRole: "",
wantName: "3b81321b-1ec7-488c-96f7-72c42a968da6",
wantDesc: "",
wantRole: "",
wantChanged: false,
},
{
name: "dbName equals UUID (placeholder row) — not a friendly name, leave untouched",
card: `{"name":"3b81321b-1ec7-488c-96f7-72c42a968da6"}`,
dbName: "3b81321b-1ec7-488c-96f7-72c42a968da6",
dbRole: "",
wantName: "3b81321b-1ec7-488c-96f7-72c42a968da6",
wantDesc: "",
wantRole: "",
wantChanged: false,
},
{
name: "malformed card JSON — return unchanged, no panic",
card: `{not json`,
dbName: "Claude Code Agent",
dbRole: "",
wantChanged: false,
},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
out, changed := reconcileAgentCardIdentity(
json.RawMessage(tc.card), wsID, tc.dbName, tc.dbRole,
)
if changed != tc.wantChanged {
t.Fatalf("changed = %v, want %v", changed, tc.wantChanged)
}
if !tc.wantChanged {
// Unchanged path must return the input bytes verbatim.
if string(out) != tc.card {
t.Fatalf("unchanged path mutated bytes:\n got %s\n want %s", out, tc.card)
}
return
}
var got map[string]any
if err := json.Unmarshal(out, &got); err != nil {
t.Fatalf("output not valid JSON: %v (%s)", err, out)
}
if g, _ := got["name"].(string); g != tc.wantName {
t.Errorf("name = %q, want %q", g, tc.wantName)
}
if g, _ := got["description"].(string); g != tc.wantDesc {
t.Errorf("description = %q, want %q", g, tc.wantDesc)
}
if tc.wantRole != "" {
if g, _ := got["role"].(string); g != tc.wantRole {
t.Errorf("role = %q, want %q", g, tc.wantRole)
}
}
})
}
}
// TestReconcileAgentCardIdentity_PreservesOtherFields ensures the
// reconcile is a minimal in-place patch — capabilities, version,
// skills and any unknown future fields survive untouched.
func TestReconcileAgentCardIdentity_PreservesOtherFields(t *testing.T) {
card := `{"name":"ws-uuid","description":"","version":"1.0.0",` +
`"capabilities":{"streaming":true,"pushNotifications":true},` +
`"skills":[{"id":"a","name":"a"}],"configuration_status":"ready"}`
out, changed := reconcileAgentCardIdentity(
json.RawMessage(card), "ws-uuid", "Friendly Name", "",
)
if !changed {
t.Fatal("expected changed = true")
}
var got map[string]any
if err := json.Unmarshal(out, &got); err != nil {
t.Fatalf("invalid JSON: %v", err)
}
if got["version"] != "1.0.0" {
t.Errorf("version not preserved: %v", got["version"])
}
if got["configuration_status"] != "ready" {
t.Errorf("configuration_status not preserved: %v", got["configuration_status"])
}
caps, ok := got["capabilities"].(map[string]any)
if !ok || caps["streaming"] != true {
t.Errorf("capabilities not preserved: %v", got["capabilities"])
}
skills, ok := got["skills"].([]any)
if !ok || len(skills) != 1 {
t.Errorf("skills not preserved: %v", got["skills"])
}
}
@@ -107,29 +107,10 @@ func (h *ChatFilesHandler) WithPendingUploads(storage pendinguploads.Storage, br
}
// chatUploadMaxBytes caps the full multipart request body so a
// malicious / runaway client can't OOM the proxy hop. 100 MB matches
// the workspace-side total limit; anything larger is rejected at the
// malicious / runaway client can't OOM the proxy hop. 50 MB matches
// the workspace-side limit; anything larger is rejected at the
// network boundary before forwarding.
//
// SSOT NOTE (issue #1520): this constant is the source of truth for
// chat upload limits across the platform. Its value is exported to
// the workspace container at provision time via the env var
// CHAT_UPLOAD_MAX_TOTAL_BYTES (see
// workspace_provision_shared.go::applyChatUploadLimits) so the
// Python runtime cap stays in lock-step. Do NOT change this without
// updating the per-file cap chatUploadMaxFileBytes below and
// verifying the env-injection site is unchanged.
const chatUploadMaxBytes = 100 * 1024 * 1024
// chatUploadMaxFileBytes caps any single multipart part. Mirrors the
// total cap by default because most chat uploads are a single file;
// keeping per-file equal to total avoids the surprise of "my 60 MB
// file fit under the total but got 413'd on per-file". Exported to
// the workspace container as CHAT_UPLOAD_MAX_FILE_BYTES so the
// Starlette parser's max_part_size matches and any single part above
// Starlette's default 1 MiB no longer raises MultiPartException
// (root cause of issue #1520).
const chatUploadMaxFileBytes = 100 * 1024 * 1024
const chatUploadMaxBytes = 50 * 1024 * 1024
// resolveWorkspaceForwardCreds resolves the workspace's URL +
// platform_inbound_secret for an /internal/* forward, applying
@@ -1,63 +0,0 @@
package handlers
// chat_upload_limits_test.go — pins the SSOT env-injection contract
// for chat-upload caps (issue #1520). The Python workspace runtime
// reads these env vars at module init; drift between the constant in
// chat_files.go and the env-var name here silently breaks chat upload
// fleet-wide, so the contract is asserted as a unit test in the same
// package as the producer.
import (
"fmt"
"testing"
)
// applyChatUploadLimits MUST seed both env vars to the byte-count
// stringification of the Go-side constants. Anything else means a
// Python-side parser cap that disagrees with the Go-side network cap,
// which is exactly the drift that shipped #1520.
func TestApplyChatUploadLimits_DefaultsMatchGoConstants(t *testing.T) {
env := map[string]string{}
applyChatUploadLimits(env)
wantFile := fmt.Sprintf("%d", chatUploadMaxFileBytes)
if got := env["CHAT_UPLOAD_MAX_FILE_BYTES"]; got != wantFile {
t.Errorf("CHAT_UPLOAD_MAX_FILE_BYTES = %q, want %q", got, wantFile)
}
wantTotal := fmt.Sprintf("%d", chatUploadMaxBytes)
if got := env["CHAT_UPLOAD_MAX_TOTAL_BYTES"]; got != wantTotal {
t.Errorf("CHAT_UPLOAD_MAX_TOTAL_BYTES = %q, want %q", got, wantTotal)
}
}
// Pre-existing values win. A tenant override, plugin mutator, or A/B
// experiment that already set the env MUST be preserved — the SSOT
// helper is a defaulting layer, not an override layer.
func TestApplyChatUploadLimits_PreExistingValuesPreserved(t *testing.T) {
env := map[string]string{
"CHAT_UPLOAD_MAX_FILE_BYTES": "1234",
"CHAT_UPLOAD_MAX_TOTAL_BYTES": "5678",
}
applyChatUploadLimits(env)
if got := env["CHAT_UPLOAD_MAX_FILE_BYTES"]; got != "1234" {
t.Errorf("pre-existing CHAT_UPLOAD_MAX_FILE_BYTES overwritten: got %q", got)
}
if got := env["CHAT_UPLOAD_MAX_TOTAL_BYTES"]; got != "5678" {
t.Errorf("pre-existing CHAT_UPLOAD_MAX_TOTAL_BYTES overwritten: got %q", got)
}
}
// The 100 MB minimum is the CTO-directed allowance floor (issue #1520).
// Pin so a future "tidy up: 100 MB seems large" refactor surfaces here
// before reverting the user-visible behaviour change.
func TestChatUploadCaps_MinimumAllowanceFloor(t *testing.T) {
const floor = 100 * 1024 * 1024
if chatUploadMaxBytes < floor {
t.Errorf("chatUploadMaxBytes = %d, below #1520 floor %d", chatUploadMaxBytes, floor)
}
if chatUploadMaxFileBytes < floor {
t.Errorf("chatUploadMaxFileBytes = %d, below #1520 floor %d", chatUploadMaxFileBytes, floor)
}
}
@@ -163,32 +163,8 @@ func (h *DelegationHandler) Delegate(c *gin.Context) {
},
})
// Fire-and-forget: send A2A in a background goroutine.
//
// internal#497 — the goroutine MUST NOT inherit the HTTP request's
// cancellation. `ctx` here is c.Request.Context(); the handler returns
// 202 a few lines below, which cancels that context immediately. Before
// this fix (regression ce2db75f) executeDelegation ran on the
// request-scoped ctx, so every DB op + proxy call in the detached
// goroutine failed `context canceled` the instant the 202 was written.
// That silently broke 100% of A2A peer delegations fleet-wide since
// 2026-05-12 (poll-mode peers never got their a2a_receive inbox row;
// lookupDeliveryMode swallowed the ctx error and defaulted to push).
//
// context.WithoutCancel detaches cancellation/deadline while PRESERVING
// all context values (trace/correlation/tenant ids that proxyA2ARequest
// and the broadcaster read off ctx) — this is the established pattern in
// this package (a2a_proxy.go:850, a2a_proxy_helpers.go:525,
// registry.go:822). The 30-minute ceiling matches the prior internal
// budget executeDelegation used before ce2db75f and the proxy's own
// absolute agent-dispatch ceiling (a2a_proxy.go forwardCtx).
delegationCtx, cancelDelegation := context.WithTimeout(
context.WithoutCancel(ctx), 30*time.Minute,
)
go func() {
defer cancelDelegation()
h.executeDelegation(delegationCtx, sourceID, body.TargetID, delegationID, a2aBody)
}()
// Fire-and-forget: send A2A in background goroutine
go h.executeDelegation(ctx, sourceID, body.TargetID, delegationID, a2aBody)
// Broadcast event so canvas shows delegation in real-time
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventDelegationSent), sourceID, map[string]interface{}{
@@ -747,14 +723,6 @@ func (h *DelegationHandler) listDelegationsFromLedger(ctx context.Context, works
entry["response_preview"] = textutil.TruncateBytes(resultPreview.String, 300)
}
if errorDetail.Valid && errorDetail.String != "" {
// Emit both keys: `error_detail` is the canonical field the
// Python poll-mode consumer (a2a_tools_delegation.py:184)
// reads from /delegations rows — without it, poll-mode
// silently loses the failure reason and falls through to
// the generic "delegation failed" string. `error` is kept
// for back-compat with existing UI surfaces that read the
// shorter name.
entry["error_detail"] = errorDetail.String
entry["error"] = errorDetail.String
}
if lastHeartbeat != nil {
@@ -816,8 +784,6 @@ func (h *DelegationHandler) listDelegationsFromActivityLogs(ctx context.Context,
entry["delegation_id"] = delegationID
}
if errorDetail != "" {
// Emit both keys per the rename: see listDelegationsFromLedger.
entry["error_detail"] = errorDetail
entry["error"] = errorDetail
}
if responseBody != "" {
@@ -16,65 +16,6 @@ import (
"github.com/gin-gonic/gin"
)
// ---------- internal#497 regression: detached goroutine ctx must outlive the handler ----------
// TestDelegate_DetachedContext_SurvivesRequestCancellation pins the
// load-bearing invariant that regression ce2db75f violated: the context
// handed to executeDelegation in the fire-and-forget goroutine must NOT be
// cancelled when the HTTP handler returns 202 (which cancels
// c.Request.Context()). Before the fix, executeDelegation ran on the
// request-scoped ctx, so every DB op + proxy call failed `context
// canceled` the instant the 202 was written — silently breaking 100% of
// A2A peer delegations fleet-wide since 2026-05-12.
//
// This test asserts the exact ctx-derivation contract used by Delegate
// (context.WithoutCancel(parent) + a timeout budget): the derived context
// (a) stays alive after the parent is cancelled, and (b) still carries
// parent values (trace/correlation/tenant ids the downstream proxy +
// broadcaster read off ctx). It is intentionally DB-free and fast.
func TestDelegate_DetachedContext_SurvivesRequestCancellation(t *testing.T) {
type ctxKey string
const traceKey ctxKey = "trace-id"
// Simulate c.Request.Context() carrying a correlation value.
parent, cancelParent := context.WithCancel(
context.WithValue(context.Background(), traceKey, "trace-abc-123"),
)
// Exact derivation Delegate uses for the detached goroutine.
delegationCtx, cancelDelegation := context.WithTimeout(
context.WithoutCancel(parent), 30*time.Minute,
)
defer cancelDelegation()
// The HTTP handler "returns 202" → request context is cancelled.
cancelParent()
if err := parent.Err(); err == nil {
t.Fatal("precondition: parent context should be cancelled after the handler returns")
}
// (a) Cancellation MUST NOT propagate to the detached context.
select {
case <-delegationCtx.Done():
t.Fatalf("regression: detached delegation ctx was cancelled by the handler returning (err=%v) — executeDelegation would fail every DB op with `context canceled`", delegationCtx.Err())
default:
// alive — correct
}
// (b) Parent values MUST still be readable (WithoutCancel preserves
// values; trace/correlation/tenant ids the proxy + broadcaster use).
if got, _ := delegationCtx.Value(traceKey).(string); got != "trace-abc-123" {
t.Errorf("detached ctx lost the parent trace value: got %q, want %q", got, "trace-abc-123")
}
// And it still has a real deadline (the 30m budget), so it is not an
// unbounded background context.
if _, hasDeadline := delegationCtx.Deadline(); !hasDeadline {
t.Error("detached ctx must carry the 30-minute timeout budget, but has no deadline")
}
}
// ---------- Delegate: missing target_id → 400 ----------
func TestDelegate_MissingTargetID(t *testing.T) {
@@ -1546,71 +1487,6 @@ func TestListDelegations_LedgerEmptyFallsBackToActivityLogs(t *testing.T) {
}
}
// ---------- ListDelegations: activity_logs failed row emits BOTH error + error_detail ----------
// Field-rename pin (P1 #348 / RFC #2829 PR-2 follow-up): the legacy
// activity_logs fallback path must also emit `error_detail` alongside
// the historical `error` key. Without this, poll-mode (which reads
// `error_detail`) silently loses the failure reason when the ledger
// is empty and the handler falls back to activity_logs.
func TestListDelegations_ActivityLogsFailedEmitsBothErrorKeys(t *testing.T) {
mock := setupTestDB(t)
setupTestRedis(t)
broadcaster := newTestBroadcaster()
wh := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
dh := NewDelegationHandler(wh, broadcaster)
// Ledger empty → fall back to activity_logs.
mock.ExpectQuery("SELECT d.delegation_id, d.caller_id, d.callee_id, d.task_preview").
WithArgs("ws-source").
WillReturnRows(sqlmock.NewRows([]string{
"delegation_id", "caller_id", "callee_id", "task_preview",
"status", "result_preview", "error_detail", "last_heartbeat",
"deadline", "created_at", "updated_at",
}))
now := time.Now()
activityRows := sqlmock.NewRows([]string{
"id", "activity_type", "source_id", "target_id",
"summary", "status", "error_detail", "response_body",
"delegation_id", "created_at",
}).AddRow(
"act-failed", "delegate_result", "ws-source", "ws-target",
"Delegation failed", "error", "codex runtime timed out", "",
"del-failed-002", now,
)
mock.ExpectQuery("SELECT id, activity_type").
WithArgs("ws-source").
WillReturnRows(activityRows)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ws-source"}}
c.Request = httptest.NewRequest("GET", "/workspaces/ws-source/delegations", nil)
dh.ListDelegations(c)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var resp []map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("failed to parse response: %v", err)
}
if len(resp) != 1 {
t.Fatalf("expected 1 row, got %d", len(resp))
}
if resp[0]["error"] != "codex runtime timed out" {
t.Errorf("expected `error` field set, got %v", resp[0]["error"])
}
if resp[0]["error_detail"] != "codex runtime timed out" {
t.Errorf("expected `error_detail` field set (poll-mode contract), got %v", resp[0]["error_detail"])
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// ---------- ListDelegations: both ledger and activity_logs empty → [] ----------
func TestListDelegations_BothEmptyReturnsEmptyArray(t *testing.T) {
@@ -1809,15 +1685,7 @@ func TestListDelegations_LedgerFailedIncludesErrorDetail(t *testing.T) {
t.Errorf("expected status 'failed', got %v", resp[0]["status"])
}
if resp[0]["error"] != "Callee workspace not reachable" {
t.Errorf("expected error detail under `error`, got %v", resp[0]["error"])
}
// Field-rename pin (P1 #348 / RFC #2829 PR-2 follow-up): the
// Python poll-mode consumer in a2a_tools_delegation.py:184 reads
// `error_detail`, not `error`. Both keys MUST be present so polling
// surfaces the real failure reason instead of falling through to
// the generic "delegation failed" string.
if resp[0]["error_detail"] != "Callee workspace not reachable" {
t.Errorf("expected error detail under `error_detail`, got %v", resp[0]["error_detail"])
t.Errorf("expected error detail, got %v", resp[0]["error"])
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
@@ -8,7 +8,6 @@ import (
"fmt"
"net/http"
"net/http/httptest"
"sync"
"testing"
"time"
@@ -23,39 +22,8 @@ import (
"github.com/redis/go-redis/v9"
)
// liveTestHandlers tracks every WorkspaceHandler built during the test
// binary's lifetime so setupTestDB can drain their in-flight goAsync
// goroutines (notably the detached RestartByID restart cycle, which
// reads the global db.DB) BEFORE restoring db.DB. Without this drain a
// fire-and-forget restart goroutine spawned by one test outlives that
// test and races the db.DB swap in a later test's t.Cleanup — the
// 0x...d548 data race on platform/internal/db.DB.
var (
liveTestHandlersMu sync.Mutex
liveTestHandlers []*WorkspaceHandler
)
func init() {
gin.SetMode(gin.TestMode)
newHandlerHook = func(h *WorkspaceHandler) {
liveTestHandlersMu.Lock()
liveTestHandlers = append(liveTestHandlers, h)
liveTestHandlersMu.Unlock()
}
}
// drainTestAsync waits for every tracked handler's goAsync goroutines to
// finish. Called from setupTestDB's cleanup before db.DB is restored so
// no detached restart/provision goroutine is mid-read of db.DB when the
// pointer is swapped.
func drainTestAsync() {
liveTestHandlersMu.Lock()
handlers := make([]*WorkspaceHandler, len(liveTestHandlers))
copy(handlers, liveTestHandlers)
liveTestHandlersMu.Unlock()
for _, h := range handlers {
h.waitAsyncForTest()
}
}
// setupTestDB creates a sqlmock DB and assigns it to the global db.DB.
@@ -74,16 +42,7 @@ func setupTestDB(t *testing.T) sqlmock.Sqlmock {
}
prevDB := db.DB
db.DB = mockDB
t.Cleanup(func() {
// Drain detached async goroutines (e.g. goAsync(RestartByID),
// which reads db.DB in runRestartCycle before its provisioner
// gate) BEFORE swapping db.DB back. Doing the restore first
// would let an in-flight restart goroutine read db.DB while
// this line writes it — the data race this guards against.
drainTestAsync()
db.DB = prevDB
mockDB.Close()
})
t.Cleanup(func() { db.DB = prevDB; mockDB.Close() })
// Disable SSRF checks for the duration of this test only. Restore
// the previous state via t.Cleanup so that TestIsSafeURL_* tests
@@ -1,53 +0,0 @@
package handlers
// plugins_install_test.go — additional coverage for plugins_install.go.
//
// Gaps filled vs. existing test files:
// - plugins_install_external_test.go: Install + Uninstall 422 (external runtime) ✓ covered
// - plugins_test.go: Install 400 (missing source, invalid body, etc.) ✓ covered
// Uninstall 400 (invalid plugin name, empty name) ✓ covered
// Download auth gate ✓ covered
// - org_import_helpers_test.go: countWorkspaces, envRequirementKey, sanitizeEnvMembers,
// flattenAndSortRequirements, collectOrgEnv ✓ covered
//
// New test added here:
// - Uninstall 503: container not running, no SaaS dispatch.
//
// NOTE: validateWorkspaceID is not called inside the Install/Uninstall handlers.
// UUID validation is the responsibility of the WorkspaceAuth middleware, so no
// 400 test is needed here for UUID format.
import (
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/gin-gonic/gin"
"github.com/stretchr/testify/require"
)
// TestPluginUninstall_ContainerNotRunning_Returns503 exercises the 503 path
// where neither a local Docker container nor a SaaS instance-id dispatch
// resolves. The handler must return "workspace container not running" — NOT a
// generic 500 or a misleading 422 (external-runtime) message.
func TestPluginUninstall_ContainerNotRunning_Returns503(t *testing.T) {
// No docker client + no instance-id lookup → falls through to 503.
h := NewPluginsHandler(t.TempDir(), nil, nil)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{
{Key: "id", Value: "550e8400-e29b-41d4-a716-446655440000"},
{Key: "name", Value: "some-plugin"},
}
c.Request = httptest.NewRequest("DELETE",
"/workspaces/550e8400-e29b-41d4-a716-446655440000/plugins/some-plugin", nil)
h.Uninstall(c)
require.Equal(t, http.StatusServiceUnavailable, w.Code)
var body map[string]string
json.Unmarshal(w.Body.Bytes(), &body)
require.Equal(t, "workspace container not running", body["error"])
}
@@ -1,141 +0,0 @@
package handlers
import (
"encoding/json"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"github.com/gin-gonic/gin"
)
func TestListRegistry_EmptyDir(t *testing.T) {
dir := t.TempDir()
h := NewPluginsHandler(dir, nil, nil)
got := h.listRegistryFiltered("")
if len(got) != 0 {
t.Errorf("expected empty list, got %d plugins", len(got))
}
}
func TestListRegistry_IgnoresFiles(t *testing.T) {
dir := t.TempDir()
if err := os.WriteFile(filepath.Join(dir, "not-a-plugin.txt"), []byte("x"), 0600); err != nil {
t.Fatal(err)
}
h := NewPluginsHandler(dir, nil, nil)
got := h.listRegistryFiltered("")
if len(got) != 0 {
t.Errorf("expected empty list (files ignored), got %d", len(got))
}
}
func TestListRegistry_SinglePlugin(t *testing.T) {
dir := t.TempDir()
pluginDir := filepath.Join(dir, "my-plugin")
if err := os.Mkdir(pluginDir, 0755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(pluginDir, "plugin.yaml"), []byte("name: my-plugin\nversion: 1.0.0\n"), 0600); err != nil {
t.Fatal(err)
}
h := NewPluginsHandler(dir, nil, nil)
got := h.listRegistryFiltered("")
if len(got) != 1 {
t.Fatalf("expected 1 plugin, got %d", len(got))
}
if got[0].Name != "my-plugin" {
t.Errorf("expected name 'my-plugin', got %q", got[0].Name)
}
}
func TestListRegistry_FiltersByRuntime(t *testing.T) {
dir := t.TempDir()
for _, spec := range []struct{ name, yaml string }{
{"runtime-a", "name: runtime-a\nruntimes:\n - claude-code\n"},
{"runtime-b", "name: runtime-b\nruntimes:\n - hermes\n"},
{"universal", "name: universal\nversion: 1.0.0\n"},
} {
pd := filepath.Join(dir, spec.name)
if err := os.Mkdir(pd, 0755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(pd, "plugin.yaml"), []byte(spec.yaml), 0600); err != nil {
t.Fatal(err)
}
}
h := NewPluginsHandler(dir, nil, nil)
// Filter to claude-code: runtime-a matches, universal (no runtimes field)
// is always included per supportsRuntime semantics.
got := h.listRegistryFiltered("claude-code")
if len(got) != 2 {
t.Fatalf("expected 2 (runtime-a + universal), got %d: %v", len(got), func() []string {
ns := make([]string, len(got))
for i, p := range got { ns[i] = p.Name }
return ns
}())
}
}
func TestListRegistry_PluginWithNoRuntimeDeclarations_AlwaysIncluded(t *testing.T) {
dir := t.TempDir()
pd := filepath.Join(dir, "universal-plugin")
if err := os.Mkdir(pd, 0755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(pd, "plugin.yaml"), []byte("name: universal-plugin\nversion: 1.0.0\n"), 0600); err != nil {
t.Fatal(err)
}
h := NewPluginsHandler(dir, nil, nil)
// When plugin declares no runtimes, it should always be included (try-it).
got := h.listRegistryFiltered("any-runtime")
if len(got) != 1 {
t.Errorf("expected 1 plugin (unspecified runtime), got %d", len(got))
}
}
func TestListRegistry_ReadDirError_ReturnsEmpty(t *testing.T) {
h := NewPluginsHandler("/nonexistent/path/for/plugins", nil, nil)
got := h.listRegistryFiltered("")
if len(got) != 0 {
t.Errorf("expected empty list on ReadDir error, got %d", len(got))
}
}
func TestListRegistry_HTTPEndpoint(t *testing.T) {
dir := t.TempDir()
pd := filepath.Join(dir, "test-plugin")
if err := os.Mkdir(pd, 0755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(pd, "plugin.yaml"), []byte("name: test-plugin\nversion: 2.0.0\n"), 0600); err != nil {
t.Fatal(err)
}
h := NewPluginsHandler(dir, nil, nil)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = httptest.NewRequest("GET", "/plugins", nil)
h.ListRegistry(c)
if w.Code != http.StatusOK {
t.Errorf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var plugins []pluginInfo
if err := json.Unmarshal(w.Body.Bytes(), &plugins); err != nil {
t.Fatalf("failed to parse JSON: %v", err)
}
if len(plugins) != 1 {
t.Errorf("expected 1 plugin, got %d", len(plugins))
}
if plugins[0].Name != "test-plugin" {
t.Errorf("expected name 'test-plugin', got %q", plugins[0].Name)
}
}
+3 -31
View File
@@ -327,33 +327,7 @@ func (h *RegistryHandler) Register(c *gin.Context) {
}
}
// Reconcile the runtime-supplied card's identity fields against the
// trusted workspaces row before storing. The runtime builds its card
// from config.name, which the CP-regenerated /configs/config.yaml
// sets to the workspace UUID — so without this the stored card
// served at /.well-known/agent-card.json and returned to peers via
// agent_card_url has name = UUID, description = "", role = null even
// though the operator-controlled workspaces.name holds the friendly
// name the canvas shows. We only FILL gaps from the DB (never
// downgrade a card that already carries a real name); identity stays
// platform-controlled — the agent cannot self-set these. Best-effort:
// a lookup failure leaves the card exactly as the runtime sent it
// (no-worse-than-before). See agent_card_reconcile.go.
reconciledCard := payload.AgentCard
{
var dbName, dbRole sql.NullString
if qErr := db.DB.QueryRowContext(ctx,
`SELECT name, role FROM workspaces WHERE id = $1`, payload.ID,
).Scan(&dbName, &dbRole); qErr == nil {
if rc, did := reconcileAgentCardIdentity(
payload.AgentCard, payload.ID, dbName.String, dbRole.String,
); did {
reconciledCard = rc
log.Printf("Registry register: reconciled agent_card identity for %s from workspaces row", payload.ID)
}
}
}
agentCardStr := string(reconciledCard)
agentCardStr := string(payload.AgentCard)
// urlForUpsert: poll-mode workspaces don't need a URL. Empty input
// becomes NULL via sql.NullString so the row's URL stays clean (the
@@ -439,12 +413,10 @@ func (h *RegistryHandler) Register(c *gin.Context) {
}
}
// Broadcast WORKSPACE_ONLINE — use the reconciled card so the canvas
// Agent Card view live-updates with the friendly name, matching what
// was just persisted (not the runtime's raw UUID-name card).
// Broadcast WORKSPACE_ONLINE
if err := h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), payload.ID, map[string]interface{}{
"url": cachedURL,
"agent_card": reconciledCard,
"agent_card": payload.AgentCard,
"delivery_mode": effectiveMode,
}); err != nil {
log.Printf("Registry broadcast error: %v", err)
@@ -56,10 +56,8 @@ const (
// (an externally routable address) is used directly.
func (h *WorkspaceHandler) gracefulPreRestart(ctx context.Context, workspaceID string) {
// Non-blocking send — don't stall the restart cycle.
// Run in a tracked async goroutine (goAsync, not bare `go`) so the
// caller (runRestartCycle) can proceed to stopForRestart without
// waiting, while the test harness can still drain it before swapping
// the global db.DB (resolveAgentURLForRestartSignal reads db.DB).
// Run in a detached goroutine so the caller (runRestartCycle) can
// proceed to stopForRestart without waiting.
h.goAsync(func() {
signalCtx, cancel := context.WithTimeout(context.Background(), restartSignalTimeout)
defer cancel()
@@ -86,40 +86,6 @@ var fallbackRuntimes = map[string]struct{}{
"mock": {},
}
// stripJSON5Comments removes // single-line comments from JSON5-formatted
// data. The Integration Tester appends "// Triggered by <job>" to
// manifest.json after cloning, which causes json.Unmarshal to fail with
// "invalid character '/'". This strips trailing and mid-file comments
// before parsing so Go's strict JSON parser accepts JSON5 files.
//
// Handles:
// - Standalone comment lines: // comment
// - Trailing comments: "key": "value", // comment
// - Comments inside strings are NOT touched ("http://example.com")
func stripJSON5Comments(data []byte) []byte {
var result []byte
inString := false
i := 0
for i < len(data) {
if data[i] == '"' && (i == 0 || data[i-1] != '\\') {
inString = !inString
result = append(result, data[i])
i++
continue
}
if !inString && i+1 < len(data) && data[i] == '/' && data[i+1] == '/' {
// Skip to end of line
for i < len(data) && data[i] != '\n' {
i++
}
continue
}
result = append(result, data[i])
i++
}
return result
}
// loadRuntimesFromManifest builds the runtime allowlist from
// manifest.json. Each workspace_templates[].name is normalized to its
// base runtime identifier (strips the `-default` suffix templates
@@ -135,9 +101,6 @@ func loadRuntimesFromManifest(path string) (map[string]struct{}, error) {
if err != nil {
return nil, err
}
// Strip JSON5 // comments before parsing. The Integration Tester
// appends "// Triggered by <job>" to manifest.json after cloning.
data = stripJSON5Comments(data)
var m manifestFile
if err := json.Unmarshal(data, &m); err != nil {
return nil, err
@@ -8,7 +8,6 @@ package handlers
// fallback (tested at the initKnownRuntimes level via integration).
import (
"encoding/json"
"os"
"path/filepath"
"testing"
@@ -112,133 +111,3 @@ func keys(m map[string]struct{}) []string {
}
return out
}
// ── stripJSON5Comments tests ─────────────────────────────────────────────────
func TestStripJSON5Comments_Standalone(t *testing.T) {
// Whitespace before the // is preserved; only the // and its text are removed.
// The result is still valid JSON.
input := "{\n\t// This is a comment\n\t\"workspace_templates\": []\n}"
got := string(stripJSON5Comments([]byte(input)))
// Stripping should produce valid JSON: try parsing it
var m manifestFile
if err := json.Unmarshal([]byte(got), &m); err != nil {
t.Errorf("output is not valid JSON: %v\ngot: %q", err, got)
}
if m.WorkspaceTemplates == nil {
t.Error("WorkspaceTemplates field should be present after parsing")
}
}
func TestStripJSON5Comments_Trailing(t *testing.T) {
input := `{"key": "value"} // trailing comment`
got := string(stripJSON5Comments([]byte(input)))
want := `{"key": "value"} `
if got != want {
t.Errorf("trailing comment:\ngot: %q\nwant: %q", got, want)
}
}
func TestStripJSON5Comments_URLsPreserved(t *testing.T) {
// URLs with // in them must NOT be stripped
input := `{"url": "https://example.com/path"}`
got := string(stripJSON5Comments([]byte(input)))
if got != input {
t.Errorf("URL stripped: got %q, want %q", got, input)
}
}
func TestStripJSON5Comments_InlineComment(t *testing.T) {
// Whitespace before // is preserved; only the comment text is removed.
// Result must be valid JSON.
input := "{\n\t\"workspace_templates\": [] // inline comment\n}"
got := string(stripJSON5Comments([]byte(input)))
var m manifestFile
if err := json.Unmarshal([]byte(got), &m); err != nil {
t.Errorf("output is not valid JSON: %v\ngot: %q", err, got)
}
}
func TestStripJSON5Comments_IntegrationTesterAppend(t *testing.T) {
// Simulates what the Integration Tester does: appends // Triggered by ...
// to the end of a valid JSON file.
input := `{
"workspace_templates": [
{"name": "hermes", "repo": "org/hermes"}
]
}
// Triggered by e2e-test job 12345
`
got := string(stripJSON5Comments([]byte(input)))
if got == input {
t.Error("Integration Tester comment was NOT stripped")
}
// After stripping, it should be valid JSON
var m manifestFile
if err := json.Unmarshal([]byte(got), &m); err != nil {
t.Errorf("stripped content is not valid JSON: %v\ngot: %q", err, got)
}
if len(m.WorkspaceTemplates) != 1 || m.WorkspaceTemplates[0].Name != "hermes" {
t.Errorf("workspace_templates not parsed: %+v", m.WorkspaceTemplates)
}
}
func TestStripJSON5Comments_NoComments(t *testing.T) {
input := `{"workspace_templates": [{"name": "hermes"}]}`
got := string(stripJSON5Comments([]byte(input)))
if got != input {
t.Errorf("unmodified JSON changed: got %q", got)
}
}
// ── loadRuntimesFromManifest with JSON5 comments ─────────────────────────────
func TestLoadRuntimesFromManifest_WithJSON5TrailingComment(t *testing.T) {
// Regression: Integration Tester appends "// Triggered by ..." to manifest.json
// after cloning. Before the fix, this caused json.Unmarshal to fail with
// "invalid character '/'". After the fix, the comment is stripped first.
dir := t.TempDir()
path := filepath.Join(dir, "manifest.json")
manifest := `{
"workspace_templates": [
{"name": "hermes", "repo": "org/hermes"},
{"name": "claude-code-default", "repo": "org/cc"}
]
}
// Triggered by e2e-test job
`
if err := os.WriteFile(path, []byte(manifest), 0600); err != nil {
t.Fatalf("write: %v", err)
}
got, err := loadRuntimesFromManifest(path)
if err != nil {
t.Fatalf("loadRuntimesFromManifest with trailing comment: %v", err)
}
for _, must := range []string{"hermes", "claude-code"} {
if _, ok := got[must]; !ok {
t.Errorf("expected runtime %q in result: %v", must, keys(got))
}
}
}
func TestLoadRuntimesFromManifest_WithInlineJSON5Comment(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "manifest.json")
// JSON5 with inline comment
manifest := `{
// runtime templates
"workspace_templates": [
{"name": "langgraph", "repo": "org/lg"} // the default
]
}`
if err := os.WriteFile(path, []byte(manifest), 0600); err != nil {
t.Fatalf("write: %v", err)
}
got, err := loadRuntimesFromManifest(path)
if err != nil {
t.Fatalf("loadRuntimesFromManifest with inline comment: %v", err)
}
if _, ok := got["langgraph"]; !ok {
t.Errorf("expected langgraph in result: %v", keys(got))
}
}
@@ -1,117 +0,0 @@
package handlers
// template_files_agent_home_stub_test.go — pins the Phase-1 stub
// contract for the /agent-home root added by internal#425 RFC.
//
// Today (pre-Phase-2b), every Files API verb against `?root=/agent-home`
// must return HTTP 501 with the canonical pending-message body. The
// stub MUST NOT:
// 1. Hit the DB (the workspace might not even exist yet from the
// canvas's POV — the root selector is testable without one).
// 2. Touch the EIC tunnel / Docker / template-dir paths — those
// would 500/404/[] depending on the env and confuse the canvas.
// 3. Accept writes/deletes that the future docker-exec backend
// would reject — fail closed.
//
// When Phase 2b lands, this file gets replaced by a real
// docker-exec dispatch test; the stub-message constant in
// templates.go disappears.
import (
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/gin-gonic/gin"
)
// TestAgentHomeAllowedRoot pins that /agent-home is in the allowedRoots
// set. Without this, a future refactor that drops the key would
// silently degrade the canvas root selector to a 400 instead of the
// stub 501.
func TestAgentHomeAllowedRoot(t *testing.T) {
if !allowedRoots["/agent-home"] {
t.Fatal("/agent-home must be in allowedRoots — RFC #425 contract")
}
}
// TestAgentHomeStub_AllVerbs_Return501 pins the canonical stub
// response across all four verbs. Each must:
//
// - status 501
// - body contains the canonical "/agent-home not implemented" prefix
// - NOT contain "workspace not found" (proves we short-circuit before
// the DB lookup)
//
// Driven as a table to keep symmetry — adding a fifth verb in the
// future means adding one row here.
func TestAgentHomeStub_AllVerbs_Return501(t *testing.T) {
cases := []struct {
name string
method string
invoke func(c *gin.Context)
}{
{
name: "ListFiles",
method: "GET",
invoke: func(c *gin.Context) { (&TemplatesHandler{}).ListFiles(c) },
},
{
name: "ReadFile",
method: "GET",
invoke: func(c *gin.Context) { (&TemplatesHandler{}).ReadFile(c) },
},
{
name: "WriteFile",
method: "PUT",
invoke: func(c *gin.Context) { (&TemplatesHandler{}).WriteFile(c) },
},
{
name: "DeleteFile",
method: "DELETE",
invoke: func(c *gin.Context) { (&TemplatesHandler{}).DeleteFile(c) },
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{
{Key: "id", Value: "ws-stub"},
// Path param without leading slash so DeleteFile's
// filepath.IsAbs guard doesn't 400 before the root
// dispatch runs. The List/Read/Write paths strip the
// leading slash themselves and accept either form.
{Key: "path", Value: "notes.md"},
}
// WriteFile binds JSON; provide a minimal valid body so the
// short-circuit isn't masked by the bind-error path.
var body string
if tc.method == "PUT" {
body = `{"content":"x"}`
}
c.Request = httptest.NewRequest(
tc.method,
"/workspaces/ws-stub/files/notes.md?root=/agent-home",
strings.NewReader(body),
)
if body != "" {
c.Request.Header.Set("Content-Type", "application/json")
}
tc.invoke(c)
if w.Code != http.StatusNotImplemented {
t.Fatalf("expected 501, got %d: %s", w.Code, w.Body.String())
}
if !strings.Contains(w.Body.String(), "/agent-home not implemented") {
t.Errorf("body should contain canonical stub message; got %s", w.Body.String())
}
if strings.Contains(w.Body.String(), "workspace not found") {
t.Errorf("stub leaked through to DB lookup; body=%s", w.Body.String())
}
})
}
}
@@ -19,7 +19,6 @@ package handlers
import (
"bytes"
"context"
"errors"
"fmt"
"log"
"os"
@@ -358,28 +357,6 @@ func writeFileViaEIC(ctx context.Context, instanceID, runtime, root, relPath str
var stderr bytes.Buffer
sshCmd.Stderr = &stderr
if err := sshCmd.Run(); err != nil {
// When the per-op context deadline (eicFileOpTimeout) fires,
// exec.CommandContext SIGKILLs the ssh subprocess and Run()
// returns the bare "signal: killed" with empty stderr. That
// surfaced to the canvas as an opaque
// `500 {"error":"ssh install: signal: killed ()"}` which gave
// the operator no idea the workspace was simply mid-provision
// with a slow/unready EIC tunnel (internal#423). Detect the
// deadline explicitly and return an actionable message instead
// — the EIC mechanism, timeout value, and success path are all
// unchanged; this only improves the error a stuck write emits.
if cerr := ctx.Err(); cerr != nil {
reason := "timed out after " + eicFileOpTimeout.String()
if errors.Is(cerr, context.Canceled) && !errors.Is(cerr, context.DeadlineExceeded) {
reason = "was cancelled"
}
return fmt.Errorf(
"ssh install: EIC tunnel to workspace %s — "+
"the workspace may still be provisioning (slow/unready SSH); "+
"retry once it is online, or apply provider credentials via "+
"Settings → Secrets (encrypted, does not use this file-write path)",
reason)
}
return fmt.Errorf("ssh install: %w (%s)", err, strings.TrimSpace(stderr.String()))
}
log.Printf("writeFileViaEIC: ws instance=%s runtime=%s root=%s wrote %d bytes → %s",
@@ -1,71 +0,0 @@
package handlers
// template_files_eic_write_timeout_test.go — pins the actionable-error
// behavior added for internal#423.
//
// When the per-op context deadline (eicFileOpTimeout) fires,
// exec.CommandContext SIGKILLs the ssh subprocess and Run() returns the
// bare "signal: killed" with empty stderr. Before the fix that surfaced
// to the canvas as an opaque `500 {"error":"ssh install: signal:
// killed ()"}` — useless to an operator whose workspace was simply
// mid-provision with a slow/unready EIC tunnel. The fix detects the
// deadline explicitly (errors.Is(ctx.Err(), context.DeadlineExceeded))
// and returns a message that names the cause and the
// Settings → Secrets workaround.
import (
"context"
"strings"
"testing"
"time"
)
// TestWriteFileViaEIC_DeadlineExceeded_ActionableError stubs
// withEICTunnel so the *real* inner closure runs against a context that
// has already exceeded its deadline. The ssh subprocess fails (no real
// sshd on the fake port) and ctx.Err() == DeadlineExceeded, so the new
// branch must fire and produce an actionable message — NOT the opaque
// "signal: killed ()" string the canvas used to show.
func TestWriteFileViaEIC_DeadlineExceeded_ActionableError(t *testing.T) {
prev := withEICTunnel
withEICTunnel = func(_ context.Context, instanceID string, fn func(s eicSSHSession) error) error {
// Run the real inner closure. It closes over the ctx that
// writeFileViaEIC derived from our already-cancelled parent, so
// the ssh subprocess is killed immediately and ctx.Err()
// resolves — exactly the eicFileOpTimeout-expiry shape.
return fn(eicSSHSession{
instanceID: instanceID,
osUser: "ubuntu",
localPort: 1, // nothing listening → ssh fails fast
keyPath: "/nonexistent/key",
})
}
t.Cleanup(func() { withEICTunnel = prev })
// Drive the real writeFileViaEIC. Pass a parent whose deadline has
// already passed: the context.WithTimeout(ctx, eicFileOpTimeout)
// derived inside writeFileViaEIC inherits the expired parent
// deadline, so ctx.Err() == context.DeadlineExceeded by the time
// the killed ssh subprocess returns — the exact production shape
// (eicFileOpTimeout expiry), exercised deterministically.
parent, cancel := context.WithDeadline(context.Background(), time.Now().Add(-time.Second))
defer cancel()
err := writeFileViaEIC(parent, "i-test", "claude-code", "/configs", "config.yaml", []byte("model: sonnet\n"))
if err == nil {
t.Fatalf("expected an error from a killed ssh subprocess, got nil")
}
msg := err.Error()
// Must NOT leak the opaque bare-signal string to the operator.
if strings.Contains(msg, "signal: killed ()") {
t.Fatalf("error still surfaces the opaque %q form: %q", "signal: killed ()", msg)
}
// Must name the cause and the Secrets workaround so the canvas
// shows something actionable.
for _, want := range []string{"timed out", "provisioning", "Settings", "Secrets"} {
if !strings.Contains(msg, want) {
t.Errorf("actionable error missing %q; got: %q", want, msg)
}
}
}
@@ -18,35 +18,11 @@ import (
)
// allowedRoots are the container paths that the Files API can browse.
//
// `/agent-home` (added 2026-05-15, internal#425 RFC) is the container's
// own $HOME — `/root` for openclaw, `/home/agent` for claude-code/hermes
// — browsed via `docker exec` rather than host-side `find`. The
// dispatch is stubbed today (returns 501); full implementation lands in
// Phase 2b of the RFC. The allowedRoots key is added now so the canvas
// can design its root-selector UI against the final shape and the
// stub-vs-full transition is server-side only.
var allowedRoots = map[string]bool{
"/configs": true,
"/workspace": true,
"/home": true,
"/plugins": true,
"/agent-home": true,
}
// agentHomeStubMessage is the body returned by every Files API verb
// when `?root=/agent-home` is requested before Phase 2b lands. Keep the
// status code 501 (Not Implemented) — the route exists, the verb is
// understood, but the handler is unimplemented. Distinguishes from
// 400/404 so a canvas behind a less-current server can render a clean
// "feature pending" state instead of a generic error.
const agentHomeStubMessage = "/agent-home not implemented yet (internal#425 RFC Phase 2b — docker-exec backend pending)"
// isAgentHomeStubRequest returns true when the request targets the
// stubbed /agent-home root. Centralised so every verb in this file
// short-circuits with the same response shape.
func isAgentHomeStubRequest(rootPath string) bool {
return rootPath == "/agent-home"
"/configs": true,
"/workspace": true,
"/home": true,
"/plugins": true,
}
// maxUploadFiles limits the number of files in a single import/replace.
@@ -248,14 +224,7 @@ func (h *TemplatesHandler) ListFiles(c *gin.Context) {
// ?depth= — max depth to recurse (default: 1, max: 5)
rootPath := c.DefaultQuery("root", "/configs")
if !allowedRoots[rootPath] {
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins, /agent-home"})
return
}
// /agent-home dispatch is stubbed pre-Phase-2b. Short-circuit before
// the DB lookup + EIC dance so a canvas exercising the new root key
// gets a clean 501 instead of a half-effort response.
if isAgentHomeStubRequest(rootPath) {
c.JSON(http.StatusNotImplemented, gin.H{"error": agentHomeStubMessage})
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"})
return
}
subPath := c.DefaultQuery("path", "")
@@ -424,11 +393,7 @@ func (h *TemplatesHandler) ReadFile(c *gin.Context) {
ctx := c.Request.Context()
rootPath := c.DefaultQuery("root", "/configs")
if !allowedRoots[rootPath] {
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins, /agent-home"})
return
}
if isAgentHomeStubRequest(rootPath) {
c.JSON(http.StatusNotImplemented, gin.H{"error": agentHomeStubMessage})
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"})
return
}
@@ -541,11 +506,7 @@ func (h *TemplatesHandler) WriteFile(c *gin.Context) {
ctx := c.Request.Context()
rootPath := c.DefaultQuery("root", "/configs")
if !allowedRoots[rootPath] {
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins, /agent-home"})
return
}
if isAgentHomeStubRequest(rootPath) {
c.JSON(http.StatusNotImplemented, gin.H{"error": agentHomeStubMessage})
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"})
return
}
var wsName, instanceID, runtime string
@@ -622,11 +583,7 @@ func (h *TemplatesHandler) DeleteFile(c *gin.Context) {
ctx := c.Request.Context()
rootPath := c.DefaultQuery("root", "/configs")
if !allowedRoots[rootPath] {
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins, /agent-home"})
return
}
if isAgentHomeStubRequest(rootPath) {
c.JSON(http.StatusNotImplemented, gin.H{"error": agentHomeStubMessage})
c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"})
return
}
var wsName, instanceID, runtime string
@@ -10,20 +10,8 @@ import (
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/wsauth"
"github.com/gin-gonic/gin"
"github.com/google/uuid"
)
// validWorkspaceID returns true when id is a syntactically valid UUID.
// workspace_id is a `uuid` column; passing a non-UUID (e.g. the canvas
// "global" sentinel sent when no node is selected) makes Postgres raise
// `invalid input syntax for type uuid`, which previously leaked as an
// opaque 500. Reject up front with a clean 400 instead. Mirrors the
// uuid.Parse guard already used in handlers/activity.go.
func validWorkspaceID(id string) bool {
_, err := uuid.Parse(id)
return err == nil
}
// TokenHandler exposes user-facing token management for workspaces.
// Routes: GET/POST/DELETE /workspaces/:id/tokens (behind WorkspaceAuth).
type TokenHandler struct{}
@@ -43,10 +31,6 @@ type tokenListItem struct {
// never the plaintext or hash).
func (h *TokenHandler) List(c *gin.Context) {
workspaceID := c.Param("id")
if !validWorkspaceID(workspaceID) {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid workspace id"})
return
}
limit := 50
if v := c.Query("limit"); v != "" {
@@ -69,7 +53,6 @@ func (h *TokenHandler) List(c *gin.Context) {
LIMIT $2 OFFSET $3
`, workspaceID, limit, offset)
if err != nil {
log.Printf("tokens: list query failed for workspace %s: %v", workspaceID, err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to list tokens"})
return
}
@@ -102,10 +85,6 @@ const maxTokensPerWorkspace = 50
// exactly once in the response — it cannot be recovered afterwards.
func (h *TokenHandler) Create(c *gin.Context) {
workspaceID := c.Param("id")
if !validWorkspaceID(workspaceID) {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid workspace id"})
return
}
// Rate limit: max active tokens per workspace
var count int
@@ -138,10 +117,6 @@ func (h *TokenHandler) Create(c *gin.Context) {
func (h *TokenHandler) Revoke(c *gin.Context) {
workspaceID := c.Param("id")
tokenID := c.Param("tokenId")
if !validWorkspaceID(workspaceID) {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid workspace id"})
return
}
result, err := db.DB.ExecContext(c.Request.Context(), `
UPDATE workspace_auth_tokens
@@ -41,15 +41,6 @@ import (
func init() { gin.SetMode(gin.TestMode) }
// Workspace IDs are validated as UUIDs up front (tokens.go validWorkspaceID),
// so handler tests must pass syntactically valid UUIDs. Fixed values keep
// sqlmock WithArgs assertions deterministic.
const (
wsUUID1 = "11111111-1111-1111-1111-111111111111"
wsUUID2 = "22222222-2222-2222-2222-222222222222"
wsUUID3 = "33333333-3333-3333-3333-333333333333"
)
// withMockDB swaps `db.DB` for a sqlmock and returns the mock plus a
// restore func. Tests use this in place of setupTokenTestDB which
// skips on a missing real DB.
@@ -90,13 +81,13 @@ func TestTokenHandler_List_HappyPath(t *testing.T) {
created := time.Date(2026, 4, 1, 12, 0, 0, 0, time.UTC)
last := created.Add(time.Hour)
mock.ExpectQuery(`SELECT id, prefix, created_at, last_used_at\s+FROM workspace_auth_tokens`).
WithArgs(wsUUID1, 50, 0).
WithArgs("ws-1", 50, 0).
WillReturnRows(sqlmock.NewRows([]string{"id", "prefix", "created_at", "last_used_at"}).
AddRow("tok-1", "abc12345", created, last).
AddRow("tok-2", "def67890", created, nil))
w := makeReq(t, NewTokenHandler().List, "GET",
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: wsUUID1}})
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: "ws-1"}})
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
@@ -130,7 +121,7 @@ func TestTokenHandler_List_EmptyResult(t *testing.T) {
WillReturnRows(sqlmock.NewRows([]string{"id", "prefix", "created_at", "last_used_at"}))
w := makeReq(t, NewTokenHandler().List, "GET",
"/workspaces/ws-2/tokens", gin.Params{{Key: "id", Value: wsUUID2}})
"/workspaces/ws-2/tokens", gin.Params{{Key: "id", Value: "ws-2"}})
if w.Code != http.StatusOK {
t.Fatalf("expected 200 on empty list, got %d", w.Code)
@@ -155,7 +146,7 @@ func TestTokenHandler_List_QueryError(t *testing.T) {
WillReturnError(errors.New("connection refused"))
w := makeReq(t, NewTokenHandler().List, "GET",
"/workspaces/ws-3/tokens", gin.Params{{Key: "id", Value: wsUUID3}})
"/workspaces/ws-3/tokens", gin.Params{{Key: "id", Value: "ws-3"}})
if w.Code != http.StatusInternalServerError {
t.Errorf("query error must surface as 500, got %d", w.Code)
@@ -167,13 +158,13 @@ func TestTokenHandler_List_RespectsLimit(t *testing.T) {
defer cleanup()
mock.ExpectQuery(`SELECT id, prefix, created_at, last_used_at`).
WithArgs(wsUUID1, 10, 5).
WithArgs("ws-1", 10, 5).
WillReturnRows(sqlmock.NewRows([]string{"id", "prefix", "created_at", "last_used_at"}))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = httptest.NewRequest("GET", "/workspaces/ws-1/tokens?limit=10&offset=5", nil)
c.Params = gin.Params{{Key: "id", Value: wsUUID1}}
c.Params = gin.Params{{Key: "id", Value: "ws-1"}}
NewTokenHandler().List(c)
if w.Code != http.StatusOK {
@@ -195,7 +186,7 @@ func TestTokenHandler_List_ScanError(t *testing.T) {
AddRow("tok-1", "abc", "not-a-timestamp", nil))
w := makeReq(t, NewTokenHandler().List, "GET",
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: wsUUID1}})
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: "ws-1"}})
if w.Code != http.StatusInternalServerError {
t.Errorf("scan error must surface as 500, got %d: %s", w.Code, w.Body.String())
@@ -210,11 +201,11 @@ func TestTokenHandler_Create_RateLimited(t *testing.T) {
// Count query returns 50 (== max) → 429.
mock.ExpectQuery(`SELECT COUNT\(\*\) FROM workspace_auth_tokens`).
WithArgs(wsUUID1).
WithArgs("ws-1").
WillReturnRows(sqlmock.NewRows([]string{"count"}).AddRow(50))
w := makeReq(t, NewTokenHandler().Create, "POST",
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: wsUUID1}})
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: "ws-1"}})
if w.Code != http.StatusTooManyRequests {
t.Errorf("max active tokens should 429, got %d", w.Code)
@@ -234,7 +225,7 @@ func TestTokenHandler_Create_IssueFails(t *testing.T) {
WillReturnError(errors.New("disk full"))
w := makeReq(t, NewTokenHandler().Create, "POST",
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: wsUUID1}})
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: "ws-1"}})
if w.Code != http.StatusInternalServerError {
t.Errorf("IssueToken DB error must 500, got %d", w.Code)
@@ -251,7 +242,7 @@ func TestTokenHandler_Create_HappyPath(t *testing.T) {
WillReturnResult(sqlmock.NewResult(1, 1))
w := makeReq(t, NewTokenHandler().Create, "POST",
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: wsUUID1}})
"/workspaces/ws-1/tokens", gin.Params{{Key: "id", Value: "ws-1"}})
if w.Code != http.StatusCreated {
t.Fatalf("expected 201, got %d: %s", w.Code, w.Body.String())
@@ -266,7 +257,7 @@ func TestTokenHandler_Create_HappyPath(t *testing.T) {
if body.AuthToken == "" {
t.Errorf("auth_token must be present and non-empty in response")
}
if body.WorkspaceID != wsUUID1 {
if body.WorkspaceID != "ws-1" {
t.Errorf("workspace_id mismatch: %q", body.WorkspaceID)
}
}
@@ -278,12 +269,12 @@ func TestTokenHandler_Revoke_HappyPath(t *testing.T) {
defer cleanup()
mock.ExpectExec(`UPDATE workspace_auth_tokens\s+SET revoked_at = now\(\)`).
WithArgs("tok-1", wsUUID1).
WithArgs("tok-1", "ws-1").
WillReturnResult(sqlmock.NewResult(0, 1))
w := makeReq(t, NewTokenHandler().Revoke, "DELETE",
"/workspaces/ws-1/tokens/tok-1", gin.Params{
{Key: "id", Value: wsUUID1},
{Key: "id", Value: "ws-1"},
{Key: "tokenId", Value: "tok-1"},
})
@@ -298,12 +289,12 @@ func TestTokenHandler_Revoke_NotFound(t *testing.T) {
// 0 rows affected → token not found OR already revoked.
mock.ExpectExec(`UPDATE workspace_auth_tokens`).
WithArgs("tok-ghost", wsUUID1).
WithArgs("tok-ghost", "ws-1").
WillReturnResult(sqlmock.NewResult(0, 0))
w := makeReq(t, NewTokenHandler().Revoke, "DELETE",
"/workspaces/ws-1/tokens/tok-ghost", gin.Params{
{Key: "id", Value: wsUUID1},
{Key: "id", Value: "ws-1"},
{Key: "tokenId", Value: "tok-ghost"},
})
@@ -321,7 +312,7 @@ func TestTokenHandler_Revoke_DBError(t *testing.T) {
w := makeReq(t, NewTokenHandler().Revoke, "DELETE",
"/workspaces/ws-1/tokens/tok-1", gin.Params{
{Key: "id", Value: wsUUID1},
{Key: "id", Value: "ws-1"},
{Key: "tokenId", Value: "tok-1"},
})
@@ -330,59 +321,6 @@ func TestTokenHandler_Revoke_DBError(t *testing.T) {
}
}
// ---- UUID validation (regression: "global" sentinel 500) ------------
// The canvas Settings → Workspace Tokens tab sent the literal sentinel
// "global" as the workspace id when no node was selected. workspace_id
// is a `uuid` column, so the query raised
// `invalid input syntax for type uuid: "global"` which leaked as an
// opaque 500. List/Create/Revoke now reject any non-UUID id with a
// clean 400 before touching the DB. No DB expectation is set on the
// mock — a DB hit would fail ExpectationsWereMet, proving short-circuit.
func TestTokenHandler_RejectsNonUUIDWorkspaceID(t *testing.T) {
h := NewTokenHandler()
cases := []struct {
name string
run func(c *gin.Context)
method string
params gin.Params
}{
{"List", h.List, "GET", gin.Params{{Key: "id", Value: "global"}}},
{"Create", h.Create, "POST", gin.Params{{Key: "id", Value: "global"}}},
{"Revoke", h.Revoke, "DELETE", gin.Params{
{Key: "id", Value: "global"},
{Key: "tokenId", Value: "tok-1"},
}},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
mock, cleanup := withMockDB(t)
defer cleanup()
w := makeReq(t, tc.run, tc.method,
"/workspaces/global/tokens", tc.params)
if w.Code != http.StatusBadRequest {
t.Fatalf("%s with non-UUID id must 400, got %d: %s",
tc.name, w.Code, w.Body.String())
}
var body struct {
Error string `json:"error"`
}
_ = json.Unmarshal(w.Body.Bytes(), &body)
if body.Error != "invalid workspace id" {
t.Errorf("%s: want error=%q, got %q",
tc.name, "invalid workspace id", body.Error)
}
// No query/exec was expected → if the handler hit the DB
// this fails, proving the guard short-circuits before SQL.
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("%s leaked a DB call past the uuid guard: %v", tc.name, err)
}
})
}
}
// Compile-time noise removal: the imports list pulls in the sql /
// driver packages and the silenced ctx so a future scenario that
// needs them doesn't have to re-add the import. Documented here so
@@ -11,7 +11,6 @@ import (
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/wsauth"
"github.com/gin-gonic/gin"
"github.com/google/uuid"
)
func init() { gin.SetMode(gin.TestMode) }
@@ -168,14 +167,11 @@ func TestTokenHandler_RevokeWrongWorkspace(t *testing.T) {
h := NewTokenHandler()
// Try to revoke with a different (valid-UUID) workspace ID that does
// not own the token — should 404. A valid UUID is required so this
// exercises the ownership branch, not the up-front uuid-shape 400.
otherWS := uuid.NewString()
// Try to revoke with a different workspace ID — should 404
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: otherWS}, {Key: "tokenId", Value: tokenID}}
c.Request = httptest.NewRequest("DELETE", "/workspaces/"+otherWS+"/tokens/"+tokenID, nil)
c.Params = gin.Params{{Key: "id", Value: "wrong-workspace-id"}, {Key: "tokenId", Value: tokenID}}
c.Request = httptest.NewRequest("DELETE", "/workspaces/wrong/tokens/"+tokenID, nil)
h.Revoke(c)
if w.Code != http.StatusNotFound {
@@ -80,15 +80,6 @@ type WorkspaceHandler struct {
asyncWG sync.WaitGroup
}
// newHandlerHook, when non-nil, is invoked for every WorkspaceHandler
// created via NewWorkspaceHandler. It is nil in production (zero cost);
// the test harness sets it so setupTestDB can drain every handler's
// in-flight async goroutines before swapping the global db.DB. Without
// this, a detached restart goroutine (maybeMarkContainerDead ->
// goAsync(RestartByID) -> runRestartCycle reads db.DB) races the
// db.DB restore in another test's t.Cleanup.
var newHandlerHook func(*WorkspaceHandler)
func (h *WorkspaceHandler) goAsync(fn func()) {
h.asyncWG.Add(1)
go func() {
@@ -117,9 +108,6 @@ func NewWorkspaceHandler(b events.EventEmitter, p *provisioner.Provisioner, plat
if p != nil {
h.provisioner = p
}
if newHandlerHook != nil {
newHandlerHook(h)
}
return h
}
@@ -1,193 +0,0 @@
package handlers
import (
"bytes"
"database/sql"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/DATA-DOG/go-sqlmock"
"github.com/gin-gonic/gin"
)
// patchReq builds a gin context for a PATCH request to /workspaces/:id/abilities.
func patchReq(id, body string) (*http.Request, *httptest.ResponseRecorder, *gin.Context) {
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: id}}
c.Request = httptest.NewRequest("PATCH", "/workspaces/"+id+"/abilities", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
return c.Request, w, c
}
func TestPatchAbilities_InvalidWorkspaceID(t *testing.T) {
setupTestDB(t)
// "not-a-uuid" fails validateWorkspaceID
_, w, c := patchReq("not-a-uuid", `{"broadcast_enabled":true}`)
PatchAbilities(c)
if w.Code != http.StatusBadRequest {
t.Errorf("expected 400, got %d: %s", w.Code, w.Body.String())
}
}
func TestPatchAbilities_EmptyBody(t *testing.T) {
setupTestDB(t)
id := "00000000-0000-0000-0000-000000000001"
// Empty JSON object — no ability fields present
_, w, c := patchReq(id, `{}`)
PatchAbilities(c)
if w.Code != http.StatusBadRequest {
t.Errorf("expected 400, got %d: %s", w.Code, w.Body.String())
}
var resp map[string]string
json.Unmarshal(w.Body.Bytes(), &resp)
if resp["error"] != "at least one ability field required" {
t.Errorf("expected 'at least one ability field required', got %v", resp["error"])
}
}
func TestPatchAbilities_WorkspaceNotFound(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000002"
// SELECT EXISTS returns false (workspace does not exist)
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(false))
_, w, c := patchReq(id, `{"broadcast_enabled":true}`)
PatchAbilities(c)
if w.Code != http.StatusNotFound {
t.Errorf("expected 404, got %d: %s", w.Code, w.Body.String())
}
}
func TestPatchAbilities_SetBroadcastEnabledTrue(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000003"
// SELECT EXISTS → true
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))
// UPDATE broadcast_enabled = true
mock.ExpectExec(`UPDATE workspaces SET broadcast_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, true).
WillReturnResult(sqlmock.NewResult(0, 1))
_, w, c := patchReq(id, `{"broadcast_enabled":true}`)
PatchAbilities(c)
if w.Code != http.StatusOK {
t.Errorf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var resp map[string]string
json.Unmarshal(w.Body.Bytes(), &resp)
if resp["status"] != "updated" {
t.Errorf("expected status=updated, got %v", resp["status"])
}
}
func TestPatchAbilities_SetTalkToUserEnabledFalse(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000004"
// SELECT EXISTS → true
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))
// UPDATE talk_to_user_enabled = false
mock.ExpectExec(`UPDATE workspaces SET talk_to_user_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, false).
WillReturnResult(sqlmock.NewResult(0, 1))
_, w, c := patchReq(id, `{"talk_to_user_enabled":false}`)
PatchAbilities(c)
if w.Code != http.StatusOK {
t.Errorf("expected 200, got %d: %s", w.Code, w.Body.String())
}
}
func TestPatchAbilities_BothFields(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000005"
// SELECT EXISTS → true
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))
// UPDATE broadcast_enabled = false
mock.ExpectExec(`UPDATE workspaces SET broadcast_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, false).
WillReturnResult(sqlmock.NewResult(0, 1))
// UPDATE talk_to_user_enabled = true
mock.ExpectExec(`UPDATE workspaces SET talk_to_user_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, true).
WillReturnResult(sqlmock.NewResult(0, 1))
_, w, c := patchReq(id, `{"broadcast_enabled":false,"talk_to_user_enabled":true}`)
PatchAbilities(c)
if w.Code != http.StatusOK {
t.Errorf("expected 200, got %d: %s", w.Code, w.Body.String())
}
}
func TestPatchAbilities_BroadcastUpdateFails(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000006"
// SELECT EXISTS → true
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))
// UPDATE fails
mock.ExpectExec(`UPDATE workspaces SET broadcast_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, true).
WillReturnError(sql.ErrConnDone)
_, w, c := patchReq(id, `{"broadcast_enabled":true}`)
PatchAbilities(c)
if w.Code != http.StatusInternalServerError {
t.Errorf("expected 500, got %d: %s", w.Code, w.Body.String())
}
}
func TestPatchAbilities_TalkToUserUpdateFails(t *testing.T) {
mock := setupTestDB(t)
id := "00000000-0000-0000-0000-000000000007"
// SELECT EXISTS → true
mock.ExpectQuery(`SELECT EXISTS\(SELECT 1 FROM workspaces WHERE id = \$1 AND status != 'removed'\)`).
WithArgs(id).
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))
// UPDATE broadcast_enabled skipped (not in payload)
// UPDATE talk_to_user_enabled fails
mock.ExpectExec(`UPDATE workspaces SET talk_to_user_enabled = \$2, updated_at = now\(\) WHERE id = \$1`).
WithArgs(id, false).
WillReturnError(sql.ErrConnDone)
_, w, c := patchReq(id, `{"talk_to_user_enabled":false}`)
PatchAbilities(c)
if w.Code != http.StatusInternalServerError {
t.Errorf("expected 500, got %d: %s", w.Code, w.Body.String())
}
}
@@ -34,13 +34,11 @@ import (
// BroadcastHandler is constructed once and shared across requests.
type BroadcastHandler struct {
broadcaster events.EventEmitter
broadcaster *events.Broadcaster
}
// NewBroadcastHandler creates a BroadcastHandler.
// The emitter is any EventEmitter — the concrete *Broadcaster in production,
// or a test double in unit tests.
func NewBroadcastHandler(b events.EventEmitter) *BroadcastHandler {
func NewBroadcastHandler(b *events.Broadcaster) *BroadcastHandler {
return &BroadcastHandler{broadcaster: b}
}
@@ -67,6 +67,7 @@ func TestBroadcast_OrgScopedRecipients(t *testing.T) {
if w.Code != http.StatusOK {
t.Errorf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var resp map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("failed to unmarshal response: %v", err)
@@ -205,7 +206,7 @@ func TestBroadcast_Disabled(t *testing.T) {
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-000000000003"
senderID := "00000000-0000-0000-0000-000000000001"
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
WillReturnRows(sqlmock.NewRows([]string{"name", "broadcast_enabled"}).AddRow("Disabled Agent", false))
@@ -236,7 +237,7 @@ func TestBroadcast_EmptyOrg_NoRecipients(t *testing.T) {
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-000000000004" // org root, only workspace in org
senderID := "00000000-0000-0000-0000-000000000001" // org root, only workspace in org
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
@@ -296,12 +297,33 @@ func TestBroadcast_InvalidWorkspaceID(t *testing.T) {
}
}
func TestBroadcast_MissingMessage(t *testing.T) {
setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "00000000-0000-0000-0000-000000000001"}}
c.Request = httptest.NewRequest("POST", "/workspaces/00000000-0000-0000-0000-000000000001/broadcast", bytes.NewBufferString("{}"))
c.Request.Header.Set("Content-Type", "application/json")
handler.Broadcast(c)
if w.Code != http.StatusBadRequest {
t.Errorf("expected 400, got %d: %s", w.Code, w.Body.String())
}
}
// TestBroadcast_OrgRootLookupFails verifies that if the recursive CTE for
// finding the org root errors, the handler returns 500 instead of proceeding
// with an un-scoped query that would broadcast to all orgs.
func TestBroadcast_OrgRootLookupFails(t *testing.T) {
mock := setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-000000000005"
senderID := "00000000-0000-0000-0000-000000000001"
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
@@ -331,13 +353,16 @@ func TestBroadcast_OrgRootLookupFails(t *testing.T) {
}
}
// TestBroadcast_OrgScoped_SelfBroadcastExcluded verifies that broadcasting
// from a workspace does not send a broadcast_receive to the sender itself
// (the sender logs broadcast_sent, not broadcast_receive).
func TestBroadcast_OrgScoped_SelfBroadcastExcluded(t *testing.T) {
mock := setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-000000000006"
peerID := "00000000-0000-0000-0000-000000000007"
senderID := "00000000-0000-0000-0000-000000000001"
peerID := "00000000-0000-0000-0000-000000000002"
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
@@ -374,145 +399,10 @@ func TestBroadcast_OrgScoped_SelfBroadcastExcluded(t *testing.T) {
}
}
// TestBroadcast_RecipientActivityLogFails_SkipsAndContinues: if one recipient's
// activity_log insert fails, the handler logs the error and continues to the
// next recipient rather than aborting the whole broadcast.
func TestBroadcast_RecipientActivityLogFails_SkipsAndContinues(t *testing.T) {
mock := setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-000000000008"
peerA := "00000000-0000-0000-0000-000000000009"
peerB := "00000000-0000-0000-0000-00000000000a"
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
WillReturnRows(sqlmock.NewRows([]string{"name", "broadcast_enabled"}).AddRow("Resilient Agent", true))
mock.ExpectQuery(`WITH RECURSIVE org_chain AS`).
WithArgs(senderID).
WillReturnRows(sqlmock.NewRows([]string{"root_id"}).AddRow(senderID))
mock.ExpectQuery(`WITH RECURSIVE org_chain AS`).
WithArgs(senderID, senderID).
WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow(peerA).AddRow(peerB))
// Peer A fails — handler logs and continues
mock.ExpectExec(`INSERT INTO activity_logs`).WithArgs(peerA, senderID, sqlmock.AnyArg()).
WillReturnError(context.DeadlineExceeded)
// Peer B succeeds
mock.ExpectExec(`INSERT INTO activity_logs`).WithArgs(peerB, senderID, sqlmock.AnyArg()).
WillReturnResult(sqlmock.NewResult(0, 1))
// Sender log succeeds
mock.ExpectExec(`INSERT INTO activity_logs`).WithArgs(senderID, sqlmock.AnyArg()).
WillReturnResult(sqlmock.NewResult(0, 1))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: senderID}}
body := `{"message":"partial delivery"}`
c.Request = httptest.NewRequest("POST", "/workspaces/"+senderID+"/broadcast", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
handler.Broadcast(c)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var resp map[string]interface{}
json.Unmarshal(w.Body.Bytes(), &resp)
// Only peerB was delivered
if int(resp["delivered"].(float64)) != 1 {
t.Errorf("expected delivered=1, got %v", resp["delivered"])
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet expectations: %v", err)
}
}
// TestBroadcast_SenderActivityLogFails_StillReturns200: if the sender's own
// broadcast_sent activity_log insert fails, the handler still returns 200
// so the caller doesn't retry a broadcast that already partially delivered.
func TestBroadcast_SenderActivityLogFails_StillReturns200(t *testing.T) {
mock := setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
senderID := "00000000-0000-0000-0000-00000000000b"
peerA := "00000000-0000-0000-0000-00000000000c"
mock.ExpectQuery(`SELECT name, broadcast_enabled FROM workspaces WHERE id = \$1 AND status != 'removed'`).
WithArgs(senderID).
WillReturnRows(sqlmock.NewRows([]string{"name", "broadcast_enabled"}).AddRow("Log-Fail Agent", true))
mock.ExpectQuery(`WITH RECURSIVE org_chain AS`).
WithArgs(senderID).
WillReturnRows(sqlmock.NewRows([]string{"root_id"}).AddRow(senderID))
mock.ExpectQuery(`WITH RECURSIVE org_chain AS`).
WithArgs(senderID, senderID).
WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow(peerA))
// Peer log succeeds
mock.ExpectExec(`INSERT INTO activity_logs`).WithArgs(peerA, senderID, sqlmock.AnyArg()).
WillReturnResult(sqlmock.NewResult(0, 1))
// Sender log FAILS
mock.ExpectExec(`INSERT INTO activity_logs`).WithArgs(senderID, sqlmock.AnyArg()).
WillReturnError(context.DeadlineExceeded)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: senderID}}
body := `{"message":"log fail test"}`
c.Request = httptest.NewRequest("POST", "/workspaces/"+senderID+"/broadcast", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
handler.Broadcast(c)
if w.Code != http.StatusOK {
t.Errorf("expected 200 even on sender log failure, got %d: %s", w.Code, w.Body.String())
}
}
func TestBroadcast_MissingMessage(t *testing.T) {
setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "00000000-0000-0000-0000-00000000000d"}}
c.Request = httptest.NewRequest("POST", "/workspaces/00000000-0000-0000-0000-00000000000d/broadcast", bytes.NewBufferString("{}"))
c.Request.Header.Set("Content-Type", "application/json")
handler.Broadcast(c)
if w.Code != http.StatusBadRequest {
t.Errorf("expected 400, got %d: %s", w.Code, w.Body.String())
}
}
func TestBroadcast_MissingBody(t *testing.T) {
setupTestDB(t)
broadcaster := newTestBroadcaster()
handler := NewBroadcastHandler(broadcaster)
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "00000000-0000-0000-0000-00000000000e"}}
c.Request = httptest.NewRequest("POST", "/workspaces/00000000-0000-0000-0000-00000000000e/broadcast", nil)
// no Content-Type and no body
handler.Broadcast(c)
if w.Code != http.StatusBadRequest {
t.Errorf("expected 400, got %d: %s", w.Code, w.Body.String())
}
}
// TestBroadcast_Truncate tests that messages are truncated with the Unicode ellipsis
// character (U+2026) when len(msg) > max. The truncated output is max runes + "…".
// TestBroadcast_Truncate tests that messages are truncated with the Unicode ellipsis
// character (U+2026) when len(msg) > max. The truncated output is max runes + "…",
// so truncating a 48-char string at max=20 produces 21 characters (20 runes + "…").
func TestBroadcast_Truncate(t *testing.T) {
cases := []struct {
msg string
@@ -520,18 +410,14 @@ func TestBroadcast_Truncate(t *testing.T) {
expect string
}{
{"short", 120, "short"}, // under max — no truncation
// exactly 120 chars → unchanged
// exactly120chars (15) + 105 ones = 120 chars; at max=120 → unchanged
{"exactly120chars1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111", 120, "exactly120chars111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111…"},
// 21 runes at max=20 → 20 + "…" = 21 chars
// "this is a longer mes" = 20 runes; + "…" = 21 chars
{"this is a longer message that needs truncating", 20, "this is a longer mes…"},
// at-max boundary: 20 chars at max=20 → no truncation
{"exactly twenty chars", 20, "exactly twenty chars"},
// over max: 11 chars at max=10 → 10 + "…" = 11
{"hello world!", 10, "hello worl…"},
// Unicode: 3-rune string at max=3 → unchanged
{"日本語", 3, "日本語"},
// Empty string → unchanged
{"", 120, ""},
}
for _, tc := range cases {
result := broadcastTruncate(tc.msg, tc.max)
@@ -37,7 +37,6 @@ package handlers
import (
"context"
"errors"
"fmt"
"log"
"path/filepath"
@@ -133,10 +132,6 @@ func (h *WorkspaceHandler) prepareProvisionContext(
// a workspace_secret named GIT_AUTHOR_NAME can override.
applyAgentGitIdentity(envVars, payload.Name)
applyRuntimeModelEnv(envVars, payload.Runtime, payload.Model)
// SSOT for chat-upload limits — see chat_files.go::chatUploadMaxBytes.
// Injecting via env keeps the Python workspace runtime caps in
// lock-step with the Go cap on every provision. Fixes #1520.
applyChatUploadLimits(envVars)
if payload.Role != "" {
envVars["MOLECULE_AGENT_ROLE"] = payload.Role
}
@@ -228,28 +223,3 @@ func (h *WorkspaceHandler) markProvisionFailed(ctx context.Context, workspaceID,
log.Printf("markProvisionFailed: db update failed for %s: %v", workspaceID, dbErr)
}
}
// applyChatUploadLimits seeds the chat-upload cap env vars on the
// workspace container so the Python /internal/chat/uploads/ingest
// handler parses the multipart form with the same per-file allowance
// that the Go proxy enforces.
//
// Why env-driven (and not, say, a hard-coded Python constant): keeping
// one Go constant as the source of truth and forwarding it lets
// operations bump the cap by editing one file + redeploy, instead of
// editing two files in two languages and risking the drift that
// shipped #1520 (Go cap 50 MB, Python parser cap 1 MiB — Starlette
// default — so a 5 MB image always 400'd on parse before per-file
// enforcement could fire).
//
// Pre-existing env wins. If something downstream (a tenant override,
// a plugin mutator, an A/B experiment) has already set either var,
// we leave it alone. Default-only injection.
func applyChatUploadLimits(envVars map[string]string) {
if _, set := envVars["CHAT_UPLOAD_MAX_FILE_BYTES"]; !set {
envVars["CHAT_UPLOAD_MAX_FILE_BYTES"] = fmt.Sprintf("%d", chatUploadMaxFileBytes)
}
if _, set := envVars["CHAT_UPLOAD_MAX_TOTAL_BYTES"]; !set {
envVars["CHAT_UPLOAD_MAX_TOTAL_BYTES"] = fmt.Sprintf("%d", chatUploadMaxBytes)
}
}
@@ -237,10 +237,10 @@ func (h *WorkspaceHandler) Restart(c *gin.Context) {
// the silent-drop bugs PRs #2811/#2824 closed). RestartWorkspaceAuto
// enforces CP-FIRST ordering matching the other dispatchers — see
// docs/architecture/backends.md.
h.goAsync(func() {
go func() {
h.RestartWorkspaceAutoOpts(context.Background(), id, templatePath, configFiles, payload, resetClaudeSession)
})
h.goAsync(func() { h.sendRestartContext(id, restartData) })
}()
go h.sendRestartContext(id, restartData)
c.JSON(http.StatusOK, gin.H{"status": "provisioning", "config_dir": configLabel, "reset_session": resetClaudeSession})
}
@@ -610,9 +610,7 @@ func (h *WorkspaceHandler) runRestartCycle(workspaceID string) {
h.provisionWorkspaceAutoSync(workspaceID, "", nil, payload)
// sendRestartContext is a one-way notification to the new container; safe
// to fire async — the next restart cycle won't depend on it completing.
// Tracked via goAsync so the test harness can drain it before the
// global db.DB swap (sendRestartContext reads db.DB).
h.goAsync(func() { h.sendRestartContext(workspaceID, restartData) })
go h.sendRestartContext(workspaceID, restartData)
}
// Pause handles POST /workspaces/:id/pause
@@ -178,21 +178,12 @@ func (p *CPProvisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string,
// /admin/liveness and other admin-gated platform endpoints (core#831).
// p.adminToken is read from os.Getenv("ADMIN_TOKEN") at provisioner creation;
// it is also used for CP→platform HTTP auth but those are separate concerns.
//
// Forensic #145 hardening: tenant workspaces run on EC2 via this path, so
// the SCM-write-token denylist (see buildContainerEnv) is enforced here
// too. Always build a filtered copy — never pass cfg.EnvVars through
// verbatim — so a latent persona-merged GITEA_TOKEN can't reach the
// tenant container regardless of whether ADMIN_TOKEN is set.
env := make(map[string]string, len(cfg.EnvVars)+1)
for k, v := range cfg.EnvVars {
if isSCMWriteTokenKey(k) {
log.Printf("CPProvisioner.Start: dropped SCM-write credential %q from tenant workspace env (forensic #145 guard)", k)
continue
}
env[k] = v
}
env := cfg.EnvVars
if p.adminToken != "" {
env = make(map[string]string, len(cfg.EnvVars)+1)
for k, v := range cfg.EnvVars {
env[k] = v
}
env["ADMIN_TOKEN"] = p.adminToken
}
// Collect template files and generated configs, with OFFSEC-010 guards:
@@ -643,28 +643,6 @@ func ValidateWorkspaceAccess(access, workspacePath string) error {
}
}
// scmWriteTokenKeys is the explicit denylist of environment variable names
// that carry a Git SCM *write* credential (push / merge / approve). These
// must never reach a tenant workspace container — see the forensic #145
// rationale in buildContainerEnv. Kept as an exact-match set rather than a
// substring/prefix heuristic so the guard is auditable and can't silently
// over-strip a legitimately-named var.
var scmWriteTokenKeys = map[string]struct{}{
"GITEA_TOKEN": {},
"GITHUB_TOKEN": {},
"GH_TOKEN": {}, // gh CLI honours GH_TOKEN as a GITHUB_TOKEN alias
"GITLAB_TOKEN": {},
"GL_TOKEN": {}, // glab CLI alias
"BITBUCKET_TOKEN": {},
}
// isSCMWriteTokenKey reports whether an env var name is a known Git SCM
// write credential that must be stripped from tenant workspace env.
func isSCMWriteTokenKey(key string) bool {
_, ok := scmWriteTokenKeys[key]
return ok
}
// buildContainerEnv assembles the initial environment variables injected
// into every workspace container.
//
@@ -701,21 +679,6 @@ func buildContainerEnv(cfg WorkspaceConfig) []string {
env = append(env, fmt.Sprintf("AWARENESS_URL=%s", cfg.AwarenessURL))
}
for k, v := range cfg.EnvVars {
// Forensic #145 hardening: tenant workspace containers run
// agent-controlled code and must NEVER receive a Git SCM *write*
// credential. Without merge/approve creds in-container the
// two-eyes review gate is structurally self-bypass-proof — an
// agent that forges an approval has no token to act on it. A
// latent path exists (loadPersonaEnvFile merges a per-role
// persona `GITEA_TOKEN` into cfg.EnvVars when MOLECULE_PERSONA_ROOT
// is set on a tenant host); it is inert today (persona dirs are
// operator-host-only) but unguarded. Strip SCM-write tokens here
// by construction so the invariant holds regardless of whether
// that path ever becomes reachable.
if isSCMWriteTokenKey(k) {
log.Printf("buildContainerEnv: dropped SCM-write credential %q from workspace env (forensic #145 guard)", k)
continue
}
env = append(env, fmt.Sprintf("%s=%s", k, v))
}
// Inject ADMIN_TOKEN from the platform server's environment so workspace
@@ -725,15 +725,10 @@ func TestBuildContainerEnv_AwarenessOnlyWhenBothSet(t *testing.T) {
}
func TestBuildContainerEnv_CustomEnvVarsAppended(t *testing.T) {
// NOTE: this test previously asserted GITHUB_TOKEN passed through
// verbatim. That assertion encoded the forensic #145 latent leak as
// expected behavior. Post-guard, ordinary custom env still flows but
// SCM-write credentials are stripped — see
// TestBuildContainerEnv_StripsSCMWriteTokens for the negative assertion.
cfg := WorkspaceConfig{
WorkspaceID: "ws-x",
PlatformURL: "http://localhost:8080",
EnvVars: map[string]string{"CUSTOM": "value", "ANTHROPIC_API_KEY": "sk-not-an-scm-token"},
EnvVars: map[string]string{"CUSTOM": "value", "GITHUB_TOKEN": "fake-token-for-test"},
}
env := buildContainerEnv(cfg)
seen := map[string]string{}
@@ -746,8 +741,8 @@ func TestBuildContainerEnv_CustomEnvVarsAppended(t *testing.T) {
if seen["CUSTOM"] != "value" {
t.Errorf("CUSTOM env missing, got env=%v", env)
}
if seen["ANTHROPIC_API_KEY"] != "sk-not-an-scm-token" {
t.Errorf("non-SCM custom env must still pass through, got env=%v", env)
if seen["GITHUB_TOKEN"] != "fake-token-for-test" {
t.Errorf("GITHUB_TOKEN env missing, got env=%v", env)
}
// Built-in defaults still present
if seen["MOLECULE_URL"] == "" {
@@ -755,129 +750,6 @@ func TestBuildContainerEnv_CustomEnvVarsAppended(t *testing.T) {
}
}
// ---------- forensic #145: SCM-write-token denylist guard ----------
// TestBuildContainerEnv_StripsSCMWriteTokens is the core negative
// assertion: a tenant workspace env constructed via buildContainerEnv MUST
// NOT contain any Git SCM *write* credential, regardless of how it got into
// cfg.EnvVars. This proves the two-eyes review gate stays structurally
// self-bypass-proof — an agent in-container has no merge/approve token to
// act on a forged approval. See forensic #145.
//
// This test FAILS on the pre-guard code (where buildContainerEnv passed
// cfg.EnvVars through verbatim) and PASSES once the denylist filter is in
// place — i.e. the guard is proven by construction, not by environment
// accident.
func TestBuildContainerEnv_StripsSCMWriteTokens(t *testing.T) {
scmTokens := []string{
"GITEA_TOKEN", "GITHUB_TOKEN", "GH_TOKEN",
"GITLAB_TOKEN", "GL_TOKEN", "BITBUCKET_TOKEN",
}
t.Run("normal path — SCM tokens explicitly set in EnvVars", func(t *testing.T) {
envVars := map[string]string{"CUSTOM": "ok", "ANTHROPIC_API_KEY": "sk-keep"}
for _, k := range scmTokens {
envVars[k] = "leaked-write-credential-" + k
}
cfg := WorkspaceConfig{
WorkspaceID: "ws-tenant",
PlatformURL: "http://localhost:8080",
Tier: 2,
EnvVars: envVars,
}
assertNoSCMWriteToken(t, buildContainerEnv(cfg), scmTokens)
// Sanity: non-SCM custom env is NOT collateral-damaged by the filter.
if !envContains(buildContainerEnv(cfg), "CUSTOM=ok") {
t.Errorf("filter must not strip non-SCM custom env")
}
if !envContains(buildContainerEnv(cfg), "ANTHROPIC_API_KEY=sk-keep") {
t.Errorf("filter must not strip non-SCM API keys")
}
})
t.Run("persona-file path — simulates loadPersonaEnvFile merge", func(t *testing.T) {
// The latent path: handlers.loadPersonaEnvFile() merges a per-role
// persona env file (carrying GITEA_USER, GITEA_TOKEN, …) into the
// workspace env map when MOLECULE_PERSONA_ROOT is set on a tenant
// host. We can't invoke that cross-package helper here, but its
// observable effect is exactly "a GITEA_TOKEN appears in
// cfg.EnvVars". Constructing that condition directly proves the
// guard holds even if the latent path becomes reachable.
cfg := WorkspaceConfig{
WorkspaceID: "ws-tenant",
PlatformURL: "http://localhost:8080",
Tier: 2,
EnvVars: map[string]string{
// Persona identity fields that are SAFE to keep (read-only
// identity, not a write credential):
"GITEA_USER": "backend-engineer",
"GITEA_USER_EMAIL": "backend-engineer@agents.moleculesai.app",
// The credential that must be stripped:
"GITEA_TOKEN": "persona-merged-write-pat",
"GITEA_TOKEN_SCOPES": "write:repository",
},
}
got := buildContainerEnv(cfg)
assertNoSCMWriteToken(t, got, scmTokens)
// Non-credential persona identity may still flow through — only the
// write token is the denied surface.
if !envContains(got, "GITEA_USER=backend-engineer") {
t.Errorf("non-credential persona identity (GITEA_USER) should not be stripped")
}
})
}
// TestCPProvisionerEnv_StripsSCMWriteTokens covers the tenant-EC2 path:
// CPProvisioner.Start builds the env map the control plane forwards to the
// EC2 workspace container. The same forensic #145 denylist must hold there.
func TestCPProvisionerEnv_StripsSCMWriteTokens(t *testing.T) {
// isSCMWriteTokenKey is the single source of truth shared by both
// buildContainerEnv (local Docker) and CPProvisioner.Start (tenant EC2).
// Assert it classifies every known SCM-write var as denied and leaves
// ordinary / read-only-identity vars alone.
for _, k := range []string{
"GITEA_TOKEN", "GITHUB_TOKEN", "GH_TOKEN",
"GITLAB_TOKEN", "GL_TOKEN", "BITBUCKET_TOKEN",
} {
if !isSCMWriteTokenKey(k) {
t.Errorf("isSCMWriteTokenKey(%q) = false, want true (SCM-write credential must be denied)", k)
}
}
for _, k := range []string{
"GITEA_USER", "GITEA_USER_EMAIL", "ANTHROPIC_API_KEY",
"CUSTOM", "PLATFORM_URL", "ADMIN_TOKEN", "",
} {
if isSCMWriteTokenKey(k) {
t.Errorf("isSCMWriteTokenKey(%q) = true, want false (must not over-strip non-SCM env)", k)
}
}
}
func assertNoSCMWriteToken(t *testing.T, env []string, scmTokens []string) {
t.Helper()
for _, e := range env {
key := e
if i := strings.IndexByte(e, '='); i >= 0 {
key = e[:i]
}
for _, banned := range scmTokens {
if key == banned {
t.Errorf("SCM-write credential %q leaked into workspace env (forensic #145 invariant violated): %q", banned, e)
}
}
}
}
func envContains(env []string, want string) bool {
for _, e := range env {
if e == want {
return true
}
}
return false
}
// ---------- buildWorkspaceMount — #65 workspace_access ----------
func TestBuildWorkspaceMount_SelectionMatrix(t *testing.T) {
@@ -1,226 +0,0 @@
// Package secrets provides the canonical SSOT for credential-shaped
// regex patterns used by:
//
// - the CI `Secret scan` workflow (.gitea/workflows/secret-scan.yml)
// - the runtime's bundled pre-commit hook
// (molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh)
// - the upcoming Phase 2b docker-exec Files API backend, which has
// to refuse to surface files whose path OR content matches a
// credential shape (RFC internal#425, Hongming 2026-05-15)
//
// Before this package, the same regex set lived as duplicate bash
// arrays in two unrelated repos; adding a pattern required editing
// both, and pattern drift was caught only via secret-scan workflow
// failures on PRs that had unrelated changes (#2090-class incident
// vector). Centralising in Go makes the Files API the SSOT, with the
// YAML + bash arrays generated/asserted from this package so drift
// is detected at CI time, not at exfiltration time.
//
// This file is Phase 2a of the internal#425 RFC. Phase 2b will import
// `Patterns` from `template_files_docker_exec.go` to gate
// `listFilesViaDockerExec` / `readFileViaDockerExec` against
// secret-shaped paths AND content. Until 2b lands, the package has
// one consumer: this package's own unit tests, which pin the regex
// strings so a refactor that drops or weakens one is caught here.
package secrets
import (
"fmt"
"regexp"
"sync"
)
// Pattern is one named credential shape — a human label plus the
// compiled regex. The label appears in CI error output ("matched:
// github-pat") so an operator can identify the family without seeing
// the actual matched bytes (echoing the bytes widens the blast radius
// per the secret-scan workflow's recovery prose).
type Pattern struct {
// Name is a short kebab-case identifier (e.g. "github-pat",
// "anthropic-api-key"). Stable across versions — consumers may
// switch on it.
Name string
// Description is a one-line human-readable explanation of what
// the pattern matches. Used in CI error messages and the Files
// API "<denied: secret-shape>" placeholder rationale.
Description string
// regexSource is the regex literal in Go-RE2 syntax. Stored as a
// string so the slice declaration below stays readable; compiled
// once via sync.Once into a *regexp.Regexp.
regexSource string
}
// Patterns is the canonical credential-shape regex set.
//
// Adding a pattern here:
//
// 1. Add a new Pattern{} entry below with a kebab-case Name, a
// one-line Description, and the regex literal. Anchor on a
// low-false-positive prefix.
// 2. Add a positive + negative test case in patterns_test.go.
// 3. Mirror the regex string into:
// a. .gitea/workflows/secret-scan.yml SECRET_PATTERNS array
// b. molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh
// (or wait for the codegen target that consumes this slice — TBD
// follow-up; tracked in the Phase 2a PR description.)
//
// The order is: alphabetical within each provider family, families
// grouped by ecosystem (GitHub family, AI-provider family, chat
// family, cloud family). Keep this stable so diffs are reviewable.
var Patterns = []Pattern{
// --- GitHub token family ---
{
Name: "github-pat-classic",
Description: "GitHub personal access token (classic)",
regexSource: `ghp_[A-Za-z0-9]{36,}`,
},
{
Name: "github-app-installation-token",
Description: "GitHub App installation token (#2090 vector)",
regexSource: `ghs_[A-Za-z0-9]{36,}`,
},
{
Name: "github-oauth-user-to-server",
Description: "GitHub OAuth user-to-server token",
regexSource: `gho_[A-Za-z0-9]{36,}`,
},
{
Name: "github-oauth-user",
Description: "GitHub OAuth user token",
regexSource: `ghu_[A-Za-z0-9]{36,}`,
},
{
Name: "github-oauth-refresh",
Description: "GitHub OAuth refresh token",
regexSource: `ghr_[A-Za-z0-9]{36,}`,
},
{
Name: "github-pat-fine-grained",
Description: "GitHub fine-grained personal access token",
regexSource: `github_pat_[A-Za-z0-9_]{82,}`,
},
// --- AI-provider API key family ---
{
Name: "anthropic-api-key",
Description: "Anthropic API key",
regexSource: `sk-ant-[A-Za-z0-9_-]{40,}`,
},
{
Name: "openai-project-key",
Description: "OpenAI project API key",
regexSource: `sk-proj-[A-Za-z0-9_-]{40,}`,
},
{
Name: "openai-service-account-key",
Description: "OpenAI service-account API key",
regexSource: `sk-svcacct-[A-Za-z0-9_-]{40,}`,
},
{
Name: "minimax-api-key",
Description: "MiniMax API key (F1088 vector)",
regexSource: `sk-cp-[A-Za-z0-9_-]{60,}`,
},
// --- Chat-platform token family ---
{
Name: "slack-token",
Description: "Slack token (xoxb/xoxa/xoxp/xoxr/xoxs)",
regexSource: `xox[baprs]-[A-Za-z0-9-]{20,}`,
},
// --- Cloud-provider credential family ---
{
Name: "aws-access-key-id",
Description: "AWS access key ID",
regexSource: `AKIA[0-9A-Z]{16}`,
},
{
Name: "aws-sts-temp-access-key-id",
Description: "AWS STS temporary access key ID",
regexSource: `ASIA[0-9A-Z]{16}`,
},
}
// compiledOnce protects the lazy build of compiledPatterns. We compile
// lazily so package init is cheap; callers pay only on first match
// (typically once per workspace-server boot).
var (
compiledOnce sync.Once
compiledPatterns []*compiledPattern
compileErr error
)
type compiledPattern struct {
Name string
Description string
Re *regexp.Regexp
}
// compileAll compiles every Pattern.regexSource into a *regexp.Regexp.
// Called once via compiledOnce. Any compile failure here is a build
// bug (the unit tests assert each regex compiles) — surfacing via
// returned error so callers don't panic in request handling.
func compileAll() {
out := make([]*compiledPattern, 0, len(Patterns))
for _, p := range Patterns {
re, err := regexp.Compile(p.regexSource)
if err != nil {
compileErr = fmt.Errorf("secrets: pattern %q failed to compile: %w", p.Name, err)
return
}
out = append(out, &compiledPattern{Name: p.Name, Description: p.Description, Re: re})
}
compiledPatterns = out
}
// ScanBytes returns a non-nil Match if any pattern matches anywhere
// inside b. Returns (nil, nil) on no match. Returns (nil, err) only
// if a regex in the package fails to compile — that's a build bug,
// not a runtime data issue.
//
// Match contains the pattern Name + Description so the caller can
// emit a path-or-content-denial rationale WITHOUT round-tripping the
// matched bytes (which would defeat the purpose). The matched bytes
// stay inside this function.
//
// The Files API Phase 2b backend will call ScanBytes on:
//
// - the absolute path string (catches a file literally named
// `ghs_abc.txt`)
// - the file content (catches a credential pasted into a workspace
// file by an agent or user — the Files API refuses to surface it
// and the canvas renders "<denied: secret-shape>")
//
// Ordering: patterns are tried in declaration order. First match
// wins. This means narrower patterns (e.g. `sk-svcacct-…`) should
// appear in `Patterns` before broader ones (`sk-…`) — today there's
// no overlap, so order is descriptive only.
func ScanBytes(b []byte) (*Match, error) {
compiledOnce.Do(compileAll)
if compileErr != nil {
return nil, compileErr
}
for _, cp := range compiledPatterns {
if cp.Re.Match(b) {
return &Match{Name: cp.Name, Description: cp.Description}, nil
}
}
return nil, nil
}
// ScanString is the string-input convenience wrapper around ScanBytes.
// Identical semantics — the body never copies, []byte(s) is a
// zero-copy reinterpret for the regex matcher.
func ScanString(s string) (*Match, error) {
return ScanBytes([]byte(s))
}
// Match describes which pattern caught a value. Deliberately does
// NOT include the matched substring — callers must not echo it.
type Match struct {
// Name is the pattern's kebab-case identifier (e.g. "github-pat-classic").
Name string
// Description is the human-readable line for UI / log surfaces.
Description string
}
@@ -1,189 +0,0 @@
package secrets
import (
"strings"
"testing"
)
// TestEveryPatternCompiles pins that every Pattern.regexSource is a
// valid Go-RE2 expression. Without this, a bad regex would silently
// disable ScanBytes for everything after it (the lazy compile would
// set compileErr and ScanBytes would return that error every call).
func TestEveryPatternCompiles(t *testing.T) {
for _, p := range Patterns {
if p.Name == "" {
t.Errorf("pattern with empty Name: regex=%q", p.regexSource)
}
if p.Description == "" {
t.Errorf("pattern %q has empty Description", p.Name)
}
}
// Force compile + check error.
if _, err := ScanBytes([]byte("placeholder")); err != nil {
t.Fatalf("ScanBytes init failed: %v", err)
}
}
// TestNoDuplicateNames — a duplicate pattern Name would make the
// "first match wins" semantics surprising to readers and any caller
// switching on Match.Name (none today but adding the guard is cheap).
func TestNoDuplicateNames(t *testing.T) {
seen := map[string]bool{}
for _, p := range Patterns {
if seen[p.Name] {
t.Errorf("duplicate pattern Name: %q", p.Name)
}
seen[p.Name] = true
}
}
// TestKnownPatternsAllPresent — pins which specific Name values are
// expected. A future refactor that renames or removes one without
// updating consumers (CI workflow, runtime pre-commit hook, Files
// API Phase 2b backend) would silently widen the leak surface.
// Failing here forces the rename to be intentional.
func TestKnownPatternsAllPresent(t *testing.T) {
expected := []string{
"github-pat-classic",
"github-app-installation-token",
"github-oauth-user-to-server",
"github-oauth-user",
"github-oauth-refresh",
"github-pat-fine-grained",
"anthropic-api-key",
"openai-project-key",
"openai-service-account-key",
"minimax-api-key",
"slack-token",
"aws-access-key-id",
"aws-sts-temp-access-key-id",
}
got := map[string]bool{}
for _, p := range Patterns {
got[p.Name] = true
}
for _, want := range expected {
if !got[want] {
t.Errorf("expected pattern %q missing from Patterns slice", want)
}
}
}
// TestPositiveMatches — for each pattern, supply a representative
// shape and assert ScanBytes returns a Match with the right Name.
// These are TEST FIXTURES, not real credentials — each is the
// pattern's prefix + a long-enough trailing run of placeholder chars.
// `EXAMPLE` is sprinkled in to make grep-finds in CI logs obviously
// fake to a human reader (matches saved memory
// feedback_assert_exact_not_substring: tighten by Name not body).
func TestPositiveMatches(t *testing.T) {
cases := []struct {
fixture string
expectedName string
}{
{"ghp_EXAMPLE111122223333444455556666777788889999", "github-pat-classic"},
{"ghs_EXAMPLE111122223333444455556666777788889999", "github-app-installation-token"},
{"gho_EXAMPLE111122223333444455556666777788889999", "github-oauth-user-to-server"},
{"ghu_EXAMPLE111122223333444455556666777788889999", "github-oauth-user"},
{"ghr_EXAMPLE111122223333444455556666777788889999", "github-oauth-refresh"},
{"github_pat_EXAMPLE" + strings.Repeat("1", 80), "github-pat-fine-grained"},
{"sk-ant-EXAMPLE" + strings.Repeat("1", 40), "anthropic-api-key"},
{"sk-proj-EXAMPLE" + strings.Repeat("1", 40), "openai-project-key"},
{"sk-svcacct-EXAMPLE" + strings.Repeat("1", 40), "openai-service-account-key"},
{"sk-cp-EXAMPLE" + strings.Repeat("1", 60), "minimax-api-key"},
{"xoxb-" + strings.Repeat("a", 25), "slack-token"},
{"xoxa-" + strings.Repeat("a", 25), "slack-token"},
// AWS regex requires [0-9A-Z]{16} — uppercase + digits only.
{"AKIA1234567890ABCDEF", "aws-access-key-id"},
{"ASIA1234567890ABCDEF", "aws-sts-temp-access-key-id"},
}
for _, tc := range cases {
t.Run(tc.expectedName, func(t *testing.T) {
m, err := ScanBytes([]byte(tc.fixture))
if err != nil {
t.Fatalf("ScanBytes(%q) errored: %v", tc.fixture, err)
}
if m == nil {
t.Fatalf("ScanBytes(%q) returned no match — expected %q", tc.fixture, tc.expectedName)
}
if m.Name != tc.expectedName {
t.Errorf("ScanBytes(%q) matched %q; expected %q", tc.fixture, m.Name, tc.expectedName)
}
})
}
}
// TestNegativeShapes — strings that look credential-adjacent but
// shouldn't match (too short, wrong prefix, missing trailing bytes).
// Failing here means a pattern is too loose, which would generate
// false-positive denial in Files API and false-positive workflow
// failures in CI.
func TestNegativeShapes(t *testing.T) {
cases := []string{
// Too-short variants — anchored on the length suffix.
"ghp_tooshort",
"ghs_alsoshort1234",
"github_pat_short",
"sk-ant-short",
"sk-cp-not-enough-bytes-here",
// Looks like one of the prefixes but isn't (different letter).
"gha_EXAMPLE_thirty_six_or_more_chars_here_xxx",
// Slack family — wrong letter after xox.
"xoxz-aaaaaaaaaaaaaaaaaaaaaaaaa",
// AWS-shaped but wrong length suffix.
"AKIATOOSHORT",
// Empty / whitespace.
"",
" ",
// Plain prose mentioning the prefix as part of a longer word.
"see also `ghp_HOWTO.md` in the repo",
}
for _, c := range cases {
t.Run(c, func(t *testing.T) {
m, err := ScanBytes([]byte(c))
if err != nil {
t.Fatalf("ScanBytes(%q) errored: %v", c, err)
}
if m != nil {
t.Errorf("ScanBytes(%q) unexpectedly matched %q", c, m.Name)
}
})
}
}
// TestScanString_NoOp — sanity-check ScanString is the zero-copy
// wrapper around ScanBytes. Without this, a future refactor that
// makes ScanString do its own thing (e.g. accidentally normalise
// case) would diverge silently.
func TestScanString_NoOp(t *testing.T) {
in := "ghp_EXAMPLE111122223333444455556666777788889999"
m1, err1 := ScanBytes([]byte(in))
if err1 != nil {
t.Fatalf("ScanBytes errored: %v", err1)
}
m2, err2 := ScanString(in)
if err2 != nil {
t.Fatalf("ScanString errored: %v", err2)
}
if m1 == nil || m2 == nil {
t.Fatalf("expected matches; got bytes=%+v string=%+v", m1, m2)
}
if m1.Name != m2.Name {
t.Errorf("ScanString and ScanBytes returned different Names: %q vs %q", m1.Name, m2.Name)
}
}
// TestMatch_NoRoundtrip — assert the Match struct does NOT include
// the matched substring as a field. Adding such a field would
// regress the "matched bytes never leave ScanBytes" invariant that
// makes this package safe to call from log/UI surfaces. This is a
// reflection-light contract test — checks the field names statically.
func TestMatch_NoRoundtrip(t *testing.T) {
var m Match
// If someone adds a `Matched string` (or similar) field, this
// test reads as the canonical place to update + reconsider.
_ = m.Name
_ = m.Description
// The two-field shape is part of the public contract; new fields
// require deliberation about whether they leak the secret value.
}
-6
View File
@@ -35,14 +35,12 @@ from a2a_tools import (
tool_commit_memory,
tool_delegate_task,
tool_delegate_task_async,
tool_get_runtime_identity,
tool_get_workspace_info,
tool_inbox_peek,
tool_inbox_pop,
tool_list_peers,
tool_recall_memory,
tool_send_message_to_user,
tool_update_agent_card,
tool_wait_for_message,
)
from platform_tools.registry import TOOLS as _PLATFORM_TOOL_SPECS
@@ -132,10 +130,6 @@ async def handle_tool_call(name: str, arguments: dict) -> str:
return await tool_get_workspace_info(
source_workspace_id=arguments.get("source_workspace_id") or None,
)
elif name == "get_runtime_identity":
return await tool_get_runtime_identity()
elif name == "update_agent_card":
return await tool_update_agent_card(arguments.get("card"))
elif name == "commit_memory":
return await tool_commit_memory(
arguments.get("content", ""),
-12
View File
@@ -167,15 +167,3 @@ from a2a_tools_inbox import ( # noqa: E402 (import after the top-of-module imp
tool_inbox_pop,
tool_wait_for_message,
)
# Identity tool handlers — extracted to a2a_tools_identity. Ports the
# two T4-tier MCP tools (``tool_get_runtime_identity`` +
# ``tool_update_agent_card``) from molecule-ai-workspace-runtime PR#17.
# That repo is mirror-only (reference_runtime_repo_is_mirror_only);
# this is the canonical edit point, and the wheel mirror is
# regenerated by publish-runtime.yml on merge.
from a2a_tools_identity import ( # noqa: E402 (import after the top-of-module imports)
tool_get_runtime_identity,
tool_update_agent_card,
)
-187
View File
@@ -1,187 +0,0 @@
"""Identity tool handlers — single-concern slice of the a2a_tools surface.
Owns the two MCP tools that close the T4-tier workspace owner-permission
gaps reported via the canvas:
* ``tool_get_runtime_identity`` env-only; returns model, model_provider,
molecule_model, anthropic_base_url, tier, workspace_id, runtime
(ADAPTER_MODULE). No HTTP call. Always permitted by RBAC even
read-only agents may know what model they are.
* ``tool_update_agent_card`` POSTs the card to ``/registry/update-card``
with the workspace's own bearer (same auth path as ``tool_commit_memory``
via ``a2a_tools_rbac.auth_headers_for_heartbeat``). The platform
replaces the stored card and broadcasts an ``agent_card_updated``
event so the canvas reflects the new card live. Gated on
``memory.write`` capability via the existing RBAC permission map so
read-only roles can't silently rewrite the platform card.
Both originated as a port of molecule-ai-workspace-runtime PR#17
(``feat(mcp): add update_agent_card + get_runtime_identity tools``).
The mirror-only PR#17 was closed without merge per
``reference_runtime_repo_is_mirror_only``; the canonical edit point is
this monorepo at ``workspace/`` and the wheel mirror is regenerated
automatically by the publish-runtime workflow.
Imports the auth-header primitive from ``a2a_tools_rbac`` (iter 4a)
NOT from ``a2a_tools`` to avoid a circular import with the
kitchen-sink re-export module.
"""
from __future__ import annotations
import json
import os
from typing import Any
import httpx
from a2a_client import PLATFORM_URL
from a2a_tools_rbac import (
auth_headers_for_heartbeat as _auth_headers_for_heartbeat,
check_memory_write_permission as _check_memory_write_permission,
)
def _runtime_identity_payload() -> dict[str, Any]:
"""Build the identity dict — env-only, no I/O.
Factored out from ``tool_get_runtime_identity`` so tests can assert
against the exact key set without re-parsing JSON. The MCP tool
handler ``tool_get_runtime_identity`` is the only public caller in
production; tests call this helper directly.
"""
return {
"model": os.environ.get("MODEL", ""),
"model_provider": os.environ.get("MODEL_PROVIDER", ""),
"molecule_model": os.environ.get("MOLECULE_MODEL", ""),
"anthropic_base_url": os.environ.get("ANTHROPIC_BASE_URL", ""),
"tier": os.environ.get("TIER", ""),
"workspace_id": os.environ.get("WORKSPACE_ID", ""),
# Adapter module is the closest thing the runtime has to a
# "template slug" — e.g. "adapter" for claude-code-default,
# "hermes" for hermes-template, etc. Picked from
# $ADAPTER_MODULE env baked by each template's Dockerfile.
"runtime": os.environ.get("ADAPTER_MODULE", ""),
}
async def tool_get_runtime_identity() -> str:
"""Return this runtime's identity — model, provider, tier, IDs.
Env-only; no HTTP call. Useful so the agent can answer "what model
am I?" correctly instead of guessing from a stale system prompt
that the operator may have changed between boots.
Returns the identity as a JSON-encoded string (the dispatch contract
every MCP tool in this module follows). Tests that want to assert
individual fields can call ``_runtime_identity_payload()`` directly,
or ``json.loads`` the return value.
Always permitted by RBAC there is no sensitive information here
that isn't already available to the process via ``os.environ``.
The point of the tool is to surface those env values to the agent
layer in a stable, documented shape rather than expecting every
agent runtime to know to ``echo $MODEL``.
"""
return json.dumps(_runtime_identity_payload(), indent=2)
async def tool_update_agent_card(card: Any) -> str:
"""Update this workspace's agent_card on the platform.
POSTs the provided card to ``/registry/update-card`` with the
workspace's own bearer token (same auth path as ``tool_commit_memory``
and ``tool_get_workspace_info``). The platform validates required
fields server-side, replaces the stored card, and broadcasts an
``agent_card_updated`` event so the canvas updates live.
Args:
card: A JSON-serialisable object (typically a dict) holding the
new card. The platform validates required fields server-side.
Returns:
JSON-encoded string. Body:
- ``{"success": true, "status": "updated"}`` on success;
- ``{"success": false, "error": "<msg>", "status_code": <int>}``
on platform error;
- ``{"success": false, "error": "<reason>"}`` on local validation
(non-dict card, missing WORKSPACE_ID, network error).
Permission gate: this tool requires the ``memory.write`` RBAC
capability same gate as ``tool_commit_memory``. The check runs
inline rather than at the dispatcher layer to keep ``a2a_mcp_server``
permission-agnostic (the gate sits with the implementation, not the
transport). Read-only roles get a clear error string back instead
of a 403 from the platform.
We re-check ``isinstance(card, dict)`` here defensively rather than
trust the MCP schema validator alone the schema only constrains
the transport, not the in-process call surface used by tests and
sibling modules.
"""
payload = await _update_agent_card_impl(card)
return json.dumps(payload, indent=2)
async def _update_agent_card_impl(card: Any) -> dict[str, Any]:
"""Dict-returning core of ``tool_update_agent_card``.
Split out so tests can assert against the raw dict shape (status
codes, error messages) without re-parsing JSON on every assertion.
The string-returning ``tool_update_agent_card`` is a thin wrapper
invoked by the MCP dispatcher.
"""
# RBAC: require memory.write permission. Same gate as
# tool_commit_memory (the agent already needs this capability to
# persist anything outbound). Read-only roles can still call
# get_runtime_identity / get_workspace_info to introspect — those
# are env-only / read-only and have no inline gate.
if not _check_memory_write_permission():
return {
"success": False,
"error": (
"RBAC — this workspace does not have the 'memory.write' "
"permission required to update the agent_card."
),
}
if not isinstance(card, dict):
return {
"success": False,
"error": "card must be a JSON object (dict)",
}
ws_id = os.environ.get("WORKSPACE_ID", "")
if not ws_id:
return {
"success": False,
"error": "WORKSPACE_ID env not set; cannot identify caller",
}
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/registry/update-card",
json={"workspace_id": ws_id, "agent_card": card},
headers=_auth_headers_for_heartbeat(),
)
if resp.status_code == 200:
body: dict[str, Any] = {}
try:
body = resp.json()
except Exception:
pass
return {
"success": True,
"status": body.get("status", "updated"),
}
# Non-200 — surface what the platform returned.
error_msg = ""
try:
error_msg = resp.json().get("error", "") or resp.text
except Exception:
error_msg = resp.text
return {
"success": False,
"status_code": resp.status_code,
"error": error_msg,
}
except Exception as e:
return {"success": False, "error": f"network error: {e}"}
-52
View File
@@ -340,16 +340,6 @@ _CLI_A2A_COMMAND_KEYWORDS: dict[str, str | None] = {
"delegate_task_async": "delegate --async",
"check_task_status": "status",
"get_workspace_info": "info",
# `get_runtime_identity` + `update_agent_card` are MCP-first
# capabilities — the CLI subprocess interface doesn't expose them
# today. `get_runtime_identity` is env-only and an agent on a
# CLI-only runtime can already `echo $MODEL` etc, so there's no
# functional gap. `update_agent_card` requires a JSON object
# argument that wouldn't survive a positional-arg shell invocation
# cleanly. Mapped to None — flip to a keyword if a2a_cli grows
# `identity` / `card` subcommands in the future.
"get_runtime_identity": None,
"update_agent_card": None,
# `broadcast_message` is not exposed via the CLI subprocess interface
# today — it's an MCP-first capability. If a2a_cli grows a `broadcast`
# subcommand, map it here and the alignment test will gate the change.
@@ -599,28 +589,6 @@ def _sanitize_for_external(msg: str) -> str:
import re as _re
msg = _re.sub(r"(?i)(?:bearer|token|api[_-]?key|sk-)[ :=]+[A-Za-z0-9_/.-]{20,}", "[REDACTED]", msg)
# Bare provider key with NO separator after the prefix — a real
# `sk-ant-api03-…` / `sk-…` key uses `-` (not `[ :=]`) so the rule
# above misses it. Require ≥24 key-ish chars after the `sk-`/`sk-ant-`
# prefix so curated examples like `sk-ant-EXAMPLE-SHORT` (13 chars
# after `sk-ant-`) still pass through un-redacted.
msg = _re.sub(r"(?i)\bsk-(?:ant-)?[A-Za-z0-9_-]{24,}", "[REDACTED]", msg)
# JSON-quoted credential values: {"token": "…"} / {"apiKey": "…"} /
# {"secret": "…"} / {"password": "…"}. Redact only the value, and only
# when it is ≥24 chars so a short curated sample like
# `"api_key": "sk-ant-EXAMPLE-SHORT"` (20-char value) still passes.
msg = _re.sub(
r'(?i)("(?:token|api[_-]?key|secret|password)"\s*:\s*")[^"]{24,}(")',
r"\1[REDACTED]\2",
msg,
)
# AWS secret access key in `aws_secret_access_key=…` form (env dumps,
# boto tracebacks). The base64-ish value runs until whitespace/quote.
msg = _re.sub(
r"(?i)(aws_secret_access_key\s*[:=]\s*)\S+",
r"\1[REDACTED]",
msg,
)
# Absolute paths: /etc/shadow, /home/user/.aws/credentials, etc.
msg = _re.sub(r"(?:/[^/\s]+){2,}", lambda m: m.group(0) if len(m.group(0)) < 60 else "[REDACTED_PATH]", msg)
return msg
@@ -630,7 +598,6 @@ def sanitize_agent_error(
exc: BaseException | None = None,
category: str | None = None,
stderr: str | None = None,
reason: str | None = None,
) -> str:
"""Render an agent-side failure into a user-safe error message.
@@ -638,18 +605,6 @@ def sanitize_agent_error(
category string (e.g. from `classify_subprocess_error`). If both are
given, `category` wins. If neither, the tag defaults to "unknown".
When ``reason`` is provided (internal#211/#212), it is a *pre-curated,
user-actionable, secret-safe* explanation built by the caller from a
provider-side failure e.g. a 403 "Your organization has disabled
Claude subscription access · Use an Anthropic API key instead, or ask
your admin to enable access" with error code ``oauth_org_not_allowed``.
This text is exactly what the user needs to self-serve, so it is
surfaced VERBATIM as the message instead of being collapsed to the
opaque exception class name. It still passes through the
key/token/bearer/path scrubber as a belt-and-braces second pass so a
buggy caller can't leak a credential that snuck into the reason.
``reason`` wins over ``stderr``; both lose to neither being set.
When ``stderr`` is provided (e.g. the first ~1 KB of a subprocess stderr
or HTTP error body), it is sanitized and appended to the output so the
A2A caller gets actionable context without needing to dig through workspace
@@ -664,13 +619,6 @@ def sanitize_agent_error(
else:
tag = "unknown"
if reason:
# Curated, user-actionable reason — surface it as the message.
# Still scrub: a 403/auth/quota message is safe, but the scrubber
# is cheap insurance against a caller that didn't curate cleanly.
clean = _sanitize_for_external(reason[:_MAX_STDERR_PREVIEW])
return f"Agent error ({tag}): {clean}"
if stderr:
# Truncate and sanitize before including — prevents DoS via
# a malicious or buggy peer injecting a huge error body, and
+9 -66
View File
@@ -26,14 +26,9 @@ Path safety:
a colliding name fails fast (the random prefix already makes
collisions astronomical, but defense-in-depth costs nothing).
Limits (SSOT matches the Go contract from chat_files.go, injected
via CHAT_UPLOAD_MAX_TOTAL_BYTES / CHAT_UPLOAD_MAX_FILE_BYTES at
provision time; falls back to legacy 50 MB / 25 MB when env unset):
- CHAT_UPLOAD_MAX_TOTAL_BYTES total request body (default 50 MB)
- CHAT_UPLOAD_MAX_FILE_BYTES per file (default 25 MB)
ALSO passed as Starlette ``max_part_size`` to override the
Starlette-1.0 default of 1 MiB which silently 400'd every
upload > 1 MiB before #1520 fix.
Limits (matches the Go contract from chat_files.go):
- 50 MB total request body
- 25 MB per file
- filename truncated to 100 chars
Response shape:
@@ -66,47 +61,14 @@ logger = logging.getLogger(__name__)
# keeps working unchanged.
CHAT_UPLOAD_DIR = "/workspace/.molecule/chat-uploads"
def _env_int(name: str, default: int) -> int:
"""Parse an int from the environment, falling back to ``default``.
Mis-formatted values (anything ``int()`` rejects) fall back to the
default rather than crashing module import operations needs to be
able to roll back a bad env-var push by simply removing the var,
not by also fixing a worker that won't boot.
"""
raw = os.environ.get(name)
if not raw:
return default
try:
return int(raw)
except (TypeError, ValueError):
logger.warning("internal_chat_uploads: env %s=%r not an int; using default %d", name, raw, default)
return default
# Total-request body cap. multipart/form-data with multiple parts can
# add ~100 bytes of framing per file; the cap is the bytes hitting the
# socket, including framing.
#
# SSOT (issue #1520): the source of truth is the Go constant
# chatUploadMaxBytes in workspace-server/internal/handlers/chat_files.go,
# exported to the workspace container as CHAT_UPLOAD_MAX_TOTAL_BYTES at
# provision time (workspace_provision_shared.go::applyChatUploadLimits).
# Unset env → keep the previous 50 MB default so an unprovisioned /
# locally-run workspace does NOT regress.
CHAT_UPLOAD_MAX_BYTES = _env_int("CHAT_UPLOAD_MAX_TOTAL_BYTES", 50 * 1024 * 1024)
CHAT_UPLOAD_MAX_BYTES = 50 * 1024 * 1024 # 50 MB
# Per-file cap. SSOT (issue #1520): exported from the Go side as
# CHAT_UPLOAD_MAX_FILE_BYTES; default 25 MB if env is unset so an older
# workspace provisioned before the env-injection landed keeps the
# legacy ceiling.
#
# This value is ALSO passed as Starlette's ``max_part_size`` (see
# ingest_handler below) — Starlette 1.0 defaults max_part_size to
# **1 MiB**, which is the actual root cause of #1520: any single file
# part above 1 MiB raised MultiPartException before per-file enforcement
# could fire. Wiring max_part_size to the same cap as per-file means
# the user-visible ceiling is exactly the per-file cap, no surprises.
CHAT_UPLOAD_MAX_FILE_BYTES = _env_int("CHAT_UPLOAD_MAX_FILE_BYTES", 25 * 1024 * 1024)
# Per-file cap. Keeping per-file under total lets a user attach, say,
# a 5 MB PDF + 10 small screenshots in a single batch.
CHAT_UPLOAD_MAX_FILE_BYTES = 25 * 1024 * 1024 # 25 MB
# Conservative {alnum, dot, underscore, dash} character class — anything
# outside gets rewritten so embedded paths, control chars, newlines,
@@ -184,30 +146,11 @@ async def ingest_handler(request: Request) -> JSONResponse:
status_code=413,
)
# max_part_size: Starlette 1.0 defaults to 1 MiB. Any single
# part above that raises MultiPartException BEFORE per-file
# enforcement can run — which silently broke every chat upload
# > 1 MiB (issue #1520, fleet-wide P0 2026-05-18). Wire it to
# the per-file cap so the user-visible ceiling matches what
# the per-file 413 path expects.
try:
form = await request.form(
max_files=64,
max_fields=32,
max_part_size=CHAT_UPLOAD_MAX_FILE_BYTES,
)
form = await request.form(max_files=64, max_fields=32)
except Exception as exc: # multipart parse error
logger.warning("internal_chat_uploads: multipart parse failed: %s", exc)
# Surface the exception detail (feedback_surface_actionable_failure_reason_to_user):
# MultiPartException strings ("Part exceeded maximum size of …",
# "Invalid boundary", "Too many parts", etc.) contain no secrets
# — they describe shape, not content. The 200-char cap is
# belt-and-braces against an exception class we haven't seen
# whose ``str()`` is unbounded.
return JSONResponse(
{"error": "failed to parse multipart form", "detail": str(exc)[:200]},
status_code=400,
)
return JSONResponse({"error": "failed to parse multipart form"}, status_code=400)
# Starlette's FormData allows multiple values per key — `files` may
# appear multiple times for batched uploads. getlist returns them
-59
View File
@@ -57,14 +57,12 @@ from a2a_tools import (
tool_commit_memory,
tool_delegate_task,
tool_delegate_task_async,
tool_get_runtime_identity,
tool_get_workspace_info,
tool_inbox_peek,
tool_inbox_pop,
tool_list_peers,
tool_recall_memory,
tool_send_message_to_user,
tool_update_agent_card,
tool_wait_for_message,
)
@@ -291,61 +289,6 @@ _GET_WORKSPACE_INFO = ToolSpec(
section=A2A_SECTION,
)
_GET_RUNTIME_IDENTITY = ToolSpec(
name="get_runtime_identity",
short=(
"Return this runtime's identity — model, model_provider, tier, "
"workspace_id, runtime template. Reads from process env; no HTTP call."
),
when_to_use=(
"Use this to answer 'what model am I?' truthfully instead of "
"guessing from a stale system prompt — the operator may have "
"routed you to a different model via persona env between boots. "
"Always permitted by RBAC: even read-only agents may know what "
"model they are. Distinct from get_workspace_info — that one "
"calls the platform for ID/role/tier/parent (workspace metadata); "
"this one returns the live process env (MODEL, MODEL_PROVIDER, "
"MOLECULE_MODEL, ANTHROPIC_BASE_URL, TIER, WORKSPACE_ID, "
"ADAPTER_MODULE)."
),
input_schema={"type": "object", "properties": {}},
impl=tool_get_runtime_identity,
section=A2A_SECTION,
)
_UPDATE_AGENT_CARD = ToolSpec(
name="update_agent_card",
short=(
"Replace this workspace's agent_card on the platform. The "
"platform validates required fields and broadcasts an "
"agent_card_updated event so the canvas reflects the change live."
),
when_to_use=(
"Use when the workspace's capabilities, skills, description, or "
"name change and the canvas display needs to follow. The "
"platform stores the new card and pushes an "
"``agent_card_updated`` event to subscribers. Gated behind the "
"``memory.write`` RBAC capability — read-only roles cannot "
"rewrite the card. Tier-1+ owners always have this capability."
),
input_schema={
"type": "object",
"properties": {
"card": {
"type": "object",
"description": (
"The new agent_card object (name, version, "
"description, skills, etc). Server-side validation "
"rejects payloads missing required fields."
),
},
},
"required": ["card"],
},
impl=tool_update_agent_card,
section=A2A_SECTION,
)
_BROADCAST_MESSAGE = ToolSpec(
name="broadcast_message",
short=(
@@ -699,8 +642,6 @@ TOOLS: list[ToolSpec] = [
_CHECK_TASK_STATUS,
_LIST_PEERS,
_GET_WORKSPACE_INFO,
_GET_RUNTIME_IDENTITY,
_UPDATE_AGENT_CARD,
_BROADCAST_MESSAGE,
_SEND_MESSAGE_TO_USER,
# Inbox (standalone-only; in-container returns informational error)
@@ -5,8 +5,6 @@
- **check_task_status**: Poll the status of a task started with delegate_task_async; returns result when done.
- **list_peers**: List the workspaces this agent can communicate with — name, ID, status, role for each.
- **get_workspace_info**: Get this workspace's own info — ID, name, role, tier, parent, status.
- **get_runtime_identity**: Return this runtime's identity — model, model_provider, tier, workspace_id, runtime template. Reads from process env; no HTTP call.
- **update_agent_card**: Replace this workspace's agent_card on the platform. The platform validates required fields and broadcasts an agent_card_updated event so the canvas reflects the change live.
- **broadcast_message**: Send a message to ALL agent workspaces in the org simultaneously. Requires broadcast_enabled=true on this workspace (set by user/admin).
- **send_message_to_user**: Send a message directly to the user's canvas chat — pushed instantly via WebSocket. Use this to: (1) acknowledge a task immediately ('Got it, I'll start working on this'), (2) send interim progress updates while doing long work, (3) deliver follow-up results after delegation completes, (4) attach files (zip, pdf, csv, image) for the user to download via the `attachments` field (NEVER paste file URLs in `message`). The message appears in the user's chat as if you're proactively reaching out.
- **wait_for_message**: Block until the next inbound message (canvas user OR peer agent) arrives, or until ``timeout_secs`` elapses.
@@ -29,12 +27,6 @@ Call this first when you need to delegate but don't know the target's ID. Access
### get_workspace_info
Use to introspect your own identity (e.g. before reporting back to the user, or to determine whether you're a tier-0 root that can write GLOBAL memory).
### get_runtime_identity
Use this to answer 'what model am I?' truthfully instead of guessing from a stale system prompt — the operator may have routed you to a different model via persona env between boots. Always permitted by RBAC: even read-only agents may know what model they are. Distinct from get_workspace_info — that one calls the platform for ID/role/tier/parent (workspace metadata); this one returns the live process env (MODEL, MODEL_PROVIDER, MOLECULE_MODEL, ANTHROPIC_BASE_URL, TIER, WORKSPACE_ID, ADAPTER_MODULE).
### update_agent_card
Use when the workspace's capabilities, skills, description, or name change and the canvas display needs to follow. The platform stores the new card and pushes an ``agent_card_updated`` event to subscribers. Gated behind the ``memory.write`` RBAC capability — read-only roles cannot rewrite the card. Tier-1+ owners always have this capability.
### broadcast_message
Use for urgent, org-wide signals: critical status changes, emergency stop instructions, coordinated task announcements. Every non-removed workspace receives the message in its activity log (poll-mode agents see it on their next poll; push-mode canvases get a real-time banner). This tool returns an error if broadcast_enabled is false — a user or admin must enable it via the workspace abilities settings first.
-390
View File
@@ -1,390 +0,0 @@
"""Tests for ``tool_get_runtime_identity`` and ``tool_update_agent_card``.
These two MCP tools close the T4-tier workspace owner-permission gaps
reported via the canvas:
- the agent could not update its own ``agent_card`` (no MCP tool
wrapped the existing ``POST /registry/update-card`` endpoint);
- the agent could not identify which model it was running (the
``MODEL`` env var is injected by ``provisioner.workspace_provision``
but nothing surfaced it back to the agent).
Ported from molecule-ai-workspace-runtime PR#17 (mirror-only repo;
canonical edit point per ``reference_runtime_repo_is_mirror_only``).
Adapted to core's conventions:
* tool functions return ``str`` (JSON-encoded), matching every other
tool in ``a2a_tools_*`` modules. Tests ``json.loads`` to inspect.
* permission check ``memory.write`` runs inline in
``tool_update_agent_card`` (same pattern as
``a2a_tools_memory.tool_commit_memory``).
* ``WORKSPACE_ID`` is read directly from ``os.environ`` core does
not have the runtime's validated-cache layer (``molecule_runtime.
builtin_tools.validation``).
"""
from __future__ import annotations
import json
import pytest
# --- Drift gate: re-export aliases on a2a_tools ------------------------------
class TestBackCompatAliases:
"""Pin that ``a2a_tools.tool_*`` resolves to the same callable as
``a2a_tools_identity.tool_*``. Refactor wrapping (e.g. a doc-string
wrapper that loses the function identity) silently breaks call
sites that ``patch("a2a_tools.tool_update_agent_card", ...)``
this gate makes that drift fail fast."""
def test_tool_get_runtime_identity_alias(self):
import a2a_tools
import a2a_tools_identity
assert a2a_tools.tool_get_runtime_identity is a2a_tools_identity.tool_get_runtime_identity
def test_tool_update_agent_card_alias(self):
import a2a_tools
import a2a_tools_identity
assert a2a_tools.tool_update_agent_card is a2a_tools_identity.tool_update_agent_card
# --- tool_get_runtime_identity ----------------------------------------------
class TestGetRuntimeIdentity:
"""The tool returns env-derived runtime identity. No HTTP call."""
@pytest.mark.asyncio
async def test_returns_all_known_env_fields(self, monkeypatch):
from a2a_tools_identity import tool_get_runtime_identity
monkeypatch.setenv("MODEL", "claude-opus-4-7")
monkeypatch.setenv("MODEL_PROVIDER", "anthropic")
monkeypatch.setenv("TIER", "T4")
monkeypatch.setenv("WORKSPACE_ID", "ws-abc")
monkeypatch.setenv("ADAPTER_MODULE", "adapter")
monkeypatch.setenv("MOLECULE_MODEL", "claude-opus-4-7")
monkeypatch.setenv("ANTHROPIC_BASE_URL", "https://api.anthropic.com")
out = await tool_get_runtime_identity()
# MCP tools return JSON-encoded strings (matches the contract
# every other tool_* in a2a_tools_* uses).
assert isinstance(out, str)
parsed = json.loads(out)
assert parsed["model"] == "claude-opus-4-7"
assert parsed["model_provider"] == "anthropic"
assert parsed["tier"] == "T4"
assert parsed["workspace_id"] == "ws-abc"
assert parsed["runtime"] == "adapter"
assert parsed["molecule_model"] == "claude-opus-4-7"
assert parsed["anthropic_base_url"] == "https://api.anthropic.com"
@pytest.mark.asyncio
async def test_missing_env_returns_empty_strings(self, monkeypatch):
"""Tool MUST NOT raise when env vars are absent — every key is
present but the value is the empty string. The agent then knows
the slot exists but is unset."""
from a2a_tools_identity import tool_get_runtime_identity
for var in (
"MODEL", "MODEL_PROVIDER", "TIER", "WORKSPACE_ID",
"ADAPTER_MODULE", "MOLECULE_MODEL", "ANTHROPIC_BASE_URL",
):
monkeypatch.delenv(var, raising=False)
parsed = json.loads(await tool_get_runtime_identity())
assert parsed["model"] == ""
assert parsed["model_provider"] == ""
assert parsed["tier"] == ""
assert parsed["workspace_id"] == ""
assert parsed["runtime"] == ""
assert parsed["molecule_model"] == ""
assert parsed["anthropic_base_url"] == ""
@pytest.mark.asyncio
async def test_no_http_call_made(self, monkeypatch):
"""``get_runtime_identity`` is env-only — must not open
httpx.AsyncClient even if the call would otherwise succeed.
Tripwire any client construction."""
import httpx
from a2a_tools_identity import tool_get_runtime_identity
class _Tripwire:
def __init__(self, *_a, **_kw):
raise AssertionError(
"tool_get_runtime_identity must not open httpx.AsyncClient"
)
monkeypatch.setattr(httpx, "AsyncClient", _Tripwire)
# Must not raise.
await tool_get_runtime_identity()
@pytest.mark.asyncio
async def test_helper_dict_matches_string_payload(self, monkeypatch):
"""``_runtime_identity_payload`` is the dict-returning helper
used by both the public tool and tests. Verify the public tool
json.dumps the same dict no field is dropped or renamed by
the encoding step."""
from a2a_tools_identity import (
_runtime_identity_payload,
tool_get_runtime_identity,
)
monkeypatch.setenv("MODEL", "claude-opus-4-7")
monkeypatch.setenv("TIER", "T4")
monkeypatch.setenv("WORKSPACE_ID", "ws-helper-check")
helper = _runtime_identity_payload()
tool_str = await tool_get_runtime_identity()
assert json.loads(tool_str) == helper
# --- tool_update_agent_card -------------------------------------------------
class _MockResponse:
def __init__(self, status_code: int, payload: dict):
self.status_code = status_code
self._payload = payload
self.text = json.dumps(payload)
def json(self):
return self._payload
class _MockClient:
"""Drop-in for httpx.AsyncClient context manager.
Records the URL + json body + headers the tool POSTed so the test
can assert against them. Returns the canned _MockResponse passed
in at construction time.
"""
def __init__(self, *, response: _MockResponse, captured: dict):
self._response = response
self._captured = captured
async def __aenter__(self):
return self
async def __aexit__(self, *_args):
return False
async def post(self, url, *, json=None, headers=None, **_kw): # noqa: A002
self._captured["url"] = url
self._captured["json"] = json
self._captured["headers"] = headers
return self._response
@pytest.fixture
def _grant_memory_write(monkeypatch):
"""Force the inline RBAC gate inside ``tool_update_agent_card`` to
succeed. The gate calls
``a2a_tools_rbac.check_memory_write_permission`` which inspects
``$MOLECULE_ROLES`` / the role table; the patch sidesteps that
machinery so tests can focus on the platform-call shape.
"""
import a2a_tools_identity
monkeypatch.setattr(
a2a_tools_identity, "_check_memory_write_permission", lambda: True
)
class TestUpdateAgentCard:
@pytest.mark.asyncio
async def test_posts_to_registry_update_card(
self, monkeypatch, _grant_memory_write,
):
"""Hits POST {PLATFORM_URL}/registry/update-card with the
workspace bearer and the {workspace_id, agent_card} body shape
the platform handler expects (workspace-server
``internal/handlers/registry.go``)."""
import a2a_tools_identity
monkeypatch.setenv("WORKSPACE_ID", "ws-42")
# Ensure PLATFORM_URL re-import sees a deterministic value —
# a2a_client imports it at module load so we patch the symbol
# on a2a_tools_identity directly (the module's own reference).
monkeypatch.setattr(a2a_tools_identity, "PLATFORM_URL", "http://test.invalid")
captured: dict = {}
response = _MockResponse(200, {"status": "updated"})
def _client_factory(*_a, **_kw):
return _MockClient(response=response, captured=captured)
monkeypatch.setattr(a2a_tools_identity.httpx, "AsyncClient", _client_factory)
monkeypatch.setattr(
a2a_tools_identity, "_auth_headers_for_heartbeat",
lambda: {"Authorization": "Bearer ws-token-xyz"},
)
card = {"name": "agent-foo", "version": "0.1.0", "description": "demo"}
result_str = await a2a_tools_identity.tool_update_agent_card(card)
result = json.loads(result_str)
# URL: PLATFORM_URL + /registry/update-card
assert captured["url"] == "http://test.invalid/registry/update-card"
# The platform handler expects {workspace_id, agent_card}; the
# agent_card is the raw object the agent submitted.
body = captured["json"]
assert body["workspace_id"] == "ws-42"
assert body["agent_card"] == card
# Auth header from auth_headers_for_heartbeat is forwarded
# verbatim — same path commit_memory uses.
assert captured["headers"]["Authorization"] == "Bearer ws-token-xyz"
assert result["success"] is True
assert result["status"] == "updated"
@pytest.mark.asyncio
async def test_propagates_server_error(
self, monkeypatch, _grant_memory_write,
):
"""Non-200 from platform surfaces as a structured error to the
agent. The agent sees {success:false, status_code, error} and
can decide whether to retry, fall back, or escalate."""
import a2a_tools_identity
monkeypatch.setenv("WORKSPACE_ID", "ws-42")
monkeypatch.setattr(a2a_tools_identity, "PLATFORM_URL", "http://test.invalid")
captured: dict = {}
response = _MockResponse(400, {"error": "invalid card"})
monkeypatch.setattr(
a2a_tools_identity.httpx, "AsyncClient",
lambda *a, **kw: _MockClient(response=response, captured=captured),
)
monkeypatch.setattr(
a2a_tools_identity, "_auth_headers_for_heartbeat", lambda: {},
)
result = json.loads(
await a2a_tools_identity.tool_update_agent_card({"name": "x"})
)
assert result["success"] is False
assert result["status_code"] == 400
assert "invalid card" in str(result["error"]).lower()
@pytest.mark.asyncio
async def test_rejects_non_dict_card(self, _grant_memory_write):
"""The MCP schema constrains transport callers to pass a dict;
in-process callers (tests, sibling modules) can still pass any
type. Reject non-dict defensively so the platform isn't asked
to validate JSON-encoded strings or lists."""
from a2a_tools_identity import tool_update_agent_card
result = json.loads(await tool_update_agent_card("not-a-dict"))
assert result["success"] is False
assert "dict" in str(result["error"]).lower()
@pytest.mark.asyncio
async def test_workspace_id_missing_returns_error(
self, monkeypatch, _grant_memory_write,
):
"""If WORKSPACE_ID is not set the tool refuses to issue the
request it would otherwise POST with an empty workspace_id
and let the platform return a confusing 400."""
from a2a_tools_identity import tool_update_agent_card
monkeypatch.delenv("WORKSPACE_ID", raising=False)
result = json.loads(await tool_update_agent_card({"name": "x"}))
assert result["success"] is False
assert "workspace_id" in str(result["error"]).lower()
@pytest.mark.asyncio
async def test_denies_when_memory_write_permission_missing(self, monkeypatch):
"""The agent's RBAC role must grant ``memory.write`` to update
the card. Read-only roles get an RBAC error string back
immediately, never touching the platform."""
import a2a_tools_identity
monkeypatch.setenv("WORKSPACE_ID", "ws-42")
monkeypatch.setattr(
a2a_tools_identity, "_check_memory_write_permission", lambda: False,
)
# Tripwire httpx — must not be called when RBAC denies.
import httpx
class _Tripwire:
def __init__(self, *_a, **_kw):
raise AssertionError("RBAC denial must short-circuit before httpx call")
monkeypatch.setattr(httpx, "AsyncClient", _Tripwire)
result = json.loads(
await a2a_tools_identity.tool_update_agent_card({"name": "x"}),
)
assert result["success"] is False
assert "memory.write" in str(result["error"]).lower()
@pytest.mark.asyncio
async def test_network_exception_returns_structured_error(
self, monkeypatch, _grant_memory_write,
):
"""A network exception (DNS failure, connect timeout, etc) is
wrapped into a structured error dict instead of bubbling up
to the MCP transport layer."""
import a2a_tools_identity
monkeypatch.setenv("WORKSPACE_ID", "ws-42")
monkeypatch.setattr(a2a_tools_identity, "PLATFORM_URL", "http://test.invalid")
class _ExplodingClient:
async def __aenter__(self):
return self
async def __aexit__(self, *_a):
return False
async def post(self, *_a, **_kw):
raise RuntimeError("simulated DNS failure")
monkeypatch.setattr(
a2a_tools_identity.httpx, "AsyncClient",
lambda *a, **kw: _ExplodingClient(),
)
result = json.loads(
await a2a_tools_identity.tool_update_agent_card({"name": "x"})
)
assert result["success"] is False
assert "network" in str(result["error"]).lower()
# --- Registry contract ------------------------------------------------------
class TestRegistryContract:
"""Pin the new tools' registration in platform_tools.registry. The
structural tests in ``test_platform_tools.py`` already check
registryMCP alignment; these are tighter assertions specific to
the two new tools so a future contributor deleting one entry sees
a focused failure."""
def test_get_runtime_identity_in_registry(self):
from platform_tools.registry import by_name
spec = by_name("get_runtime_identity")
assert spec.section == "a2a"
# No input parameters — env-only call.
assert spec.input_schema == {"type": "object", "properties": {}}
# impl points at the actual tool function, not a shim.
from a2a_tools_identity import tool_get_runtime_identity
assert spec.impl is tool_get_runtime_identity
def test_update_agent_card_in_registry(self):
from platform_tools.registry import by_name
spec = by_name("update_agent_card")
assert spec.section == "a2a"
assert "card" in spec.input_schema["properties"]
assert spec.input_schema["required"] == ["card"]
from a2a_tools_identity import tool_update_agent_card
assert spec.impl is tool_update_agent_card
-117
View File
@@ -788,123 +788,6 @@ def test_sanitize_agent_error_stderr_combined_with_existing_tests():
assert "workspace logs" in out
# ─── reason passthrough (internal#211/#212: surface actionable provider error) ───
def test_sanitize_agent_error_reason_surfaced_verbatim():
"""A curated provider reason is shown to the user, not collapsed to the
exception class name. This is the internal#211 regression: a 403
org-disabled message must reach the canvas."""
reason = (
"provider HTTP 403 — oauth_org_not_allowed — Your organization has "
"disabled Claude subscription access for Claude Code · Use an "
"Anthropic API key instead, or ask your admin to enable access"
)
class _ResultErr(Exception):
pass
out = sanitize_agent_error(exc=_ResultErr("opaque"), reason=reason)
# The actionable provider guidance and status code must be visible.
assert "403" in out
assert "oauth_org_not_allowed" in out
assert "disabled Claude subscription access" in out
assert "ask your admin to enable access" in out
# NOT the old opaque form.
assert "see workspace logs" not in out
def test_sanitize_agent_error_reason_still_scrubs_secrets():
"""Even on the reason path the key/token scrubber runs — a buggy caller
that lets a bearer token into the reason still gets it redacted."""
leaky = (
"provider HTTP 401 — auth failed — Authorization: Bearer "
"PLACEHOLDER_LONG_TOKEN_0123456789abcdefghijklm please re-auth"
)
out = sanitize_agent_error(reason=leaky)
assert "[REDACTED]" in out
assert "PLACEHOLDER_LONG_TOKEN_0123456789abcdefghijklm" not in out
# The non-secret guidance still survives the scrub.
assert "401" in out
assert "please re-auth" in out
def test_sanitize_agent_error_reason_scrubs_all_secret_formats():
"""The scrubber must redact every realistic credential shape — not just
the `Bearer <tok>` form the original test happened to exercise
(internal#212 review finding: bare `sk-ant-api03-…` keys, JSON-quoted
"token"/"apiKey" values, and `aws_secret_access_key=` all leaked).
All curated/actionable guidance must still survive the scrub.
"""
# 1. Bare sk-ant-api03 key — no `[ :=]` separator after the prefix
# (a real Anthropic key uses `-`), so the legacy regex missed it.
bare = (
"provider HTTP 401 — auth failed — invalid key "
"sk-FAKEPLACEHOLDERabcdefghijklmnopqrstuvwxy0123456789 "
"please re-auth"
)
out = sanitize_agent_error(reason=bare)
assert "sk-FAKEPLACEHOLDERabcdefghijklmnopqrstuvwxy0123456789" not in out
assert "[REDACTED]" in out
assert "401" in out # actionable status survives
assert "please re-auth" in out # actionable guidance survives
# 2. JSON-quoted "token" / "apiKey" values.
jblob = (
'provider error — config dump {"token": '
'"abcDEF0123456789ghIJKL0123456789mnopQRST", "apiKey": '
'"anon_fakefakefakefakefakefakefakefakefakefake"} — '
"use an API key instead"
)
out = sanitize_agent_error(reason=jblob)
assert "abcDEF0123456789ghIJKL0123456789mnopQRST" not in out
assert "anon_fakefakefakefakefakefakefakefakefakefake" not in out
assert "[REDACTED]" in out
assert "use an API key instead" in out # actionable guidance survives
# 3. aws_secret_access_key=… form.
awsblob = (
"provider HTTP 403 — boto credential error "
"aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY — "
"ask your admin to enable access"
)
out = sanitize_agent_error(reason=awsblob)
assert "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" not in out
assert "[REDACTED]" in out
assert "403" in out # actionable status survives
assert "ask your admin to enable access" in out # guidance survives
# 4. Regression: the original Bearer form still redacts.
# Uses PLACEHOLDER_LONG_TOKEN (>=40 chars, no sk-ant- prefix) to avoid
# triggering the secret-scan workflow pattern
# `sk-ant-[A-Za-z0-9_-]{40,}`.
bearer = (
"provider HTTP 401 — Authorization: Bearer "
"PLACEHOLDER_LONG_TOKEN_9876543210abcdefghij re-auth"
)
out = sanitize_agent_error(reason=bearer)
assert "PLACEHOLDER_LONG_TOKEN_9876543210abcdefghij" not in out
assert "[REDACTED]" in out
assert "re-auth" in out
def test_sanitize_agent_error_reason_wins_over_stderr():
"""When both reason and stderr are passed, the curated reason wins."""
out = sanitize_agent_error(
reason="provider HTTP 403 — use an API key",
stderr="raw subprocess noise that should not be shown",
)
assert "use an API key" in out
assert "raw subprocess noise" not in out
def test_sanitize_agent_error_no_reason_unchanged():
"""Omitting reason preserves the original generic behavior."""
out = sanitize_agent_error(exc=ValueError("boom"))
assert "ValueError" in out
assert "workspace logs" in out
# ======================================================================
# classify_subprocess_error
@@ -299,122 +299,3 @@ def test_symlink_at_target_is_refused(client: TestClient, chat_uploads_dir: Path
assert r.status_code == 500, r.text
# Sentinel content unchanged — the symlink wasn't followed.
assert sentinel.read_bytes() == b"original"
# ───────────── issue #1520: max_part_size + SSOT env-driven caps ─────────────
def test_part_above_starlette_1mib_default_is_accepted(client: TestClient, chat_uploads_dir: Path):
"""Regression: pre-fix, ANY single multipart part > 1 MiB raised
MultiPartException because the ingest handler called
``request.form()`` without ``max_part_size`` and Starlette 1.0's
default is 1 MiB (issue #1520, fleet-wide P0 2026-05-18).
This test sends a 2 MiB part, which is well below the 25 MB default
per-file cap but ABOVE the Starlette default, so it pins the fix:
we now pass ``max_part_size=CHAT_UPLOAD_MAX_FILE_BYTES`` so the
parser uses the same cap the per-file 413 path expects.
"""
payload = b"a" * (2 * 1024 * 1024) # 2 MiB — > Starlette 1 MiB default
r = client.post(
"/internal/chat/uploads/ingest",
files={"files": ("big-but-allowed.bin", payload)},
headers={"Authorization": "Bearer test-secret"},
)
assert r.status_code == 200, r.text
item = r.json()["files"][0]
assert item["size"] == len(payload)
def test_parse_error_surfaces_exception_detail(client: TestClient):
"""Per feedback_surface_actionable_failure_reason_to_user: the 400
body must include a ``detail`` field naming WHICH multipart error
fired. The MultiPartException strings ("Part exceeded maximum size
of ", "Invalid boundary", "Too many parts", etc.) describe SHAPE
not content no secrets.
We trigger a real Starlette MultiPartException by submitting a body
whose Content-Type advertises ``multipart/form-data`` but whose
body is not a valid multipart envelope the parser raises before
any per-file check can fire.
"""
r = client.post(
"/internal/chat/uploads/ingest",
content=b"this is not a valid multipart body",
headers={
"Authorization": "Bearer test-secret",
"Content-Type": "multipart/form-data; boundary=----not-a-real-boundary",
},
)
assert r.status_code == 400, r.text
body = r.json()
assert body["error"] == "failed to parse multipart form"
# Detail must be present + non-empty + bounded.
assert "detail" in body and isinstance(body["detail"], str)
assert body["detail"], "detail must not be empty"
assert len(body["detail"]) <= 200, "detail must be bounded"
def test_total_cap_413_still_fires_above_per_file_pass(client: TestClient, monkeypatch: pytest.MonkeyPatch):
"""Total-cap 413 path still works: two parts whose sum exceeds
CHAT_UPLOAD_MAX_BYTES but each individually fits the per-file cap.
Sanity-check that raising the per-file ceiling didn't accidentally
short-circuit the total-cap check.
"""
monkeypatch.setattr(internal_chat_uploads, "CHAT_UPLOAD_MAX_BYTES", 1024)
monkeypatch.setattr(internal_chat_uploads, "CHAT_UPLOAD_MAX_FILE_BYTES", 800)
r = client.post(
"/internal/chat/uploads/ingest",
files=[
("files", ("a.bin", b"a" * 600)),
("files", ("b.bin", b"b" * 600)),
],
headers={"Authorization": "Bearer test-secret"},
)
assert r.status_code == 413
# Either early (Content-Length pre-parse) or post-parse cumulative path is
# acceptable; both messages mention exceeding the total limit.
err = r.json()["error"]
assert "exceeds" in err and "limit" in err, err
def test_env_driven_ssot_overrides_caps(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
"""SSOT contract: setting CHAT_UPLOAD_MAX_FILE_BYTES /
CHAT_UPLOAD_MAX_TOTAL_BYTES in the environment at module import
time changes the module constants. Pin so the
workspace_provision_shared.go::applyChatUploadLimits env injection
cannot silently drift from what the Python side reads.
"""
import importlib
monkeypatch.setenv("CHAT_UPLOAD_MAX_FILE_BYTES", str(7 * 1024 * 1024))
monkeypatch.setenv("CHAT_UPLOAD_MAX_TOTAL_BYTES", str(13 * 1024 * 1024))
reloaded = importlib.reload(internal_chat_uploads)
try:
assert reloaded.CHAT_UPLOAD_MAX_FILE_BYTES == 7 * 1024 * 1024
assert reloaded.CHAT_UPLOAD_MAX_BYTES == 13 * 1024 * 1024
finally:
# Reset to defaults so subsequent tests see clean constants.
monkeypatch.delenv("CHAT_UPLOAD_MAX_FILE_BYTES", raising=False)
monkeypatch.delenv("CHAT_UPLOAD_MAX_TOTAL_BYTES", raising=False)
importlib.reload(internal_chat_uploads)
def test_env_driven_ssot_malformed_value_falls_back_to_default(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
"""If ops pushes a garbage value the worker still boots with the
in-code default (operability over precision see _env_int
docstring). Pin the fallback.
"""
import importlib
monkeypatch.setenv("CHAT_UPLOAD_MAX_FILE_BYTES", "not-an-int")
monkeypatch.setenv("CHAT_UPLOAD_MAX_TOTAL_BYTES", "") # empty == use default
reloaded = importlib.reload(internal_chat_uploads)
try:
# Defaults (legacy 25 MB / 50 MB) come back.
assert reloaded.CHAT_UPLOAD_MAX_FILE_BYTES == 25 * 1024 * 1024
assert reloaded.CHAT_UPLOAD_MAX_BYTES == 50 * 1024 * 1024
finally:
monkeypatch.delenv("CHAT_UPLOAD_MAX_FILE_BYTES", raising=False)
monkeypatch.delenv("CHAT_UPLOAD_MAX_TOTAL_BYTES", raising=False)
importlib.reload(internal_chat_uploads)