Commit Graph

3015 Commits

Author SHA1 Message Date
rabbitblood
ad5295cd8a feat(workspaces): wire max_concurrent_tasks from template config.yaml (#1408)
Phase 4 of #1408 (active_tasks counter). Runtime increment/decrement,
schema column (037), and scheduler enforcement (scheduler.go:312)
already shipped — but the write path from template config.yaml +
direct API was missing, so every workspace silently fell through to
the schema default of 1. Leaders that set max_concurrent_tasks: 3 in
their org template were getting 1 anyway, defeating the entire
feature for the use case it was built for (cron-vs-A2A contention on
PM/lead workspaces).

- OrgWorkspace gains MaxConcurrentTasks (yaml + json tags)
- CreateWorkspacePayload gains MaxConcurrentTasks (json tag)
- Both INSERTs now write the column unconditionally; 0/omitted
  payload value falls back to 1 (schema default mirror) so the wire
  stays single-shape — no forked column list / goto.
- Existing Create-handler test mocks updated to expect the 11th arg.
- New TestWorkspaceCreate_MaxConcurrentTasksOverride locks the
  payload→DB propagation for the leader case (value=3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 11:03:01 -07:00
Hongming Wang
5b346ab3e7
Merge pull request #2104 from Molecule-AI/test/ssrf-devmode-rfc1918-followup
test(ssrf): pin dev-mode RFC-1918 allow contract (follow-up to #2103)
2026-04-26 17:35:05 +00:00
Hongming Wang
762d3b8b2c test(ssrf): pin dev-mode RFC-1918 allow contract (follow-up to #2103)
PR #2103 widened the SSRF saasMode branch to also relax RFC-1918 + ULA
under MOLECULE_ENV=development (so the docker-compose dev pattern stops
rejecting workspace registrations on 172.18.x.x bridge IPs). The
existing TestIsSafeURL_DevMode_StillBlocksOtherRanges covered the
security floor (metadata / TEST-NET / CGNAT stay blocked), but no
test asserted the positive side — that 10.x / 172.x / 192.168.x / fd00::
ARE now allowed under dev mode.

Without this test, a future refactor that quietly drops the
`|| devModeAllowsLoopback()` from isPrivateOrMetadataIP wouldn't trip
any assertion, and the docker-compose dev loop would silently re-break.

Adds TestIsSafeURL_DevMode_AllowsRFC1918 — table of 4 URLs covering
the three RFC-1918 IPv4 ranges + IPv6 ULA fd00::/8. Sets
MOLECULE_DEPLOY_MODE=self-hosted explicitly so the test exercises the
devMode branch, not a SaaS-mode pass.

Closes the Optional finding I left on PR #2103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:32:33 -07:00
Hongming Wang
61c16fe657
Merge pull request #2103 from Molecule-AI/runtime/cd-chain
feat: runtime CD chain + queued/drain classification + reload-safe agent messages
2026-04-26 17:21:54 +00:00
Hongming Wang
0de67cd379 feat(platform/admin): /admin/workspace-images/refresh + Docker SDK + GHCR auth
The production-side end of the runtime CD chain. Operators (or the post-
publish CI workflow) hit this after a runtime release to pull the latest
workspace-template-* images from GHCR and recreate any running ws-* containers
so they adopt the new image. Without this, freshly-published runtime sat in
the registry but containers kept the old image until naturally cycled.

Implementation notes:
- Uses Docker SDK ImagePull rather than shelling out to docker CLI — the
  alpine platform container has no docker CLI installed.
- ghcrAuthHeader() reads GHCR_USER + GHCR_TOKEN env, builds the base64-
  encoded JSON payload Docker engine expects in PullOptions.RegistryAuth.
  Both empty → public/cached images only; both set → private GHCR pulls.
- Container matching uses ContainerInspect (NOT ContainerList) because
  ContainerList returns the resolved digest in .Image, not the human tag.
  Inspect surfaces .Config.Image which is what we need.
- Provisioner.DefaultImagePlatform() exported so admin handler picks the
  same Apple-Silicon-needs-amd64 platform as the provisioner — single
  source of truth for the multi-arch override.

Local-dev companion: scripts/refresh-workspace-images.sh runs on the
host and inherits the host's docker keychain auth — alternate path for
when GHCR_USER/TOKEN aren't set in the platform env.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:17:21 -07:00
Hongming Wang
50decfd326 chore(compose): wire MOLECULE_ENV, GHCR_USER/TOKEN, MOLECULE_IMAGE_PLATFORM
Three env vars the platform now reads:

- MOLECULE_ENV=development (default) — activates the WorkspaceAuth /
  AdminAuth dev fail-open path so the canvas's bearer-less requests pass
  through. Also unlocks RFC-1918 relaxation in the SSRF guard so docker-
  bridge IPs work. Override to 'production' for staged deploys.

- GHCR_USER + GHCR_TOKEN — feed POST /admin/workspace-images/refresh's
  ImagePull auth payload. Both empty → endpoint can pull cached/public
  images only. Set with a fine-grained PAT (read:packages on Molecule-AI
  org) to pull private GHCR images.

- MOLECULE_IMAGE_PLATFORM=linux/amd64 (default) — workspace-template-*
  images ship single-arch amd64. On Apple Silicon hosts, the daemon's
  native linux/arm64/v8 request misses the manifest and pulls fail.
  Forcing amd64 makes Docker Desktop run them under Rosetta — slower
  (~2-3×) but functional.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:14:47 -07:00
Hongming Wang
09972486e8 fix(platform/notify): persist agent send_message_to_user pushes
Pre-fix, POST /workspaces/:id/notify (the side-channel agents use to push
interim updates and follow-up results) only broadcast via WebSocket — no
DB write. When the user refreshed the page, the chat-history loader
(which queries activity_logs) couldn't restore those messages and they
vanished from the chat.

Hits the most common path: when the platform's POST /a2a times out (idle),
the runtime keeps working and eventually pushes its reply via
send_message_to_user. The reply rendered live but disappeared on reload.

Fix: also INSERT an activity_logs row with shape the existing loader
already understands (type=a2a_receive, source_id=NULL, response_body=
{result: text}). Persistence is best-effort — a DB hiccup doesn't block
the WebSocket push (which the user is already seeing).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:14:47 -07:00
Hongming Wang
7ed50824b6 fix(platform/ssrf): allow RFC-1918 in MOLECULE_ENV=development
The docker-compose dev pattern puts platform and workspace containers on
the same docker bridge network (172.18.0.0/16, RFC-1918). The runtime
registers via its docker-internal hostname which DNS-resolves to a
172.18.x.x IP. The SSRF defence's isPrivateOrMetadataIP rejected those,
so every workspace POST through the platform proxy returned
'workspace URL is not publicly routable' — breaking the entire docker-
compose dev loop.

Fix: in isPrivateOrMetadataIP, treat MOLECULE_ENV=development the same
as SaaS mode for RFC-1918 relaxation. Both share the 'trusted intra-
network routing' property — SaaS is sibling EC2s in the same VPC, dev
is sibling containers on the same docker bridge. Always-blocked
categories (metadata link-local, TEST-NET, CGNAT) stay blocked.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:14:47 -07:00
Hongming Wang
d97d7d4768 fix(platform/delegation): classify queued response + stitch drain result back
When proxyA2A returns 202+{queued:true} (target busy → enqueued for drain
on next heartbeat), executeDelegation previously treated it as a successful
completion and ran extractResponseText on the queued JSON. The result was
'Delegation completed (workspace agent busy — request queued, will dispatch...)'
landing in activity_logs.summary, which the LLM then echoed to the user
chat as garbage.

Two fixes:
1. delegation.go: detect queued shape via new isQueuedProxyResponse helper,
   write status='queued' with clean summary 'Delegation queued — target at
   capacity', store delegation_id in response_body so the drain can stitch
   back later. Also embed delegation_id in params.message.metadata + use it
   as messageId so the proxy's idempotency-key path keys off the same id.

2. a2a_queue.go: when DrainQueueForWorkspace successfully drains a queued
   item, extract delegation_id from the body's metadata and UPDATE the
   originating delegate_result row (queued → completed with real
   response_body). Broadcast DELEGATION_COMPLETE so the canvas chat feed
   flips the queued line to completed in real time.

Closes the loop so check_task_status reflects ground truth instead of
perpetual 'queued' even after the queued request eventually drained.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:14:19 -07:00
Hongming Wang
4dd9e2b846
Merge pull request #2102 from Molecule-AI/test/e2e-invalid-api-key-pattern-1900
test(e2e): add 'Invalid API key' regression assertion to staging A2A check (#1900)
2026-04-26 17:06:03 +00:00
Hongming Wang
1ae051ec95 test(e2e): add 'Invalid API key' regression assertion to staging A2A check (#1900)
The staging E2E suite already grep's for 5 known regression patterns
in the A2A response (hermes-agent 401, model_not_found, Encrypted
content, Unknown provider, hermes-agent unreachable). The comment
block at lines 386-395 lists "Invalid API key" as the signal for the
CP #238 boot-event 401 race + stale OPENAI_API_KEY paths, but the
explicit grep was never added — meaning a regression in that class
would slip through the generic `error|exception` catch-all.

Closes the gap with one specific-pattern check that fails loud with
the relevant bug references in the message.

Verified `bash -n` clean; pre-existing shellcheck SC2015 at line 88
is unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:03:46 -07:00
Hongming Wang
d949b5b323
Merge pull request #2101 from Molecule-AI/test/broadcaster-interface-1814
test(handlers): introduce events.EventEmitter interface (#1814 partial)
2026-04-26 16:08:25 +00:00
Hongming Wang
7d48f24fef test(handlers): introduce events.EventEmitter interface (#1814 partial)
The 3 skipped tests in workspace_provision_test.go (#1206 regression
tests) were blocked because captureBroadcaster's struct-embed wouldn't
type-check against WorkspaceHandler.broadcaster's concrete
*events.Broadcaster field. This PR fixes the interface blocker for
the 2 broadcaster-related tests; the 3rd (plugins.Registry resolver)
is a separate blocker tracked elsewhere.

Changes:

- internal/events/broadcaster.go: define `EventEmitter` interface with
  RecordAndBroadcast + BroadcastOnly. *Broadcaster satisfies it via
  its existing methods (compile-time assertion guards future drift).
  SubscribeSSE / Subscribe stay off the interface because only sse.go
  + cmd/server/main.go call them, and both still hold the concrete
  *Broadcaster.

- internal/handlers/workspace.go: WorkspaceHandler.broadcaster type
  changes from *events.Broadcaster to events.EventEmitter.
  NewWorkspaceHandler signature updated to match. Production callers
  unchanged — they pass *events.Broadcaster, which the interface
  accepts.

- internal/handlers/activity.go: LogActivity takes events.EventEmitter
  for the same reason — tests passing a stub no longer need to
  construct the full broadcaster.

- internal/handlers/workspace_provision_test.go: captureBroadcaster
  drops the struct embed (no more zero-value Broadcaster underlying
  the SSE+hub fields), implements RecordAndBroadcast directly, and
  adds a no-op BroadcastOnly to satisfy the interface. Skip messages
  on the 2 empty broadcaster-blocked tests updated to reflect the
  new "interface unblocked, test body still needed" state.

Verified `go build ./...`, `go test ./internal/handlers/`, and
`go vet ./...` all clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 09:05:52 -07:00
Hongming Wang
d86eabdd58
Merge pull request #2100 from Molecule-AI/fix/token-cache-toctou-1552
fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552)
2026-04-26 15:24:34 +00:00
Hongming Wang
dafe08450b
Merge pull request #2099 from Molecule-AI/fix/staging-e2e-tls-timeout
fix(e2e): bump staging tenant TLS-readiness timeout 3min → 10min
2026-04-26 15:24:01 +00:00
Hongming Wang
fc2720c1fe fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552)
The token-cache helper had three #1552 findings, all in the
mode-600-after-the-fact pattern:

1. _write_cache writes .tmp with default umask (typically 022 → 644
   on disk) and then chmod 600's after the mv. A concurrent reader
   in that microsecond-wide window sees the token at mode 644.
2. Each chmod was swallowed via `|| true` — if it ever fails, the
   tokens stay world-readable with no operator signal.
3. _refresh_gh's gh_token_file write has the same shape and same
   two issues.

Hardening:

- Wrap the .tmp creates in a `umask 077` block so the files are 600
  from creation. Restore the previous umask before return so callers
  aren't perturbed.
- Replace `chmod ... 2>/dev/null || true` with `if ! chmod ...; then
  echo WARN ...; fi`. A chmod failure is a real signal worth grep'ing.
- Apply the same pattern to the _refresh_gh gh_token_file path.
  `local` is illegal in a top-level case branch, so use a uniquely-
  named global (_gh_prev_umask) and unset it after.

Verified `bash -n` clean and `shellcheck` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:22:29 -07:00
rabbitblood
f9b1b34956 fix(e2e): bump staging tenant TLS-readiness timeout 3min → 10min
Closes a 4+ cycle Canvas tabs E2E flake pattern that's been blocking
staging→main PRs since 2026-04-24+ (#2096, #2094, #2055, #2079, ...).

Root cause: TLS_TIMEOUT_MS=180s (3 min) is too tight for the layered
realities of staging tenant TLS readiness:

1. Cloudflare DNS propagation through the edge (1-2 min typical)
2. Tenant CF Tunnel registering the new hostname (1-2 min)
3. CF edge ACME cert provisioning + cache (1-3 min)

Each layer can add 1-3 min on its own under heavy staging load — the
realistic worst case is well past the 3-min cap.

Provision and workspace-online timeouts were already raised to 20 min
(staging-setup.ts:42-46 history). The TLS gate was the remaining
under-budgeted step. Bumping to 10 min keeps it inside the 20-min
PROVISION envelope so a genuinely-stuck tenant still fails loud at
the earlier provision step rather than masquerading as a TLS issue.

Both call sites raised together:
- canvas/e2e/staging-setup.ts: TLS_TIMEOUT_MS = 10 * 60 * 1000
- tests/e2e/test_staging_full_saas.sh: TLS_DEADLINE += 600

Each carries an inline rationale comment so the next reviewer sees
the layer-by-layer decomposition without re-reading the issue thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:21:18 -07:00
Hongming Wang
7c8be5cac2
Merge pull request #2098 from Molecule-AI/fix/sweep-cf-orphans-noise
fix(ci): stop sweep-cf-orphans noise — drop merge_group + soft-skip when secrets unset
2026-04-26 15:08:35 +00:00
Hongming Wang
f1792e1f7a fix(ci): stop sweep-cf-orphans noise — drop merge_group + soft-skip when secrets unset
The sweep-cf-orphans workflow shipped in #2088 was noisier than
intended in two ways. This PR fixes both — was filed under the
Optional finding I left on the original review and now matters because
the noise is observably hitting the merge queue.

1) `merge_group: types: [checks_requested]` was firing the entire
   sweep job on every PR through the merge queue. The original intent
   ("future required-check support without a workflow edit") never
   materialized, and meanwhile every recent merge-queue eval (#2091,
   #2092, #2093, #2094, #2095, #2097) generated a red `Sweep CF
   orphans (merge_group)` run.

   Drop the trigger. Comment in the workflow explains the re-add path
   if/when the workflow IS wired as a required check (re-add the
   trigger AND gate the actual sweep step with
   `if: github.event_name != 'merge_group'` so merge-queue evals are
   no-op success).

2) The `Verify required secrets present` step exits 2 when the 6
   secrets aren't configured yet (the PR body's post-merge step,
   still pending). That turns the hourly schedule into an hourly red
   CI run for as long as the secrets stay unset.

   Convert to a soft skip: emit a `:⚠️:` listing the missing
   secrets and set a `skip=true` step output, then gate the sweep
   step with `if: steps.verify.outputs.skip != 'true'`. Workflow
   reports green and ops still sees the warning when they review
   recent runs.

Net effect:
- merge-queue evals stop generating spurious red runs
- the schedule reports green-with-warning until secrets land
- once secrets land, behavior is identical to today's (real sweep
  runs, hard-fails if a secret is later removed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:05:53 -07:00
Hongming Wang
0a2c8e25bf
Merge pull request #2097 from Molecule-AI/fix/ssrf-dispatch-a2a-1483
fix(a2a): isSafeURL guard inside dispatchA2A (closes #1483)
2026-04-26 14:21:26 +00:00
Hongming Wang
fd891a147e fix(a2a): isSafeURL guard inside dispatchA2A (closes #1483)
#1483 flagged that dispatchA2A() doesn't call isSafeURL internally —
the guard exists only at the caller level (resolveAgentURL at
a2a_proxy.go:424). The primary call path through proxyA2ARequest
is safe today, but if any future code path ever calls dispatchA2A
directly without going through resolveAgentURL, the SSRF check
would be silently bypassed.

This adds the one-line defense-in-depth guard the issue prescribed:

  if err := isSafeURL(agentURL); err != nil {
      return nil, nil, &proxyDispatchBuildError{err: err}
  }

Wrapping as *proxyDispatchBuildError preserves the existing caller
error-classification path — the same shape that maps to 500 elsewhere.

Adds TestDispatchA2A_RejectsUnsafeURL pinning the contract:
re-enables SSRF for the test (setupTestDB disables it for normal
unit tests), passes a metadata IP, asserts the build error returns
and cancel is nil so no resource is leaked.

The 4 existing dispatchA2A unit tests use setupTestDB → SSRF
disabled, so they continue passing unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 07:18:58 -07:00
Hongming Wang
a8c9644618
Merge pull request #2094 from Molecule-AI/feat/server-side-provision-timeout-2054-phase2
feat(workspace-server): surface provision_timeout_ms in workspace API (#2054 phase 2)
2026-04-26 13:53:18 +00:00
Hongming Wang
6c72b8ec68
Merge pull request #2095 from Molecule-AI/fix/ssrf-discoverhostpeer-1484
fix(discovery): isSafeURL guard on registered URLs (closes #1484)
2026-04-26 13:53:06 +00:00
Hongming Wang
2b76f7dfcb fix(discovery): isSafeURL guard on registered URLs (closes #1484)
#1484 flagged that discoverHostPeer() and writeExternalWorkspaceURL()
return URLs sourced from the workspaces table without an isSafeURL
check. Workspace runtimes register their own URLs via /registry/register
— a misbehaving / compromised runtime could register a metadata-IP URL.
Today both functions are gated by Phase 30.6 bearer-required Discover,
so exposure is theoretical. The fix makes them safe regardless of
upstream auth shape.

Changes:
- discoverHostPeer: isSafeURL on resolved URL before responding;
  503 + log on rejection.
- writeExternalWorkspaceURL: same guard applied to the post-rewrite
  outURL (so a host.docker.internal rewrite is checked AND a
  metadata-IP that survived the rewrite untouched is rejected).
- 3 new regression tests:
  * RejectsMetadataIPURL on host-peer path (169.254.169.254 → 503)
  * AcceptsPublicURL on host-peer path (8.8.8.8 → 200; positive
    counterpart so the rejection test can't pass via universal-fail)
  * RejectsMetadataIPURL on external-workspace path

setupTestDB already disables SSRF checks via setSSRFCheckForTest,
so the 16+ existing discovery tests remain untouched. Only the new
tests opt in to enabled SSRF.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:50:36 -07:00
rabbitblood
f1ad012024 refactor(handlers): apply simplify findings on PR #2094
- Extract walkTemplateConfigs(configsDir, fn) shared helper. Both
  templates.List and loadRuntimeProvisionTimeouts walked configsDir
  + parsed config.yaml — same boilerplate twice. Now centralised so
  a future template-discovery rule (subdir naming, README sentinel,
  etc.) lands in one place.
- templates.List uses the walker — net -10 lines.
- loadRuntimeProvisionTimeouts uses the walker — net -10 lines.
- Document runtimeProvisionTimeoutsCache as 'NOT SAFE for
  package-level reuse' so a future change doesn't accidentally
  promote it to a singleton (sync.Once can't be reset → tests
  would lock out other fixtures).

Skipped (review finding): atomic.Pointer[map[string]int] for
future hot-reload. The doc comment already documents the
limitation; YAGNI-promoting the primitive now would buy a
not-yet-built feature at the cost of more code today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:40:15 -07:00
rabbitblood
27396d992c feat(workspace-server): surface provision_timeout_ms in workspace API (#2054 phase 2)
Phase 2 of #2054 — workspace-server reads runtime-level
provision_timeout_seconds from template config.yaml manifests and
includes provision_timeout_ms in the workspace List/Get response.
Phase 1 (canvas, #2092) already plumbs the field through socket →
node-data → ProvisioningTimeout's resolver, so the moment a
template declares the field the per-runtime banner threshold
adjusts without a canvas release.

Implementation:

- templates.go: parse runtime_config.provision_timeout_seconds in
  the templateSummary marshaller. The /templates API now surfaces
  the field too — useful for ops dashboards and future tooling.
- runtime_provision_timeouts.go (new): loadRuntimeProvisionTimeouts
  scans configsDir, parses every immediate subdir's config.yaml,
  returns runtime → seconds. Multiple templates with the same
  runtime: max wins (so a slow template's threshold doesn't get
  cut by a fast template's). Bad/empty inputs are silently
  skipped — workspace-server starts cleanly with no templates.
- runtimeProvisionTimeoutsCache: sync.Once-backed lazy cache.
  First workspace API request after process start pays the read
  cost (~few KB across ~50 templates); every subsequent request is
  a map lookup. Cache lifetime = process lifetime; invalidates on
  workspace-server restart, which is the normal template-change
  cadence.
- WorkspaceHandler gets a provisionTimeouts field (zero-value struct
  is valid — the cache lazy-inits on first get()).
- addProvisionTimeoutMs decorates the response map with
  provision_timeout_ms (seconds × 1000) when the runtime has a
  declared timeout. Absent = no key in the response, canvas falls
  through to its runtime-profile default. Wired into both List
  (per-row decoration in the loop) and Get.

Tests (5 new in runtime_provision_timeouts_test.go):
- happy path: hermes declares 720, claude-code doesn't, only
  hermes appears in the map
- max-on-duplicate: same runtime in two templates → max wins
- skip-bad-inputs: missing runtime, zero timeout, malformed yaml,
  loose top-level files all silently ignored
- missing-dir: returns empty map, no crash
- cache: lazy-init on first get; subsequent gets hit cache even
  after underlying file changes (sync.Once contract); unknown
  runtime returns zero

Phase 3 (separate template-repo PR): template-hermes config.yaml
declares provision_timeout_seconds: 720 under runtime_config.
canvas RUNTIME_PROFILES.hermes becomes redundant + removable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:37:45 -07:00
Hongming Wang
f4cbb50ddf
Merge pull request #2093 from Molecule-AI/test/python-coverage-floor-1817
test(workspace): centralize pytest-cov config + 92% floor (closes #1817)
2026-04-26 13:27:05 +00:00
Hongming Wang
5d294936b3 fixup: lower coverage floor 92→86 to match post-omit measurement
The 97% number from CI run 24956647701 was measured WITHOUT a
.coveragerc omit list. Once this PR's prescribed omit set is in
effect (`*/__init__.py`, `*/tests/*`, `plugins_registry/*` — files
that don't carry behavior), the actual measurement of behavior-bearing
code on the same staging snapshot is 91.11% (run 24957664272).

86% sits at the issue's prescribed `current − 5pp` margin and
unblocks CI without lowering the bar in real terms.
2026-04-26 06:24:36 -07:00
Hongming Wang
e8c87e9f72
Merge pull request #2092 from Molecule-AI/feat/per-node-provision-timeout-2054
feat(canvas): per-workspace provision_timeout_ms override (#2054 phase 1)
2026-04-26 13:22:48 +00:00
Hongming Wang
355355a80a test(workspace): centralize pytest-cov config + 92% floor (closes #1817)
The Python workspace already runs pytest-cov in CI but with no
threshold and inline-flagged config. CI run 24956647701 (2026-04-26
staging) reports 97% coverage on the package — well above the issue's
75% target. The actionable gap is locking in a floor so a regression
can't sneak past, and centralizing config so local `pytest` matches CI.

Changes:

- workspace/pytest.ini — coverage flags moved into addopts (-q,
  --cov=., --cov-report=term-missing, --cov-fail-under=92).
  92% = current 97% measurement minus the 5pp safety margin
  the issue's Step 3 prescribes.

- workspace/.coveragerc (new) — [run] omit list and [report]
  skip_covered. coverage.py doesn't read pytest.ini sections, so
  the omit config has to live here.

- .github/workflows/ci.yml — removed the inline --cov flags from the
  Python Lint & Test step; now reads from pytest.ini. Workflow stays
  the same single-command shape, just simpler.

Result: any PR that drops coverage below 92% fails CI loudly. Floor
ratchets up by replacing 92 with current measurement on a future
test-writing pass — same shape as Go coverage gates landed elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:21:22 -07:00
rabbitblood
6b9be7b086 docs(provisioning): clarify separator-safety contract for the serialized-node string
simplify-review note: the |/,-delimited node string is brittle if a
future string-typed field is added without sanitization. Document
which fields are user-typed (name — already sanitized) vs primitive
(id is UUID, runtime is a slug, provisionTimeoutMs is numeric) so
the next field-add doesn't accidentally introduce an injection
vector for the splitter.

Skipped (false-positive review finding): the agent flagged the
prop > runtime-profile order as inconsistent with the docstring,
but the docstring explicitly lists the prop at #2 (between node and
runtime-profile) — matches both the implementation AND the original
behavior pre-#2054 (the prop was 'timeoutMs ?? runtime-profile').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:05:47 -07:00
rabbitblood
1a273f21f5 feat(canvas): per-workspace provision_timeout_ms override (#2054)
Phase 1 of moving runtime UX knobs server-side. Builds the canvas
foundation: a workspace can carry its own provision_timeout_ms
(sourced server-side from a template manifest in a follow-up PR),
and ProvisioningTimeout's resolver respects it per-node.

Today the resolver had Props-level timeoutMs that applied to ALL
nodes — fine for tests but wrong for production where one batch
could mix runtimes (hermes 12-min cold boot alongside docker 2-min).
The runtime profile fallback already handles per-runtime defaults;
this PR adds the per-WORKSPACE override layer above that.

Resolution priority (most specific wins):
  1. node.provisionTimeoutMs — server-declared per-workspace
     override (this PR's new field)
  2. timeoutMs prop — single-threshold test override
  3. runtime profile in @/lib/runtimeProfiles
  4. DEFAULT_RUNTIME_PROFILE

Changes:
- WorkspaceData (socket): add optional provision_timeout_ms
- WorkspaceNodeData: add optional provisionTimeoutMs
- canvas-topology hydrate: thread the field through to node.data
- ProvisioningTimeout: extend the serialized-string node iteration
  to carry provisionTimeoutMs (4-field positional split); pass as
  the second arg to provisionTimeoutForRuntime
- 3 new tests in ProvisioningTimeout.test.tsx covering hydrate
  threading, null fall-through, and resolver priority

Phase 2 (separate PR, blocked on workspace-server template-config
loader): workspace-server reads provision_timeout_seconds from
template config.yaml at provision time, includes
provision_timeout_ms in the workspace API/socket response. Phase 3
(template-repo PR): template-hermes config.yaml declares
provision_timeout_seconds: 720; canvas RUNTIME_PROFILES.hermes
becomes redundant and can be removed.

19/19 tests pass (3 new + 16 existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:02:56 -07:00
Hongming Wang
dff14c010e
Merge pull request #2091 from Molecule-AI/fix/bare-except-a2a-executor-1787
fix(a2a): document the metadata-attach except-pass in a2a_executor (closes #1787)
2026-04-26 12:25:07 +00:00
Hongming Wang
76d0f8d004 fix(a2a): document the metadata-attach except-pass in a2a_executor (closes #1787)
GitHub Code Quality bot flagged the empty `except (AttributeError,
TypeError): pass` at workspace/a2a_executor.py:424 as a nit on PR #1783.
The suppression IS intentional — `new_agent_text_message()` returns
a plain string in MagicMock paths in tests where assignment to
`.metadata` raises despite hasattr being true.

This:
  - Adds a why-comment citing the test-mock motivation, commit
    dcbcf19 (the original guard), and issue #1787 so the next
    code-quality pass doesn't re-flag it.
  - Adds `logger.debug("metadata attach skipped (non-Message ...")`
    for observability — debug-level so production logs stay quiet
    but ops can flip the level if metadata loss is ever suspected.

Behavior unchanged. 43 existing a2a_executor tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 05:23:00 -07:00
Hongming Wang
889cc2f9fe
Merge pull request #2089 from Molecule-AI/test/wsauth-canvasorbearer-coverage-1818
test(middleware): branch coverage for CanvasOrBearer + IsSameOriginCanvas (closes #1818)
2026-04-26 11:26:17 +00:00
Hongming Wang
246ad0a48e
Merge pull request #2088 from Molecule-AI/feat/sweep-cf-orphans-workflow-cp239
ops(cf): hourly sweep workflow for orphan Cloudflare DNS records (#239)
2026-04-26 11:25:52 +00:00
Hongming Wang
eb42f7d145 test(middleware): branch coverage for CanvasOrBearer + IsSameOriginCanvas (closes #1818)
Per the 2026-04-23 audit, wsauth_middleware.go had two coverage holes
on auth-boundary code:

  CanvasOrBearer       50.0% (only fail-open + Origin paths covered)
  IsSameOriginCanvas    0.0% (exported wrapper never exercised)

This adds focused tests for the missing branches:

  CanvasOrBearer:
    - ValidBearer_Passes              (path-1 success)
    - InvalidBearer_Returns401        (auth-escape regression: bad
                                        bearer + matching Origin must
                                        NOT fall through to Origin)
    - AdminTokenEnv_Passes            (ADMIN_TOKEN constant-time match)
    - DBError_FailOpen                (documented fail-open behavior)
    - SameOriginCanvas_Passes         (path-3 combined-tenant image)

  IsSameOriginCanvas / isSameOriginCanvas:
    - ExportedWrapper_DelegatesToInternal
    - DisabledByEnv                   (CANVAS_PROXY_URL unset short-circuit)
    - BranchCoverage                  (table-driven: 11 host/referer/origin
                                        cases incl. the h.example.com.evil.com
                                        suffix-attack rejection)

Coverage moves CanvasOrBearer 50% → 100%, IsSameOriginCanvas 0% → 100%,
and middleware-package overall 81.6% → 86.0%. No production code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 04:23:24 -07:00
rabbitblood
0ae6b201b4 refactor(ci): apply simplify findings on PR #2088
- Drop redundant 'aws --version' step. Script's own 'aws ec2
  describe-instances' fails just as loud with a more actionable
  error; the pre-check added ~1s with no signal value.
- timeout-minutes 10 → 3. Realistic worst case is ~2min (4 curls +
  1 aws + N×CF-DELETE each individually capped at 10s by the
  script's curl -m flag). 3 surfaces hangs within one cron tick
  instead of burning the full interval.
- Document the schedule-vs-dispatch dry-run asymmetry inline so
  the next reader doesn't need to trace input defaults.
- Add merge_group: types: [checks_requested] for queue parity with
  runtime-pin-compat.yml — cheap insurance if this ever becomes a
  required check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 04:18:24 -07:00
rabbitblood
3c18b76aa7 ops(cf): hourly sweep workflow for orphan Cloudflare DNS records (#239)
Closes Molecule-AI/molecule-controlplane#239.

CF zone hit the 200-record quota 2026-04-23+ — every E2E and canary
left a record on moleculesai.app, and no scheduled job pruned them.
Provisions started failing with code 81045 ('Record quota exceeded').

The sweep-cf-orphans.sh script (PR #1978, with decision-function
unit tests added in #2079) already exists but no workflow fires it.
Adding it here as a parallel janitor to sweep-stale-e2e-orgs.yml:

- hourly schedule at :15 (offset from the e2e-orgs sweep at :00 so
  the two converge cleanly without racing the same CP admin endpoint)
- workflow_dispatch with dry_run input default true (ad-hoc verify
  without committing to deletes)
- workflow_dispatch with max_delete_pct input for major cleanups
  (the script's own MAX_DELETE_PCT defaults to 50% as a safety gate)
- concurrency group prevents schedule + manual-dispatch from racing
  the same zone

Why a separate workflow vs sweep-stale-e2e-orgs.yml:
- That workflow drives DELETE /cp/admin/tenants/:slug, assumes CP
  has the org row. Doesn't catch records left when CP itself never
  knew about the tenant (canary scratch, manual ops experiments)
  or when the CP-side cascade's CF-delete branch failed.
- sweep-cf-orphans.sh enumerates the CF zone directly + matches
  against live CP slugs + AWS EC2 names. Catches what the CP-driven
  sweep can't.

Required secrets (will need to be set on the repo): CF_API_TOKEN,
CF_ZONE_ID, CP_PROD_ADMIN_TOKEN, CP_STAGING_ADMIN_TOKEN,
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. Pre-flight verify-secrets
step fails loud if any are missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 04:16:43 -07:00
Hongming Wang
8dc965c3b0
Merge pull request #2087 from Molecule-AI/test/handlers-tokens-coverage-1819
test(handlers): sqlmock coverage for tokens.go (closes #1819)
2026-04-26 09:53:03 +00:00
Hongming Wang
28d7649c48 test(handlers): sqlmock coverage for tokens.go (closes #1819)
The existing tokens_test.go skips every test when db.DB is nil, so CI
ran with 0% coverage on tokens.go's List/Create/Revoke. This file adds
sqlmock-driven tests that exercise the SQL paths directly without
needing a live Postgres, lifting coverage on all 4 functions to 100%
and module-level handler coverage from 60.3% → 61.1%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:50:42 -07:00
Hongming Wang
1e7f8ebb1b
Merge pull request #2079 from Molecule-AI/feat/test-sweep-cf-decide-2027
test(ops): unit tests for sweep-cf-orphans decide() (#2027)
2026-04-26 09:21:45 +00:00
Hongming Wang
4e90f3f5b7
Merge pull request #2081 from Molecule-AI/fix/peers-q-filter-1038
fix(discovery): apply ?q= filter to Peers list (#1038)
2026-04-26 09:21:44 +00:00
Hongming Wang
c07a71523b
Merge pull request #2083 from Molecule-AI/feat/runtime-pin-compat-gate-cp253
test(ci): runtime + a2a-sdk pin compatibility gate (controlplane#253)
2026-04-26 09:21:42 +00:00
Hongming Wang
b232015eee
Merge pull request #2085 from Molecule-AI/test/compliance-default-2059
test(config): lock ComplianceConfig default to owasp_agentic (#2059)
2026-04-26 09:21:41 +00:00
Hongming Wang
966821b7d8
Merge pull request #2086 from Molecule-AI/fix/provisioner-nil-guards-1813
fix(provisioner): nil guards on Stop/IsRunning, unblock contract tests (closes #1813)
2026-04-26 09:20:22 +00:00
Hongming Wang
48b494def3 fix(provisioner): nil guards on Stop/IsRunning, unblock contract tests (closes #1813)
Both backends panicked when called on a zero-valued or nil receiver:
Provisioner.{Stop,IsRunning} dereferenced p.cli; CPProvisioner.{Stop,
IsRunning} dereferenced p.httpClient. The orphan sweeper and shutdown
paths can call these speculatively where the receiver isn't fully
wired — the panic crashed the goroutine instead of the caller seeing
a clean error.

Three changes:

1. Add ErrNoBackend (typed sentinel) and nil-guard the four methods.
   - Provisioner.{Stop,IsRunning}: guard p == nil || p.cli == nil at
     the top.
   - CPProvisioner.Stop: guard p == nil up top, then httpClient nil
     AFTER resolveInstanceID + empty-instance check (the empty
     instance_id path doesn't need HTTP and stays a no-op success
     even on zero-valued receivers — preserved historical contract
     from TestIsRunning_EmptyInstanceIDReturnsFalse).
   - CPProvisioner.IsRunning: same shape — empty instance_id stays
     (false, nil); httpClient-nil with non-empty instance_id returns
     ErrNoBackend.

2. Flip the t.Skip on TestDockerBackend_Contract +
   TestCPProvisionerBackend_Contract — both contract tests run now
   that the panics are gone. Skipped scenarios were the regression
   guard for this fix.

3. Add TestZeroValuedBackends_NoPanic — explicit assertion that
   zero-valued and nil receivers return cleanly (no panic). Docker
   backend always returns ErrNoBackend on zero-valued; CPProvisioner
   may return (false, nil) when the DB-lookup layer absorbs the case
   (no instance to query → no HTTP needed). Both are acceptable per
   the issue's contract — the gate is no-panic.

Tests:
  - 6 sub-cases across the new TestZeroValuedBackends_NoPanic
  - TestDockerBackend_Contract + TestCPProvisionerBackend_Contract
    now run their 2 scenarios (4 sub-cases each)
  - All existing provisioner tests still green
  - go build ./... + go vet ./... + go test ./... clean

Closes drift-risk #6 in docs/architecture/backends.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:17:51 -07:00
rabbitblood
4a4a740804 refactor(test_config): parametrize the 3 yaml-default cases (simplify on #2085)
Collapses test_compliance_default_when_yaml_omits_block,
_when_yaml_block_is_empty, _explicit_optout_still_works into one
parametrized test_compliance_default_via_load_config with three
ids (yaml_omits_block, yaml_block_empty, yaml_explicit_optout).

The dataclass-default test stays separate (no tmp_path needed).

Coverage and assertions identical; net -19 lines, same 4 logical cases.
prompt_injection check moves out of per-case to a single tail-assert
since no payload overrode it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:03:59 -07:00
rabbitblood
577294b8f4 test(config): lock ComplianceConfig default to owasp_agentic (#2059)
PR #2056 flipped ComplianceConfig.mode default from "" to "owasp_agentic"
so every shipped template gets prompt-injection detection + PII redaction
by default. The flip is correct + already shipping, but no test asserts
the new default — a silent revert (or a refactor that reintroduces the
old "" default) would pass workspace/tests/ and ship a workspace with
compliance silently off.

Add 4 regression tests:

- test_compliance_dataclass_default — ComplianceConfig() with no args
  returns mode='owasp_agentic' + prompt_injection='detect'
- test_compliance_default_when_yaml_omits_block — load_config on a yaml
  without `compliance:` key still produces owasp_agentic
- test_compliance_default_when_yaml_block_is_empty — load_config on
  `compliance: {}` (a common shape during template editing) still
  produces owasp_agentic; covers the load_config()
  `.get("mode", "owasp_agentic")` default-fill path
- test_compliance_explicit_optout_still_works — `mode: ""` in yaml
  must disable compliance (the documented opt-out path)

23/23 tests pass locally (4 new + 19 existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:01:57 -07:00
rabbitblood
5ce7af2d2c fix(ci): set WORKSPACE_ID for the runtime-pin smoke import
platform_auth.py validates WORKSPACE_ID at module load — EC2 user-data
sets it from cloud-init, but the CI smoke-test was missing it and
failed with 'WORKSPACE_ID is empty'. Set a placeholder UUID so the
import gate exercises only the dep-resolution path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 01:59:56 -07:00