Commit Graph

2847 Commits

Author SHA1 Message Date
Hongming Wang
46fbffb95b fix(canvas/e2e): raise staging-setup deadline 15 min → 20 min
Matches tests/e2e/test_staging_full_saas.sh's 20-min budget (#1930).
Canvas E2E was still stuck at 900s (15 min) which regularly flakes on
tenant cold boots in 12-15 min range — especially on staging where
workspace-server image pulls + AMI bootstrapping add 3-5 min vs prod.

Concrete blocker: 2026-04-24 staging→main sync (#1981) kept failing on
"tenant provision: timed out after 900s" in canvas/e2e/staging-setup.ts
despite the actual sync E2E going green. Canvas-side timeout was
strictly tighter than the sync-side timeout.

Also raises WORKSPACE_ONLINE_TIMEOUT_MS to 20 min to cover the case
where the workspace EC2 is provisioned but hermes cold-install (apt +
uv + hermes-agent clone + gateway boot) takes longer than the original
10-min budget — matches the 20-min workspace deadline in SaaS E2E.

No behavior change when things are fast. Just covers the tail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:26:13 -07:00
Hongming Wang
3770d4d68c
Merge pull request #2005 from Molecule-AI/chore/remove-forbidden-marketing-paths
chore: remove all forbidden marketing paths from staging (unblocks #1981)
2026-04-24 07:58:31 +00:00
Molecule AI App & Docs Lead
561b1c2c0d chore: remove all forbidden marketing/docs/marketing paths from staging
71 files across docs/marketing/ and marketing/ are blocked by the
Block-internal-flavored-paths CI gate (CEO directive 2026-04-23).
These paths must live in Molecule-AI/internal, not the public monorepo.

Unblocks PR #1981 (staging→main sync).

Public-facing blog/devrel content should be re-added via correct paths:
  docs/blog/<slug>.md, docs/devrel/<slug>.md, docs/tutorials/<slug>.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:52:04 +00:00
Hongming Wang
0a70430b5c
Merge pull request #2004 from Molecule-AI/feat/list-templates-loud-on-half-clone
feat(org): log loud when org-template dir is a half-clone
2026-04-24 07:42:10 +00:00
rabbitblood
d0080b0e98 feat(org): log loud when org-template dir is a half-clone
Audit 2026-04-24 case: org-templates/molecule-dev/ contained only .git/
(working tree wiped). ListTemplates silently skipped the directory and
the molecule-dev template silently disappeared from the Canvas palette.
No log trail; CEO discovered hours later when looking for the registry
listing manually.

This commit adds a one-line log warning when a directory under orgDir
has a .git/ subdir but no org.yaml/.yml — that's almost always a manifest
clone that got truncated. The warning includes the recovery command
(`git checkout main -- .`) so operators can self-fix without re-cloning.

Doesn't change the response behavior — the directory is still skipped
to keep ListTemplates a fail-soft endpoint. Just makes the failure
visible in `docker logs platform`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:39:11 -07:00
Hongming Wang
92ce37ae99
Merge pull request #2003 from Molecule-AI/ci/gh-wrapper-identity-shim
ci(gh-wrapper): translate --assignee @me → --label team:<role> (fixes #1957)
2026-04-24 07:36:36 +00:00
Hongming Wang
b5c93cff4f
Merge pull request #2002 from Molecule-AI/ci/merge-group-trigger-linter
ci: linter to catch missing merge_group triggers on required workflows
2026-04-24 07:35:23 +00:00
rabbitblood
7b662d2494 ci(gh-wrapper): translate --assignee @me → --label team:<role>
Fixes #1957. All agents share one PAT, so `gh issue create --assignee @me`
resolves to the CEO. Today's "6 issues @me for 7 cycles" defect signal
turned out to be CEO-load misclassified as team-stagnation.

Translation rules:
- `--assignee @me` → `--label team:<role-slug>`
- `--reviewer @me` → dropped (review-bot scans labels, not requests)
- `--assignee user` (real user) → unchanged

role-slug derived from GIT_AUTHOR_NAME ("Molecule AI Core-BE" → "core-be").
The wrapper already handled the title-prefix + body-footer transforms;
these are just two more cases in the existing arg-walk loop.

Backward compat: any agent prompt that doesn't use @me passes through
unchanged. Agents don't need prompt updates — the wrapper is transparent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:34:21 -07:00
Hongming Wang
3bbcc96bce
Merge pull request #2000 from Molecule-AI/fix/tenant-image-staging-latest-autobump
ci(publish-image): auto-tag :staging-latest so CP picks up new builds
2026-04-24 07:33:12 +00:00
rabbitblood
5ddeca2c0a ci: add linter that fails when required workflow lacks merge_group trigger
Pre-merge guard against the deadlock pattern that hit twice today:
adding a workflow's check to required_status_checks while the workflow
itself doesn't have a `merge_group:` trigger → merge queue stalls
forever in AWAITING_CHECKS because the required check can't fire on
gh-readonly-queue/* refs.

Each time today this happened it cost 30-60min of debug + a hot-fix PR
+ temporary removal of the required check. This workflow runs on every
PR touching .github/workflows/ and on push to staging/main, listing
required checks for staging and verifying each one's owning workflow
declares merge_group.

Self-listens on merge_group so the linter passes its own queue runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:33:05 -07:00
Hongming Wang
24bfced630 ci(publish-image): also tag :staging-latest so CP auto-picks up new builds
Root cause of the 2026-04-24 all-day E2E failure chain: Railway staging
CP had TENANT_IMAGE pinned to :staging-a14cf86 — a static SHA that had
silently drifted 10+ days stale. Every new tenant (including every E2E
run's fresh tenant) was spawned with that stale image, which predated
applyRuntimeModelEnv. Without applyRuntimeModelEnv, HERMES_DEFAULT_MODEL
never reached the workspace EC2 user-data, so install.sh fell back to
nousresearch/hermes-4-70b → openrouter → 401 "Missing Authentication
header" in every A2A reply.

Four correct fixes shipped today all got shadowed by this single stale
pin:
  • template-hermes#19 (provider priority for openai/*)
  • template-hermes#20 (decouple prefix-strip from bridge guard)
  • molecule-controlplane#247 (force fresh /opt/adapter clone)
  • molecule-core#1987 (E2E pins HERMES_CUSTOM_* as workaround)

Fix: publish each main build under both :staging-<sha> AND :staging-latest.
Change Railway staging CP's TENANT_IMAGE env to :staging-latest (done via
`railway variables --set` as part of this incident). Future main builds
then auto-propagate to new tenant provisions without any human in the
loop.

Safety: :staging-latest is the "most recent main build" — NOT a
canary-verified promotion. That distinction is preserved:
  • Prod tenants still pull :latest (canary-verified, retagged by
    canary-verify.yml only after the canary fleet green-lights a digest)
  • Staging tenants now pull :staging-latest (every main build, pre-canary)

So staging becomes the canary: if a :staging-latest build regresses,
the staging canary fleet catches it before it can be promoted to :latest
for prod. This is what the canary design intended; the missing
:staging-latest tag was the hole.

Zero impact on image size / build time: Docker tags point at the same
digest, no duplicate push.

Follow-up: filed an issue tracking the need for CP's TENANT_IMAGE to
NEVER be pinned to a SHA in any environment — it must always float on a
named tag (:staging-latest for staging, :latest for prod).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:29:55 -07:00
Hongming Wang
5f85c7f567
Merge pull request #1997 from Molecule-AI/ci/block-paths-merge-group-trigger
ci: add merge_group trigger to block-internal-paths workflow
2026-04-24 07:21:46 +00:00
Hongming Wang
757337d644
Merge pull request #1613 from Molecule-AI/docs/saas-federation-tutorial
docs(tutorial): SaaS federation — multi-tenant control plane setup
2026-04-24 07:21:39 +00:00
rabbitblood
d9f69a8fd5 ci: add merge_group trigger to block-internal-paths workflow
Re-do of the fix that was originally bundled into PR #1995 but never
landed — the second commit on that branch got rejected by GH006
(branch locked by merge queue) after the first commit was already
queued. Only the file-removal commit made it to staging.

Without this trigger, adding "Block forbidden paths" to
required_status_checks deadlocks the queue: every PR sits in
AWAITING_CHECKS forever waiting on a check that can't fire on
gh-readonly-queue/* refs.

Sequence to land safely:
1. (already done) Removed "Block forbidden paths" from required_status_checks
2. (this PR) Add merge_group trigger
3. (after merge) Re-add "Block forbidden paths" to required_status_checks

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:19:38 -07:00
molecule-ai[bot]
b0676756c9
Merge pull request #1950 from Molecule-AI/fix/1947-stale-queue-cleanup
fix(admin/a2a_queue): drop-stale endpoint for post-incident queue cleanup
2026-04-24 07:05:54 +00:00
Hongming Wang
f46844d6b0
Merge pull request #1923 from Molecule-AI/docs/mcp-server-list-og-v2
docs(blog + assets): MCP Server List blog post + OG image (1200×630 dark tech)
2026-04-24 07:05:54 +00:00
molecule-ai[bot]
a92d32f320
Merge pull request #1860 from Molecule-AI/docs/phase34-community-launch
docs(community): Phase 34 launch content — Reddit/HN/Discord posts + FAQ
2026-04-24 07:05:54 +00:00
molecule-ai[bot]
82d15f4d33
Merge pull request #1859 from Molecule-AI/content-marketer/phase34-launch-post-v2
docs(marketing): Phase 34 launch post v2 — governance-first + tool trace
2026-04-24 07:05:54 +00:00
Hongming Wang
a5a054e861
Merge pull request #1995 from Molecule-AI/fix/remove-leaked-marketing-devrel
chore: remove leaked marketing/devrel files (Block-paths CI red on staging)
2026-04-24 07:03:58 +00:00
rabbitblood
7b98526611 chore: remove leaked marketing/devrel/ files (block-forbidden-paths leak)
PR #1889 ("docs(blog): A2A Protocol deep-dive") landed two files under
the forbidden marketing/devrel/ path:

- marketing/devrel/phase34-platform-instructions-social-copy.md
- marketing/devrel/phase34-tool-trace-social-copy.md

The Block-forbidden-paths workflow correctly flagged both at PR-time
(run 24875689649 — failure at 06:28:20Z) but it was NOT in the required
status checks list on staging, so the PR merged anyway at 06:32:47Z.
The push-event run on staging then failed visibly (run 24875838257),
which is what surfaced this.

Two-part fix:

1. (this PR) Remove the leaked files. Authors can re-file the same
   content in Molecule-AI/internal under marketing/ if it's still needed.

2. (already done outside this PR) "Block forbidden paths" added to
   required_status_checks on staging branch protection so the next leak
   attempt gets blocked at PR-merge time, not after the fact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:01:28 -07:00
Hongming Wang
23e329aa4c
Merge pull request #1927 from Molecule-AI/feat/ci/e2e-canvas-staging-trigger
feat(ci): run E2E Staging Canvas on staging branch pushes
2026-04-24 07:01:19 +00:00
Hongming Wang
0166aaad93
Merge pull request #1988 from Molecule-AI/docs/a2a-v1-production-reference-blog
docs(blog): A2A v1.0 production reference — migration guide from 0.3.x
2026-04-24 06:57:15 +00:00
Hongming Wang
0ef5dad1b1
Merge pull request #1993 from Molecule-AI/fix/auth-redirect-loop-regression-tests
test(auth): add regression tests for redirect loop guards
2026-04-24 06:57:12 +00:00
Hongming Wang
2821b979f2
Merge pull request #1994 from Molecule-AI/fix/canvas-multilevel-layout-ux
fix(canvas): subtree-aware layout + org-import reliability + UX polish
2026-04-24 06:57:10 +00:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
Hongming Wang
1732d30f6b
Merge pull request #1889 from Molecule-AI/content/a2a-v1-deep-dive
docs(blog): A2A Protocol deep-dive — peer-to-peer, JSON-RPC, SSE, Redis key model
2026-04-23 23:32:46 -07:00
e9be12210f test(auth): add regression tests for redirect loop guards
AuthGate now skips session fetch for /cp/auth/* paths, and
redirectToLogin guards against re-setting window.location when
already on an auth path. Both guards had no test coverage —
a future refactor could silently reintroduce the redirect loop.

Added:
- AuthGate.test.tsx: 2 cases covering /cp/auth/login and
  /cp/auth/signup path skipping (no fetchSession call, no
  redirectToLogin call, children rendered)
- auth.test.ts: 2 cases covering redirectToLogin early return
  for /cp/auth/login and /cp/auth/signup paths

Fixes: Molecule-AI/molecule-core#1541

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:30:35 +00:00
molecule-ai[bot]
63c9d07a01
Merge branch 'staging' into content/a2a-v1-deep-dive 2026-04-24 06:28:16 +00:00
molecule-ai[bot]
d359b1803a
Merge branch 'staging' into docs/a2a-v1-production-reference-blog 2026-04-24 06:28:12 +00:00
molecule-ai[bot]
e4e389950f
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth (#1992)
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth

Three fixes cherry-picked from issue #1744:

1. aria-hidden on decorative SVG icons:
   - DeleteCascadeConfirmDialog.tsx: warning triangle SVG gets aria-hidden="true"
   - MissingKeysModal.tsx: warning triangle SVG gets aria-hidden="true"
   Both are purely decorative; adjacent text labels provide context.

2. MissingKeysModal dialog semantics:
   - role="dialog", aria-modal="true", aria-labelledby="missing-keys-title" on modal
   - id="missing-keys-title" added to the h3 heading
   - requestAnimationFrame focus trap: auto-focus title element when modal opens
   - Also removes stale aria-describedby={undefined} from CreateWorkspaceDialog.tsx

3. Session cookie auth for /registry/:id/peers:
   - Promotes VerifiedCPSession() fallback before the bearer token branch
   - Fixes SaaS canvas Peers tab 401 — canvas hits this endpoint via session cookie
   - Correctly returns "invalid session" for bad cookies instead of falling through
   - Self-hosted bypass logic preserved

Test fix (bundled, same branch):
   - ContextMenu keyboard test: add getState() stub to useCanvasStore mock
   - Required after ContextMenu.tsx gained a direct getState() call at line 169

Reviewed-by: Core-Security (security audit: APPROVED)
CI: Canvas CI , Platform CI , E2E API , CodeQL 

GitHub issue: #1740 (test), #1744 (a11y)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:20:32 +00:00
Hongming Wang
a2f471feed
Merge pull request #1987 from Molecule-AI/fix/e2e-pin-hermes-custom-provider
fix(e2e): pin HERMES_* env so openai/* routes deterministically
2026-04-23 22:44:25 -07:00
Hongming Wang
884fff1145 fix(e2e): pin HERMES_* env vars so openai/* routes deterministically
Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs
2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose
text is OpenRouter's error format —
  {'message': 'Missing Authentication header', 'code': 401}
(integer code, not OpenAI's string 'invalid_api_key'). template-hermes's
derive-provider.sh was picking PROVIDER=openrouter for openai/* models
despite template-hermes#19 (the fix that flips openai/* → custom when
OPENAI_API_KEY is set) having been merged 01:30Z.

Verified via probe workspaces on the staging canary tenant:
  probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape
  probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_*):
           → OpenAI's 401 shape ('code': 'invalid_api_key')

So derive-provider.sh's updates apparently aren't reaching every
staging tenant on re-provision — possibly because tenant EC2s cache
/opt/adapter from an earlier boot, or the CP's user-data snapshot
bundles a pre-fix template-hermes. That's a separate follow-up (needs
forced re-clone of /opt/adapter on every workspace boot).

This PR is the test-side workaround. Pinning the HERMES_* bridge env
vars bypasses derive-provider.sh entirely, so the test works regardless
of which template-hermes commit any given tenant happens to have on
disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:41:22 -07:00
molecule-ai[bot]
078ab61458
docs(blog): A2A v1.0 production reference — migration from 0.3.x, 6 files, 8 smoke scenarios 2026-04-24 05:33:37 +00:00
Hongming Wang
faba17a84c
Merge pull request #1917 from Molecule-AI/fix/blog-ai-agents-org-scoped-keys-missing-endpoint
fix(blog): remove fake /org/tokens/:id/logs endpoint reference (molecule-core#1914)
2026-04-23 22:12:10 -07:00
1da9759d0d Merge remote-tracking branch 'origin/staging' into fix/blog-ai-agents-org-scoped-keys-missing-endpoint 2026-04-24 05:09:39 +00:00
Hongming Wang
f4b301b4da
Merge pull request #1982 from Molecule-AI/feat/merge-queue-trigger
ci: add merge_group trigger to ci + codeql
2026-04-23 21:51:50 -07:00
rabbitblood
0cc8733f09 Merge remote-tracking branch 'origin/staging' into feat/merge-queue-trigger 2026-04-23 21:48:59 -07:00
molecule-ai[bot]
35bcad9204
feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974)
* feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009)

Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the
official migration guide from a2aproject/a2a-python.

Breaking changes applied:
- A2AStarletteApplication → Starlette route factory
  (create_agent_card_routes + create_jsonrpc_routes)
- AgentCard.url removed; url+protocol now in supported_protocols[].url
- AgentCapabilities fields renamed to snake_case
  (pushNotifications→push_notifications,
   stateTransitionHistory→state_transition_history)
- AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes
- TaskState.canceled → TaskState.TASK_STATE_CANCELED
- a2a.utils → a2a.helpers
- Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed)

Files changed:
- requirements.txt: pinned >=1.0.0,<2.0
- main.py: Starlette route factory + AgentCard restructure
- a2a_executor.py: Part() + TaskState + helpers import
- hermes_executor.py: TaskState + helpers import
- google-adk/adapter.py: TaskState + helpers import
- cli_executor.py: helpers import
- claude_sdk_executor.py: helpers import
- tests/conftest.py: a2a.helpers mock stub
- tests/test_a2a_executor.py: TaskState enum key
- adapters/google-adk/test_adapter.py: Part + helpers stub

Refs: KI-009
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED)

---------

Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-24 04:43:17 +00:00
97d15ddf35 fix(handlers/admin_queue_test): wire sqlmock to make DropStale tests pass
DropStale calls DropStaleQueueItems which reads db.DB directly. Without
setupTestDB() the global mock was nil → every query returned 500.
Adds mock expectations for the 3 happy-path sub-tests; validation-only
sub-tests (bad input) need no DB and are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:40:19 +00:00
rabbitblood
01de3ef6d2 Merge remote-tracking branch 'origin/staging' into feat/merge-queue-trigger 2026-04-23 21:34:16 -07:00
molecule-ai[bot]
01fcc9a4b6
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog, session cookie auth
* fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth

Three fixes cherry-picked from issue #1744:

1. aria-hidden on decorative SVG icons:
   - DeleteCascadeConfirmDialog.tsx: warning triangle SVG gets aria-hidden="true"
   - MissingKeysModal.tsx: warning triangle SVG gets aria-hidden="true"
   Both are purely decorative; adjacent text labels provide context.

2. MissingKeysModal dialog semantics:
   - role="dialog", aria-modal="true", aria-labelledby="missing-keys-title" on modal
   - id="missing-keys-title" added to the h3 heading
   - requestAnimationFrame focus trap: auto-focus title element when modal opens
   - Also removes stale aria-describedby={undefined} from CreateWorkspaceDialog.tsx

3. Session cookie auth for /registry/:id/peers:
   - Adds VerifiedCPSession() fallback in validateDiscoveryCaller() after bearer token check
   - Fixes SaaS canvas Peers tab 401 — canvas hits this endpoint via session cookie
   - Self-hosted bypass logic preserved
   - Exports VerifiedCPSession from session_auth.go for cross-package use

Test fix (bundled, same branch):
   - ContextMenu keyboard test: add getState() stub to useCanvasStore mock
   - Required after ContextMenu.tsx gained a direct getState() call at line 169

GitHub issue: #1740 (test), #1744 (a11y)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): remove duplicate VerifiedCPSession declaration

The branch accidentally added a second func VerifiedCPSession declaration
that shadows the real implementation, causing go build to fail with:
  internal/middleware/session_auth.go:238:6: VerifiedCPSession redeclared in this block

Remove the stub alias so the original full implementation is used directly.
The function already exports correctly for cross-package use via the
VerifiedCPSession() call in discovery.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): correct VerifiedCPSession condition in discovery.go

Fix Go build error — 'presented' was declared and not used.
The cookie fallback check was using `if ok, presented := ...; ok` instead
of `if ok, presented := ...; presented`, causing the build to fail in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): fix declared and not used 'presented' in discovery.go

Fixes Go build failure:
  discovery.go:355:10: declared and not used: presented
  discovery.go:358:6: undefined: presented

Variable shadowing in the second VerifiedCPSession call reused the outer
scope's `ok` and `presented` names, causing a compile error. Renamed to
ok2/presented2 to avoid shadowing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:30:26 +00:00
52504dd4a8 fix(handlers/admin_queue_test): remove unused bytes import
CI failure: admin_queue_test.go imports "bytes" but never uses it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:29:50 +00:00
rabbitblood
5f3508fef0 ci: add merge_group trigger to ci + codeql
Pre-work for enabling GitHub merge queue on the staging branch (#TBD
follow-up issue). Without these triggers, the queue's pre-merge CI run
on the speculative `gh-readonly-queue/...` ref would never fire, every
queued PR would show false-green for the required checks, and queue
would merge things that don't actually pass on the rebased commit.

Adding the trigger now is **a no-op** — the `merge_group` event only
fires once the queue is enabled on a branch, which is a separate UI/API
toggle. So this PR is safe to land in isolation; merge-queue enablement
is the next step and reversible at the branch-protection level.

Why these two workflows:
- `ci.yml` provides 5 of the 8 required staging checks (Detect changes,
  Platform Go, Canvas Next.js, Python Lint & Test, Shellcheck E2E)
- `codeql.yml` provides the other 3 (Analyze go / js-ts / python)

Other workflows (e2e-staging-*, canary-*, publish-*) are not required
status checks and don't need the trigger to keep the queue working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:24:53 -07:00
Hongming Wang
0576e341b9
ops(#1976): add smart-sweep script for orphan Cloudflare DNS records (#1978)
Replaces the "panic-button at >65 records" manual sweep that nukes
every pattern-match unconditionally (would delete live workspaces
along with orphans).

This version:
- Queries CP prod + staging /admin/orgs for live tenant slugs
- Queries AWS EC2 describe-instances for live workspace Name tags
- Only deletes CF records whose slug/ws-id has no live counterpart
- Dry-run by default (--execute to actually delete)
- Safety gate refuses to delete >50% of records (configurable via
  MAX_DELETE_PCT env var) — catches the "API returned zero orgs, every
  tenant looks orphan" failure mode before it nukes production
- Per-category accounting: orphan-ws / orphan-e2e-tenant / etc.

Usage:
  CF_API_TOKEN=... CF_ZONE_ID=... \
    CP_PROD_ADMIN_TOKEN=... CP_STAGING_ADMIN_TOKEN=... \
    bash scripts/ops/sweep-cf-orphans.sh           # dry-run
  bash scripts/ops/sweep-cf-orphans.sh --execute   # actually delete

Ref: #1976 (root-cause: tenant.Delete + workspace.Delete don't clean
their CF records — until that's fixed, this script is the maintenance
path)

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-24 04:19:49 +00:00
Hongming Wang
6745a61ebf
Merge pull request #1970 from Molecule-AI/fix/restore-quickstart-plus-hotfixes
fix(canvas): playability pass + UX polish (post #1897)
2026-04-23 21:08:52 -07:00
Hongming Wang
d53583f9c6 Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes 2026-04-23 21:04:55 -07:00
Hongming Wang
2d6ff11c4e fix(canvas): re-sort parents-before-children after nest mutation
React Flow requires parent nodes to appear before their children in
the nodes array. When they don't, it logs "Parent node {id} not
found. Please make sure that parent nodes are in front of their
child nodes in the nodes array" and — more importantly — renders
the child at canvas-absolute coords instead of parent-relative,
flashing it far outside the parent.

topology's buildNodesAndEdges already enforced this at hydrate, but
nestNode + batchNest weren't re-sorting after mutating parentId.
A freshly-nested child often ended up after-first-drag at the
wrong screen position because its new parent sat later in the
array than itself.

Extract sortParentsBeforeChildren() into canvas-topology as a
reusable DFS visit; call it at the tail of both nestNode's set()
and batchNest's commit set(). 923 tests still green — no behaviour
change beyond eliminating the warning and the position flash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:00:40 -07:00
Hongming Wang
2a8977c946 fix(canvas): cancel-nest also shrinks the parent back
Canceling the nest/extract dialog restored the child's position but
left the parent card at its auto-grown size. growParentsToFitChildren
fires on drag-stop to fit a then-outside child; when the drag is
subsequently cancelled, the parent keeps that grown width/height
forever because the grow pass is grow-only.

Strip width/height from the ex-parent alongside the child position
restore in cancelNest — React Flow re-measures from CSS, parent
collapses back to its natural size. Same trick nestNode already
uses for the un-nest path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:56:08 -07:00
Hongming Wang
09053dfdeb fix(canvas): cancel-nest restores position; un-nest shrinks parent
Two follow-up polish items for drag-and-nest:

1. Cancelling the "Extract from team?" dialog now snaps the
   dragged card back to where the drag started. Before, a user
   who dragged a child out, saw the confirm dialog, then clicked
   Cancel ended up with the card stranded outside the parent at
   its drop-point position — which also got persisted via
   savePosition on drag-stop. Now onNodeDragStart captures the
   pre-drag position + parent, and cancelNest restores both the
   RF node position and fires savePosition with the absolute
   pre-drag coords so reload matches.

2. Un-nesting now clears the ex-parent's explicit width/height
   in the nodes array. growParentsToFitChildren is grow-only so
   it could never shrink the parent back down after a child
   left; the card stayed at its auto-grown size with empty
   space. Stripping width/height lets React Flow re-measure from
   the card's own min-width / min-height CSS, so the parent
   visually shrinks to fit whatever children remain.

923 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:52:28 -07:00
Hongming Wang
512fdfd59d fix(canvas): plain drag out of parent un-nests again
Un-nest used to require holding Alt (or Cmd to force-detach). That
was too conservative — when a user dragged a child clearly outside
its parent's bbox, nothing happened on release, because the default
branch soft-clamped back and only the Alt branch actually opened
the "Extract?" confirm. Matches the exact bug the user just flagged
("I can put agents in other agent, but when I drag it out, it does
not move out").

New rules:
 * Past the 20 % hysteresis → confirm un-nest. Plain drag, no
   modifier. This is what most users expect (Miro / Figma behave
   the same way — drag outside the frame and the shape leaves it).
 * Inside or within 20 % of the edge → soft-clamp back inside.
   Guards against twitchy releases that momentarily overshoot the
   edge by a few pixels.
 * Cmd / Ctrl → force un-nest regardless of overlap. Escape-hatch
   for when the user dragged within the hysteresis zone but really
   wants out.
 * Dropping onto a different parent → nest there (unchanged).

Alt is no longer a required modifier for un-nesting. Keeps it as
a non-gesture modifier only; no meaning unless we re-bind it later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:48:38 -07:00