fix(canvas): stop infinite re-render on ContextMenu mount (#1544)

fix(canvas): stop infinite re-render on ContextMenu mount
2026-04-21 21:50:41 -07:00 · 2026-04-21 21:50:41 -07:00 · fc27477df9
commit fc27477df9
parent 3820a0cc5b e88ab70251
2 changed files with 97 additions and 17 deletions
--- a/canvas/src/components/ContextMenu.tsx
+++ b/canvas/src/components/ContextMenu.tsx
@ -1,6 +1,6 @@
 "use client";

-import { useCallback, useEffect, useRef, useState } from "react";
+import { useCallback, useEffect, useMemo, useRef, useState } from "react";
 import { useCanvasStore, type WorkspaceNodeData } from "@/store/canvas";
 import { api } from "@/lib/api";
 import { showToast } from "./Toaster";
@ -23,8 +23,15 @@ export function ContextMenu() {
  const setPanelTab = useCanvasStore((s) => s.setPanelTab);
  const nestNode = useCanvasStore((s) => s.nestNode);
  const contextNodeId = contextMenu?.nodeId ?? null;
-  const children = useCanvasStore((s) =>
-    contextNodeId ? s.nodes.filter((n) => n.data.parentId === contextNodeId) : []
+  // Select the full nodes array (stable reference across unrelated store
+  // updates) and derive children via useMemo. Filtering inside the
+  // selector returned a new array every call, which Zustand's
+  // useSyncExternalStore saw as "snapshot changed" → schedule
+  // re-render → loop → React error #185. See canvas-store-snapshots.
+  const nodes = useCanvasStore((s) => s.nodes);
+  const children = useMemo(
+    () => (contextNodeId ? nodes.filter((n) => n.data.parentId === contextNodeId) : []),
+    [nodes, contextNodeId],
  );
  const hasChildren = children.length > 0;
  const setPendingDelete = useCanvasStore((s) => s.setPendingDelete);
--- a/docs/infra/workspace-terminal.md
+++ b/docs/infra/workspace-terminal.md
@ -1,6 +1,8 @@
 # Workspace Terminal over EIC + SSH

-Tracking: [molecule-core#1528](https://github.com/Molecule-AI/molecule-core/issues/1528)
+Tracking: [molecule-core#1528](https://github.com/Molecule-AI/molecule-core/issues/1528) (resolved 2026-04-22)
+
+**Status: live in prod** on hongmingwang tenant as of 2026-04-22. Verified end-to-end against the Hermes workspace EC2.

 ## Problem

@ -142,25 +144,96 @@ Three more failure modes + ongoing bookkeeping per tenant. Skip unless you have
 | SSH connect timeout | "tenant cannot reach workspace instance — check security group" | Yes (SG fix) |
 | `docker exec` fails (no container) | "workspace container is not running — try restart" | Yes (normal ops) |

-## Rollout checklist
+## Rollout (verified recipe)

-### 1. Infra prep (one-time)
+Each AWS account (staging + prod, etc.) needs this once. The CP repo
+ships `scripts/bootstrap-eic-terminal.sh` that automates everything
+below — what's here is what the script does, in case you want to run
+the steps by hand or audit it.

- [ ] Add IAM policy above to `molecule-cp` user (tag key is `Role`, already set by CP at launch — no CP change needed)
- [ ] Create one EIC Endpoint in the workspace VPC (see command above)
- [ ] No change to `workspaceIngressRules()` — EIC Endpoint bypasses SG ingress
+### 1. Infra (one-shot)

-### 2. Tenant code (this repo)
+```bash
+# From molecule-controlplane checkout (needs IAM admin creds):
+./scripts/bootstrap-eic-terminal.sh <workspace-vpc-id> <region>
+```

- [ ] PR 1 (this one): migration `038_workspace_instance_id` + persist instance_id on CP provision
- [ ] PR 2 (follow-up): terminal handler EIC + SSH branch + tests
+Creates (idempotent):
+- EC2 Instance Connect **service-linked role** (`AWSServiceRoleForEC2InstanceConnect`)
+- **Managed IAM policy** `MoleculeEICTerminal` (DescribeInstances + SendSSHPublicKey + OpenTunnel + CreateInstanceConnectEndpoint + DescribeInstanceConnectEndpoints)
+- **IAM role + instance profile** `MoleculeTenantEICRole` / `MoleculeTenantEICProfile` (attach the managed policy) — this replaces env-var AWS creds on tenant EC2s
+- **EIC Endpoint** in the workspace VPC (uses the default VPC SG for egress, which is all EIC Endpoint needs)

-### 3. Verification
+Script prints the endpoint SG id + profile name to set on the CP:

- [ ] After PR 1 merges + deploys, provision a new CP workspace → verify `SELECT instance_id FROM workspaces` returns the EC2 id
- [ ] After PR 2 merges + deploys, open Terminal tab on a CP workspace → bash prompt appears
- [ ] Intentionally terminate the EC2 → Terminal tab shows the "instance no longer exists" message
- [ ] Pull the `ec2-instance-connect:OpenTunnel` action from molecule-cp temporarily → Terminal shows "tenant lacks EIC permission"
+```
+EIC_ENDPOINT_SG_ID=sg-xxxxxx
+EC2_TENANT_IAM_PROFILE=MoleculeTenantEICProfile
+```
+
+### 2. CP config + redeploy
+
+Set those two env vars on the CP service (Railway dashboard or equivalent). On redeploy, [molecule-controlplane#227](https://github.com/Molecule-AI/molecule-controlplane/pull/227) ensures every **newly-provisioned** workspace + tenant SG auto-carries a `22/tcp` ingress rule sourced from the EIC Endpoint SG.
+
+### 3. Tenant env vars (every tenant EC2)
+
+The tenant workspace-server container needs these env vars to verify session cookies and reach the CP. Missing any of these produces a working-looking tenant whose canvas cold-loads with `401 admin auth required` on every call — which is what broke the hongmingwang tenant on 2026-04-22 before these were set.
+
+| Env var | Value | What breaks if missing |
+|---|---|---|
+| `CP_UPSTREAM_URL` | `https://api.moleculesai.app` (or your CP) | `/cp/*` paths fall through to Next.js 404 → canvas `AuthGate` infinite-redirects on login, hits browser's 431 header-limit |
+| `MOLECULE_ORG_SLUG` | tenant slug, e.g. `hongmingwang` | `verifiedCPSession` returns false — session cookie never validates, every API call 401s with "admin auth required" |
+| `MOLECULE_ORG_ID` | UUID of the tenant org | `tenant_guard` middleware 404s all non-`/cp/*` routes |
+| `AWS_REGION` | e.g. `us-east-2` | `aws ec2-instance-connect` subprocesses default to `us-east-1` and can't find instances |
+
+Tenants launched by CP should have `MOLECULE_ORG_ID` + `MOLECULE_ORG_SLUG` injected from the `organizations` row at provision time. If you find a tenant where these are missing, that's a CP provisioner bug, not operator error.
+
+AWS creds are NOT on this list because the instance profile (`MoleculeTenantEICProfile` from step 1) provides them via IMDSv2 — aws-cli inside the tenant container picks them up automatically. If you still see `AWS_ACCESS_KEY_ID` env vars on a tenant, strip them and rely on the profile.
+
+### 4. Backfill existing instances
+
+Pre-existing SGs need one-time ingress added. The bootstrap script's final output includes this loop with the real SG id substituted; shown here for visibility — **replace `<EIC_ENDPOINT_SG_ID>` with the `sg-…` value step 1 printed**:
+
+```bash
+EIC_SG=<EIC_ENDPOINT_SG_ID>  # from step 1 output
+
+for sg in $(aws ec2 describe-security-groups --region us-east-2 \
+    --filters 'Name=tag:ManagedBy,Values=molecule-cp' \
+    --query 'SecurityGroups[].GroupId' --output text | tr '\t' '\n'); do
+  aws ec2 authorize-security-group-ingress --region us-east-2 \
+    --group-id "$sg" --protocol tcp --port 22 --source-group "$EIC_SG" \
+    2>&1 | grep -v DuplicatePermission || true
+done
+```
+
+Note the `| tr '\t' '\n'` — aws-cli `--output text` tab-separates values within a row, which can concatenate all SG ids into a single word that breaks the for loop. Splitting to newlines is a no-op on well-behaved output and a fix on the concatenated case.
+
+### 5. Tenant code (this monorepo)
+
+Already merged:
+- [#1531](https://github.com/Molecule-AI/molecule-core/pull/1531) — migration `038_workspace_instance_id` + persist on CP provision
+- [#1533](https://github.com/Molecule-AI/molecule-core/pull/1533) — terminal handler remote branch (EIC open-tunnel + ssh + pty)
+
+Tenant image (`ghcr.io/molecule-ai/platform-tenant:latest`) ships with `aws-cli` + `openssh-client` as of 2026-04-22.
+
+### 6. Verification (how to confirm after deploy)
+
+- Provision a fresh CP workspace → `SELECT instance_id FROM workspaces WHERE id = ?` is non-null
+- Open canvas Terminal on that workspace → bash prompt (`ubuntu@ip-...`)
+- Terminate the workspace EC2 manually → Terminal shows "EIC tunnel didn't come up"
+- Temporarily remove `ec2-instance-connect:OpenTunnel` from `MoleculeEICTerminal` → Terminal shows "failed to push session key"
+
+### Existing-workspace backfill of `instance_id`
+
+Migrations run on tenant boot, but pre-existing workspace rows have NULL `instance_id`. The CP provisioner only writes `instance_id` on NEW provisions; old workspaces need:
+
+```sql
+-- Inside the tenant DB
+UPDATE workspaces SET instance_id = '<i-xxx from DescribeInstances by tag WorkspaceID>', updated_at = now()
+WHERE id = '<workspace-uuid>';
+```
+
+For a whole fleet, join CP's workspace table with the DescribeInstances result by `WorkspaceID` tag and batch-UPDATE.

 ## Future work (not in scope)