fix(canvas): stop infinite re-render on ContextMenu mount (#1544)

fix(canvas): stop infinite re-render on ContextMenu mount
This commit is contained in:
Hongming Wang 2026-04-21 21:50:41 -07:00 committed by GitHub
commit fc27477df9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 97 additions and 17 deletions

View File

@ -1,6 +1,6 @@
"use client";
import { useCallback, useEffect, useRef, useState } from "react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import { useCanvasStore, type WorkspaceNodeData } from "@/store/canvas";
import { api } from "@/lib/api";
import { showToast } from "./Toaster";
@ -23,8 +23,15 @@ export function ContextMenu() {
const setPanelTab = useCanvasStore((s) => s.setPanelTab);
const nestNode = useCanvasStore((s) => s.nestNode);
const contextNodeId = contextMenu?.nodeId ?? null;
const children = useCanvasStore((s) =>
contextNodeId ? s.nodes.filter((n) => n.data.parentId === contextNodeId) : []
// Select the full nodes array (stable reference across unrelated store
// updates) and derive children via useMemo. Filtering inside the
// selector returned a new array every call, which Zustand's
// useSyncExternalStore saw as "snapshot changed" → schedule
// re-render → loop → React error #185. See canvas-store-snapshots.
const nodes = useCanvasStore((s) => s.nodes);
const children = useMemo(
() => (contextNodeId ? nodes.filter((n) => n.data.parentId === contextNodeId) : []),
[nodes, contextNodeId],
);
const hasChildren = children.length > 0;
const setPendingDelete = useCanvasStore((s) => s.setPendingDelete);

View File

@ -1,6 +1,8 @@
# Workspace Terminal over EIC + SSH
Tracking: [molecule-core#1528](https://github.com/Molecule-AI/molecule-core/issues/1528)
Tracking: [molecule-core#1528](https://github.com/Molecule-AI/molecule-core/issues/1528) (resolved 2026-04-22)
**Status: live in prod** on hongmingwang tenant as of 2026-04-22. Verified end-to-end against the Hermes workspace EC2.
## Problem
@ -142,25 +144,96 @@ Three more failure modes + ongoing bookkeeping per tenant. Skip unless you have
| SSH connect timeout | "tenant cannot reach workspace instance — check security group" | Yes (SG fix) |
| `docker exec` fails (no container) | "workspace container is not running — try restart" | Yes (normal ops) |
## Rollout checklist
## Rollout (verified recipe)
### 1. Infra prep (one-time)
Each AWS account (staging + prod, etc.) needs this once. The CP repo
ships `scripts/bootstrap-eic-terminal.sh` that automates everything
below — what's here is what the script does, in case you want to run
the steps by hand or audit it.
- [ ] Add IAM policy above to `molecule-cp` user (tag key is `Role`, already set by CP at launch — no CP change needed)
- [ ] Create one EIC Endpoint in the workspace VPC (see command above)
- [ ] No change to `workspaceIngressRules()` — EIC Endpoint bypasses SG ingress
### 1. Infra (one-shot)
### 2. Tenant code (this repo)
```bash
# From molecule-controlplane checkout (needs IAM admin creds):
./scripts/bootstrap-eic-terminal.sh <workspace-vpc-id> <region>
```
- [ ] PR 1 (this one): migration `038_workspace_instance_id` + persist instance_id on CP provision
- [ ] PR 2 (follow-up): terminal handler EIC + SSH branch + tests
Creates (idempotent):
- EC2 Instance Connect **service-linked role** (`AWSServiceRoleForEC2InstanceConnect`)
- **Managed IAM policy** `MoleculeEICTerminal` (DescribeInstances + SendSSHPublicKey + OpenTunnel + CreateInstanceConnectEndpoint + DescribeInstanceConnectEndpoints)
- **IAM role + instance profile** `MoleculeTenantEICRole` / `MoleculeTenantEICProfile` (attach the managed policy) — this replaces env-var AWS creds on tenant EC2s
- **EIC Endpoint** in the workspace VPC (uses the default VPC SG for egress, which is all EIC Endpoint needs)
### 3. Verification
Script prints the endpoint SG id + profile name to set on the CP:
- [ ] After PR 1 merges + deploys, provision a new CP workspace → verify `SELECT instance_id FROM workspaces` returns the EC2 id
- [ ] After PR 2 merges + deploys, open Terminal tab on a CP workspace → bash prompt appears
- [ ] Intentionally terminate the EC2 → Terminal tab shows the "instance no longer exists" message
- [ ] Pull the `ec2-instance-connect:OpenTunnel` action from molecule-cp temporarily → Terminal shows "tenant lacks EIC permission"
```
EIC_ENDPOINT_SG_ID=sg-xxxxxx
EC2_TENANT_IAM_PROFILE=MoleculeTenantEICProfile
```
### 2. CP config + redeploy
Set those two env vars on the CP service (Railway dashboard or equivalent). On redeploy, [molecule-controlplane#227](https://github.com/Molecule-AI/molecule-controlplane/pull/227) ensures every **newly-provisioned** workspace + tenant SG auto-carries a `22/tcp` ingress rule sourced from the EIC Endpoint SG.
### 3. Tenant env vars (every tenant EC2)
The tenant workspace-server container needs these env vars to verify session cookies and reach the CP. Missing any of these produces a working-looking tenant whose canvas cold-loads with `401 admin auth required` on every call — which is what broke the hongmingwang tenant on 2026-04-22 before these were set.
| Env var | Value | What breaks if missing |
|---|---|---|
| `CP_UPSTREAM_URL` | `https://api.moleculesai.app` (or your CP) | `/cp/*` paths fall through to Next.js 404 → canvas `AuthGate` infinite-redirects on login, hits browser's 431 header-limit |
| `MOLECULE_ORG_SLUG` | tenant slug, e.g. `hongmingwang` | `verifiedCPSession` returns false — session cookie never validates, every API call 401s with "admin auth required" |
| `MOLECULE_ORG_ID` | UUID of the tenant org | `tenant_guard` middleware 404s all non-`/cp/*` routes |
| `AWS_REGION` | e.g. `us-east-2` | `aws ec2-instance-connect` subprocesses default to `us-east-1` and can't find instances |
Tenants launched by CP should have `MOLECULE_ORG_ID` + `MOLECULE_ORG_SLUG` injected from the `organizations` row at provision time. If you find a tenant where these are missing, that's a CP provisioner bug, not operator error.
AWS creds are NOT on this list because the instance profile (`MoleculeTenantEICProfile` from step 1) provides them via IMDSv2 — aws-cli inside the tenant container picks them up automatically. If you still see `AWS_ACCESS_KEY_ID` env vars on a tenant, strip them and rely on the profile.
### 4. Backfill existing instances
Pre-existing SGs need one-time ingress added. The bootstrap script's final output includes this loop with the real SG id substituted; shown here for visibility — **replace `<EIC_ENDPOINT_SG_ID>` with the `sg-…` value step 1 printed**:
```bash
EIC_SG=<EIC_ENDPOINT_SG_ID> # from step 1 output
for sg in $(aws ec2 describe-security-groups --region us-east-2 \
--filters 'Name=tag:ManagedBy,Values=molecule-cp' \
--query 'SecurityGroups[].GroupId' --output text | tr '\t' '\n'); do
aws ec2 authorize-security-group-ingress --region us-east-2 \
--group-id "$sg" --protocol tcp --port 22 --source-group "$EIC_SG" \
2>&1 | grep -v DuplicatePermission || true
done
```
Note the `| tr '\t' '\n'` — aws-cli `--output text` tab-separates values within a row, which can concatenate all SG ids into a single word that breaks the for loop. Splitting to newlines is a no-op on well-behaved output and a fix on the concatenated case.
### 5. Tenant code (this monorepo)
Already merged:
- [#1531](https://github.com/Molecule-AI/molecule-core/pull/1531) — migration `038_workspace_instance_id` + persist on CP provision
- [#1533](https://github.com/Molecule-AI/molecule-core/pull/1533) — terminal handler remote branch (EIC open-tunnel + ssh + pty)
Tenant image (`ghcr.io/molecule-ai/platform-tenant:latest`) ships with `aws-cli` + `openssh-client` as of 2026-04-22.
### 6. Verification (how to confirm after deploy)
- Provision a fresh CP workspace → `SELECT instance_id FROM workspaces WHERE id = ?` is non-null
- Open canvas Terminal on that workspace → bash prompt (`ubuntu@ip-...`)
- Terminate the workspace EC2 manually → Terminal shows "EIC tunnel didn't come up"
- Temporarily remove `ec2-instance-connect:OpenTunnel` from `MoleculeEICTerminal` → Terminal shows "failed to push session key"
### Existing-workspace backfill of `instance_id`
Migrations run on tenant boot, but pre-existing workspace rows have NULL `instance_id`. The CP provisioner only writes `instance_id` on NEW provisions; old workspaces need:
```sql
-- Inside the tenant DB
UPDATE workspaces SET instance_id = '<i-xxx from DescribeInstances by tag WorkspaceID>', updated_at = now()
WHERE id = '<workspace-uuid>';
```
For a whole fleet, join CP's workspace table with the DescribeInstances result by `WorkspaceID` tag and batch-UPDATE.
## Future work (not in scope)