|
Some checks failed
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 52s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Failing after 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 43s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
Harness Replays / Harness Replays (pull_request) Failing after 31s
CI / Platform (Go) (pull_request) Failing after 2m41s
CI / Canvas (Next.js) (pull_request) Failing after 2m42s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m56s
## Symptom
Canvas detail-panel "config + filesystem load" took ~20s. Reported on
production hongming tenant, workspace c7c28c0b-... (Claude Code Agent T2).
## Two stacked latency sources
### 1. Server-side: per-call EIC tunnel setup (~80% of the win)
`workspace-server/internal/handlers/template_files_eic.go::realWithEICTunnel`
performed ssh-keygen + SendSSHPublicKey + open-tunnel + waitForPort PER call.
4 callers (read/write/list/delete) each paid the full ~3-5s setup cost even
when fired back-to-back on the same workspace EC2.
Fix: refcounted pool keyed on instanceID with TTL ≤ 50s (under the 60s
SendSSHPublicKey grant). One tunnel serves N file ops; concurrent acquires
for the same instance share the slot via a pendingSetups gate; LRU eviction
caps simultaneous tracked instances at 32. Poisons entries on tunnel-fatal
errors (connection refused, broken pipe, auth failed) so the next acquire
builds fresh. Cleanup on panic via defer-release pattern (added after
self-review caught a refcount-leak hazard).
Public API unchanged — `var withEICTunnel` rebinds to `pooledWithEICTunnel`
at package init, so all 4 callers inherit pooling for free.
10 unit tests pin: 4-ops-amortise (1 setup), different-instances-do-not-share,
TTL eviction, poison invalidates, concurrent-acquire-single-setup,
TTL=0 escape hatch, LRU eviction at cap, error classification heuristic,
refcount blocks expired eviction, panic poisons entry. All green.
### 2. Canvas-side: serial fan-out + duplicate fetch (~20% of the win)
`canvas/src/components/tabs/ConfigTab.tsx::loadConfig` awaited 3 independent
metadata GETs (`/workspaces/{id}`, `/model`, `/provider`) serially.
`AgentCardSection` fired a SECOND `/workspaces/{id}` from its own useEffect.
Fix: Promise.all over the 3 metadata GETs (each leg keeps its existing
.catch fallback semantics). AgentCardSection now reads `agentCard` from
the canvas store (`useCanvasStore`) instead of refetching — the canvas
already hydrates `node.data.agentCard` from the platform event stream.
Defensive selector handles test mocks without a `nodes` array.
## Verification
- `go test ./internal/handlers/` 5.07s green (full handlers package, including
10 new pool tests)
- `go vet ./internal/handlers/` clean
- `npx vitest run` — 1380/1380 canvas unit tests pass (2 test FILES fail on
a pre-existing xyflow CSS-load issue in vitest config, unrelated to this
change)
- `npx tsc --noEmit` clean
Live wall-time verification deferred to Phase 4 / E2E (canvas browser session
required; external probe blocked by 403 since the canvas auth chain is
session-cookie + Origin header, not a bearer token I can fabricate).
## Backwards compatibility
API surface unchanged. All 4 EIC handler callers use the rebound var; no
caller migration. Pool defaults to enabled (TTL=50s); tests can disable by
setting poolTTL=0 or by overwriting withEICTunnel directly (existing stub
pattern in template_files_eic_dispatch_test.go preserved).
## Hostile self-review (3 weakest spots)
1. `fnErrIndicatesTunnelFault` is a substring grep on err.Error() — the
marker list is hand-curated and ssh client error formats vary across
OpenSSH versions. A future ssh that reports a tunnel failure via a
phrasing not in the list would NOT poison the entry → next callers reuse
a dead tunnel until TTL evicts. Acceptable: TTL bounds the impact (≤50s
of bad reuse), and the heuristic covers every tunnel-error shape that
appears in the existing test fixtures and known incidents.
2. `acquire`'s for-loop has unbounded retry potential under pathological
churn (signal closed → new acquirer → setup fails → repeat). No bounded
retry counter. Today there is no test exercise for "flaky setup that
succeeds-then-fails-then-succeeds"; if observability ever shows this
shape, add a max-retry guard. Filed as a known limitation, not blocking.
3. The substring assertion `strings.Contains` style I used for tunnel-fault
classification could false-positive on app-level error messages that
happen to contain "permission denied" or "broken pipe" verbatim. The
classification test covers the discriminator but only against the
error shapes we know today. Acceptable: poisoning errs on the side of
building fresh, which is correct-but-slightly-slow rather than incorrect.
## Phase 4 / E2E plan
- Live timing of the canvas detail-panel open against a real workspace
(browser session, not external probe).
- Target: perceived latency under 2s on warm pool. Cold open still pays
one tunnel setup (~3-5s) — the pool buys you the SECOND through Nth
panel-open within the TTL window.
- Memory `feedback_chase_verification_to_staging` applies — will not
declare done at PR-merge; will follow through to user-visible behavior
on staging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| testdata | ||
| a2a_corpus_test.go | ||
| a2a_proxy_helpers.go | ||
| a2a_proxy_test.go | ||
| a2a_proxy.go | ||
| a2a_queue_status_test.go | ||
| a2a_queue_status.go | ||
| a2a_queue_test.go | ||
| a2a_queue.go | ||
| activity_since_id_test.go | ||
| activity_since_secs_test.go | ||
| activity_test.go | ||
| activity.go | ||
| admin_delegations_test.go | ||
| admin_delegations.go | ||
| admin_memories_cutover_test.go | ||
| admin_memories_test.go | ||
| admin_memories.go | ||
| admin_queue_test.go | ||
| admin_queue.go | ||
| admin_schedules_health_test.go | ||
| admin_schedules_health.go | ||
| admin_test_token_test.go | ||
| admin_test_token.go | ||
| admin_workspace_images_test.go | ||
| admin_workspace_images.go | ||
| agent_git_identity_test.go | ||
| agent_git_identity.go | ||
| agent_message_persistence_gate_test.go | ||
| agent_message_writer_test.go | ||
| agent_message_writer.go | ||
| agent_test.go | ||
| agent.go | ||
| approvals_test.go | ||
| approvals.go | ||
| artifacts_test.go | ||
| artifacts.go | ||
| audit_test.go | ||
| audit.go | ||
| budget_test.go | ||
| budget.go | ||
| bundle.go | ||
| channels_test.go | ||
| channels.go | ||
| chat_files_poll_test.go | ||
| chat_files_test.go | ||
| chat_files.go | ||
| chat_history_test.go | ||
| chat_history.go | ||
| checkpoints_integration_test.go | ||
| checkpoints_test.go | ||
| checkpoints.go | ||
| class1_ast_gate_test.go | ||
| config_test.go | ||
| config.go | ||
| container_files_delete_test.go | ||
| container_files_test.go | ||
| container_files.go | ||
| delegation_inbox_push_test.go | ||
| delegation_ledger_integration_test.go | ||
| delegation_ledger_test.go | ||
| delegation_ledger_writes_test.go | ||
| delegation_ledger_writes.go | ||
| delegation_ledger.go | ||
| delegation_sweeper_test.go | ||
| delegation_sweeper.go | ||
| delegation_test.go | ||
| delegation.go | ||
| derive_provider_drift_test.go | ||
| discovery_test.go | ||
| discovery.go | ||
| eic_tunnel_pool_setup.go | ||
| eic_tunnel_pool_test.go | ||
| eic_tunnel_pool.go | ||
| events_test.go | ||
| events.go | ||
| external_connection_test.go | ||
| external_connection.go | ||
| external_rotate_test.go | ||
| external_rotate.go | ||
| github_token_test.go | ||
| github_token.go | ||
| handlers_additional_test.go | ||
| handlers_extended_test.go | ||
| handlers_test.go | ||
| hermes_messages_test.go | ||
| hermes_messages.go | ||
| hibernation_test.go | ||
| instructions.go | ||
| mcp_test.go | ||
| mcp_tools_memory_legacy_shim_test.go | ||
| mcp_tools_memory_legacy_shim.go | ||
| mcp_tools_memory_v2_test.go | ||
| mcp_tools_memory_v2.go | ||
| mcp_tools.go | ||
| mcp.go | ||
| memories_test.go | ||
| memories_v2_test.go | ||
| memories_v2.go | ||
| memories.go | ||
| memory_test.go | ||
| memory.go | ||
| native_session_test.go | ||
| native_status_mgmt_test.go | ||
| org_helpers.go | ||
| org_import_force_removed_test.go | ||
| org_import_idempotency_test.go | ||
| org_import.go | ||
| org_include_test.go | ||
| org_include.go | ||
| org_path_test.go | ||
| org_plugin_allowlist_test.go | ||
| org_plugin_allowlist.go | ||
| org_prompt_ref_test.go | ||
| org_provision_concurrency_test.go | ||
| org_test.go | ||
| org_tokens_test.go | ||
| org_tokens.go | ||
| org.go | ||
| pending_uploads_integration_test.go | ||
| pending_uploads_test.go | ||
| pending_uploads.go | ||
| plugins_install_pipeline_test.go | ||
| plugins_install_pipeline.go | ||
| plugins_install.go | ||
| plugins_listing.go | ||
| plugins_sources.go | ||
| plugins_test.go | ||
| plugins.go | ||
| provlog_emit_test.go | ||
| registry_test.go | ||
| registry.go | ||
| restart_context_test.go | ||
| restart_context.go | ||
| restart_template_test.go | ||
| restart_template.go | ||
| runtime_image_pin_test.go | ||
| runtime_image_pin.go | ||
| runtime_overrides_test.go | ||
| runtime_overrides.go | ||
| runtime_provision_timeouts_test.go | ||
| runtime_provision_timeouts.go | ||
| runtime_registry_test.go | ||
| runtime_registry.go | ||
| saas_default_tier_test.go | ||
| sanitize_filename_test.go | ||
| schedules_test.go | ||
| schedules.go | ||
| secrets_test.go | ||
| secrets.go | ||
| security_regression_685_686_687_688_test.go | ||
| socket.go | ||
| sse_test.go | ||
| sse.go | ||
| ssrf_test.go | ||
| ssrf.go | ||
| template_files_eic_dispatch_test.go | ||
| template_files_eic_shells_test.go | ||
| template_files_eic_test.go | ||
| template_files_eic.go | ||
| template_import_test.go | ||
| template_import.go | ||
| templates_test.go | ||
| templates.go | ||
| terminal_diagnose_test.go | ||
| terminal_diagnose.go | ||
| terminal_test.go | ||
| terminal.go | ||
| tokens_sqlmock_test.go | ||
| tokens_test.go | ||
| tokens.go | ||
| traces_test.go | ||
| traces.go | ||
| transcript_test.go | ||
| transcript.go | ||
| viewport_test.go | ||
| viewport.go | ||
| webhooks_test.go | ||
| webhooks_workflow_test.go | ||
| webhooks.go | ||
| workspace_bootstrap_test.go | ||
| workspace_bootstrap.go | ||
| workspace_budget_test.go | ||
| workspace_crud.go | ||
| workspace_dispatchers.go | ||
| workspace_metrics_test.go | ||
| workspace_metrics.go | ||
| workspace_namespace_cleanup_test.go | ||
| workspace_preflight_test.go | ||
| workspace_preflight.go | ||
| workspace_provision_auto_test.go | ||
| workspace_provision_concurrent_repro_test.go | ||
| workspace_provision_panic_test.go | ||
| workspace_provision_shared_test.go | ||
| workspace_provision_shared.go | ||
| workspace_provision_test.go | ||
| workspace_provision.go | ||
| workspace_restart_async_test.go | ||
| workspace_restart_coalesce_test.go | ||
| workspace_restart_stop_retry_test.go | ||
| workspace_restart_test.go | ||
| workspace_restart.go | ||
| workspace_test.go | ||
| workspace.go | ||
| workspaces_insert_allowlist_test.go | ||