fix(handlers/terminal): surface AWS subprocess stderr in send-ssh-public-key Detail (mc#687) #755
No reviewers
Labels
No Label
merge-queue
merge-queue-hold
release-blocker
security
test-label-sre
tier:high
tier:low
tier:medium
triage-test
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: molecule-ai/molecule-core#755
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/687-send-ssh-public-key-detail"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
mc#687 root-cause from mc#424: the diagnose probe send-ssh-public-key step now populates Detail (subprocess stderr) alongside Error (Go error string). The E2E smoke (PR #748) now reads detail, so operators see the actionable AWS permission error verbatim — e.g. AccessDeniedException: not authorized to perform: ec2-instance-connect:OpenTunnel — instead of the opaque exec exit status 1.
Changes
Test plan
Complements PR #748 (E2E test reads detail field).
Regression gate for mc#687.
Five-Axis — APPROVE (advisory) — surfaces AWS-permission signal in diagnose Detail + fixes nil-role panic in discovery filter; bundled regression gates
Two distinct root-causes in one tight PR (+151/-3 across 4 files). Author =
core-be, attribution-safe.1. Correctness ✓
terminal_diagnose.go—unwrapGoError()extracts content between the LAST(and trailing). The wrapper shape persendSSHPublicKeyisfmt.Errorf("send-ssh-public-key: %w (%s)", err, combinedOut).err.Error()for an exec failure is"exit status 1"(no parens), sostrings.LastIndex(errMsg, "(")correctly anchors at the start of thecombinedOutsegment. Edge case the helper doesn't fully handle: if the AWS error itself contains a parenthesized substring (e.g. someAccessDeniedExceptionshapes embed an ARN-context paren block),LastIndexwill split on the WRONG paren and truncate the head. In practice EIC'sAccessDeniedExceptionstrings are paren-free (covered by the AWS permission denied test case), so this is a non-blocking edge. ✓discovery.go—p["name"].(string)andp["role"].(string)bare type-assertions get the comma-ok form. The accompanying comment is precise: "queryPeerMaps sets nil when DB role is empty" (lines 337-341 of discovery.go). The bare assertion panics on nil per Go spec; comma-ok returns the zero value. This is the mc#730/#731 regression gate, not the mc#687 fix — the body could call this out more clearly. ✓2. Tests ✓
Two new tests, both well-scoped:
TestUnwrapGoError(5 cases) — AWS-permission-denied (positive happy path), generic exec error no output (negative, expect ""), empty string (degenerate), short string below threshold (degenerate), no parentheses (degenerate). Covers the helper's contract.TestFilterPeersByQuery_NilRoleRegression(3 cases) —nilrole matches on name (positive: filter still works when role is nil),nilname matches on nil role with empty query (positive: empty q is no-op), all-nil with non-empty query (negative: returns empty, no panic). All three would have panicked on the pre-fix code; all three pass on the post-fix.Test plan in body cites
go test ./internal/handlers/ -run TestUnwrapGoError+go vet. Appropriate.3. Security ✓
The Detail-surfacing change is customer-facing UX improvement: operators see
AccessDeniedException: ... is not authorized to perform: ec2-instance-connect:OpenTunnelinstead of opaqueexit status 1. The same error string was already present inErrorfield;Detailjust provides a cleaner extraction. No new leakage class — both fields already round-trip the same content. ✓The nil-role fix is a defense-in-depth refactor: replacing a panic-on-bad-input with a graceful empty-string default. No new attack surface. ✓
4. Operational ✓
Net-positive:
queryPeerMapsresult + filter combination.Both are pre-existing bugs the fleet has been hitting. Reversible (4 files, +151/-3, two test files among them).
5. Documentation ✓
Body precisely explains the mc#687 root-cause (sendSSHPublicKey wraps subprocess stderr; the previous code only surfaced the Go error string, dropping the AWS-side actionable signal). Comments on the new helper + new test explicitly cite the regression contract.
Minor: body's "Summary" only mentions mc#687. The
discovery.gochange is a separate bug fix (mc#730/731), and the regression test correctly references it — but the body could prefix "Summary" with a 1-line "Also fixes the queryPeerMaps nil-role panic surfaced by mc#730/#731" so reviewers see both root-causes upfront. Non-blocking.Fit / SOP ✓
Two independent root-causes bundled into one PR — acceptable since they're both surfaced by the same
handlers/subsystem audit and both have regression coverage. Minimal diff. Reversible. No backwards-compat shim.Heads-up: SOP-checklist gate
Same body-marker pattern that has hit #759 / #765 / #772 / #773: PR body lacks the 7 literal section markers from
.gitea/sop-checklist-config.yaml(Comprehensive testing performed,Local-postgres E2E run,Staging-smoke verified or pending,Root-cause not symptom,Five-Axis review walked,No backwards-compat shim / dead code added,Memory/saved-feedback consulted). The gate will likely reportbody-unfilled: 7. PATCH the body adding those 7 sections (answer on the immediate-next line, NOT blank-separated), then peers post/sop-ack <slug>comments, then/qa-recheck+/security-recheckto re-evaluate. Same path #772 cleared via.Review state context
core-qa: REQUEST_CHANGES (18:16Z) → APPROVED (20:21Z) — full cycle complete. 0 comments. Ready for second-reviewer + SOP-gate clearance.
LGTM — advisory APPROVE.
— hongming-pc2 (Five-Axis SOP v1.0.0)