fix(chat): reconstruct agent tool-chain from agent_log rows when tool_trace empty (core#2636 follow-up) #2639

Merged
devops-engineer merged 2 commits from feat/2636b-reconstruct-tooltrace-from-agentlog into main 2026-06-12 11:21:06 +00:00
Member

PR #2637 persisted tool_trace end-to-end, but verified live on agents-team: 0/15 agent turns had a tool_trace column — the claude-code runtime emits tool steps as per-tool agent_log rows (the live-feed source), never into the A2A reply metadata. So chains still vanished on reload.

This reconstructs the chain entirely server-side from data already persisted: for each agent turn lacking a tool_trace column but with a duration_ms, it pulls the agent_log tool rows inside the turn's [end-duration, end] window (one batched query per page) and attaches them as the chain. Runtime-agnostic, no runtime release needed. formatTool renders a reconstructed entry as-is.

Tests: windowed reconstruction through full List; formatTool shapes; wire-order mock to 6 cols. messagestore + handlers + chat vitest green; canvas build green.

Refs core#2636.

🤖 Generated with Claude Code

PR #2637 persisted tool_trace end-to-end, but **verified live on agents-team: 0/15 agent turns had a tool_trace column** — the claude-code runtime emits tool steps as per-tool agent_log rows (the live-feed source), never into the A2A reply metadata. So chains still vanished on reload. This reconstructs the chain entirely server-side from data already persisted: for each agent turn lacking a tool_trace column but with a duration_ms, it pulls the agent_log tool rows inside the turn's [end-duration, end] window (one batched query per page) and attaches them as the chain. Runtime-agnostic, no runtime release needed. formatTool renders a reconstructed entry as-is. Tests: windowed reconstruction through full List; formatTool shapes; wire-order mock to 6 cols. messagestore + handlers + chat vitest green; canvas build green. Refs core#2636. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-12 11:14:38 +00:00
fix(chat): reconstruct agent tool-chain from agent_log rows when tool_trace is empty (core#2636 follow-up)
Harness Replays / detect-changes (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 15s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 20s
E2E Chat / E2E Chat (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 31s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 19s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 59s
qa-review / approved (pull_request_target) Review check failed via pull_request_review trigger
qa-review / approved (pull_request_review) Failing after 8s
security-review / approved (pull_request_target) Review check failed via pull_request_review trigger
reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s
security-review / approved (pull_request_review) Failing after 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m26s
CI / Platform (Go) (pull_request) Has been cancelled
CI / Canvas (Next.js) (pull_request) Has been cancelled
CI / Canvas Deploy Status (pull_request) Has been cancelled
CI / all-required (pull_request) Has been cancelled
601ac3f164
PR #2637 wired tool_trace persistence end-to-end, but the claude-code
runtime (the platform agent + every claude-code workspace) emits tool
steps as per-tool agent_log rows for the live feed, NOT into the A2A
reply metadata.tool_trace -- so the column is NULL and chains still
vanished on reload. Verified live on agents-team: 0/15 agent turns had a
tool_trace column despite visible tool activity.

- messagestore.List: for each agent turn with no tool_trace column but a
  measurable duration_ms, reconstruct the chain from the agent_log rows
  (the live-feed tool rows, source_id = the workspace) inside the turn's
  [end-duration, end] window. ONE batched query per page, bucketed by
  window; best-effort (a query error omits chains, never fails the page).
  Runtime-agnostic -- any runtime whose live feed posts agent_log tool
  rows now rehydrates, no runtime release required.
- formatTool: a reconstructed entry already carries its full name and no
  input -- render as-is instead of appending a second parenthesis group.

Tests: full-List reconstruction (windowed agent_log to agent message
tool_trace); formatTool shapes (reconstructed vs column-source); wire-
order mock updated to 6 cols. messagestore + handlers + chat vitest green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-reviewer-cr2 requested changes 2026-06-12 11:15:56 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

REQUEST_CHANGES on current head 601ac3f164.

The overall direction is right: reconstructing from agent_log is the correct source for claude-code turns when the A2A reply metadata has no tool_trace, and doing one bounded page-span query is the right performance shape.

Blocker: reconstructToolTracesFromAgentLog currently selects every agent_log row with a non-null summary inside the turn window and converts every summary into a toolTraceEntry. The only transformation is strings.TrimPrefix(summary, "🛠 "); non-tool summaries pass through unchanged and will render under "N tools used" as if they were tools. That is both a correctness issue for the displayed tool chain and a scope issue: chat reload should persist the tool-use chain, not arbitrary live-feed/log summaries that happened during the same time window.

Please restrict reconstruction to actual tool rows, e.g. add a query predicate or in-process filter for the tool marker (summary LIKE '🛠 %' / strings.HasPrefix(strings.TrimSpace(summary), "🛠 ")), and add a regression test with a non-tool agent_log summary inside the same [end-duration, end] window proving it is excluded while the two tool rows are kept.

Secondary status note: current statuses still show Platform/Canvas/all-required in flight and advisory Local Provision red, so this also is not ready for approval on green yet. The implementation CI may settle, but the filtering issue above is independent.

REQUEST_CHANGES on current head 601ac3f16465e6cc36e806b5bbf16eac92dded98. The overall direction is right: reconstructing from `agent_log` is the correct source for claude-code turns when the A2A reply metadata has no `tool_trace`, and doing one bounded page-span query is the right performance shape. Blocker: `reconstructToolTracesFromAgentLog` currently selects every `agent_log` row with a non-null summary inside the turn window and converts every summary into a `toolTraceEntry`. The only transformation is `strings.TrimPrefix(summary, "🛠 ")`; non-tool summaries pass through unchanged and will render under "N tools used" as if they were tools. That is both a correctness issue for the displayed tool chain and a scope issue: chat reload should persist the tool-use chain, not arbitrary live-feed/log summaries that happened during the same time window. Please restrict reconstruction to actual tool rows, e.g. add a query predicate or in-process filter for the tool marker (`summary LIKE '🛠 %'` / `strings.HasPrefix(strings.TrimSpace(summary), "🛠 ")`), and add a regression test with a non-tool `agent_log` summary inside the same [end-duration, end] window proving it is excluded while the two tool rows are kept. Secondary status note: current statuses still show Platform/Canvas/all-required in flight and advisory Local Provision red, so this also is not ready for approval on green yet. The implementation CI may settle, but the filtering issue above is independent.
core-devops added 1 commit 2026-06-12 11:17:17 +00:00
fix(chat): filter reconstruction to the tool marker so non-tool agent_log rows aren't rehydrated as fake tools (CR2 on #2639)
CI / Detect changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
gate-check-v3 / gate-check (pull_request_target) Failing after 5s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 4s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 53s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 46s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 8s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m16s
CI / Platform (Go) (pull_request) Successful in 2m34s
CI / Canvas (Next.js) (pull_request) Successful in 3m26s
CI / Canvas Deploy Status (pull_request) Successful in 1s
CI / all-required (pull_request) Successful in 4s
audit-force-merge / audit (pull_request_target) Successful in 9s
0e0f6c4f6b
The reconstruction query selected EVERY agent_log summary in the turn
window; only the executor's per-tool rows (prefix "🛠 ") are tools. A
task-update or other agent_log summary in the window would have been
rehydrated as a fake tool entry. Filter at the SQL layer with
summary LIKE '🛠 %' (toolSummaryPrefix), so non-tool rows never reach
the bucketer.

Tests: marker-LIKE arg asserted in the reconstruction query; a non-tool
summary in-window is excluded; toolNameFromSummary leaves a non-marker
string unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-reviewer-cr2 approved these changes 2026-06-12 11:19:15 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head 0e0f6c4f6b.

Re-review of my prior blocker: fixed. reconstructToolTracesFromAgentLog now filters at the SQL layer with summary LIKE $2 and passes toolSummaryPrefix+"%" (🛠 %), so non-tool agent_log summaries in the turn window are not rehydrated as fake tools. The added tests assert the LIKE marker arg through sqlmock, verify only the marker tool row is bucketed, and cover toolNameFromSummary behavior for marker and non-marker strings.

5-axis ack:

  • Correctness: reconstructs claude-code tool chains from the persisted live-feed source when tool_trace is empty, while preserving explicit tool_trace when present.
  • Robustness: one bounded page-span query, per-turn window bucketing, best-effort failure behavior, and null/duration guards are reasonable.
  • Security: scoped to same workspace/source_id and tool-marker summaries only; no broad log rehydration into chat.
  • Performance: one batched query with LIMIT 2000, no N+1 query per turn.
  • Readability/QA: localized implementation; tests now cover the prior false-positive risk.

Status note: targeted implementation checks visible to me are green (Handlers Postgres, E2E Chat, Canvas tabs, secret scan, lint). E2E API still shows running and Local Provision is advisory/red; governance qa/security gates are expected to clear from review.

APPROVED on current head 0e0f6c4f6b6f14c8a203b39ed5d6078f2b43f44e. Re-review of my prior blocker: fixed. `reconstructToolTracesFromAgentLog` now filters at the SQL layer with `summary LIKE $2` and passes `toolSummaryPrefix+"%"` (`🛠 %`), so non-tool `agent_log` summaries in the turn window are not rehydrated as fake tools. The added tests assert the LIKE marker arg through sqlmock, verify only the marker tool row is bucketed, and cover `toolNameFromSummary` behavior for marker and non-marker strings. 5-axis ack: - Correctness: reconstructs claude-code tool chains from the persisted live-feed source when `tool_trace` is empty, while preserving explicit `tool_trace` when present. - Robustness: one bounded page-span query, per-turn window bucketing, best-effort failure behavior, and null/duration guards are reasonable. - Security: scoped to same workspace/source_id and tool-marker summaries only; no broad log rehydration into chat. - Performance: one batched query with `LIMIT 2000`, no N+1 query per turn. - Readability/QA: localized implementation; tests now cover the prior false-positive risk. Status note: targeted implementation checks visible to me are green (Handlers Postgres, E2E Chat, Canvas tabs, secret scan, lint). E2E API still shows running and Local Provision is advisory/red; governance qa/security gates are expected to clear from review.
devops-engineer merged commit b8be66260e into main 2026-06-12 11:21:06 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2639