fix(chat): reconstruct agent tool-chain from agent_log rows when tool_trace empty (core#2636 follow-up) #2639

2026-06-12T11:14:38Z

core-devops commented

2026-06-12 11:14:38 +00:00

PR #2637 persisted tool_trace end-to-end, but verified live on agents-team: 0/15 agent turns had a tool_trace column — the claude-code runtime emits tool steps as per-tool agent_log rows (the live-feed source), never into the A2A reply metadata. So chains still vanished on reload.

This reconstructs the chain entirely server-side from data already persisted: for each agent turn lacking a tool_trace column but with a duration_ms, it pulls the agent_log tool rows inside the turn's [end-duration, end] window (one batched query per page) and attaches them as the chain. Runtime-agnostic, no runtime release needed. formatTool renders a reconstructed entry as-is.

Tests: windowed reconstruction through full List; formatTool shapes; wire-order mock to 6 cols. messagestore + handlers + chat vitest green; canvas build green.

Refs core#2636.

🤖 Generated with Claude Code

PR #2637 persisted tool_trace end-to-end, but **verified live on agents-team: 0/15 agent turns had a tool_trace column** — the claude-code runtime emits tool steps as per-tool agent_log rows (the live-feed source), never into the A2A reply metadata. So chains still vanished on reload. This reconstructs the chain entirely server-side from data already persisted: for each agent turn lacking a tool_trace column but with a duration_ms, it pulls the agent_log tool rows inside the turn's [end-duration, end] window (one batched query per page) and attaches them as the chain. Runtime-agnostic, no runtime release needed. formatTool renders a reconstructed entry as-is. Tests: windowed reconstruction through full List; formatTool shapes; wire-order mock to 6 cols. messagestore + handlers + chat vitest green; canvas build green. Refs core#2636. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

core-devops added 1 commit 2026-06-12 11:14:38 +00:00

fix(chat): reconstruct agent tool-chain from agent_log rows when tool_trace is empty (core#2636 follow-up)

Harness Replays / detect-changes (pull_request) Successful in 3s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 6s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

CI / Python Lint & Test (pull_request) Successful in 6s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s

Details

reserved-path-review / reserved-path-review (pull_request_target) Successful in 3s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 9s

Details

E2E Chat / detect-changes (pull_request) Successful in 15s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 13s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s

Details

CI / Detect changes (pull_request) Successful in 20s

Details

E2E Chat / E2E Chat (pull_request) Successful in 4s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 31s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 19s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 59s

Details

qa-review / approved (pull_request_target) Review check failed via pull_request_review trigger

qa-review / approved (pull_request_review) Failing after 8s

Details

security-review / approved (pull_request_target) Review check failed via pull_request_review trigger

reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s

Details

security-review / approved (pull_request_review) Failing after 9s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m26s

Details

CI / Platform (Go) (pull_request) Has been cancelled

Details

CI / Canvas (Next.js) (pull_request) Has been cancelled

Details

CI / Canvas Deploy Status (pull_request) Has been cancelled

Details

CI / all-required (pull_request) Has been cancelled

Details

601ac3f164

PR #2637 wired tool_trace persistence end-to-end, but the claude-code
runtime (the platform agent + every claude-code workspace) emits tool
steps as per-tool agent_log rows for the live feed, NOT into the A2A
reply metadata.tool_trace -- so the column is NULL and chains still
vanished on reload. Verified live on agents-team: 0/15 agent turns had a
tool_trace column despite visible tool activity.

- messagestore.List: for each agent turn with no tool_trace column but a
  measurable duration_ms, reconstruct the chain from the agent_log rows
  (the live-feed tool rows, source_id = the workspace) inside the turn's
  [end-duration, end] window. ONE batched query per page, bucketed by
  window; best-effort (a query error omits chains, never fails the page).
  Runtime-agnostic -- any runtime whose live feed posts agent_log tool
  rows now rehydrates, no runtime release required.
- formatTool: a reconstructed entry already carries its full name and no
  input -- render as-is instead of appending a second parenthesis group.

Tests: full-List reconstruction (windowed agent_log to agent message
tool_trace); formatTool shapes (reconstructed vs column-source); wire-
order mock updated to 6 cols. messagestore + handlers + chat vitest green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

agent-reviewer-cr2 requested changes 2026-06-12 11:15:56 +00:00

Dismissed

agent-reviewer-cr2 left a comment

REQUEST_CHANGES on current head 601ac3f164.

The overall direction is right: reconstructing from agent_log is the correct source for claude-code turns when the A2A reply metadata has no tool_trace, and doing one bounded page-span query is the right performance shape.

Blocker: reconstructToolTracesFromAgentLog currently selects every agent_log row with a non-null summary inside the turn window and converts every summary into a toolTraceEntry. The only transformation is strings.TrimPrefix(summary, "🛠 "); non-tool summaries pass through unchanged and will render under "N tools used" as if they were tools. That is both a correctness issue for the displayed tool chain and a scope issue: chat reload should persist the tool-use chain, not arbitrary live-feed/log summaries that happened during the same time window.

Please restrict reconstruction to actual tool rows, e.g. add a query predicate or in-process filter for the tool marker (summary LIKE '🛠 %' / strings.HasPrefix(strings.TrimSpace(summary), "🛠 ")), and add a regression test with a non-tool agent_log summary inside the same [end-duration, end] window proving it is excluded while the two tool rows are kept.

Secondary status note: current statuses still show Platform/Canvas/all-required in flight and advisory Local Provision red, so this also is not ready for approval on green yet. The implementation CI may settle, but the filtering issue above is independent.

REQUEST_CHANGES on current head 601ac3f16465e6cc36e806b5bbf16eac92dded98. The overall direction is right: reconstructing from `agent_log` is the correct source for claude-code turns when the A2A reply metadata has no `tool_trace`, and doing one bounded page-span query is the right performance shape. Blocker: `reconstructToolTracesFromAgentLog` currently selects every `agent_log` row with a non-null summary inside the turn window and converts every summary into a `toolTraceEntry`. The only transformation is `strings.TrimPrefix(summary, "🛠 ")`; non-tool summaries pass through unchanged and will render under "N tools used" as if they were tools. That is both a correctness issue for the displayed tool chain and a scope issue: chat reload should persist the tool-use chain, not arbitrary live-feed/log summaries that happened during the same time window. Please restrict reconstruction to actual tool rows, e.g. add a query predicate or in-process filter for the tool marker (`summary LIKE '🛠 %'` / `strings.HasPrefix(strings.TrimSpace(summary), "🛠 ")`), and add a regression test with a non-tool `agent_log` summary inside the same [end-duration, end] window proving it is excluded while the two tool rows are kept. Secondary status note: current statuses still show Platform/Canvas/all-required in flight and advisory Local Provision red, so this also is not ready for approval on green yet. The implementation CI may settle, but the filtering issue above is independent.

core-devops referenced this issue from a commit

2026-06-12 11:17:17 +00:00

fix(chat): filter reconstruction to the tool marker so non-tool agent_log rows aren't rehydrated as fake tools (CR2 on #2639)

core-devops added 1 commit 2026-06-12 11:17:17 +00:00

fix(chat): filter reconstruction to the tool marker so non-tool agent_log rows aren't rehydrated as fake tools (CR2 on #2639 )

CI / Detect changes (pull_request) Successful in 5s

Details

Harness Replays / detect-changes (pull_request) Successful in 4s

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

E2E Chat / detect-changes (pull_request) Successful in 6s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s

Details

CI / Python Lint & Test (pull_request) Successful in 8s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 5s

Details

reserved-path-review / reserved-path-review (pull_request_target) Successful in 4s

Details

E2E Chat / E2E Chat (pull_request) Successful in 4s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 9s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 15s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 16s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 53s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 46s

Details

security-review / approved (pull_request_target) Approved via pull_request_review trigger

security-review / approved (pull_request_review) Successful in 4s

Details

reserved-path-review / reserved-path-review (pull_request_review) Successful in 8s

Details

qa-review / approved (pull_request_target) Approved via pull_request_review trigger

qa-review / approved (pull_request_review) Successful in 10s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m16s

Details

CI / Platform (Go) (pull_request) Successful in 2m34s

Details

CI / Canvas (Next.js) (pull_request) Successful in 3m26s

Details

CI / Canvas Deploy Status (pull_request) Successful in 1s

Details

CI / all-required (pull_request) Successful in 4s

Details

audit-force-merge / audit (pull_request_target) Successful in 9s

Details

0e0f6c4f6b

The reconstruction query selected EVERY agent_log summary in the turn
window; only the executor's per-tool rows (prefix "🛠 ") are tools. A
task-update or other agent_log summary in the window would have been
rehydrated as a fake tool entry. Filter at the SQL layer with
summary LIKE '🛠 %' (toolSummaryPrefix), so non-tool rows never reach
the bucketer.

Tests: marker-LIKE arg asserted in the reconstruction query; a non-tool
summary in-window is excluded; toolNameFromSummary leaves a non-marker
string unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

agent-reviewer-cr2 approved these changes 2026-06-12 11:19:15 +00:00

agent-reviewer-cr2 left a comment

APPROVED on current head 0e0f6c4f6b.

Re-review of my prior blocker: fixed. reconstructToolTracesFromAgentLog now filters at the SQL layer with summary LIKE $2 and passes toolSummaryPrefix+"%" (🛠 %), so non-tool agent_log summaries in the turn window are not rehydrated as fake tools. The added tests assert the LIKE marker arg through sqlmock, verify only the marker tool row is bucketed, and cover toolNameFromSummary behavior for marker and non-marker strings.

5-axis ack:

Correctness: reconstructs claude-code tool chains from the persisted live-feed source when tool_trace is empty, while preserving explicit tool_trace when present.
Robustness: one bounded page-span query, per-turn window bucketing, best-effort failure behavior, and null/duration guards are reasonable.
Security: scoped to same workspace/source_id and tool-marker summaries only; no broad log rehydration into chat.
Performance: one batched query with LIMIT 2000, no N+1 query per turn.
Readability/QA: localized implementation; tests now cover the prior false-positive risk.

Status note: targeted implementation checks visible to me are green (Handlers Postgres, E2E Chat, Canvas tabs, secret scan, lint). E2E API still shows running and Local Provision is advisory/red; governance qa/security gates are expected to clear from review.

APPROVED on current head 0e0f6c4f6b6f14c8a203b39ed5d6078f2b43f44e. Re-review of my prior blocker: fixed. `reconstructToolTracesFromAgentLog` now filters at the SQL layer with `summary LIKE $2` and passes `toolSummaryPrefix+"%"` (`🛠 %`), so non-tool `agent_log` summaries in the turn window are not rehydrated as fake tools. The added tests assert the LIKE marker arg through sqlmock, verify only the marker tool row is bucketed, and cover `toolNameFromSummary` behavior for marker and non-marker strings. 5-axis ack: - Correctness: reconstructs claude-code tool chains from the persisted live-feed source when `tool_trace` is empty, while preserving explicit `tool_trace` when present. - Robustness: one bounded page-span query, per-turn window bucketing, best-effort failure behavior, and null/duration guards are reasonable. - Security: scoped to same workspace/source_id and tool-marker summaries only; no broad log rehydration into chat. - Performance: one batched query with `LIMIT 2000`, no N+1 query per turn. - Readability/QA: localized implementation; tests now cover the prior false-positive risk. Status note: targeted implementation checks visible to me are green (Handlers Postgres, E2E Chat, Canvas tabs, secret scan, lint). E2E API still shows running and Local Provision is advisory/red; governance qa/security gates are expected to clear from review.

devops-engineer merged commit b8be66260e into main

2026-06-12 11:21:06 +00:00

core-devops referenced this pull request

2026-06-12 11:21:41 +00:00

fix(create): recover the atomic-byok-create commit dropped from #2617 (main lifecycle e2e is red) #2640

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2639