molecule-core

History

Hongming Wang fa9e29f2f5 fix(canary): reframe smoke prompt to give GPT-4o explicit permission to echo Canary started flaking 2026-05-01 22:11 with model-refusal replies: - "I'm unable to do that." - "I'm unable to fulfill that request. Can I assist you with anything else?" - "I'm unable to reply with responses that don't allow me to fulfill tasks…" 3 fails / 10 recent runs ≈ 30% flake. Trigger: 2026-04-30's Platform Capabilities preamble (#2332) added the directive "Use them proactively" to the top of every system prompt. Combined with the heavy A2A + HMA tool docs further down, the model reads the contrived bare-echo prompt ("Reply with exactly: PONG") as out-of-role and intermittently refuses. Real user prompts don't hit this — only the synthetic smoke prompt does, so the right fix is in the canary's prompt phrasing, not the platform's system prompt (which is correctly priming agents toward tool use). New phrasing explicitly tells the model "this is a smoke test" and "no tools or memory are needed" so it has permission to comply. Also updates the child workspace's CHILD_PONG prompt with the same framing — same failure mode would have hit it once full-mode runs again. No code change to system prompt, no test infra change. Just two prompt strings + a load-bearing comment so future readers don't trim back to the brittle phrasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-01 23:53:24 -07:00
..
_extract_token.py	chore: apply round-7 review nits	2026-04-13 17:08:45 -07:00
_lib.sh	feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6 )	2026-04-14 09:35:26 -07:00
STAGING_SAAS_E2E.md	feat(e2e): pivot to admin-bearer-only auth + add sanity self-check workflow	2026-04-21 04:34:11 -07:00
test_2307_peer_visibility_staging.sh	test(e2e): add staging peer-visibility harness for #2307	2026-04-29 13:26:24 -07:00
test_a2a_e2e.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
test_activity_e2e.sh	chore: apply code-review round-6 suggestions	2026-04-13 17:08:45 -07:00
test_api.sh	fix(e2e): stop asserting current_task on public workspace GET (#966 )	2026-04-19 02:19:15 -07:00
test_chat_attachments_e2e.sh	feat(canvas+platform): chat attachments, model selection, deploy/delete UX	2026-04-24 13:27:51 -07:00
test_chat_attachments_multiruntime_e2e.sh	feat(canvas+platform): chat attachments, model selection, deploy/delete UX	2026-04-24 13:27:51 -07:00
test_chat_upload_e2e.sh	feat(chat_files): rewrite Upload as HTTP-forward to workspace (RFC #2312 , PR-C)	2026-04-29 14:26:37 -07:00
test_claude_code_e2e.sh	chore: final open-source cleanup — binary, stale paths, private refs	2026-04-18 00:38:55 -07:00
test_comprehensive_e2e.sh	fix(e2e): make provisioning-status assertions robust to CI environment	2026-04-13 17:31:07 -07:00
test_dev_mode.sh	fix(quickstart): hotfixes discovered during live testing session	2026-04-23 14:57:18 -07:00
test_harness_rc_normalization.sh	fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159 )	2026-04-27 02:55:44 -07:00
test_notify_attachments_e2e.sh	test(notify): pre-sweep prior workspaces so interrupted runs don't pile up	2026-04-26 20:55:13 -07:00
test_poll_mode_e2e.sh	fix(e2e): use real UUIDs for poll-mode test workspace ids	2026-04-29 23:10:36 -07:00
test_priority_runtimes_e2e.sh	feat(e2e): extend priority-runtimes test to cover all 8 templates	2026-04-27 05:57:59 -07:00
test_saas_tenant.sh	chore: final open-source cleanup — binary, stale paths, private refs	2026-04-18 00:38:55 -07:00
test_staging_external_runtime.sh	test(e2e): read delivery_mode from register response, not GET	2026-04-30 10:35:21 -07:00
test_staging_full_saas.sh	fix(canary): reframe smoke prompt to give GPT-4o explicit permission to echo	2026-05-01 23:53:24 -07:00