molecule-core

Author	SHA1	Message	Date
devops-engineer	a302d75129	chore(ci): retrigger Handlers Postgres Integration for second-green proof Class B verification — second consecutive green run to demonstrate the fix isn't flaky. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:23:05 -07:00
devops-engineer	241859b552	fix(ci): handlers-postgres — sidestep port collision under host-network runner Class B Hongming-owned CICD red sweep. The Handlers Postgres Integration workflow has been silently failing on staging push and PRs ever since #92 fixed the IPv6 flake — the IPv6 fix correctly pinned 127.0.0.1, but unmasked a deeper issue: with our act_runner global container.network=host config, multiple concurrent runs of this workflow each tried to bind 0.0.0.0:5432 on the operator host. The first wins; subsequent postgres service containers exit with `FATAL: could not create any TCP/IP sockets` + `Address in use`. Docker auto-removes them (act_runner sets AutoRemove:true), so by the time `Apply migrations` runs `psql`, the container is gone — Connection refused, then `failed to remove container: No such container` at cleanup time. Per-job container.network override is silently ignored by act_runner (`--network and --net in the options will be ignored.`), so we sidestep `services:` entirely. The job container still uses host-net (required for cache server discovery on the operator's bridge IP). We launch a sibling postgres on the existing molecule-monorepo-net bridge with a unique name per run (run_id+run_attempt) and connect via the bridge IP read from `docker inspect`. Verified manually on operator host 2026-05-08: 2× postgres on host-net collides, but on the bridge with unique names + different IPs, both succeed and each is reachable from a host-net job container. Adds: - always()-cleanup step so containers don't leak on test failure - Diagnostic dump now includes the postgres container's docker logs - Runbook at docs/runbooks/ documenting the substrate behavior + the pattern future workflows should adopt for any `services:`-shaped need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:21:12 -07:00
claude-ceo-assistant (Claude Opus 4.7 on Hongming's MacBook)	87b971a292	fix(ci): close 3 chronic Gitea-Actions workflow flakes (closes #88 ) Three workflows have been failing on every push to this Gitea repo for GitHub-shaped reasons that don't translate to act_runner. Surfaced while landing #84; bundled per `feedback_gitea_actions_migration_audit_pattern` ("bundle per-repo, not per-finding") instead of three separate PRs. 1) handlers-postgres-integration: localhost → 127.0.0.1 - lib/pq tries to dial localhost → ::1 first; the postgres service container only listens on IPv4 → ECONNREFUSED → all TestIntegration_* fail. Pin IPv4 to make the job deterministic. 2) pr-guards / disable-auto-merge-on-push: Gitea no-op - The previous reusable-workflow caller invoked `gh pr merge --disable-auto`, which calls GitHub's GraphQL API. Gitea returns HTTP 405 on /api/graphql → step always fails. Inline the step so it can detect Gitea (GITEA_ACTIONS=true OR repo url under moleculesai.app) and no-op with a notice. Auto-merge gating is moot on Gitea anyway: there's no `--auto` primitive being touched. Job stays ALWAYS-RUN so branch protection's required check still lands SUCCESS (avoids the SKIPPED-in-set trap from `feedback_branch_protection_check_name_parity`). 3) Harness Replays: cf-proxy nginx.conf via docker `configs:` (not bind) - act_runner runs the workflow inside a runner container; runc in the docker daemon below resolves bind-mount source paths on the OUTER host, not inside the runner. The path `/workspace/.../cf-proxy/nginx.conf` is invisible there → "not a directory" runc error. Switching to compose `configs:` packages the file as content rather than a host bind, sidestepping the DinD path-translation gap. Local validation: - YAML parsed clean for all 3 files. - cf-proxy nginx.conf: standalone `docker compose run cf-proxy nginx -T` reproduced the configs: mount end-to-end and dumped the config correctly. The full harness compose still renders via `docker compose config`. Real-CI verification will land on this branch's first push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:06:09 -07:00
Hongming Wang	debe29c889	ci(handlers-postgres-integration): apply legacy .sql migrations too The migration-replay step globbed only .up.sql, silently skipping the older flat-naming migrations (001_workspaces.sql, 009_activity_logs.sql, etc.). Fine while no integration test depended on those tables; broke when the #149 cross-table atomicity test came in needing both workspaces (FK target for activity_logs) and activity_logs themselves. Switch to globbing .sql + sorted lex-order, excluding .down.sql so up/down pairs don't undo themselves mid-run. Add a sanity check for workspaces + activity_logs + pending_uploads alongside the existing delegations gate so a future migration drift fails loud instead of silently skipping the regressed test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:02:24 -07:00
Hongming Wang	90d202c80a	ci(handlers-pg): apply all migrations with skip-on-error + sanity check (#320 ) Previous workflow applied only 049_delegations.up.sql — fragile to future migrations that touch the delegations table or any other handlers/-tested table. Operator would have to remember to update the workflow's psql -f line per migration. New behavior: loop every .up.sql in lexicographic order, apply each with ON_ERROR_STOP=1 + per-migration result captured. Failed migrations are SKIPPED rather than blocking the suite — handles the historical migrations (017_memories_fts_namespace, 042_a2a_queue, etc.) that depend on tables since renamed/dropped and can't replay from scratch. Migrations that DO succeed land their tables, which is sufficient for the integration tests in handlers/. Sanity gate at the end: if the delegations table is missing after the replay, hard-fail with a loud error. That catches a real regression where 049 itself becomes broken (e.g., schema rename), separate from the historical-broken-migration noise above. Per-migration log line ("✓" or "⊘ skipped") makes it easy to spot when a migration that SHOULD have replayed didn't. Verified locally: full migration chain runs, 049 lands, all 7 integration tests pass against the chained-migration DB. Closes #320.	2026-05-05 03:48:43 -07:00
Hongming Wang	4c9f12258d	fix(delegations): preserve result_preview through completion + add real-Postgres integration gate Two-part PR: ## Fix: result_preview was lost on completion Self-review of #2854 caught a real bug. SetStatus has a same-status replay no-op; the order of calls in `executeDelegation` completion + `UpdateStatus` completed branch clobbered the preview field: 1. updateDelegationStatus(completed, "") fires 2. inner recordLedgerStatus(completed, "", "") → SetStatus transitions dispatched → completed with preview="" 3. outer recordLedgerStatus(completed, "", responseText) → SetStatus reads current=completed, status=completed → SAME-STATUS NO-OP, never writes responseText → preview lost Confirmed against real Postgres (see integration test). Strict-sqlmock unit tests passed because they pin SQL shape, not row state. Fix: call the WITH-PREVIEW recordLedgerStatus FIRST, then updateDelegationStatus. The inner call becomes the no-op (correctly preserves the row written by the outer call). Same gap fixed in UpdateStatus handler — body.ResponsePreview was never landing in the ledger because updateDelegationStatus's nested SetStatus(completed, "", "") fired first. ## Gate: real-Postgres integration tests + CI workflow The unit-test-only workflow that shipped #2854 was the root cause. Adding two layers of defense: 1. workspace-server/internal/handlers/delegation_ledger_integration_test.go — `//go:build integration` tag, requires INTEGRATION_DB_URL env var. 4 tests: * ResultPreviewPreservedThroughCompletion (regression gate for the bug above — fires the production call sequence in fixed order and asserts row.result_preview matches) * ResultPreviewBuggyOrderIsLost (DIAGNOSTIC: confirms the same-status no-op contract works as designed; if SetStatus's semantics ever change, this test fires) * FailedTransitionCapturesErrorDetail (failure-path symmetry) * FullLifecycle_QueuedToDispatchedToCompleted (forward-only + happy path) 2. .github/workflows/handlers-postgres-integration.yml — required check on staging branch protection. Spins postgres:15 service container, applies the delegations migration, runs `go test -tags=integration` against the live DB. Always-runs + per-step gating on path filter (handlers/wsauth/migrations) so the required-check name is satisfied on PRs that don't touch relevant code. Local dev workflow (file header documents this): docker run --rm -d --name pg -e POSTGRES_PASSWORD=test -p 55432:5432 postgres:15-alpine psql ... < workspace-server/migrations/049_delegations.up.sql INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \ go test -tags=integration ./internal/handlers/ -run "^TestIntegration_" ## Why this matters Per memory `feedback_mandatory_local_e2e_before_ship`: backend PRs MUST verify against real Postgres before claiming done. sqlmock pins SQL shape; only a real DB can verify row state. The workflow makes this gate mandatory rather than optional.	2026-05-05 02:47:52 -07:00

6 Commits