Class B Hongming-owned CICD red sweep, e2e-api leg. Same substrate
hazard as PR #98 (handlers-postgres-integration) — Gitea act_runner
configures `container.network: host` operator-wide, so:
* Two concurrent e2e-api runs both attempted to bind `-p 15432:5432`
and `-p 16379:6379` on the operator host. Verified in run a7/2727
on 2026-05-07: `docker: Error response from daemon: Conflict. The
container name "/molecule-ci-redis" is already in use by container
af10f438...` — exit 125, job fails before any test runs.
* Hardcoded container names `molecule-ci-postgres` / `-redis` plus
the leading `docker rm -f` step meant a second job's startup also
KILLED the first job's still-running services.
Fix shape (mirrors PR #98 bridge-net pattern, adapted because the
platform-server is a Go binary on the host, not a containerised step):
1. Per-run unique container names: `pg-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`,
`redis-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`. Unique even across reruns
of the same run_id.
2. Ephemeral host port per run via `-p 0:5432` / `-p 0:6379` and
`docker port` lookup, exported as `DATABASE_URL` / `REDIS_URL` to
`$GITHUB_ENV`. No fixed host-port → no collision.
3. `127.0.0.1` (NOT `localhost`) in URLs — IPv6 first-resolve flake
fixed in #92 stays fixed.
4. `if: always()` cleanup so containers don't leak when test steps
fail.
Issue #94 items #2 + #3 also addressed:
* Pre-pull `alpine:latest` (provisioner uses it for ephemeral
token-write containers in `internal/handlers/container_files.go`).
* Idempotent `docker network create molecule-monorepo-net` (the
provisioner attaches workspace containers via that bridge —
`internal/provisioner/provisioner.go::DefaultNetwork`).
Issue #94 item #1 (timeouts) NOT bumped — recent log evidence shows
postgres ready in 3s, redis in 1s, platform in 1s when they DO come
up. Timeouts are not the bottleneck on the current substrate.
NOT addressed here (out of scope, separate change required):
* `Run E2E API tests` step has been failing on `Status back online`
because the platform's langgraph workspace template image
(`ghcr.io/molecule-ai/workspace-template-langgraph:latest`)
returns 403 Forbidden post-2026-05-06 GitHub org suspension. That
is a template-registry resolution issue (ADR-002 / local-build
mode) and belongs in a workspace-server change, not this workflow
file. This PR fixes the parallel-collision class and the workflow
setup hygiene; the langgraph-403 failure will still surface on
runs after this lands until template resolution is fixed
separately.
Verified manually on operator host 2026-05-08: docker now hands out
ephemeral ports on `-p 0:5432`, two parallel runs land on different
ports, both reach pg_isready GREEN.
Closes#94 (items #2 and #3; item #1 documented as not-bottleneck;
langgraph-template-403 referenced for follow-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `HongmingWang-Rabbit/hermes-agent` fork is no longer reachable on
github.com (account suspended 2026-05-06). The patched fork now lives
at https://git.moleculesai.app/molecule-ai/hermes-agent. Same SHAs,
same branches — pure URL flip.
See molecule-ai/internal#72 for the github.com fork shell decision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/upload-artifact@v4+ and download-artifact@v4+ use the GHES 3.10+
artifact protocol that Gitea Actions (act_runner v0.6 / Gitea 1.22.x)
does NOT implement. Failure cite from PR #54 run 1325 jobs/2:
::error::@actions/artifact v2.0.0+, upload-artifact@v4+ and
download-artifact@v4+ are not currently supported on GHES.
Pinned all 3 references to v3.2.2 (latest v3) at SHA-pinned form for
supply-chain hygiene, matching the existing `uses:` style in this repo.
Affected workflows:
- ci.yml (Canvas Next.js coverage upload, blocks `CI / Canvas (Next.js)`
required check on every PR — was the merge-queue blocker for #53,
#54, #69, #71, #76, #81)
- e2e-staging-canvas.yml (Playwright report + screenshots on failure)
No download-artifact callers in the repo, so v3-pin doesn't compose-break
anywhere. Drop these pins post-Gitea-1.23+ when the v4 artifact protocol
ships, or migrate to a Gitea-native action.
Closes#210.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Class F of #75 sweep. /commits/{sha}/statuses replaces unavailable workflow-runs API. 4 mapping buckets verified against synthetic+real Gitea data. Approved by security-auditor.
Two artifacts that unblock the parked follow-ups from #59:
1. scripts/edge-429-probe.sh (closes the "operator-blocked" status of
#62). An operator without CF/Vercel dashboard access can reproduce
a canvas-sized burst against a tenant subdomain and read each 429's
response shape — workspace-server bucket overflow (JSON body +
X-RateLimit-* headers) is distinguishable from CF (cf-ray) and
Vercel (x-vercel-id) by inspection of the report. Read-only,
parallel via background subshells (no GNU parallel dependency),
no credential use. Smoke-tested against example.com end-to-end.
2. docs/engineering/ratelimit-observability.md (closes the
"metric-blocked" status of #64). The existing
molecule_http_requests_total{path,status} counter + X-RateLimit-*
response headers already cover #64's acceptance criterion ("watch
metrics for two weeks"). The runbook collects the PromQL queries,
a decision tree for the re-tune (keep / per-tenant override /
change default), an alert rule template, and a hard "do not roll
ad-hoc per-bucket-key exposure" note (in-memory map includes
SHA-256 of bearer tokens — exposing it is a security review
surface, file a follow-up if needed).
Neither artifact changes runtime behaviour. Pure operational tooling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty commit to trigger CI a second consecutive time per the SOP
'verify ≥1 representative workflow per class via workflow_dispatch
or push event ... ≥2 consecutive successful runs per class'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Part of the post-#66 sweep to remove `gh` CLI dependencies that fail
silently against Gitea. Class F covers `gh run list --workflow=X
--commit=SHA` shapes — querying whether a specific workflow ran (and
how it finished) for a specific SHA.
Why this is the only call site in class F:
`gh run list` hits GitHub's `/repos/.../actions/runs` REST endpoint.
Gitea exposes ZERO endpoints under `/repos/.../actions/runs` —
verified 2026-05-07 via swagger inspection: only secrets, variables,
and runner-registration tokens live under /actions/. There's no way
to query workflow run state via the Gitea v1 API directly.
However, every Gitea Actions job DOES emit a commit status with
`context = "<Workflow Name> / <Job Name> (<event>)"` (verified
2026-05-07 by reading /repos/.../commits/{sha}/statuses on a recent
main SHA). That surface is exactly what we need: each workflow run
leg is one status row, the aggregate state encodes the run outcome,
and Gitea exposes it under `/api/v1/repos/.../commits/{sha}/statuses`
which IS available.
Affected:
`auto-promote-on-e2e.yml` (lines 172-180):
Old: `gh run list --workflow e2e-staging-saas.yml --commit $SHA
--json status,conclusion --jq ...` returning a 5-bucket string
like `completed/success` | `in_progress/none` | `none/none` |
`completed/failure` | `completed/cancelled`.
New: `curl /api/v1/repos/.../commits/$SHA/statuses` + jq filter on
contexts whose name starts with
`"E2E Staging SaaS (full lifecycle) /"`. Mapping:
0 matched contexts → "none/none" (E2E paths-
filtered out
— same as
before)
any context = pending → "in_progress/none" (defer)
any context = error|failure → "completed/failure" (abort)
all contexts = success → "completed/success" (proceed)
The `completed/cancelled` arm of the case statement becomes
unreachable: Gitea status API doesn't expose a `cancelled` state
(it has success/failure/error/pending/warning), so per-SHA
concurrency cancellations now surface as `failure` and are handled
by the failure branch. Documented in-place; the cancelled arm is
kept as defense-in-depth for any future dual-host operation.
Verification:
- Live curl against the current main SHA returns `none/none` (E2E
was paths-filtered for that change set — expected).
- Synthetic-input jq tests verify all four mapping buckets:
no contexts → "none/none"
one context = pending → "in_progress/none"
success + success → "completed/success"
success + failure → "completed/failure"
- YAML syntax validates.
Token: continues to use act_runner's GITHUB_TOKEN (per-run, repo
read scope). The `/commits/{sha}/statuses` endpoint is repo-scoped,
no extra perms needed.
Closes part of #75. Master tracking issue at #75; companion PRs:
#80 (class A — `gh pr ...`), #81 (class D — `gh api ...`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>