Commit Graph

5317 Commits

Author SHA1 Message Date
f1a705271a ci: re-trigger CI after E2E completion 2026-05-11 21:15:49 +00:00
c3274a2af7 ci: re-trigger CI checks (3rd attempt) 2026-05-11 21:15:49 +00:00
afadfad07e ci: re-trigger CI checks 2026-05-11 21:15:49 +00:00
4ff8b969b0 ci: trigger re-run of CI checks after flaky failures
The Go + Postgres + E2E checks failed on the first attempt with
"Failing after 2-3m" — consistent with operational flakiness rather
than code failures (PR only touches org.go org import logic, unrelated
to the failing handlers).
2026-05-11 21:15:49 +00:00
f0021d630a fix(pendinguploads): use 100ms ticker in TickerFiresAdditionalCycles test
TestStartSweeperWithInterval_TickerFiresAdditionalCycles was flaky on
loaded CI runners because it called StartSweeperForTest, which passes
SweepInterval (5 minutes) as the ticker interval. The test expects ≥2
cycles in a 2-second window, but a 5-minute ticker fires 0-1 times
under CPU contention, causing "waited 2s for 2 sweep cycles, got 1".

Fix: call StartSweeperWithIntervalForTest directly with a 100ms ticker
interval, which is the intended test-harness pattern (per the export_test
comment). The done-channel teardown (cancel + <-done) is preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 21:15:49 +00:00
4dc4790849 ci: trigger fresh CI run for log diagnostics 2026-05-11 21:15:49 +00:00
963995acbd ci: trigger CI re-run 2026-05-11 21:15:49 +00:00
2e4f4ecda6 ci: re-trigger CI to get fresh logs 2026-05-11 21:15:49 +00:00
483aa950e8 ci: trigger CI (5th attempt) 2026-05-11 21:15:49 +00:00
a0853cbe14 ci: re-trigger CI after E2E completion 2026-05-11 21:15:49 +00:00
d24633872e ci: re-trigger CI checks (3rd attempt) 2026-05-11 21:15:49 +00:00
437d24906b ci: re-trigger CI checks 2026-05-11 21:15:49 +00:00
36c0a662f0 fix(org): convert map[string]string to map[string]struct{} before IsSatisfied call
loadWorkspaceEnv returns map[string]string but EnvRequirement.IsSatisfied
expects map[string]struct{}. Without this conversion the Go compiler
rejects the call, causing CI / Platform (Go) to fail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 21:15:49 +00:00
b0a5d3c25d ci: trigger re-run of CI checks after flaky failures
The Go + Postgres + E2E checks failed on the first attempt with
"Failing after 2-3m" — consistent with operational flakiness rather
than code failures (PR only touches org.go org import logic, unrelated
to the failing handlers).
2026-05-11 21:15:49 +00:00
e8af1df261 fix(org): add per-workspace RequiredEnv preflight check (#232)
Before returning 201 on /org/import, verify that every RequiredEnv
declared at the workspace level is covered by either:

(a) a global secret key (already validated by the existing preflight)
(b) a key present in the workspace's .env files (org root .env +
    per-workspace <files_dir>/.env), matching the resolution order
    used by createWorkspaceTree at runtime

Previously, collectOrgEnv correctly walked all
tmpl.Workspaces[].RequiredEnv and added them to the global preflight
check, but loadConfiguredGlobalSecretKeys only checked global_secrets.
Workspace-specific .env files are injected into workspace_secrets AFTER
the 201 response, so an unsatisfied per-workspace RequiredEnv returned
201 and the workspace came up NOT CONFIGURED — breaking on every LLM
call with no signal to the operator.

Changes:
- org_import.go: add PerWorkspaceUnsatisfied struct +
  collectPerWorkspaceUnsatisfied (mirrors createWorkspaceTree's
  three-source .env resolution stack)
- org.go: after the global preflight block, call
  collectPerWorkspaceUnsatisfied if orgBaseDir != ""; return 412
  with per-workspace details before creating any workspaces
- org_workspace_required_env_test.go: 8 unit tests covering global
  coverage, .env coverage, missing keys, any-of groups, nested
  children, empty orgBaseDir, and multiple workspaces

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 21:15:49 +00:00
ef0164250d Merge pull request 'fix(sre): gate-check-v3 remove combined_state self-referential fallback' (#564) from sre/fix-gate-check-v3-combined-state-loop into main 2026-05-11 21:09:39 +00:00
6d66e854cf fix(sre): gate-check-v3 remove combined_state self-referential fallback
The `elif ci_state == "failure"` fallback in signal_6_ci was creating a
self-referential failure loop: gate-check posts failure → combined_state
becomes failure → script re-blocks → posts failure again.

Root cause: combined_state is Gitea's aggregate over ALL commit statuses,
including gate-check-v3's own prior result. Using it as a fallback verdict
driver means the script gates on its own output.

Fix: remove the combined_state fallback. check_statuses already excludes
gate-check (Bug-1 fix from PR #547). Use failing_required as the sole
CI gate. If no required checks are defined on the branch, return CLEAR
rather than re-using combined_state which includes our own status.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 21:07:03 +00:00
0006aa168a Merge pull request 'test(ci): add bats integration tests for review-check.sh (#540)' (#552) from ci/540-review-check-bats-tests into main 2026-05-11 20:58:04 +00:00
b575ab8266 Merge branch 'main' into ci/540-review-check-bats-tests 2026-05-11 20:45:21 +00:00
3974f88925 Merge pull request 'fix(ci): publish-runtime-autobump bump-and-tag always-skipped (internal#327)' (#563) from fix/publish-runtime-autobump-push-condition into main 2026-05-11 20:44:20 +00:00
8a7ca8ed33 fix(ci): publish-runtime-autobump bump-and-tag condition is always-skipped
`if: github.event.pull_request.base.ref == ''` was meant to gate
bump-and-tag to push events (not pull_request events which route to
pr-validate).  However, on a PR-merge push in Gitea Actions, the
pull_request context is still attached with base.ref='main', so the
condition always evaluated to false and bump-and-tag was permanently
skipped.

Fix: replace with `if: github.event_name == 'push'` which correctly
fires only on branch pushes after the PR is merged.

Also add `workflow_dispatch` trigger so the workflow can be manually
dispatched when the Gitea Actions API (/actions/*) is unreachable
(act_runner 404 on Gitea 1.22.6 — internal#327).

Closes internal#327.
2026-05-11 20:41:57 +00:00
43cc27ade5 test(ci): add bats-style integration tests for review-check.sh (#540)
Add 13 test cases (22 assertions) covering all key paths:
- open/closed PR handling
- non-author APPROVED review detection
- dismissed review exclusion
- team membership probe (204 member, 404 not-member, 403 fail-closed)
- missing GITEA_TOKEN exits 1
- CURL_AUTH_FILE mode 600 and header format
- jq filter correctness

Uses a Python HTTP fixture server that reads scenario from a temp
state dir, with a curl shim rewriting https://fixture.local/* to
http://127.0.0.1:{port}/*.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:33:14 +00:00
d53b7fecc0 Merge pull request 'ci: verify publish-runtime pipeline end-to-end (internal#327)' (#560) from ci/558-verify-publish-runtime-marker into main 2026-05-11 20:31:31 +00:00
a92839e39a ci: verify publish-runtime pipeline end-to-end (internal#327)
Marker file triggers workspace/** path filter on publish-runtime-autobump.yml,
exercising the full runtime publish pipeline after publish-runtime-bot
provisioning + stale-tag resolution.

Acceptance: bump-and-tag green, tag exists, publish-runtime.yml green,
PyPI updated, 9 template repos updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:26:55 +00:00
815dc7e1eb Merge pull request 'feat(ci): add OCI labels + buildx to publish workflow (#554)' (#559) from ci/554-oci-labels-publish-workflow into main 2026-05-11 20:15:31 +00:00
4045fa4fec feat(ci): add OCI labels + buildx to publish-workspace-server-image.yml (#554)
Add all 4 OCI provenance labels (RFC internal#229 §X step 4 PR-1):
- org.opencontainers.image.source — fixed from github.com → git.moleculesai.app
- org.opencontainers.image.revision — GIT_SHA
- org.opencontainers.image.created — ISO-8601 UTC timestamp
- molecule.workflow.run_id — GITHUB_RUN_ID

Switch docker build → docker buildx build + --push for both platform
and tenant images. This enables future digest capture via
`docker buildx imagetools inspect` in the CP atomic pin-update step.

Uses pinned docker/setup-buildx-action@v4.0.0 (same version as
publish-canvas-image.yml). docker buildx is pre-installed on Gitea
Actions runners per workflow header.

Part 1 of 2 for #554. Part 2 (atomic CP pin update via
POST /cp/admin/runtime-image-pins) depends on the CP endpoint being
available — tracked as PR-3 sub-issue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:04:19 +00:00
982dac0904 Merge pull request 'fix(ci): ci-required-drift uses scoped mc-drift-bot token (mirrors controlplane)' (#557) from infra/drift-bot-token into main 2026-05-11 19:56:36 +00:00
02aed70291 fix(ci): ci-required-drift uses scoped mc-drift-bot token (mirrors controlplane)
Companion to molecule-controlplane PR#134. The `ci-required-drift`
detector calls GET /repos/{owner}/{repo}/branch_protections/{branch},
which Gitea 1.22.6 gates behind the repo-ADMIN role. The previous
fallback chain (`secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN`)
had only read or write — neither admin — so drift runs would 403.

Switch to `secrets.DRIFT_BOT_TOKEN`, owned by the new least-privilege
`mc-drift-bot` persona (team: drift-bot, permission: admin, scope:
read:repository,write:issue,read:organization, repos: this + CP).

Note: this repo's drift detector additionally requires the
`all-required` sentinel job in ci.yml, which is being added in PR#553.
After both PRs merge the drift workflow will be fully green.

Audit trail in internal#329. Sibling pattern: internal#327
(publish-runtime-bot). Per feedback_per_agent_gitea_identity_default.
2026-05-11 12:47:51 -07:00
9558b7d8fb Merge pull request 'feat(ci): add all-required sentinel job (RFC#219 Phase 4 / closes internal#286)' (#553) from infra/rfc-219-phase-4-all-required-sentinel into main 2026-05-11 19:45:59 +00:00
22a1752eb3 feat(ci): add all-required sentinel job (RFC#219 Phase 4 / closes internal#286)
Adds the `all-required` aggregator sentinel job to .gitea/workflows/ci.yml,
mirroring the molecule-controlplane Phase 2a impl. The sentinel needs every
non-event-gated job (changes, platform-build, canvas-build, shellcheck,
python-lint) and asserts result==success per dep so skipped-as-green can't
sneak through.

Two immediate effects:
  1. .gitea/workflows/ci-required-drift.yml stops hard-failing with exit 3
     on the missing sentinel (see comment lines 26-31 of that workflow).
  2. Branch protection can now (Step 5 follow-up, separate PR per
     feedback_never_admin_merge_bypass) point status_check_contexts at the
     single 'ci / all-required (pull_request)' name and CI churn underneath
     no longer requires protection edits.

NOT in this PR (deferred Step 5 follow-up):
  - PATCH branch_protections/main to add 'ci / all-required (pull_request)'
    to status_check_contexts — Owners-tier change, separate PR.
  - Mirror the same context into audit-force-merge.yml REQUIRED_CHECKS env
    (RFC §6 — drift detector F3 will flag if the two diverge).

Refs:
  - internal#219 (parent RFC, §2 Aggregator sentinel)
  - internal#286 (Phase 4 emergency bump — 2026-05-11 broken-merge evidence)
  - molecule-controlplane Phase 2a (reference impl, CP PR#112)
  - feedback_phantom_required_check_after_gitea_migration (incident class)
  - feedback_path_filtered_workflow_cant_be_required (sentinel has no
    paths: filter; fires on every push/PR per RFC §2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:44:52 +00:00
03da3a5ccd Merge pull request 'fix(ci)(security): revert gate-check-v3 checkout to base SHA (#551)' (#556) from ci/551-gate-checkout-trusted-ref into main 2026-05-11 19:41:41 +00:00
f36052b0ff fix(ci)(security): revert gate-check-v3 checkout to base SHA (internal#116 footgun)
pull_request_target runs with the repo's secrets-context. Checking out
github.event.pull_request.head.sha means a PR that modifies
tools/gate-check-v3/gate_check.py executes that modified script with
secrets. This is the canonical pull_request_target footgun.

Fix: checkout base SHA instead of head SHA for pull_request_target events.
Bug-1 (self-loop exclusion) and Bug-3 (403→exit0) from #547 are kept;
only the checkout-ref regresses to the pre-#547 base-branch behavior.

Refs: #551, internal#116, RFC#324 A4, feedback_pull_request_target_workflow_from_base

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 19:35:50 +00:00
6a49bb3a77 Merge pull request 'fix(ci)(security): stop token appearing in curl argv (#541)' (#549) from fix/541-token-argv-security into main 2026-05-11 19:32:05 +00:00
c7d5089586 fix(ci)(security): stop token appearing in curl argv (#541)
Token (especially long-lived RFC_324_TEAM_READ_TOKEN org-secret)
passed via -H "Authorization: token ${TOKEN}" is visible in
/proc/<pid>/cmdline and ps -ef on the runner host.

Fix: write token to a mode-600 temp file and pass it to curl via
-K (curl config file). The token never appears in the argv of any
process; curl reads it from the fd-backed file.

Affected:
- .gitea/scripts/review-check.sh: CURL_AUTH_FILE + -K on all 3 curl calls
- .gitea/workflows/qa-review.yml: privilege-check inline curl
- .gitea/workflows/security-review.yml: privilege-check inline curl

Fixes: #541
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 19:30:22 +00:00
ba6ddd3c19 Merge pull request 'fix(ci): gate-check-v3 — 3 bug fixes (self-loop, base ref, 403 comment)' (#547) from sre/fix-gate-check-v3-bugs into main 2026-05-11 19:26:55 +00:00
2843d6214c fix(ci): gate-check-v3 workflow uses PR branch (head) for script
The gate-check job now checks out github.event.pull_request.head.sha
instead of base.sha. This ensures that script fixes in PR branches
(e.g. the self-loop exclusion in signal_6_ci) are actually used when
evaluating that PR.

Security note: this job only runs the read-only gate-check script
(API reads + JSON stdout) and has continue-on-error: true, so
running PR-branch code here carries minimal risk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 19:26:23 +00:00
f5f27cb870 fix(ci): gate-check-v3 — 3 bug fixes
Bug 1 (self-referential failure loop, #544):
  signal_6_ci now filters out its own prior status from
  check_statuses before evaluating, preventing a
  gate-check-v3 → failure → re-reads self → failure cycle.

Bug 2 (hardcoded base branch, #544):
  signal_6_ci now uses the PR's actual base branch ref
  instead of hardcoded 'main'. Caller passes PR data to
  avoid redundant API call.

Bug 3 (comment-post 403, #543):
  Wrapped POST/PATCH comment-post in try/except for
  HTTPError 403. Logs a warning and skips posting when
  the token lacks write:repository scope — verdict still
  drives exit code correctly.

Also removed 3 lines of dead code at the end of
format_comment (unreachable return after prior return).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 19:26:23 +00:00
d5114fdbef Merge pull request 'fix(workspace): wrap delegate_task return with sanitize_a2a_result (CWE-117, closes #537)' (#542) from fix/537-cwe117-a2a-tools-sanitize into main 2026-05-11 19:14:34 +00:00
Molecule AI Core Platform Lead
6d5fd6be3e fix(workspace): wrap delegate_task return with sanitize_a2a_result (CWE-117, closes #537)
Issue #537: builtin_tools/a2a_tools.py:72 returns peer-sourced text from
delegate_task() without OFFSEC-003 sanitization. Sibling regression to #491 / #492
in a different code path (google-adk delegation surface).

Fix: import sanitize_a2a_result from _sanitize_a2a and wrap all 4 peer-controlled
return sites in delegate_task() — parts[0].text path, empty-parts str(result) path,
fallback str(result) path, and the error message path.

Closes #537.
2026-05-11 19:09:18 +00:00
2db72fccf6 Merge pull request 'fix(provisioner): fail-fast pre-flight check for docker+git in local-build mode' (#536) from sre/fix-localbuild-preflight into main 2026-05-11 19:03:27 +00:00
4fc941efd0 Merge branch 'main' into sre/fix-localbuild-preflight 2026-05-11 18:55:24 +00:00
ec63334580 Merge pull request 'feat(ci): add qa-review + security-review checks (RFC#324 Step 1 of 5)' (#535) from infra/rfc-324-workflow-add into main 2026-05-11 18:54:44 +00:00
9ee910c484 Merge branch 'main' into sre/fix-localbuild-preflight 2026-05-11 18:53:13 +00:00
d5abcf103b Merge branch 'main' into infra/rfc-324-workflow-add 2026-05-11 18:53:09 +00:00
ecbfa60f04 fix(ci): close fail-open in qa/security review checks (RFC#324 v1.3 §A1.1) + drop dead jq fallback
Addresses hongming-pc review #1421 on PR #535.

Blocker 1 (fail-open privilege gate):
  Original v1.2 design `if:`-gated the "Check out BASE" and "Evaluate"
  steps on the privilege-check step's `proceed` output. A non-collaborator
  commenting `/qa-recheck` produced proceed=false → both steps skipped →
  job conclusion = success → `qa-review / approved` context published as
  success with ZERO real APPROVE. Any visitor could green the gate.

  Fix per RFC#324 v1.3 §A1.1 option (b): drop privilege-gating of the
  eval entirely. The eval is read-only and idempotent (reads
  pulls/{N}/reviews + teams/{id}/members/{u}, both server-side state
  uninfluenced by who commented). Re-running on a non-collaborator's
  comment is harmless: if a real team-member APPROVE exists, the eval
  flips green; if not, it stays red. The privilege step is retained as
  a `::notice::` log line only (griefer-spotting), not a gate.

Non-blocking nit 5 (dead jq fallback):
  `apt-get install jq` (no root) and `curl -o /usr/local/bin/jq` (no
  write perm on uid-1001 rootless runner) both can't succeed. Per
  feedback_ci_runner_install_needs_writable_path + #391/#402, jq is
  already baked into runner-base. Replace the install dance with a
  clear `exit 1` + diagnostic so a missing-jq runner fails loud rather
  than confusingly.

Smoke-test (mocked Gitea API):
  no-approve         → exit 1  (gate red)
  self-approve       → exit 1  (gate red)
  dismissed-approve  → exit 1  (gate red)
  non-team-approve   → exit 1  (gate red)
  team-approve       → exit 0  (gate green)

Blocker 2 (A1-α event-suffix context-name verification) is the
smoke-PR's job and is flagged in a follow-up comment on this PR — does
not require workflow changes here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:45:59 -07:00
b95a20bb9e fix(provisioner): fix type mismatch in checkTool seam
checkToolOnPath must match the checkTool func(tool string) error
signature in LocalBuildOptions — Go does not allow assigning a function
with (string, error) returns to a func(string) error variable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:45:39 +00:00
9e5a7f2814 Merge pull request #534: fix(security): CWE-117 stderr-scrubbing for A2A error responses (#471)
Closes #471 (CWE-117 tier:high). Cherry-pick of #454 content. Supersedes #517 + #533 (closed in redo loop) + #534-prior-close.

Reviewed-by: hongming-pc2 (Owners-tier Five-Axis 1417, advisory)
Approved-by: claude-ceo-assistant (1418, managers counting whitelist)
Merged-by: claude-ceo-assistant
2026-05-11 18:34:31 +00:00
6f0001d04c fix(provisioner): fail-fast pre-flight check for docker+git in local-build mode
Before reaching the clone/build cold path, check that both `docker` and
`git` are on PATH. Previously, a missing `docker` would produce a
cryptic "exec: docker: executable file not found" from deep inside the
docker-has-tag or docker-build call. Now the error surfaces immediately
with:

  local-build: "docker" not found on PATH — local-build mode requires
  both docker and git; either install them, or set MOLECULE_IMAGE_REGISTRY
  so local-build is bypassed

The check runs before the cache-hit fast path too, since docker is used
for image inspect + tag even on a cache hit.

Adds checkTool seam to LocalBuildOptions so tests can inject a stub
(no-op in makeTestOpts; two new tests exercise the missing-tool path).

Fixes issue #529 option B.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:32:05 +00:00
e922351b78 feat(ci): add qa-review + security-review checks (RFC#324 Step 1 of 5)
Adds the two job-conclusion-as-status review-gate workflows that will
replace sop-tier-check (Step 3 of RFC#324). Both:

- Trigger on pull_request_target (opened/synchronize/reopened) for the
  initial status, plus issue_comment for /qa-recheck and /security-recheck
  slash-command refire (Gitea 1.22.6 doesn't refire on pull_request_review
  per go-gitea/gitea#33700).
- Use job name 'approved' so the published context is 'qa-review / approved'
  and 'security-review / approved' — NO POST /statuses, NO write:repository
  scope (RFC#324 v1.1 addendum A1-α).
- Privilege-check slash-command commenters via /repos/.../collaborators/{u}
  (NOT github.event.comment.author_association — that field doesn't exist
  on Gitea 1.22.6, defect #1 from sop-tier-refire).
- Run under pull_request_target's BASE-branch trust boundary; checkout
  pins to default_branch (never head.sha) and the workflows only HTTP-call
  the Gitea API; no PR-head code is executed (RFC#324 A4 + internal#116).

Shared evaluator lives at .gitea/scripts/review-check.sh, parameterized
by TEAM + TEAM_ID. Pass condition: at least one APPROVED, non-dismissed,
non-author review whose user is a member of the named team.

Branch-protection flip (Step 2) is intentionally NOT included in this PR.
That is Owners-tier and blocked on (a) the first run of these workflows
capturing the EXACT status-context names, and (b) RFC_324_TEAM_READ_TOKEN
provisioning (filed as internal#325).

Refs: internal#324, internal#325 (token follow-up).
Closes: nothing yet — Steps 2 and 3 must land before #292/#319/#321 close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:30:34 -07:00
389613bb95 fix(tests): correct assert in test_sanitize_agent_error_stderr_and_exc
The exc class IS the tag when stderr is provided:
  "Agent error (ValueError): rate limit exceeded"

Fixes the incorrect assertion added in PR #517.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:21:19 +00:00