molecule-core/.gitea/workflows/lint-curl-status-capture.yml
dev-lead f5f96df5e3
All checks were successful
audit-force-merge / audit (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Check migration collisions / Migration version collision check (pull_request) Successful in 37s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 32s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
sop-tier-check / tier-check (pull_request) Successful in 9s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 39s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 2m0s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m3s
ci: port 9 gates/lints/audits to .gitea/workflows/ (RFC internal#219 §1, Category C-1)
Sweep companion to PR#372 (ci.yml port), PR#378 (Cat A), PR#379 (Cat B).

Ports 9 workflow files from .github/workflows/ to .gitea/workflows/.
Each port applies the four-surface audit pattern per
feedback_gitea_actions_migration_audit_pattern:

  1. YAML — dropped workflow_dispatch.inputs (Gitea 1.22.6 parser
     rejects them per feedback_gitea_workflow_dispatch_inputs_unsupported),
     dropped merge_group (no Gitea merge queue), workflow-level
     env.GITHUB_SERVER_URL pinned per feedback_act_runner_github_server_url.
  2. Cache — actions/setup-python cache:pip retained (works with Gitea
     1.22.x cache server). No actions/cache@v4 usage in this batch.
  3. Token — auto-injected GITHUB_TOKEN (Gitea-aliased) used; no
     custom dispatch tokens.
  4. Docs — top-of-file "Ported from .github/workflows/X.yml on
     2026-05-11 per RFC internal#219 §1 sweep" comment on every file.

Per RFC §1: each job has `continue-on-error: true` so surfaced
defects do not block PRs. Follow-up PR (not in this sweep's scope)
flips to `continue-on-error: false` after triage.

Files ported:

- block-internal-paths.yml — forbidden-path PR gate. Standard port;
  dropped merge_group + the merge_group-specific fetch step.
- cascade-list-drift-gate.yml — TEMPLATES vs manifest.json drift.
  Passes WORKFLOW=.gitea/workflows/publish-runtime.yml to the script
  (script's default is .github/... which Cat A removes).
- check-migration-collisions.yml — Postgres migration prefix
  collision gate. The collision script already supports Gitea via
  _gitea_api_url() / _gitea_token() — no script edit needed.
- lint-curl-status-capture.yml — workflow-bash anti-pattern lint.
  Scanner glob and SELF self-skip path retargeted to .gitea/workflows/**.yml.
- runtime-pin-compat.yml — PyPI-latest install + import smoke.
  Dropped workflow_dispatch + merge_group.
- runtime-prbuild-compat.yml — PR-built wheel import smoke.
  dorny/paths-filter@v4 replaced with inline `git diff` per PR#372
  pattern. detect-changes job + per-step if-gates retained.
- secret-pattern-drift.yml — canonical/consumer pattern set drift
  lint. on.paths references the .gitea/ canonical path. Also edits
  .github/scripts/lint_secret_pattern_drift.py CANONICAL_FILE
  constant from `.github/workflows/secret-scan.yml` to
  `.gitea/workflows/secret-scan.yml` (Cat A removes the .github/
  one).
- test-ops-scripts.yml — scripts/ unittest runner. Dropped merge_group.
- railway-pin-audit.yml — daily Railway env var drift detection.
  `actions/github-script@v9` blocks (which call github.rest.* — a
  GitHub-specific JS API) replaced with curl calls against the
  Gitea REST API (/api/v1/repos/.../issues|comments). Issue
  open/comment-on-repeat/close-on-clean semantics preserved.

This Cat C-1 PR groups the "safer" gates/lints/audits. Categories
C-2 (E2E) and C-3 (deploy/publish/janitors) ship in separate PRs.

The original .github/ files are left in place per RFC §1 (deletion
is a Phase 4 follow-up). They are silently dead — Gitea Actions in
molecule-core only registers workflows under .gitea/workflows/ —
but keeping them documented in-repo eases the diff-review.

DO NOT MERGE without orchestrator-dispatched Five-Axis review +
@hongmingwang chat-go.

Cross-links:
- RFC: molecule-ai/internal#219
- Companion: PR#372 (ci.yml port), PR#378 (Cat A), PR#379 (Cat B)
- Runbook: runbooks/gitea-actions-migration-checklist.md (Cat B PR)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:18:11 -07:00

105 lines
4.8 KiB
YAML

name: Lint curl status-code capture
# Ported from .github/workflows/lint-curl-status-capture.yml on 2026-05-11
# per RFC internal#219 §1 sweep.
#
# Differences from the GitHub version:
# - on.paths and the lint scanner target .gitea/workflows/**.yml (the
# active Gitea workflow directory) instead of .github/workflows/**.yml
# (which the rest of this sweep is emptying out).
# - Self-skip path updated to the .gitea/ version of this file.
# - Dropped `merge_group:` trigger.
# - Workflow-level env.GITHUB_SERVER_URL set per
# feedback_act_runner_github_server_url.
# - `continue-on-error: true` on the job (RFC §1 contract).
#
# Pins the workflow-bash anti-pattern that produced "HTTP 000000" on the
# 2026-05-04 redeploy-tenants-on-main run for sha 2b862f6:
#
# HTTP_CODE=$(curl ... -w '%{http_code}' ... || echo "000")
#
# When curl exits non-zero (connection reset -> 56, --fail-with-body 4xx/5xx
# -> 22), the `-w '%{http_code}'` already wrote a status to stdout — usually
# "000" for connection failures or the actual code for HTTP errors. The
# `|| echo "000"` then fires AND appends ANOTHER "000" to the captured
# stdout, producing values like "000000" or "409000" that fail string
# comparisons against "200" while looking superficially right.
#
# Same class of bug the synth-E2E §7c gate hit twice (PRs #2779/#2783 +
# #2797). Memory: feedback_curl_status_capture_pollution.md.
on:
pull_request:
paths: ['.gitea/workflows/**']
push:
branches: [main, staging]
paths: ['.gitea/workflows/**']
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
scan:
name: Scan workflows for curl status-capture pollution
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Find curl ... -w '%{http_code}' ... || echo "000" subshells
run: |
set -uo pipefail
# Multi-line aware: look for `$(curl ... -w '%{http_code}' ... || echo "000")`
# subshell where the entire command-substitution wraps a curl that
# ends with `|| echo "000"`. Must distinguish from the SAFE shape
# `$(cat tempfile 2>/dev/null || echo "000")` — `cat` with a missing
# tempfile produces empty stdout, no pollution.
python3 <<'PY'
import os, re, sys, glob
BAD_FILES = []
# Match the buggy substitution across newlines: $(curl ... -w '%{http_code}' ... || echo "000")
# The `\\n` is the bash line-continuation that lets curl flags span lines.
# We collapse continuation lines first, then look for the single-line bad pattern.
PATTERN = re.compile(
r'\$\(\s*curl\b[^)]*-w\s*[\'"]%\{http_code\}[\'"][^)]*\|\|\s*echo\s+"000"\s*\)',
re.DOTALL,
)
# Self-skip: this lint workflow contains the literal anti-pattern in
# its own docstring — that's intentional, not a bug.
SELF = ".gitea/workflows/lint-curl-status-capture.yml"
for f in sorted(glob.glob(".gitea/workflows/*.yml")):
if f == SELF:
continue
with open(f) as fh:
content = fh.read()
# Collapse bash line-continuations (\\\n + leading whitespace)
# into a single logical line so the regex can see the full
# curl invocation as one chunk.
flat = re.sub(r'\\\s*\n\s*', ' ', content)
for m in PATTERN.finditer(flat):
BAD_FILES.append((f, m.group(0)[:120]))
if not BAD_FILES:
print("OK No curl-status-capture pollution patterns detected")
sys.exit(0)
print(f"::error::Found {len(BAD_FILES)} curl-status-capture pollution site(s):")
for f, snippet in BAD_FILES:
print(f"::error file={f}::Curl status-capture pollution: '|| echo \"000\"' inside a $(curl ... -w '%{{http_code}}' ...) subshell. On non-2xx or connection failure, curl's -w writes a status, then exits non-zero, then the || echo appends another '000' — producing 'HTTP 000000' or '409000' that fails comparisons silently. Fix: route -w into a tempfile so the exit code can't pollute stdout. See memory feedback_curl_status_capture_pollution.md.")
print(f" matched: {snippet}...")
print()
print("Fix template:")
print(' set +e')
print(' curl ... -w \'%{http_code}\' >code.txt 2>/dev/null')
print(' set -e')
print(' HTTP_CODE=$(cat code.txt 2>/dev/null)')
print(' [ -z "$HTTP_CODE" ] && HTTP_CODE="000"')
sys.exit(1)
PY