molecule-core

Author	SHA1	Message	Date
devops-engineer	cd55ce10d2	chore: sync main → staging (auto, `502aa082`)	2026-05-07 23:25:49 +00:00
claude-ceo-assistant	502aa082bc	Merge pull request 'feat(canvas): A2ATopologyOverlay → ACTIVITY_LOGGED subscriber (#61 stage 2)' (#71 ) from feat/canvas-topology-overlay-ws-subscribe into main	2026-05-07 23:18:24 +00:00
devops-engineer	739c7f1141	chore: sync main → staging (auto, `33327cf0`)	2026-05-07 23:16:03 +00:00
security-auditor	7d0df65474	Merge remote-tracking branch 'origin/main' into feat/canvas-topology-overlay-ws-subscribe	2026-05-07 16:04:29 -07:00
claude-ceo-assistant	33327cf077	Merge pull request 'feat(canvas): CommunicationOverlay → ACTIVITY_LOGGED subscriber (#61 stage 1)' (#69 ) from feat/canvas-comm-overlay-ws-subscribe into main	2026-05-07 23:04:18 +00:00
devops-engineer	7e2cca7fad	chore: sync main → staging (auto, `e7660618`)	2026-05-07 23:00:21 +00:00
security-auditor	b73f599184	Merge remote-tracking branch 'origin/main' into feat/canvas-topology-overlay-ws-subscribe	2026-05-07 15:56:52 -07:00
security-auditor	5855be50b4	Merge remote-tracking branch 'origin/main' into feat/canvas-comm-overlay-ws-subscribe	2026-05-07 15:56:49 -07:00
claude-ceo-assistant	e766061800	Merge pull request 'fix(ratelimit): tenant-aware bucket keying — close canvas 429 storm (#59 )' (#60 ) from fix/canvas-429-tenant-aware-ratelimit into main	2026-05-07 22:56:38 +00:00
devops-engineer	792bfdf8fd	chore: sync main → staging (auto, `0be89053`)	2026-05-07 22:55:22 +00:00
security-auditor	9bb4bbdff7	Merge remote-tracking branch 'origin/main' into feat/canvas-topology-overlay-ws-subscribe	2026-05-07 15:54:03 -07:00
security-auditor	bec1cb3786	Merge remote-tracking branch 'origin/main' into feat/canvas-comm-overlay-ws-subscribe	2026-05-07 15:54:00 -07:00
security-auditor	1d6b09f2bd	Merge remote-tracking branch 'origin/main' into fix/canvas-429-tenant-aware-ratelimit	2026-05-07 15:53:57 -07:00
claude-ceo-assistant	0be89053e8	Merge pull request 'chore(observability): edge-429 probe + ratelimit runbook (unblocks #62 , #64 )' (#85 ) from chore/edge-429-probe-and-ratelimit-runbook into main	2026-05-07 22:53:48 +00:00
security-auditor	9eb530bbf0	Merge remote-tracking branch 'origin/main' into chore/edge-429-probe-and-ratelimit-runbook	2026-05-07 15:52:29 -07:00
security-auditor	62e793040e	chore(observability): edge-429 probe + ratelimit observability runbook Two artifacts that unblock the parked follow-ups from #59: 1. scripts/edge-429-probe.sh (closes the "operator-blocked" status of #62). An operator without CF/Vercel dashboard access can reproduce a canvas-sized burst against a tenant subdomain and read each 429's response shape — workspace-server bucket overflow (JSON body + X-RateLimit-* headers) is distinguishable from CF (cf-ray) and Vercel (x-vercel-id) by inspection of the report. Read-only, parallel via background subshells (no GNU parallel dependency), no credential use. Smoke-tested against example.com end-to-end. 2. docs/engineering/ratelimit-observability.md (closes the "metric-blocked" status of #64). The existing molecule_http_requests_total{path,status} counter + X-RateLimit-* response headers already cover #64's acceptance criterion ("watch metrics for two weeks"). The runbook collects the PromQL queries, a decision tree for the re-tune (keep / per-tenant override / change default), an alert rule template, and a hard "do not roll ad-hoc per-bucket-key exposure" note (in-memory map includes SHA-256 of bearer tokens — exposing it is a security review surface, file a follow-up if needed). Neither artifact changes runtime behaviour. Pure operational tooling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:48:34 -07:00
devops-engineer	34e05c35b9	chore: sync main → staging (auto, `6946cd12`)	2026-05-07 22:45:14 +00:00
claude-ceo-assistant	6946cd12c5	ci(branch-protection): check-name parity gate (#144 ) (#56 ) Adds tools/branch-protection/check_name_parity.sh regression guard + 6 shell tests + branch-protection-drift.yml wire-up. Closed #144. Approved by security-auditor.	2026-05-07 22:42:08 +00:00
claude-ceo-assistant	bcc72419ce	Merge branch 'main' into fix/144-branch-protection-check-name-parity-audit	2026-05-07 22:35:33 +00:00
devops-engineer	224b65764d	chore: sync main → staging (auto, `050cb035`)	2026-05-07 22:34:17 +00:00
claude-ceo-assistant	050cb035d6	fix(ci): pre-clone manifest deps in harness-replays workflow (#50 ) Mirrors PR #66/#173 pre-clone-manifest pattern. Closes #173 (followup). Approved by security-auditor.	2026-05-07 22:33:51 +00:00
claude-ceo-assistant	b83b533381	Merge branch 'main' into fix/144-branch-protection-check-name-parity-audit	2026-05-07 22:24:45 +00:00
claude-ceo-assistant	a23cf6a6bb	Merge branch 'main' into fix/harness-replays-pre-clone-manifest	2026-05-07 22:24:42 +00:00
security-auditor	7194b08987	feat(canvas): A2ATopologyOverlay subscribes to ACTIVITY_LOGGED — drop 60s polling Stage 2 of #61. Replaces the 60s setInterval poll that fanned out across every visible workspace fetching `?type=delegation&limit=500` with: 1. One bootstrap fan-out on mount (or on visible-ID-set change), same shape as before — preserves the 60-min look-back history. 2. useSocketEvent subscription to ACTIVITY_LOGGED — every event with activity_type=delegation + method=delegate from a visible workspace appends to a local rolling buffer, edges are re-derived via the existing buildA2AEdges helper. 3. showA2AEdges toggle off: clears edges + buffer. No interval poll. The visibleIdsKey selector gate that fixed the 2026-05-04 render-loop incident is preserved — peer-discovery / status-flip writes still don't trigger a wasteful re-bootstrap. Steady-state HTTP traffic from this overlay drops from N req/min (N visible workspaces × 1 cycle/min) to 0 outside of mount + visible- ID-set-change bootstraps. Live update latency drops from up to 60s to ~10ms. Bootstrap race-aware: any WS arrivals that landed in the buffer during the fetch await are preserved by id-dedup-with-fetched-first ordering. No row is double-counted; no row is lost during in-flight updates. Test changes: - 27 existing tests pass unchanged (buildA2AEdges purity preserved, component visibility/visibleIdsKey/error-swallow behaviour preserved). - 6 new WS-subscription tests: - NO 60s polling after bootstrap (clock advance fires nothing) - WS push for delegation updates edges with NO HTTP call - WS push for non-delegation activity_type ignored - WS push for delegate_result ignored (mirrors buildA2AEdges method filter) - WS push from hidden workspace ignored - WS push while showA2AEdges=false ignored Mutation-tested: - drop activity_type filter → "non-delegation" test fails - drop method===delegate filter → "delegate_result" test fails - drop visible-ws membership filter → "hidden workspace" test fails Full canvas suite: 1395 passing, 0 failing. tsc clean. No API or schema change. ACTIVITY_LOGGED event shape unchanged. The /workspaces/:id/activity HTTP endpoint stays — used for bootstrap. Hostile self-review (three weakest spots): 1. Bootstrap fetches up to 500 rows × N workspaces. Worst-case buffer ~3000 entries before window-prune. Acceptable: window- prune runs on every recomputeAndPush, buildA2AEdges aggregates to at most N² edges. Real-world usage stays well under both. 2. WS handler re-arms on every bootstrap dependency change (visibleIds change). useSocketEvent's ref-based pattern means the bus subscription stays stable across renders, but the handler closure re-captures bootstrap each time. Side effect: fine — handler invocation just calls recomputeAndPush which is idempotent. 3. delegate_result rows arriving over WS are silently dropped. Acceptable: the existing buildA2AEdges already filters them out at aggregation time (avoids double-counting); pre-filtering at the WS handler is the correct mirror — keeps the bus path and the bootstrap path consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:17:19 -07:00
devops-engineer	249dbc6ac9	chore: sync main → staging (auto, `f8a238df`)	2026-05-07 22:11:39 +00:00
devops-engineer	f8a238dfdd	chore: second auto-sync verification (post-#66/#67) (#68 )	2026-05-07 22:11:30 +00:00
security-auditor	830de70e84	feat(canvas): CommunicationOverlay subscribes to ACTIVITY_LOGGED — drop 30s polling Stage 1 of #61. Replaces the 30s setInterval poll with: 1. One bootstrap fan-out on mount (cap of 3 retained from the 2026-05-04 fix), gives the initial recent-comms window without waiting for live events. 2. useSocketEvent subscription to ACTIVITY_LOGGED — every event with a comm-overlay-relevant activity_type from a visible online workspace prepends to the rendered list. 3. Re-bootstrap on visibility-toggle re-open so the snapshot is fresh after a long collapsed period. No interval poll. Inherits the singleton ReconnectingSocket's reconnect / backoff / health-check guarantees via useSocketEvent. Steady-state HTTP traffic from this overlay drops from ~6 req/min (3 ws × 2 cycles/min) to 0 outside of mount/visibility-toggle bootstraps. Live updates arrive within ~10ms of the server insert instead of after up to 30s. Test changes: - Bootstrap fan-out cap of 3 — kept (was the cadence test's role pre-#61) - 30s cadence test — replaced with "no interval polling" test that pins the absence of any cadence-driven HTTP after bootstrap - Visibility gate test — extended to verify both: no fetches while closed, AND re-bootstrap on re-open - WS subscription tests (new): - WS push extends rendered list with NO HTTP call - WS push for offline workspace ignored - WS push for non-comm activity_type ignored - WS push while collapsed ignored - non-ACTIVITY_LOGGED events ignored Mutation-tested: - drop visibility gate → visibility test fails - drop activity_type filter → "non-comm activity_type" test fails - drop workspace online-set filter → "offline workspace" test fails Full canvas suite: 1393 passing, 0 failing. tsc clean. No API or schema change. ACTIVITY_LOGGED event shape pinned by existing socket-events tests. Hostile self-review (three weakest spots): 1. Sustained WS outage shows stale comms until visibility-toggle re-bootstrap. Acceptable: the singleton socket already auto- reconnects and the comm overlay isn't a critical-path surface. 2. Bootstrap on visibility-toggle costs another 3 HTTP calls each re-open. Acceptable: visibility-toggle is a deliberate user action, not a tight loop. 3. The WS handler reads the latest `nodes` via nodesRef rather than re-subscribing on node changes. By design — the bus listener stays bound for the component lifetime to avoid the "tear-down storm" pattern A2ATopologyOverlay's comment warns about (ref-based current-state lookup, stable subscription). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:11:02 -07:00
devops-engineer	3f68ac1fcb	chore: second consecutive trigger for auto-sync verification (post-#66/#67)	2026-05-07 15:10:40 -07:00
devops-engineer	da7baee2a3	chore: sync main → staging (auto, `5efa92fb`)	2026-05-07 22:10:12 +00:00
devops-engineer	5efa92fbc6	chore: verify auto-sync main→staging post-#66 (#67 )	2026-05-07 22:10:04 +00:00
devops-engineer	f0664264cb	chore: empty commit to verify auto-sync main→staging post-#66	2026-05-07 15:09:18 -07:00
devops-engineer	2679fdd01a	chore: sync main → staging (manual, resolve auto-sync workflow conflict, post-#66) # Conflicts: # .github/workflows/auto-sync-main-to-staging.yml	2026-05-07 15:08:20 -07:00
devops-engineer	7b194eb1aa	fix(ci): rewrite auto-sync main→staging for Gitea direct push (#66 , closes #65 )	2026-05-07 22:07:00 +00:00
devops-engineer	6235ef7461	fix(ci): rewrite auto-sync main→staging for Gitea direct push Root cause of `Auto-sync main → staging / sync-staging (push)` failing every push to main since the GitHub→Gitea migration: The workflow assumed a GitHub `merge_queue` ruleset on staging (blocking direct push) and used `gh pr create` + `gh pr merge --auto` to land sync via the queue. On Gitea this fails at the `gh pr create` step with `HTTP 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)` — Gitea exposes no GraphQL endpoint, and the GitHub-CLI cannot ship PRs against Gitea. Verified failure mode in run 1117/job 0 (token logs at /tmp/log2.txt, run target /molecule-ai/molecule-core/actions/ runs/1117/jobs/0). The merge step succeeded and pushed auto-sync/main-1e1f4d63; the PR step failed with the 405. So every main push left an orphan auto-sync/* branch and a red CI status, with no PR to land it. Fix: the staging branch protection on Gitea (`enable_push: true`, `push_whitelist_usernames: [devops-engineer]`) already permits direct push from the devops-engineer persona. Drop the entire merge-queue PR architecture and replace with: 1. Checkout staging with secrets.AUTO_SYNC_TOKEN (devops-engineer persona token, NOT founder PAT — `feedback_per_agent_gitea_identity_default`). 2. `git fetch origin main` + ff-merge or no-ff merge. 3. `git push origin staging` directly. The AUTO_SYNC_TOKEN repo secret already exists (created 2026-05-07 14:00 alongside the staging push_whitelist update). Workflow name + job name unchanged → required-check name `Auto-sync main → staging / sync-staging (push)` keeps the same context, no branch-protection edits needed. Rejected alternatives (documented in workflow header): - Reuse PR architecture via Gitea REST: ~80 LOC of API plumbing for no benefit; direct push works. - GH_HOST=git.moleculesai.app: still calls /api/graphql, same 405; doesn't fix the root issue. - Custom JS action: external dep for a 5-line `git push`. Header comment in the workflow now documents: - What this workflow does (SSOT for staging advancing). - Why direct push (GitHub merge_queue → Gitea push_whitelist). - Identity and token (anti-bot-ring per saved memory). - Failure modes A–D with operator runbook for each. - Loop safety (push to staging doesn't fire push:main → no recursion). Verification plan: this fix-PR's merge to main is itself the trigger; watch the workflow run on the merge commit and on one follow-up trigger commit, expect both green. Refs: failing run https://git.moleculesai.app/molecule-ai/ molecule-core/actions/runs/1117/jobs/0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:04:12 -07:00
security-auditor	5b7b669b4c	docs(ratelimit): tighten dev-mode comment after keyFor refactor The previous comment said "all share one IP bucket" — accurate before the keyFor refactor, slightly stale after it. The dev-mode rationale (bucket fills fast, blanks the page on a single-user dev box) is unchanged; only the bucket-key flavour text needed updating. Doc-only follow-up from #60's hostile self-review #3. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:57:21 -07:00
security-auditor	9dda84d671	fix(ratelimit): tenant-aware bucket keying — close canvas 429 storm Closes #59. Symptom: /workspaces/:id/activity returns 429 with rate-limit-exceeded on hongming.moleculesai.app whenever multiple workspaces are visible in the canvas. Single-tab, single-user, well within the documented 600 req/min budget — but every request collapsed into one bucket. Root cause: workspace-server's RateLimiter keyed buckets on c.ClientIP(). After issue #179 turned off proxy-header trust (SetTrustedProxies(nil), correctly closing the XFF spoofing hole), c.ClientIP() returns the TCP RemoteAddr — which in production is the upstream proxy (Caddy on per-tenant EC2; CP/Vercel on the SaaS plane). Every browser tab + every canvas consumer + every poll loop for every tenant collapsed into one bucket. Fix: bucket key derivation moves into a single keyFor helper that mirrors the SSOT pattern of: - molecule-controlplane/internal/middleware/ratelimit.go (org > user > IP) - this package's own MCPRateLimiter (token-hash via tokenKey) Priority: X-Molecule-Org-Id header → SHA-256(Authorization Bearer) → ClientIP. Token values are kept hashed in the bucket map so the in-memory state can't become a token dump. Tests: - TestKeyFor_OrgIdHeaderTrumpsBearerAndIP — priority order - TestKeyFor_BearerTokenWhenNoOrgId — middle tier + raw-token leak pin - TestKeyFor_IPFallbackWhenNoOrgIdNoBearer — anon probe path - TestRateLimit_TwoOrgsSameIP_IndependentBuckets — load-bearing regression (issue #59) — two tenants behind same upstream proxy must not share a bucket - TestRateLimit_TwoTokensSameIP_IndependentBuckets — same shape for the per-tenant Caddy box - TestRateLimit_SameOrgDifferentTokens_SharedBucket — counter-pin: rotating tokens within one org must NOT bypass the org's quota - TestRateLimit_Middleware_RoutesThroughKeyFor — AST gate, mirrors the SSOT gates established in #36/#10/#12 Mutation-tested: - strip org-id branch in keyFor → 3 tests fail - strip bearer-token branch → 2 tests fail - reintroduce direct c.ClientIP() in Middleware → 3 tests fail (including the AST gate) Existing tests pass unchanged: dev-mode fail-open, X-RateLimit-* headers (#105), Retry-After on 429 (#105), XFF anti-spoofing (#179). No schema/API change. 429 response body and X-RateLimit-* headers unchanged. RATE_LIMIT env var semantics unchanged. Hostile self-review (three weakest spots) is in the issue body: 1. one-shot Docker-inspect cost is now bucket-key derivation cost (string compare + SHA-256 of bearer); single-digit microseconds. 2. X-Molecule-Org-Id is unvalidated at the rate-limiter layer — spoofing is closed by tenant SG + CP front; documented in keyFor's docstring with the conditions under which to revisit. 3. cpProv-style SaaS surface is out of scope; CP's own limiter handles that hop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:51:08 -07:00
Hongming Wang	7c6acc18ae	ci(branch-protection): check-name parity gate (#144 ) Audit finding: every workflow that emits a required-status-check name on molecule-core's branch protection (apply.sh's STAGING_CHECKS + MAIN_CHECKS) ALREADY uses the safe always-runs-with-conditional-steps shape — Platform/Canvas/Python/Shellcheck in ci.yml, Canvas tabs E2E in e2e-staging-canvas.yml, E2E API Smoke in e2e-api.yml, PR-built wheel in runtime-prbuild-compat.yml, the codeql Analyze matrix, and the always-on Secret scan + Detect changes. No production drift to fix today. Adds a regression-guard so the next path-filter / matrix refactor / workflow rename can't silently re-introduce the bug shape called out in saved memory feedback_branch_protection_check_name_parity: "Path filters … silently break branch protection because no job emits the protected sentinel status when path-filter returns false." New tools: - tools/branch-protection/check_name_parity.sh — extracts every required check name from apply.sh's heredocs, then for each name classifies the owning workflow as safe (no top-level paths:) / safe (per-step if-gates without top-level paths:) / unsafe (top-level paths: without per-step if-gates) / unsafe-mix (top-level paths: WITH per-step if-gates — the workflow may still skip entirely on path exclusion, leaving the gates dormant) / missing (no emitter at all). Special-cases codeql.yml's matrix- expanded `Analyze (${{ matrix.language }})`. - tools/branch-protection/test_check_name_parity.sh — 6 unit tests covering each classification: safe, unsafe-path-filter, missing, safe-with-per-step-gates, unsafe-mix, matrix-expansion. Each test builds a synthetic apply.sh + workflow file in a tmpdir, invokes the script, and asserts on exit code + stderr substring. Per feedback_assert_exact_not_substring the assertions pin specific classifications, not just non-zero exit. Wired into branch-protection-drift.yml so every PR touching .github/workflows/** runs the parity check; the existing daily schedule covers between-PR drift. The check is cheap (~1s) and runs without the admin token — only reads files in the checkout. Self- test step runs the unit tests on every invocation, so a regression in the script can't false-pass on production. Per BSD-vs-GNU portability hygiene: heredoc-marker extraction stays in plain awk + sed (no gawk-only `match()` array form), grep regex avoids `^` anchor for `if:` lines because real workflows use ` - if:` with the `-` step-marker between leading spaces and `if:` (the original anchor missed every workflow's per-step gates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:42:50 -07:00
claude-ceo-assistant	1e1f4d635b	fix(ci): convert CodeQL workflow to no-op stub on Gitea (#156 ) (#51 ) Closes #156. Touches #142. Approved-by: security-auditor	2026-05-07 21:37:04 +00:00
claude-ceo-assistant	3a00dd236f	fix(ci): convert CodeQL workflow to no-op stub on Gitea (#156 ) Why --- PR #35 marked `continue-on-error: true` at the JOB level (correct YAML), but Gitea Actions 1.22.6 does NOT propagate job-level continue-on-error to the commit-status API — every matrix leg still posts `failure`. That keeps OVERALL=failure on every push to main + staging and blocks the auto-promote signal even when every other gate is green. Worse: the underlying CodeQL run never actually worked on Gitea. The github/codeql-action/init@v4 step calls api.github.com bundle endpoints (CLI download + query packs + telemetry) that Gitea does NOT proxy. Confirmed via live-tested run 1d/3101 on operator host: 2026-05-07T20:55:17 ::group::Run Initialize CodeQL with: languages: ${{ matrix.language }} queries: security-extended 2026-05-07T20:55:36 ::error::404 page not found 2026-05-07T20:55:50 Failure - Main Initialize CodeQL 2026-05-07T20:55:51 skipping Perform CodeQL Analysis (main skipped) 2026-05-07T20:55:51 :⚠️:No files were found at sarif-results/go/ The SARIF artifact upload was already a no-op (warning above) — the analyze step never wrote anything because init failed. So nothing of value is being lost by stubbing this out. What ---- - Convert the workflow to a single-step stub that emits success per matrix language (go, javascript-typescript, python). - Keep workflow `name: CodeQL` exactly (auto-promote-staging.yml line 67 keys on it as a workflow_run gate). - Keep job name template `Analyze (${{ matrix.language }})` and the 3-leg matrix exactly (commit-status context names + branch protection + #144 required-check-name parity). - Keep all four triggers (push / pull_request / merge_group / schedule) so merge_group required-checks parity holds. - Drop the codeql-action steps, the Autobuild step, the SARIF parse step, and the upload-artifact step — all four of those are now dead code (init can never succeed against Gitea's API surface). Policy ------ Per Hongming decision 2026-05-07 (#156): CodeQL is ADVISORY, not blocking, until a Gitea-compatible SAST pipeline lands. The header of the new workflow file documents this decision + lists the three re-enable options (self-hosted Semgrep, Sonatype, GitHub mirror) plus the compensating controls in place (secret-scan, block-internal- paths, lint-curl-status-capture, branch-protection-drift). Closes #156. Touches #142 (no capital-M Molecule-AI refs in this file — already lowercase per `e01077be`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:26:57 -07:00
devops-engineer	229b1a902a	fix(ci): pre-clone manifest deps in harness-replays workflow (#173 followup) harness-replays.yml builds tenant-alpha + tenant-beta via tests/harness/ compose.yml using workspace-server/Dockerfile.tenant. Post-#173, that Dockerfile expects .tenant-bundle-deps/{workspace-configs-templates, org-templates,plugins} pre-cloned at the build context root. Sister PR #38 added the pre-clone step to publish-workspace-server-image.yml but missed harness-replays.yml. Symptoms: - main run #892 (2026-05-07T20:28:53Z): COPY .tenant-bundle-deps/plugins -> failed to calculate checksum ... not found. - staging run #964 (2026-05-07T20:41:52Z): hits the OLD in-image clone path (staging hasn't picked up the Dockerfile.tenant refactor yet via auto-sync) and fails on 'fatal: could not read Username for https://git.moleculesai.app' when cloning the first private workspace-template-* repo. Fix: add the same Pre-clone step to harness-replays.yml, mirroring publish-workspace-server-image.yml. Uses AUTO_SYNC_TOKEN (devops-engineer persona PAT) per feedback_per_agent_gitea_identity_default. Once auto-sync main->staging unblocks (sister agent fixing the 7-file conflict in flight), staging will inherit both this workflow fix AND the Dockerfile.tenant refactor atomically. Refs: #168, #173	2026-05-07 14:26:52 -07:00
devops-engineer	e3904ebb42	chore: reconcile main → staging post-suspension divergence (Task #165 followup) (#48 )	2026-05-07 21:26:41 +00:00
claude-ceo-assistant (Claude Opus 4.7 on Hongming's MacBook)	25fb696965	chore: reconcile main → staging post-suspension divergence Refs Task #165 (Class D AUTO_SYNC_TOKEN plumbing). main and staging diverged after the 2026-05-06 GitHub-org suspension because Class D / Class G / feature work landed on staging while unrelated CI fixes (#34-47, ECR auth-inline, buildx→docker, pre-clone manifest deps) landed straight on main. Both branches edited the same workflow files, so every push to main triggered an Auto-sync run that aborted at `git merge --no-ff origin/main` with 7 content conflicts: - .github/workflows/canary-verify.yml (URL: github.com → Gitea) - .github/workflows/ci.yml (3 URL refs) - .github/workflows/publish-runtime.yml (cascade: HTTP repo-dispatch → Gitea push) - .github/workflows/publish-workspace-server-image.yml (drop AWS-action steps; ECR auth is inline) - .github/workflows/retarget-main-to-staging.yml (URL) - manifest.json (lowercase org slug + add mock-bigorg from main) - scripts/clone-manifest.sh (keep main's MOLECULE_GITEA_TOKEN auth path + drop awk-tolower since manifest is now lowercase) Resolution: union — staging's post-suspension Gitea/ECR migrations win on URL/policy edits; main's additive work (mock-bigorg manifest entry, inline ECR auth, MOLECULE_GITEA_TOKEN basic-auth) is preserved on top. After this lands, staging is a strict superset of main, so the next auto-sync run on a push to main will be a clean fast-forward / no-op. The auto-sync workflow on main also picks up staging's AUTO_SYNC_TOKEN swap (Class D #26) for free, fixing the latent layer-2 push-auth issue. Verified locally: - bash -n scripts/clone-manifest.sh - python -c 'yaml.safe_load(...)' on each touched workflow - python -c 'json.load(open(manifest.json))' (21 plugins, 9 templates, 7 org_templates) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:24:37 -07:00
claude-ceo-assistant	0276b295cc	Merge pull request 'chore(ci): retrigger publish-workspace-server-image after ECR repo create (#173 )' (#47 ) from chore/issue173-retrigger-after-ecr-repo-create into main	2026-05-07 20:54:53 +00:00
devops-engineer	194cdf012b	chore(ci): retrigger publish-workspace-server-image after ECR repo create (#173 ) Run #1010 (post-#46) succeeded all the way to push but failed with "repository molecule-ai/platform does not exist" — the platform image ECR repo had never been created (only platform-tenant existed). Created the repo via: aws ecr create-repository --region us-east-2 \ --repository-name molecule-ai/platform \ --image-scanning-configuration scanOnPush=true This is a one-line workflow comment to satisfy the path-filter and re-run the publish workflow against the now-existing repo. Closes #173 properly this time — pre-clone + inline ECR auth + ECR repo all in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:54:11 -07:00
claude-ceo-assistant	6b30ab6391	fix(ci): inline aws ecr get-login-password + docker login (#46 ) Closes #173 — final piece.	2026-05-07 20:49:55 +00:00
devops-engineer	f0e8d9bb23	fix(ci): inline aws ecr get-login-password + docker login (followup #173 ) CI run #987 (post-#45) showed `docker push` from shell still hits "no basic auth credentials" — `aws-actions/amazon-ecr-login@v2` writes auth to a step-scoped DOCKER_CONFIG that doesn't carry across to the next shell step on Gitea Actions. Fix: drop both `aws-actions/configure-aws-credentials@v4` and `aws-actions/amazon-ecr-login@v2`. Run `aws ecr get-login-password \| docker login` inline in the same shell step as `docker build` + `docker push`. AWS creds come from secrets via env vars, ECR token is fresh per-step (12h validity is plenty), config.json lives in the same shell process — auth state is guaranteed. This is the operator-host manual approach mapped 1:1 into CI. runner-base image already has aws-cli + docker (verified locally). Closes #173 (fifth piece — and final, this matches the manual flow exactly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:49:12 -07:00
claude-ceo-assistant	ee56443146	fix(ci): replace buildx with plain docker build+push (#45 ) Closes #173 — fourth and hopefully final piece.	2026-05-07 20:44:42 +00:00
devops-engineer	43e2d24c5b	fix(ci): replace buildx with plain docker build+push (followup #173 ) CI run #946 (post-#43) confirmed `driver: docker` doesn't fix the ECR push 401 either: buildx CLI inside the runner container talks to the operator-host docker daemon (mounted socket), but the daemon doesn't see the runner's ECR auth state, and the runner's buildx CLI doesn't attach the auth header in a way the daemon accepts. Drop buildx + build-push-action entirely. Plain `docker build` + `docker push` from the runner container works because both use the SAME docker socket + the SAME runner-container config.json (populated by `aws ecr get-login-password \| docker login` from amazon-ecr-login). Trade-off: lose multi-arch support. We only ship linux/amd64 tenant images today, so this is fine. If multi-arch becomes a requirement later, we can revisit (likely with `docker buildx create --driver=remote` pointing at an external buildkit, but that's substantial infra work; not worth it for a single-arch shop). Closes #173 (fourth piece — and hopefully last; this matches the operator-host manual approach exactly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:43:50 -07:00
devops-engineer	de039e3861	Merge pull request 'chore: retrigger Harness Replays after Class G + clone-manifest fixes (#168 )' (#44 ) from chore/retrigger-harness-replays-post-class-g into staging	2026-05-07 20:41:05 +00:00
devops-engineer	11afd25e6a	chore: retrigger Harness Replays after Class G + clone-manifest fixes (#168 ) Empty-shape commit on a tests/harness/** path to trigger the harness-replays workflow's path-filter on staging, verifying that: - PR #40 (Class G #168) migrated all explicit github.com/Molecule-AI URL refs - PR #42 (Class G #168 followup) migrated the indirect clone-manifest.sh + manifest.json forms After this run, harness-replays should get past the previously-failing 'fatal: could not read Username for https://github.com' clone-manifest step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:36:39 -07:00

1 2 3 4 5 ...

4615 Commits