Commit Graph

4979 Commits

Author SHA1 Message Date
Molecule AI Core Platform Lead
550711596e trigger 2026-05-10 02:47:46 +00:00
Molecule AI Core Platform Lead
3f738e6ab5 Merge remote-tracking branch 'origin/main' into trig-223 2026-05-10 02:47:46 +00:00
12bb73d000 fix(dockerfile-tenant): chown /org-templates to canvas user so !external resolver can mkdir cache
Root cause:
  Dockerfile.tenant chowns /canvas /platform /memory-plugin /migrations
  to canvas:canvas (line ~119) but not /org-templates. The image is
  built as root, COPY-ed templates inherit root:root 0755. The platform
  binary then runs as the canvas user (uid 1000) because of the USER
  directive on line ~124, so when the !external resolver
  (org_external.go, internal#77 / task #222) tries
  os.MkdirAll("/org-templates/<tmpl>/.external-cache/<repo>") on first
  import, mkdir(2) returns EACCES and the import handler returns 400
  "org template expansion failed" (org.go:592). The user-facing error
  is generic; only the server log carries:

    Org import: refusing import: !include expansion failed:
    !external at line 156: fetch git.moleculesai.app/molecule-ai/molecule-dev-department@v1.0.0:
    mkdir cache root: mkdir /org-templates/molecule-dev/.external-cache: permission denied

Repro:
  Tenant staging-cplead-2 (canary AWS 004947743811, image SHA
  a93c4ce17725...). POST /org/import {"dir":"molecule-dev"} returns 400
  while POST /org/import {"dir":"free-beats-all"} returns 201 — only
  templates with !external trip the bug.

Fix:
  Add /org-templates to the chown -R argv. One-line change. Same
  ownership shape as the other writable platform-state dirs.

Why this is safe for prod:
  * The platform binary already needs read access to /org-templates,
    so canvas:canvas owning it doesn't widen any attack surface.
  * /org-templates is image-resident, not bind-mounted; chown applies
    inside the image layers and prod tenants get the fix on next
    image rebuild + redeploy. Live prod tenants are unaffected until
    the next deploy (no orgs currently using !external in prod —
    molecule-dev consumers are all internal staging).

Verification:
  After hand-applying the chown live (docker exec --user 0 ... chown -R
  canvas:canvas /org-templates/molecule-dev), POST /org/import
  {"dir":"molecule-dev"} returns 201 with 39 workspaces; cp-lead +
  CP-BE + CP-QA + CP-Security all reach status=online within ~2 min.

Refs:
  internal#77 — !external RFC (Phase 3a)
  task #222 — resolver PR (introduced the unflagged-permission
              dependency this fixes)
  Live incident 2026-05-10 — staging-cplead-2 import failed,
              chown-on-host workaround in place pending image rebuild
2026-05-09 19:40:52 -07:00
5480d40bc1 Merge pull request 'fix(workspace): add SSRF validation before writing external workspace URL' (#221) from fix/ssrf-admin-create-url-validation into main 2026-05-10 02:34:41 +00:00
Molecule AI Core Platform Lead
89fadb0dac Merge remote-tracking branch 'origin/main' into trig-221 2026-05-10 02:34:32 +00:00
Molecule AI Core Platform Lead
bbf0b164e5 trigger 2026-05-10 02:34:18 +00:00
Molecule AI Core Platform Lead
b97bda13e9 Merge remote-tracking branch 'origin/main' into trig-221 2026-05-10 02:34:18 +00:00
4474ddc189 fix(workspace): add SSRF validation before writing external workspace URL
Issue #212: POST /workspaces with runtime=external and a URL wrote the
URL directly to the DB without validateAgentURL checking (the same check
that registry.go:324 applies to the heartbeat path). An attacker with
AdminAuth could register a workspace URL at a cloud metadata endpoint
(169.254.169.254) and exfiltrate IAM credentials when the platform
fires pre-restart drain signals.

Changes:
- workspace.go: add validateAgentURL(payload.URL) guard before the
  UPDATE at line 386. 400 on unsafe URL, no DB write occurs.
- workspace_test.go: add 3 regression tests:
  - TestWorkspaceCreate_ExternalURL_SSRFSafe: safe public URL → 201
  - TestWorkspaceCreate_ExternalURL_SSRFMetadataBlocked: 169.254.169.254 → 400
  - TestWorkspaceCreate_ExternalURL_SSRFLoopbackBlocked: 127.0.0.1 → 400
  Both unsafe tests assert zero DB calls (the handler rejects before
  any transaction).

Ref: issue #212.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:30:18 +00:00
50dc31cd66 Merge pull request 'feat(workspace): add static .github-token fallback to git credential helper' (#219) from infra/add-github-token-static-fallback into main 2026-05-10 02:24:59 +00:00
Molecule AI Core Platform Lead
9ad8d8407d Merge remote-tracking branch 'origin/main' into trig-219 2026-05-10 02:24:27 +00:00
a7278abad4 Merge pull request 'docs(runbook): add admin-auth.md covering test-token route lockdown' (#220) from infra/add-admin-auth-runbook into main 2026-05-10 02:24:02 +00:00
Molecule AI Core Platform Lead
14afa58606 trigger 2026-05-10 02:23:40 +00:00
Molecule AI Core Platform Lead
4615298eca Merge remote-tracking branch 'origin/main' into trig-220 2026-05-10 02:23:40 +00:00
Molecule AI Core Platform Lead
7386d9cbea Merge remote-tracking branch 'origin/main' into trig-219 2026-05-10 02:23:26 +00:00
Molecule AI Core Platform Lead
5f5ee4038c trigger 2026-05-10 02:23:08 +00:00
Molecule AI Core Platform Lead
afb4bb1f81 Merge remote-tracking branch 'origin/main' into trig-219 2026-05-10 02:23:08 +00:00
b5d9f13ab1 docs(runbook): add admin-auth.md covering test-token route lockdown
Issue #214: documents the MOLECULE_ENV=production requirement for
staging/prod tenants to lock the /admin/workspaces/:id/test-token route.
Also adds a startup INFO log in main.go when the route is enabled, so
operators can confirm the setting in boot logs without having to probe
the endpoint directly.

Ref: issue #214.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:20:30 +00:00
c22e45049e Merge pull request 'test(canvas): add tests for StatusBadge, ValidationHint, Spinner' (#218) from test/canvas-context-search-tests into main 2026-05-10 02:18:04 +00:00
Molecule AI Core Platform Lead
6bf901b391 Merge remote-tracking branch 'origin/main' into trig-218 2026-05-10 02:17:26 +00:00
7ae3ee786f feat(workspace): add static .github-token fallback to git credential helper
Adds a 4th fallback step to the token chain (cache > API > env > static)
so workspace git/gh operations survive a platform outage without requiring
a restart or platform-side fix. Addresses the 2026-05-08 incident where
every workspace lost git+gh auth simultaneously when the
/github-installation-token endpoint returned 500.

Operator places a PAT in ${CONFIGS_DIR:-/configs}/.github-token
(no root needed — /configs is agent-writable). Both _fetch_token
(git credential helper path) and _refresh_gh (gh CLI daemon path)
gain the static fallback so git and gh both recover post-incident.

Pure additive — existing cache > API > env chain is unchanged.
Empty static file is rejected (whitespace-stripped before use).
Static path never writes the cache, so the API recovers transparently
on the next refresh cycle when it comes back online.

Ref: issue #140.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:17:22 +00:00
Molecule AI Core Platform Lead
9313fc82ac trigger 2026-05-10 02:17:06 +00:00
Molecule AI Core Platform Lead
a4c314bea5 Merge remote-tracking branch 'origin/main' into trig-218 2026-05-10 02:17:05 +00:00
6b3ab63bc0 test(canvas): add tests for StatusBadge, ValidationHint, Spinner
StatusBadge: all 3 status variants, aria-label, role=status, config class names.
ValidationHint: error/valid/neutral states, warning icon, valid icon, class names.
Spinner: sm/md/lg size classes, aria-hidden, motion-safe:animate-spin.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:15:02 +00:00
2fb6044d96 Merge pull request 'test(canvas): add component tests for SearchDialog and ContextMenu' (#216) from test/canvas-context-search-tests into main 2026-05-10 02:13:53 +00:00
Molecule AI Core Platform Lead
df7a7560cf Merge remote-tracking branch 'origin/main' into trig-216 2026-05-10 02:13:27 +00:00
Molecule AI Core Platform Lead
0ee6317c0c trigger 2026-05-10 02:13:02 +00:00
f7833f1643 Merge pull request 'fix(ci): migrate canary-verify from GHCR to ECR + add POST route smoke tests' (#217) from infra/fix-canary-verify-ecr-migration into main 2026-05-10 02:12:47 +00:00
Molecule AI Core Platform Lead
862819dc65 Merge remote-tracking branch 'origin/main' into trig-217 2026-05-10 02:12:37 +00:00
Molecule AI Core Platform Lead
67310828e7 trigger 2026-05-10 02:12:21 +00:00
af5406d29e fix(ci): migrate canary-verify from GHCR to ECR + add POST route smoke tests
Root cause of issue #213: canary-verify.yml still used GHCR
(ghcr.io/molecule-ai/platform-tenant) while
publish-workspace-server-image.yml migrated to ECR on 2026-05-07
(commit 10e510f5). Canary smoke tests were silently testing a stale
GHCR image while actual staging/prod tenants ran the ECR build.
The POST /org/import and POST /workspaces routes were missing from
the ECR binary (likely a Docker layer-caching artefact during the
staging push window) but smoke tests passed because they never tested
the ECR image at all.

Changes:
- canary-verify.yml: migrate promote-to-latest from GHCR crane tag
  ops to the CP redeploy-fleet endpoint (same mechanism as
  redeploy-tenants-on-main.yml). The wait-for-canaries step already
  read SHA from the running tenant /health (registry-agnostic), so
  no change needed there. Pre-fix promote step used `crane tag` against
  GHCR, which was never updated after the ECR migration.
- redeploy-tenants-on-main.yml: update stale comments that reference
  GHCR to reflect ECR; replace the 30s GHCR CDN propagation wait
  with a no-op comment (ECR has no CDN cache to wait for).
- scripts/canary-smoke.sh: add POST /org/import and POST /workspaces
  smoke tests (steps 6-8). These assert HTTP 401 unauthenticated
  (proves AdminAuth enforced AND the route is compiled in — 404 would
  mean route missing from binary). GET /workspaces was already covered;
  POST was the untested gap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:10:12 +00:00
2549c4cbcc test(canvas): add component tests for SearchDialog and ContextMenu
SearchDialog: Cmd+K/Ctrl+K shortcut, Escape close, input focus via rAF,
text filtering by name/role/status, arrow-key navigation, Enter select,
aria-combobox/listbox/option attributes, footer workspace count.

ContextMenu: null guard, node header, outside-click/Escape/Tab close,
conditional items (online vs offline vs paused), team items, dividers,
danger Delete styling, keyboard navigation, Pause/Resume API calls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:09:43 +00:00
511bc7c01d Merge pull request 'test(canvas): add component tests for OnboardingWizard and PurchaseSuccessModal' (#215) from test/canvas-onboarding-purchase-modal-tests into main 2026-05-10 01:53:55 +00:00
Molecule AI Core Platform Lead
ee5648b3d1 trigger 2026-05-10 01:53:43 +00:00
b23ca65d35 test(canvas): add component tests for OnboardingWizard and PurchaseSuccessModal
OnboardingWizard: visibility gates, 4-step flow, skip/dismiss,
localStorage persistence, progress bar, aria-live announcements,
auto-advance from welcome→api-key on nodes change.

PurchaseSuccessModal: URL param gating, portal rendering,
item name display, 5s auto-dismiss (fake timers), backdrop/Escape
close, replaceState URL stripping, aria-modal/focus management.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:50:29 +00:00
2893c4c2aa Merge pull request 'feat(ci): port publish-runtime.yml to .gitea/workflows/ (issue #206)' (#211) from ci/port-publish-runtime-to-gitea-actions into main 2026-05-10 01:29:41 +00:00
Molecule AI Core Platform Lead
b04e7b39a0 Merge remote-tracking branch 'origin/main' into trig-211 2026-05-10 01:29:23 +00:00
Molecule AI Core Platform Lead
66d3bb9f2f trigger 2026-05-10 01:29:10 +00:00
25d3b1a2f3 feat(ci): port publish-runtime.yml to .gitea/workflows/ (issue #206)
publish-runtime.yml was dead on Gitea Actions because Gitea reads
.gitea/workflows/, not .github/workflows/ (the GitHub Actions paths are
ignored). Issue #206 identified this as one of three bugs blocking the
runtime versioning pipeline.

Changes:
- Add .gitea/workflows/publish-runtime.yml (canonical Gitea version)
  - Drop environment: + id-token: write (Gitea has no OIDC/OAuth)
  - Replace pypa/gh-action-pypi-publish with twine upload using PYPI_TOKEN secret
  - Replace github.ref_name with ${GITHUB_REF#refs/tags/} (Gitea exposes github.ref)
  - Drop merge_group trigger (Gitea has no merge queue)
  - Drop staging branch trigger (staging branch does not exist)
  - Cascade step unchanged (DISPATCH_TOKEN + Gitea API already compatible)
- Add DEPRECATED notice to .github/workflows/publish-runtime.yml

Required secrets (repo Settings → Actions → Variables and Secrets):
  PYPI_TOKEN: PyPI API token for molecule-ai-workspace-runtime
  DISPATCH_TOKEN: Gitea PAT with write:repo on template repos (already used)

Closes #206 (publish-runtime Gitea port).
2026-05-10 01:26:13 +00:00
9b53b70b48 Merge pull request 'test(canvas): add component tests for ThemeToggle and BundleDropZone' (#210) from test/canvas-component-tests-2 into main 2026-05-10 01:22:25 +00:00
Molecule AI Core Platform Lead
85a8ab428c Merge remote-tracking branch 'origin/main' into trig-210 2026-05-10 01:22:17 +00:00
Molecule AI Core Platform Lead
124e1a6f04 trigger 2026-05-10 01:22:03 +00:00
Molecule AI Core Platform Lead
02c2226e46 Merge remote-tracking branch 'origin/main' into trig-210 2026-05-10 01:22:02 +00:00
9452123d78 Merge pull request 'feat(workspace-server): pre-restart A2A drain signal (core#125)' (#207) from feat/a2a-pre-restart-drain-125 into main 2026-05-10 01:18:51 +00:00
Molecule AI Core Platform Lead
422d621e3c Merge remote-tracking branch 'origin/main' into trig-207 2026-05-10 01:18:43 +00:00
Molecule AI Core Platform Lead
27a94f0b79 trigger 2026-05-10 01:18:30 +00:00
a3e437b43f Merge pull request 'fix(ci): replace dorny/paths-filter with shell-based git diff (Gitea Actions compatibility)' (#208) from infra/fix-harness-replays-paths-filter-and-failure into main 2026-05-10 01:18:25 +00:00
Molecule AI Core Platform Lead
9c35057c98 trigger 2026-05-10 01:18:14 +00:00
ad1a4a2d49 test(canvas): add component tests for ThemeToggle and BundleDropZone
- ThemeToggle.test.tsx (13 tests): renders radiogroup with 3 options,
  aria radiogroup/radio semantics, aria-checked per option, setTheme
  calls on click, custom className prop
- BundleDropZone.test.tsx (11 tests): hidden file input + keyboard
  accessibility (WCAG 2.1.1), drag-over state, import success/error
  toast, auto-clear timeouts (3s error, 4s success), importing
  status indicator, file input reset on re-select

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:18:10 +00:00
d0126662c7 docs: cycle report 2026-05-10
Cycle summary:
- Assigned: core#125 (feat: preserve in-flight A2A messages across restart)
- Implemented: Phase 1 of #125 — pre-restart drain signal
- Opened: PR #207
- Reviewed: PR #140 (static-token fallback, approved)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:15:07 +00:00
796201e09f fix(ci): replace dorny/paths-filter with shell-based git diff (Gitea Actions compatibility)
dorny/paths-filter is GitHub-Actions-only and does not work correctly on
Gitea Actions — it silently returns no file changes regardless of what
files were modified, causing the harness-replays workflow to silently
skip on Gitea even when workspace-server/** or canvas/** files change.

Verified: zero harness-replays statuses on PR #188 and #168 (both changed
workspace-server files) vs GitHub Actions where the same workflow
correctly detects changes.

Replace with a shell-based approach that uses:
- github.event.pull_request.base.sha  (Gitea + GitHub: merge-base for PRs)
- github.event.before                (Gitea + GitHub: previous tip for pushes)
- git diff --name-only <BASE> github.sha (portable git, works on both platforms)

Also adds detect-changes.debug output so future no-op passes show WHY
the workflow decided to skip, and the first real run on Gitea will
confirm the diff detection is working.

Closes #141 (followup: root-cause fix still TBD — failure logs
inaccessible via Gitea Actions API).
2026-05-10 01:11:45 +00:00