chore: reconcile main → staging post-suspension divergence (Task #165 followup) (#48)
Some checks failed
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 10s
Block internal-flavored paths / Block forbidden paths (push) Successful in 12s
CI / Detect changes (push) Successful in 16s
E2E API Smoke Test / detect-changes (push) Successful in 21s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 21s
Handlers Postgres Integration / detect-changes (push) Successful in 25s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 29s
Harness Replays / detect-changes (push) Successful in 30s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 25s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 19s
CI / Shellcheck (E2E scripts) (push) Successful in 23s
SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push) Successful in 1m3s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 53s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 2m9s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 2m9s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 2m10s
Handlers Postgres Integration / Handlers Postgres Integration (push) Failing after 1m24s
Harness Replays / Harness Replays (push) Failing after 52s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 2m31s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 3m50s
CI / Canvas (Next.js) (push) Successful in 6m19s
CI / Canvas Deploy Reminder (push) Has been skipped
publish-workspace-server-image / build-and-push (push) Successful in 7m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5m38s
CI / Platform (Go) (push) Failing after 7m32s
CI / Python Lint & Test (push) Successful in 7m30s
Some checks failed
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 10s
Block internal-flavored paths / Block forbidden paths (push) Successful in 12s
CI / Detect changes (push) Successful in 16s
E2E API Smoke Test / detect-changes (push) Successful in 21s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 21s
Handlers Postgres Integration / detect-changes (push) Successful in 25s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 29s
Harness Replays / detect-changes (push) Successful in 30s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 25s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 19s
CI / Shellcheck (E2E scripts) (push) Successful in 23s
SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push) Successful in 1m3s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 53s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 2m9s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 2m9s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 2m10s
Handlers Postgres Integration / Handlers Postgres Integration (push) Failing after 1m24s
Harness Replays / Harness Replays (push) Failing after 52s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 2m31s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 3m50s
CI / Canvas (Next.js) (push) Successful in 6m19s
CI / Canvas Deploy Reminder (push) Has been skipped
publish-workspace-server-image / build-and-push (push) Successful in 7m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5m38s
CI / Platform (Go) (push) Failing after 7m32s
CI / Python Lint & Test (push) Successful in 7m30s
This commit is contained in:
commit
e3904ebb42
4
.github/workflows/block-internal-paths.yml
vendored
4
.github/workflows/block-internal-paths.yml
vendored
@ -1,7 +1,7 @@
|
|||||||
name: Block internal-flavored paths
|
name: Block internal-flavored paths
|
||||||
|
|
||||||
# Hard CI gate. Internal content (positioning, competitive briefs, sales
|
# Hard CI gate. Internal content (positioning, competitive briefs, sales
|
||||||
# playbooks, PMM/press drip, draft campaigns) lives in Molecule-AI/internal —
|
# playbooks, PMM/press drip, draft campaigns) lives in molecule-ai/internal —
|
||||||
# this public monorepo must never re-acquire those paths. CEO directive
|
# this public monorepo must never re-acquire those paths. CEO directive
|
||||||
# 2026-04-23 after a fleet-wide audit found 79 internal files leaked here.
|
# 2026-04-23 after a fleet-wide audit found 79 internal files leaked here.
|
||||||
#
|
#
|
||||||
@ -135,7 +135,7 @@ jobs:
|
|||||||
echo "::error::Forbidden internal-flavored paths detected:"
|
echo "::error::Forbidden internal-flavored paths detected:"
|
||||||
printf "$OFFENDING"
|
printf "$OFFENDING"
|
||||||
echo ""
|
echo ""
|
||||||
echo "These paths belong in Molecule-AI/internal, not this public repo."
|
echo "These paths belong in molecule-ai/internal, not this public repo."
|
||||||
echo "See docs/internal-content-policy.md for canonical locations."
|
echo "See docs/internal-content-policy.md for canonical locations."
|
||||||
echo ""
|
echo ""
|
||||||
echo "If your file is genuinely public-facing (e.g. a blog post"
|
echo "If your file is genuinely public-facing (e.g. a blog post"
|
||||||
|
|||||||
2
.github/workflows/ci.yml
vendored
2
.github/workflows/ci.yml
vendored
@ -165,7 +165,7 @@ jobs:
|
|||||||
# Strip the package-import prefix so we can match .coverage-allowlist.txt
|
# Strip the package-import prefix so we can match .coverage-allowlist.txt
|
||||||
# entries written as paths relative to workspace-server/.
|
# entries written as paths relative to workspace-server/.
|
||||||
# Handle both module paths: platform/workspace-server/... and platform/...
|
# Handle both module paths: platform/workspace-server/... and platform/...
|
||||||
rel=$(echo "$file" | sed 's|^github.com/Molecule-AI/molecule-monorepo/platform/workspace-server/||; s|^github.com/Molecule-AI/molecule-monorepo/platform/||')
|
rel=$(echo "$file" | sed 's|^github.com/molecule-ai/molecule-monorepo/platform/workspace-server/||; s|^github.com/molecule-ai/molecule-monorepo/platform/||')
|
||||||
|
|
||||||
if echo "$ALLOWLIST" | grep -qxF "$rel"; then
|
if echo "$ALLOWLIST" | grep -qxF "$rel"; then
|
||||||
echo "::warning file=workspace-server/$rel::Critical file at ${pct}% coverage (allowlisted, #1823) — fix before expiry."
|
echo "::warning file=workspace-server/$rel::Critical file at ${pct}% coverage (allowlisted, #1823) — fix before expiry."
|
||||||
|
|||||||
3
.github/workflows/codeql.yml
vendored
3
.github/workflows/codeql.yml
vendored
@ -43,6 +43,9 @@ permissions:
|
|||||||
jobs:
|
jobs:
|
||||||
analyze:
|
analyze:
|
||||||
name: Analyze (${{ matrix.language }})
|
name: Analyze (${{ matrix.language }})
|
||||||
|
# CodeQL set to advisory (non-blocking) on Gitea Actions — Hongming dec'''n 2026-05-07 (#156).
|
||||||
|
# Findings still emit as SARIF artifacts; failing CodeQL run does not block PR merge.
|
||||||
|
continue-on-error: true
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
timeout-minutes: 45
|
timeout-minutes: 45
|
||||||
|
|
||||||
|
|||||||
2
.github/workflows/pr-guards.yml
vendored
2
.github/workflows/pr-guards.yml
vendored
@ -19,4 +19,4 @@ permissions:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
disable-auto-merge-on-push:
|
disable-auto-merge-on-push:
|
||||||
uses: Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
|
uses: molecule-ai/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
|
||||||
|
|||||||
4
.github/workflows/publish-runtime.yml
vendored
4
.github/workflows/publish-runtime.yml
vendored
@ -25,7 +25,7 @@ name: publish-runtime
|
|||||||
# 3. Publishes to PyPI via the PyPA Trusted Publisher action (OIDC).
|
# 3. Publishes to PyPI via the PyPA Trusted Publisher action (OIDC).
|
||||||
# No static API token is stored — PyPI verifies the workflow's
|
# No static API token is stored — PyPI verifies the workflow's
|
||||||
# OIDC claim against the trusted-publisher config registered for
|
# OIDC claim against the trusted-publisher config registered for
|
||||||
# molecule-ai-workspace-runtime (Molecule-AI/molecule-core,
|
# molecule-ai-workspace-runtime (molecule-ai/molecule-core,
|
||||||
# publish-runtime.yml, environment pypi-publish).
|
# publish-runtime.yml, environment pypi-publish).
|
||||||
#
|
#
|
||||||
# After publish: the 8 template repos pick up the new version on their
|
# After publish: the 8 template repos pick up the new version on their
|
||||||
@ -166,7 +166,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Publish to PyPI (Trusted Publisher / OIDC)
|
- name: Publish to PyPI (Trusted Publisher / OIDC)
|
||||||
# PyPI side is configured: project molecule-ai-workspace-runtime →
|
# PyPI side is configured: project molecule-ai-workspace-runtime →
|
||||||
# publisher Molecule-AI/molecule-core, workflow publish-runtime.yml,
|
# publisher molecule-ai/molecule-core, workflow publish-runtime.yml,
|
||||||
# environment pypi-publish. The action mints a short-lived OIDC
|
# environment pypi-publish. The action mints a short-lived OIDC
|
||||||
# token and exchanges it for a PyPI upload credential — no static
|
# token and exchanges it for a PyPI upload credential — no static
|
||||||
# API token in this repo's secrets.
|
# API token in this repo's secrets.
|
||||||
|
|||||||
229
.github/workflows/publish-workspace-server-image.yml
vendored
229
.github/workflows/publish-workspace-server-image.yml
vendored
@ -37,6 +37,7 @@ on:
|
|||||||
- 'workspace-server/**'
|
- 'workspace-server/**'
|
||||||
- 'canvas/**'
|
- 'canvas/**'
|
||||||
- 'manifest.json'
|
- 'manifest.json'
|
||||||
|
- 'scripts/**'
|
||||||
- '.github/workflows/publish-workspace-server-image.yml'
|
- '.github/workflows/publish-workspace-server-image.yml'
|
||||||
workflow_dispatch:
|
workflow_dispatch:
|
||||||
|
|
||||||
@ -74,33 +75,87 @@ jobs:
|
|||||||
# plugin was dropped + workspace-server/Dockerfile no longer
|
# plugin was dropped + workspace-server/Dockerfile no longer
|
||||||
# COPYs it.
|
# COPYs it.
|
||||||
|
|
||||||
- name: Configure AWS credentials for ECR
|
# ECR auth + buildx setup are now inline in each build step
|
||||||
# GHCR was the pre-suspension target; the molecule-ai org on
|
# below (Task #173, 2026-05-07).
|
||||||
# GitHub got swept 2026-05-06 and ghcr.io/molecule-ai/* is no
|
#
|
||||||
# longer reachable. Post-suspension target is the operator's
|
# Why moved inline: aws-actions/configure-aws-credentials@v4 +
|
||||||
# ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
|
# aws-actions/amazon-ecr-login@v2 + docker/setup-buildx-action
|
||||||
# molecule-ai/*), which already hosts platform-tenant +
|
# all left auth state in places that the actual `docker push`
|
||||||
# workspace-template-* + runner-base images. AWS creds come
|
# couldn't see on Gitea Actions:
|
||||||
# from the AWS_ACCESS_KEY_ID/SECRET secrets bound to the
|
# - The actions wrote to a step-scoped DOCKER_CONFIG path
|
||||||
# molecule-cp IAM user. Closes #161.
|
# that didn't survive into subsequent shell steps.
|
||||||
uses: aws-actions/configure-aws-credentials@v4
|
# - Buildx couldn't bridge the runner container ↔
|
||||||
with:
|
# operator-host docker daemon auth gap (401 on the
|
||||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
# docker-container driver, "no basic auth credentials"
|
||||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
# with the action-driven login).
|
||||||
aws-region: us-east-2
|
#
|
||||||
|
# Doing AWS+ECR auth inline (`aws ecr get-login-password |
|
||||||
- name: Log in to ECR
|
# docker login`) in the same shell step as `docker build` +
|
||||||
id: ecr-login
|
# `docker push` is the operator-host manual approach, mapped
|
||||||
uses: aws-actions/amazon-ecr-login@v2
|
# 1:1 into CI. Auth state is guaranteed to live in the env that
|
||||||
|
# `docker push` actually runs from.
|
||||||
- name: Set up Docker Buildx
|
#
|
||||||
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
|
# Post-suspension target is the operator's ECR org
|
||||||
|
# (153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*),
|
||||||
|
# which already hosts platform-tenant + workspace-template-* +
|
||||||
|
# runner-base images. AWS creds come from the
|
||||||
|
# AWS_ACCESS_KEY_ID/SECRET secrets bound to the molecule-cp
|
||||||
|
# IAM user. Closes #161.
|
||||||
|
|
||||||
- name: Compute tags
|
- name: Compute tags
|
||||||
id: tags
|
id: tags
|
||||||
run: |
|
run: |
|
||||||
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
|
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
|
||||||
|
|
||||||
|
# Pre-clone manifest deps before docker build (Task #173 fix).
|
||||||
|
#
|
||||||
|
# Why pre-clone: post-2026-05-06, every workspace-template-* repo on
|
||||||
|
# Gitea (codex, crewai, deepagents, gemini-cli, langgraph) plus all
|
||||||
|
# 7 org-template-* repos are private. The pre-fix Dockerfile.tenant
|
||||||
|
# ran `git clone` inside an in-image stage, which had no auth path
|
||||||
|
# — every CI build failed with "fatal: could not read Username for
|
||||||
|
# https://git.moleculesai.app". For weeks, every workspace-server
|
||||||
|
# rebuild required a manual operator-host push. Now we clone in the
|
||||||
|
# trusted CI context (where AUTO_SYNC_TOKEN is naturally available)
|
||||||
|
# and Dockerfile.tenant just COPYs from .tenant-bundle-deps/.
|
||||||
|
#
|
||||||
|
# Token shape: AUTO_SYNC_TOKEN is the devops-engineer persona PAT
|
||||||
|
# (see /etc/molecule-bootstrap/agent-secrets.env). Per saved memory
|
||||||
|
# `feedback_per_agent_gitea_identity_default`, every CI surface uses
|
||||||
|
# a per-persona token, never the founder PAT. clone-manifest.sh
|
||||||
|
# embeds it as basic-auth (oauth2:<token>) for the duration of the
|
||||||
|
# clones, then strips .git directories — the token never enters
|
||||||
|
# the resulting image.
|
||||||
|
#
|
||||||
|
# Idempotent: if a re-run finds populated dirs, clone-manifest.sh
|
||||||
|
# skips them; safe to retrigger via path-filter or workflow_dispatch.
|
||||||
|
- name: Pre-clone manifest deps
|
||||||
|
env:
|
||||||
|
MOLECULE_GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
|
||||||
|
run: |
|
||||||
|
set -euo pipefail
|
||||||
|
if [ -z "${MOLECULE_GITEA_TOKEN}" ]; then
|
||||||
|
echo "::error::AUTO_SYNC_TOKEN secret is empty — register the devops-engineer persona PAT in repo Actions secrets"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
mkdir -p .tenant-bundle-deps
|
||||||
|
bash scripts/clone-manifest.sh \
|
||||||
|
manifest.json \
|
||||||
|
.tenant-bundle-deps/workspace-configs-templates \
|
||||||
|
.tenant-bundle-deps/org-templates \
|
||||||
|
.tenant-bundle-deps/plugins
|
||||||
|
# Sanity-check counts so a silent partial clone fails fast
|
||||||
|
# instead of producing a half-empty image.
|
||||||
|
ws_count=$(find .tenant-bundle-deps/workspace-configs-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
|
||||||
|
org_count=$(find .tenant-bundle-deps/org-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
|
||||||
|
plugins_count=$(find .tenant-bundle-deps/plugins -mindepth 1 -maxdepth 1 -type d | wc -l)
|
||||||
|
echo "Cloned: ws=$ws_count org=$org_count plugins=$plugins_count"
|
||||||
|
# Counts are derived from manifest.json (9 ws / 7 org / 21
|
||||||
|
# plugins as of 2026-05-07). If manifest.json grows but the
|
||||||
|
# clone step regresses silently, the find above caps at the
|
||||||
|
# actual disk state — but clone-manifest.sh's own EXPECTED vs
|
||||||
|
# CLONED check (line ~95) is the authoritative fail-fast.
|
||||||
|
|
||||||
# Canary-gated release flow:
|
# Canary-gated release flow:
|
||||||
# - This step always publishes :staging-<sha> + :staging-latest.
|
# - This step always publishes :staging-<sha> + :staging-latest.
|
||||||
# - On staging push, staging-CP picks up :staging-latest immediately
|
# - On staging push, staging-CP picks up :staging-latest immediately
|
||||||
@ -126,58 +181,82 @@ jobs:
|
|||||||
# were running pre-RFC code. Adding the staging trigger above closes
|
# were running pre-RFC code. Adding the staging trigger above closes
|
||||||
# that gap. Earlier 2026-04-24 incident: a static :staging-<sha> pin
|
# that gap. Earlier 2026-04-24 incident: a static :staging-<sha> pin
|
||||||
# drifted 10 days behind staging — same class of bug, different
|
# drifted 10 days behind staging — same class of bug, different
|
||||||
# mechanism.
|
# mechanism. ECR repo molecule-ai/platform created 2026-05-07.
|
||||||
- name: Build & push platform image to GHCR (staging-<sha> + staging-latest)
|
# Build + push platform image with plain `docker` (no buildx).
|
||||||
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
|
# GIT_SHA bakes into the Go binary via -ldflags so /buildinfo
|
||||||
with:
|
# returns it at runtime — see Dockerfile + buildinfo/buildinfo.go.
|
||||||
context: .
|
# The OCI revision label below carries the same value for registry
|
||||||
file: ./workspace-server/Dockerfile
|
# tooling; the duplication is intentional.
|
||||||
platforms: linux/amd64
|
- name: Build & push platform image to ECR (staging-<sha> + staging-latest)
|
||||||
push: true
|
env:
|
||||||
tags: |
|
IMAGE_NAME: ${{ env.IMAGE_NAME }}
|
||||||
${{ env.IMAGE_NAME }}:staging-${{ steps.tags.outputs.sha }}
|
TAG_SHA: staging-${{ steps.tags.outputs.sha }}
|
||||||
${{ env.IMAGE_NAME }}:staging-latest
|
TAG_LATEST: staging-latest
|
||||||
cache-from: type=gha
|
GIT_SHA: ${{ github.sha }}
|
||||||
cache-to: type=gha,mode=max
|
REPO: ${{ github.repository }}
|
||||||
# GIT_SHA bakes into the Go binary via -ldflags so /buildinfo
|
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||||
# returns it at runtime — see Dockerfile + buildinfo/buildinfo.go.
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||||
# This is the same value as the OCI revision label below; passing
|
AWS_DEFAULT_REGION: us-east-2
|
||||||
# it twice is intentional, the OCI label is for registry tooling
|
run: |
|
||||||
# while /buildinfo is for the redeploy verification step.
|
set -euo pipefail
|
||||||
build-args: |
|
# ECR auth in-step so config.json is populated in the same
|
||||||
GIT_SHA=${{ github.sha }}
|
# shell env that runs `docker push`. ECR get-login-password
|
||||||
labels: |
|
# tokens last 12h, plenty for a single-step build+push.
|
||||||
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
ECR_REGISTRY="${IMAGE_NAME%%/*}"
|
||||||
org.opencontainers.image.revision=${{ github.sha }}
|
aws ecr get-login-password --region us-east-2 | \
|
||||||
org.opencontainers.image.description=Molecule AI platform (Go API server) — pending canary verify
|
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
|
||||||
|
docker build \
|
||||||
|
--file ./workspace-server/Dockerfile \
|
||||||
|
--build-arg GIT_SHA="${GIT_SHA}" \
|
||||||
|
--label "org.opencontainers.image.source=https://github.com/${REPO}" \
|
||||||
|
--label "org.opencontainers.image.revision=${GIT_SHA}" \
|
||||||
|
--label "org.opencontainers.image.description=Molecule AI platform (Go API server) — pending canary verify" \
|
||||||
|
--tag "${IMAGE_NAME}:${TAG_SHA}" \
|
||||||
|
--tag "${IMAGE_NAME}:${TAG_LATEST}" \
|
||||||
|
.
|
||||||
|
docker push "${IMAGE_NAME}:${TAG_SHA}"
|
||||||
|
docker push "${IMAGE_NAME}:${TAG_LATEST}"
|
||||||
|
|
||||||
|
# Canvas uses same-origin fetches. The tenant Go platform
|
||||||
|
# reverse-proxies /cp/* to the SaaS CP via its CP_UPSTREAM_URL
|
||||||
|
# env; the tenant's /canvas/viewport, /approvals/pending,
|
||||||
|
# /org/templates etc. live on the tenant platform itself.
|
||||||
|
# Both legs share one origin (the tenant subdomain) so
|
||||||
|
# PLATFORM_URL="" forces canvas to fetch paths as relative,
|
||||||
|
# which land same-origin.
|
||||||
|
#
|
||||||
|
# Self-hosted / private-label deployments override this at
|
||||||
|
# build time with a specific backend (e.g. local dev:
|
||||||
|
# NEXT_PUBLIC_PLATFORM_URL=http://localhost:8080).
|
||||||
|
- name: Build & push tenant image to ECR (staging-<sha> + staging-latest)
|
||||||
|
env:
|
||||||
|
TENANT_IMAGE_NAME: ${{ env.TENANT_IMAGE_NAME }}
|
||||||
|
TAG_SHA: staging-${{ steps.tags.outputs.sha }}
|
||||||
|
TAG_LATEST: staging-latest
|
||||||
|
GIT_SHA: ${{ github.sha }}
|
||||||
|
REPO: ${{ github.repository }}
|
||||||
|
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||||
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||||
|
AWS_DEFAULT_REGION: us-east-2
|
||||||
|
run: |
|
||||||
|
set -euo pipefail
|
||||||
|
# Re-login: the platform-image step's docker login wrote to
|
||||||
|
# the same config.json, so this is technically redundant — but
|
||||||
|
# making each push step self-contained keeps the workflow
|
||||||
|
# robust to step reordering / future extraction.
|
||||||
|
ECR_REGISTRY="${TENANT_IMAGE_NAME%%/*}"
|
||||||
|
aws ecr get-login-password --region us-east-2 | \
|
||||||
|
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
|
||||||
|
docker build \
|
||||||
|
--file ./workspace-server/Dockerfile.tenant \
|
||||||
|
--build-arg NEXT_PUBLIC_PLATFORM_URL= \
|
||||||
|
--build-arg GIT_SHA="${GIT_SHA}" \
|
||||||
|
--label "org.opencontainers.image.source=https://github.com/${REPO}" \
|
||||||
|
--label "org.opencontainers.image.revision=${GIT_SHA}" \
|
||||||
|
--label "org.opencontainers.image.description=Molecule AI tenant platform + canvas — pending canary verify" \
|
||||||
|
--tag "${TENANT_IMAGE_NAME}:${TAG_SHA}" \
|
||||||
|
--tag "${TENANT_IMAGE_NAME}:${TAG_LATEST}" \
|
||||||
|
.
|
||||||
|
docker push "${TENANT_IMAGE_NAME}:${TAG_SHA}"
|
||||||
|
docker push "${TENANT_IMAGE_NAME}:${TAG_LATEST}"
|
||||||
|
|
||||||
- name: Build & push tenant image to GHCR (staging-<sha> + staging-latest)
|
|
||||||
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
|
|
||||||
with:
|
|
||||||
context: .
|
|
||||||
file: ./workspace-server/Dockerfile.tenant
|
|
||||||
platforms: linux/amd64
|
|
||||||
push: true
|
|
||||||
tags: |
|
|
||||||
${{ env.TENANT_IMAGE_NAME }}:staging-${{ steps.tags.outputs.sha }}
|
|
||||||
${{ env.TENANT_IMAGE_NAME }}:staging-latest
|
|
||||||
cache-from: type=gha
|
|
||||||
cache-to: type=gha,mode=max
|
|
||||||
# Canvas uses same-origin fetches. The tenant Go platform
|
|
||||||
# reverse-proxies /cp/* to the SaaS CP via its CP_UPSTREAM_URL
|
|
||||||
# env; the tenant's /canvas/viewport, /approvals/pending,
|
|
||||||
# /org/templates etc. live on the tenant platform itself.
|
|
||||||
# Both legs share one origin (the tenant subdomain) so
|
|
||||||
# PLATFORM_URL="" forces canvas to fetch paths as relative,
|
|
||||||
# which land same-origin.
|
|
||||||
#
|
|
||||||
# Self-hosted / private-label deployments override this at
|
|
||||||
# build time with a specific backend (e.g. local dev:
|
|
||||||
# NEXT_PUBLIC_PLATFORM_URL=http://localhost:8080).
|
|
||||||
build-args: |
|
|
||||||
NEXT_PUBLIC_PLATFORM_URL=
|
|
||||||
GIT_SHA=${{ github.sha }}
|
|
||||||
labels: |
|
|
||||||
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
|
||||||
org.opencontainers.image.revision=${{ github.sha }}
|
|
||||||
org.opencontainers.image.description=Molecule AI tenant platform + canvas — pending canary verify
|
|
||||||
|
|||||||
@ -9,7 +9,7 @@ name: redeploy-tenants-on-main
|
|||||||
#
|
#
|
||||||
# This workflow closes the gap by calling the control-plane admin
|
# This workflow closes the gap by calling the control-plane admin
|
||||||
# endpoint that performs a canary-first, batched, health-gated rolling
|
# endpoint that performs a canary-first, batched, health-gated rolling
|
||||||
# redeploy across every live tenant. Implemented in Molecule-AI/
|
# redeploy across every live tenant. Implemented in molecule-ai/
|
||||||
# molecule-controlplane as POST /cp/admin/tenants/redeploy-fleet
|
# molecule-controlplane as POST /cp/admin/tenants/redeploy-fleet
|
||||||
# (feat/tenant-auto-redeploy, landing alongside this workflow).
|
# (feat/tenant-auto-redeploy, landing alongside this workflow).
|
||||||
#
|
#
|
||||||
@ -146,7 +146,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Call CP redeploy-fleet
|
- name: Call CP redeploy-fleet
|
||||||
# CP_ADMIN_API_TOKEN must be set as a repo/org secret on
|
# CP_ADMIN_API_TOKEN must be set as a repo/org secret on
|
||||||
# Molecule-AI/molecule-core, matching the staging/prod CP's
|
# molecule-ai/molecule-core, matching the staging/prod CP's
|
||||||
# CP_ADMIN_API_TOKEN env. Stored in Railway, mirrored to this
|
# CP_ADMIN_API_TOKEN env. Stored in Railway, mirrored to this
|
||||||
# repo's secrets for CI.
|
# repo's secrets for CI.
|
||||||
env:
|
env:
|
||||||
|
|||||||
@ -97,7 +97,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Call staging-CP redeploy-fleet
|
- name: Call staging-CP redeploy-fleet
|
||||||
# CP_STAGING_ADMIN_API_TOKEN must be set as a repo/org secret
|
# CP_STAGING_ADMIN_API_TOKEN must be set as a repo/org secret
|
||||||
# on Molecule-AI/molecule-core, matching staging-CP's
|
# on molecule-ai/molecule-core, matching staging-CP's
|
||||||
# CP_ADMIN_API_TOKEN env var (visible in Railway controlplane
|
# CP_ADMIN_API_TOKEN env var (visible in Railway controlplane
|
||||||
# / staging environment). Stored separately from the prod
|
# / staging environment). Stored separately from the prod
|
||||||
# CP_ADMIN_API_TOKEN so a leak of one doesn't auth the other.
|
# CP_ADMIN_API_TOKEN so a leak of one doesn't auth the other.
|
||||||
|
|||||||
2
.github/workflows/secret-scan.yml
vendored
2
.github/workflows/secret-scan.yml
vendored
@ -12,7 +12,7 @@ name: Secret scan
|
|||||||
#
|
#
|
||||||
# jobs:
|
# jobs:
|
||||||
# secret-scan:
|
# secret-scan:
|
||||||
# uses: Molecule-AI/molecule-core/.github/workflows/secret-scan.yml@staging
|
# uses: molecule-ai/molecule-core/.github/workflows/secret-scan.yml@staging
|
||||||
#
|
#
|
||||||
# Pin to @staging not @main — staging is the active default branch,
|
# Pin to @staging not @main — staging is the active default branch,
|
||||||
# main lags via the staging-promotion workflow. Updates ride along
|
# main lags via the staging-promotion workflow. Updates ride along
|
||||||
|
|||||||
7
.gitignore
vendored
7
.gitignore
vendored
@ -131,6 +131,13 @@ backups/
|
|||||||
# Cloned by publish-workspace-server-image.yml so the Dockerfile's
|
# Cloned by publish-workspace-server-image.yml so the Dockerfile's
|
||||||
# replace-directive path resolves. Lives in its own repo.
|
# replace-directive path resolves. Lives in its own repo.
|
||||||
/molecule-ai-plugin-github-app-auth/
|
/molecule-ai-plugin-github-app-auth/
|
||||||
|
# Tenant-image build context — populated by the workflow's
|
||||||
|
# "Pre-clone manifest deps" step. Mirrors the public manifest, holds the
|
||||||
|
# same content as the three /<>/ dirs above but namespaced under one
|
||||||
|
# parent so the Docker build context is a single COPY-friendly tree.
|
||||||
|
# Each entry is a transient working-dir, never source-of-truth, never
|
||||||
|
# committed.
|
||||||
|
/.tenant-bundle-deps/
|
||||||
|
|
||||||
# Internal-flavored content lives in Molecule-AI/internal — NEVER in this
|
# Internal-flavored content lives in Molecule-AI/internal — NEVER in this
|
||||||
# public monorepo. Migrated 2026-04-23 (CEO directive). The CI workflow
|
# public monorepo. Migrated 2026-04-23 (CEO directive). The CI workflow
|
||||||
|
|||||||
@ -3,6 +3,7 @@ import { cookies, headers } from "next/headers";
|
|||||||
import "./globals.css";
|
import "./globals.css";
|
||||||
import { AuthGate } from "@/components/AuthGate";
|
import { AuthGate } from "@/components/AuthGate";
|
||||||
import { CookieConsent } from "@/components/CookieConsent";
|
import { CookieConsent } from "@/components/CookieConsent";
|
||||||
|
import { PurchaseSuccessModal } from "@/components/PurchaseSuccessModal";
|
||||||
import { ThemeProvider } from "@/lib/theme-provider";
|
import { ThemeProvider } from "@/lib/theme-provider";
|
||||||
import {
|
import {
|
||||||
THEME_COOKIE,
|
THEME_COOKIE,
|
||||||
@ -86,6 +87,12 @@ export default async function RootLayout({
|
|||||||
vercel preview URL, apex) pass through unchanged. */}
|
vercel preview URL, apex) pass through unchanged. */}
|
||||||
<AuthGate>{children}</AuthGate>
|
<AuthGate>{children}</AuthGate>
|
||||||
<CookieConsent />
|
<CookieConsent />
|
||||||
|
{/* Demo Mock #1: post-purchase success toast. Mounted at the
|
||||||
|
layout level so it persists across page state transitions
|
||||||
|
(loading → hydrated → error) without being unmounted and
|
||||||
|
losing its open-state. Reads ?purchase_success=1 from the
|
||||||
|
URL on first paint, then strips the param. */}
|
||||||
|
<PurchaseSuccessModal />
|
||||||
</ThemeProvider>
|
</ThemeProvider>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
175
canvas/src/components/PurchaseSuccessModal.tsx
Normal file
175
canvas/src/components/PurchaseSuccessModal.tsx
Normal file
@ -0,0 +1,175 @@
|
|||||||
|
"use client";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* PurchaseSuccessModal — demo-only post-purchase confirmation.
|
||||||
|
*
|
||||||
|
* Mounted on the canvas root (`app/page.tsx`). On first paint it inspects
|
||||||
|
* `?purchase_success=1[&item=<name>]` on the current URL. If present, it
|
||||||
|
* renders a centred modal styled after `ConfirmDialog`, schedules a 5s
|
||||||
|
* auto-dismiss, and rewrites the URL via `history.replaceState` to drop
|
||||||
|
* the params so a refresh after dismiss does NOT re-show the modal.
|
||||||
|
*
|
||||||
|
* Mock for the funding demo — there is no real billing surface behind
|
||||||
|
* this. The marketplace "Purchase" button on the landing page redirects
|
||||||
|
* here with the params; this modal is the only thing the user sees of
|
||||||
|
* the "transaction".
|
||||||
|
*
|
||||||
|
* Styling matches the warm-paper @theme tokens (surface-sunken / line /
|
||||||
|
* ink / good) so it tracks light + dark without per-mode overrides.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { useEffect, useRef, useState } from "react";
|
||||||
|
import { createPortal } from "react-dom";
|
||||||
|
|
||||||
|
const AUTO_DISMISS_MS = 5000;
|
||||||
|
|
||||||
|
function readPurchaseParams(): { open: boolean; item: string | null } {
|
||||||
|
if (typeof window === "undefined") return { open: false, item: null };
|
||||||
|
const sp = new URLSearchParams(window.location.search);
|
||||||
|
const flag = sp.get("purchase_success");
|
||||||
|
if (flag !== "1" && flag !== "true") return { open: false, item: null };
|
||||||
|
return { open: true, item: sp.get("item") };
|
||||||
|
}
|
||||||
|
|
||||||
|
function stripPurchaseParams() {
|
||||||
|
if (typeof window === "undefined") return;
|
||||||
|
const url = new URL(window.location.href);
|
||||||
|
url.searchParams.delete("purchase_success");
|
||||||
|
url.searchParams.delete("item");
|
||||||
|
// replaceState (not pushState) so back-button doesn't return to the
|
||||||
|
// pre-strip URL and re-trigger the modal.
|
||||||
|
window.history.replaceState({}, "", url.toString());
|
||||||
|
}
|
||||||
|
|
||||||
|
export function PurchaseSuccessModal() {
|
||||||
|
const [open, setOpen] = useState(false);
|
||||||
|
const [item, setItem] = useState<string | null>(null);
|
||||||
|
const [mounted, setMounted] = useState(false);
|
||||||
|
const dialogRef = useRef<HTMLDivElement>(null);
|
||||||
|
|
||||||
|
// Read the URL params once on mount. We don't subscribe to navigation —
|
||||||
|
// this modal is a one-shot for the demo redirect, not a persistent
|
||||||
|
// listener.
|
||||||
|
useEffect(() => {
|
||||||
|
setMounted(true);
|
||||||
|
const { open: shouldOpen, item: itemName } = readPurchaseParams();
|
||||||
|
if (shouldOpen) {
|
||||||
|
setOpen(true);
|
||||||
|
setItem(itemName);
|
||||||
|
// Clean the URL immediately so a refresh after the modal is closed
|
||||||
|
// (or even while it's still open) does NOT re-trigger it.
|
||||||
|
stripPurchaseParams();
|
||||||
|
}
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
// Auto-dismiss timer + Escape handler.
|
||||||
|
useEffect(() => {
|
||||||
|
if (!open) return;
|
||||||
|
const t = window.setTimeout(() => setOpen(false), AUTO_DISMISS_MS);
|
||||||
|
const onKey = (e: KeyboardEvent) => {
|
||||||
|
if (e.key === "Escape") setOpen(false);
|
||||||
|
};
|
||||||
|
window.addEventListener("keydown", onKey);
|
||||||
|
// Focus the close button so keyboard users land on it after redirect.
|
||||||
|
const raf = requestAnimationFrame(() => {
|
||||||
|
dialogRef.current?.querySelector<HTMLButtonElement>("button")?.focus();
|
||||||
|
});
|
||||||
|
return () => {
|
||||||
|
window.clearTimeout(t);
|
||||||
|
window.removeEventListener("keydown", onKey);
|
||||||
|
cancelAnimationFrame(raf);
|
||||||
|
};
|
||||||
|
}, [open]);
|
||||||
|
|
||||||
|
if (!open || !mounted) return null;
|
||||||
|
|
||||||
|
const itemLabel = item ? decodeURIComponent(item) : "Your new agent";
|
||||||
|
|
||||||
|
return createPortal(
|
||||||
|
<div
|
||||||
|
className="fixed inset-0 z-[9999] flex items-center justify-center"
|
||||||
|
data-testid="purchase-success-modal"
|
||||||
|
>
|
||||||
|
{/* Backdrop — click closes, matches ConfirmDialog backdrop. */}
|
||||||
|
<div
|
||||||
|
className="absolute inset-0 bg-black/60 backdrop-blur-sm"
|
||||||
|
onClick={() => setOpen(false)}
|
||||||
|
aria-hidden="true"
|
||||||
|
/>
|
||||||
|
|
||||||
|
<div
|
||||||
|
ref={dialogRef}
|
||||||
|
role="dialog"
|
||||||
|
aria-modal="true"
|
||||||
|
aria-labelledby="purchase-success-title"
|
||||||
|
className="relative bg-surface-sunken border border-line rounded-xl shadow-2xl shadow-black/50 max-w-[420px] w-full mx-4 overflow-hidden"
|
||||||
|
>
|
||||||
|
<div className="px-6 pt-6 pb-4">
|
||||||
|
<div className="flex items-start gap-4">
|
||||||
|
{/* Success glyph — uses --color-good so it tracks the theme.
|
||||||
|
Inline SVG over an emoji so it stays readable + on-brand
|
||||||
|
in both light and dark. */}
|
||||||
|
<div
|
||||||
|
className="flex h-10 w-10 flex-shrink-0 items-center justify-center rounded-full"
|
||||||
|
style={{
|
||||||
|
background:
|
||||||
|
"color-mix(in srgb, var(--color-good) 15%, transparent)",
|
||||||
|
color: "var(--color-good)",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<svg
|
||||||
|
width="22"
|
||||||
|
height="22"
|
||||||
|
viewBox="0 0 24 24"
|
||||||
|
fill="none"
|
||||||
|
aria-hidden="true"
|
||||||
|
>
|
||||||
|
<circle
|
||||||
|
cx="12"
|
||||||
|
cy="12"
|
||||||
|
r="10"
|
||||||
|
stroke="currentColor"
|
||||||
|
strokeWidth="1.5"
|
||||||
|
/>
|
||||||
|
<path
|
||||||
|
d="M7.5 12.5L10.5 15.5L16.5 9.5"
|
||||||
|
stroke="currentColor"
|
||||||
|
strokeWidth="1.8"
|
||||||
|
strokeLinecap="round"
|
||||||
|
strokeLinejoin="round"
|
||||||
|
/>
|
||||||
|
</svg>
|
||||||
|
</div>
|
||||||
|
<div className="flex-1">
|
||||||
|
<h3
|
||||||
|
id="purchase-success-title"
|
||||||
|
className="text-base font-semibold text-ink"
|
||||||
|
>
|
||||||
|
Purchase successful
|
||||||
|
</h3>
|
||||||
|
<p className="mt-1.5 text-[13px] leading-relaxed text-ink-mid">
|
||||||
|
<span className="font-medium text-ink">{itemLabel}</span> has
|
||||||
|
been added to your workspace. Provisioning starts in the
|
||||||
|
background — you can keep working while it spins up.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="flex items-center justify-between gap-3 px-6 py-3 border-t border-line bg-surface/50">
|
||||||
|
<span className="font-mono text-[10.5px] uppercase tracking-[0.12em] text-ink-soft">
|
||||||
|
auto-dismiss · {AUTO_DISMISS_MS / 1000}s
|
||||||
|
</span>
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
onClick={() => setOpen(false)}
|
||||||
|
className="px-3.5 py-1.5 text-[13px] rounded-lg bg-accent hover:bg-accent-strong text-white transition-colors focus:outline-none focus-visible:ring-2 focus-visible:ring-offset-2 focus-visible:ring-offset-surface-sunken focus-visible:ring-accent/60"
|
||||||
|
>
|
||||||
|
Close
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>,
|
||||||
|
document.body,
|
||||||
|
);
|
||||||
|
}
|
||||||
@ -41,6 +41,7 @@
|
|||||||
{"name": "medo-smoke", "repo": "molecule-ai/molecule-ai-org-template-medo-smoke", "ref": "main"},
|
{"name": "medo-smoke", "repo": "molecule-ai/molecule-ai-org-template-medo-smoke", "ref": "main"},
|
||||||
{"name": "molecule-worker-gemini", "repo": "molecule-ai/molecule-ai-org-template-molecule-worker-gemini", "ref": "main"},
|
{"name": "molecule-worker-gemini", "repo": "molecule-ai/molecule-ai-org-template-molecule-worker-gemini", "ref": "main"},
|
||||||
{"name": "reno-stars", "repo": "molecule-ai/molecule-ai-org-template-reno-stars", "ref": "main"},
|
{"name": "reno-stars", "repo": "molecule-ai/molecule-ai-org-template-reno-stars", "ref": "main"},
|
||||||
{"name": "ux-ab-lab", "repo": "molecule-ai/molecule-ai-org-template-ux-ab-lab", "ref": "main"}
|
{"name": "ux-ab-lab", "repo": "molecule-ai/molecule-ai-org-template-ux-ab-lab", "ref": "main"},
|
||||||
|
{"name": "mock-bigorg", "repo": "molecule-ai/molecule-ai-org-template-mock-bigorg", "ref": "main"}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|||||||
@ -6,6 +6,29 @@
|
|||||||
# ./scripts/clone-manifest.sh <manifest.json> <ws-templates-dir> <org-templates-dir> <plugins-dir>
|
# ./scripts/clone-manifest.sh <manifest.json> <ws-templates-dir> <org-templates-dir> <plugins-dir>
|
||||||
#
|
#
|
||||||
# Requires: git, jq (lighter than python3 — ~2MB vs ~50MB in Alpine)
|
# Requires: git, jq (lighter than python3 — ~2MB vs ~50MB in Alpine)
|
||||||
|
#
|
||||||
|
# Auth (optional):
|
||||||
|
# When MOLECULE_GITEA_TOKEN is set, embed it as the basic-auth password so
|
||||||
|
# private Gitea repos clone successfully. When unset, clone anonymously
|
||||||
|
# (works only for repos that are public on git.moleculesai.app).
|
||||||
|
#
|
||||||
|
# This is the path the publish-workspace-server-image.yml workflow uses:
|
||||||
|
# it injects AUTO_SYNC_TOKEN (devops-engineer persona PAT, repo:read on
|
||||||
|
# the molecule-ai org) so the in-CI pre-clone step succeeds for ALL
|
||||||
|
# manifest entries — including the 5 private workspace-template-* repos
|
||||||
|
# (codex, crewai, deepagents, gemini-cli, langgraph) and all 7
|
||||||
|
# org-template-* repos.
|
||||||
|
#
|
||||||
|
# The token never enters the Docker image: this script runs in the
|
||||||
|
# trusted CI context BEFORE `docker buildx build`, populates
|
||||||
|
# .tenant-bundle-deps/, then `Dockerfile.tenant` COPYs from there with
|
||||||
|
# the .git directories already stripped (see line ~67 below).
|
||||||
|
#
|
||||||
|
# For backward compatibility — and so a fresh clone works without
|
||||||
|
# secrets when (eventually) the workspace-template-* repos flip public —
|
||||||
|
# the unset path remains a plain anonymous HTTPS clone. That path will
|
||||||
|
# FAIL with "could not read Username" on private repos today; CI MUST
|
||||||
|
# set MOLECULE_GITEA_TOKEN.
|
||||||
|
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
@ -45,11 +68,27 @@ clone_category() {
|
|||||||
continue
|
continue
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo " cloning $repo -> $target_dir/$name (ref=$ref)"
|
# Build the clone URL. When MOLECULE_GITEA_TOKEN is set (CI path)
|
||||||
if [ "$ref" = "main" ]; then
|
# embed it as basic-auth so private repos succeed. The username
|
||||||
git clone --depth=1 -q "https://git.moleculesai.app/${repo}.git" "$target_dir/$name"
|
# part ("oauth2") is conventional and ignored by Gitea — only the
|
||||||
|
# token-as-password is verified.
|
||||||
|
#
|
||||||
|
# manifest.json was migrated to lowercase org slugs on
|
||||||
|
# 2026-05-07 (post-suspension reconciliation), so we use $repo
|
||||||
|
# verbatim — no on-the-fly tolower transform needed.
|
||||||
|
if [ -n "${MOLECULE_GITEA_TOKEN:-}" ]; then
|
||||||
|
clone_url="https://oauth2:${MOLECULE_GITEA_TOKEN}@git.moleculesai.app/${repo}.git"
|
||||||
|
display_url="https://oauth2:***@git.moleculesai.app/${repo}.git"
|
||||||
else
|
else
|
||||||
git clone --depth=1 -q --branch "$ref" "https://git.moleculesai.app/${repo}.git" "$target_dir/$name"
|
clone_url="https://git.moleculesai.app/${repo}.git"
|
||||||
|
display_url="$clone_url"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " cloning $display_url -> $target_dir/$name (ref=$ref)"
|
||||||
|
if [ "$ref" = "main" ]; then
|
||||||
|
git clone --depth=1 -q "$clone_url" "$target_dir/$name"
|
||||||
|
else
|
||||||
|
git clone --depth=1 -q --branch "$ref" "$clone_url" "$target_dir/$name"
|
||||||
fi
|
fi
|
||||||
CLONED=$((CLONED + 1))
|
CLONED=$((CLONED + 1))
|
||||||
i=$((i + 1))
|
i=$((i + 1))
|
||||||
|
|||||||
@ -1,7 +1,15 @@
|
|||||||
# Platform-only image (no canvas). Used by publish-platform-image workflow
|
# Platform-only image (no canvas). Used by publish-workspace-server-image
|
||||||
# for GHCR + Fly registry. Tenant image uses Dockerfile.tenant instead.
|
# workflow for ECR. Tenant image uses Dockerfile.tenant instead.
|
||||||
#
|
#
|
||||||
# Build context: repo root.
|
# Templates + plugins are pre-cloned by scripts/clone-manifest.sh (in CI
|
||||||
|
# or on the operator host) into .tenant-bundle-deps/ — same pattern as
|
||||||
|
# Dockerfile.tenant. See that file's header for the full rationale; the
|
||||||
|
# short version is that post-2026-05-06 every workspace-template-* and
|
||||||
|
# org-template-* repo on Gitea is private, so an in-image `git clone`
|
||||||
|
# has no auth path that doesn't leak the Gitea token into a layer.
|
||||||
|
#
|
||||||
|
# Build context: repo root, with `.tenant-bundle-deps/` populated by the
|
||||||
|
# workflow's "Pre-clone manifest deps" step (Task #173).
|
||||||
|
|
||||||
FROM golang:1.25-alpine AS builder
|
FROM golang:1.25-alpine AS builder
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
@ -26,21 +34,18 @@ RUN CGO_ENABLED=0 GOOS=linux go build \
|
|||||||
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
|
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
|
||||||
-o /memory-plugin ./cmd/memory-plugin-postgres
|
-o /memory-plugin ./cmd/memory-plugin-postgres
|
||||||
|
|
||||||
# Clone templates + plugins at build time from manifest.json
|
|
||||||
FROM alpine:3.20 AS templates
|
|
||||||
RUN apk add --no-cache git jq
|
|
||||||
COPY manifest.json /manifest.json
|
|
||||||
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
|
|
||||||
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
|
|
||||||
|
|
||||||
FROM alpine:3.20
|
FROM alpine:3.20
|
||||||
RUN apk add --no-cache ca-certificates git tzdata wget
|
RUN apk add --no-cache ca-certificates git tzdata wget
|
||||||
COPY --from=builder /platform /platform
|
COPY --from=builder /platform /platform
|
||||||
COPY --from=builder /memory-plugin /memory-plugin
|
COPY --from=builder /memory-plugin /memory-plugin
|
||||||
COPY workspace-server/migrations /migrations
|
COPY workspace-server/migrations /migrations
|
||||||
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
|
# Templates + plugins (pre-cloned by scripts/clone-manifest.sh in the
|
||||||
COPY --from=templates /org-templates /org-templates
|
# trusted CI / operator-host context, .git already stripped). The Gitea
|
||||||
COPY --from=templates /plugins /plugins
|
# token used to clone them never enters this image — same shape as
|
||||||
|
# Dockerfile.tenant.
|
||||||
|
COPY .tenant-bundle-deps/workspace-configs-templates /workspace-configs-templates
|
||||||
|
COPY .tenant-bundle-deps/org-templates /org-templates
|
||||||
|
COPY .tenant-bundle-deps/plugins /plugins
|
||||||
# Non-root runtime with Docker socket access for workspace provisioning.
|
# Non-root runtime with Docker socket access for workspace provisioning.
|
||||||
RUN addgroup -g 1000 platform && adduser -u 1000 -G platform -s /bin/sh -D platform
|
RUN addgroup -g 1000 platform && adduser -u 1000 -G platform -s /bin/sh -D platform
|
||||||
EXPOSE 8080
|
EXPOSE 8080
|
||||||
|
|||||||
@ -3,14 +3,34 @@
|
|||||||
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
|
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
|
||||||
# single container. Go reverse-proxies unknown routes to canvas.
|
# single container. Go reverse-proxies unknown routes to canvas.
|
||||||
#
|
#
|
||||||
# Templates are cloned from standalone GitHub repos at build time so the
|
# Templates + plugins are NOT cloned at build time. They are pre-cloned
|
||||||
# monorepo doesn't need to carry them. The repos are public; no auth.
|
# in the trusted CI context (or operator host) by
|
||||||
|
# `scripts/clone-manifest.sh` into `.tenant-bundle-deps/` and COPYed in.
|
||||||
|
# The reason: post-2026-05-06, every workspace-template-* repo on Gitea
|
||||||
|
# (codex, crewai, deepagents, gemini-cli, langgraph) plus all 7
|
||||||
|
# org-template-* repos are private, so the Docker build can't `git clone`
|
||||||
|
# from inside the build context — there's no auth path that doesn't leak
|
||||||
|
# the Gitea token into an image layer. Pre-cloning keeps the token in
|
||||||
|
# the CI environment only; the resulting image carries the cloned trees
|
||||||
|
# with `.git` already stripped (see clone-manifest.sh).
|
||||||
#
|
#
|
||||||
# Build context: repo root.
|
# Build context: repo root, with `.tenant-bundle-deps/` populated by:
|
||||||
|
#
|
||||||
|
# MOLECULE_GITEA_TOKEN=<persona-PAT> scripts/clone-manifest.sh \
|
||||||
|
# manifest.json \
|
||||||
|
# .tenant-bundle-deps/workspace-configs-templates \
|
||||||
|
# .tenant-bundle-deps/org-templates \
|
||||||
|
# .tenant-bundle-deps/plugins
|
||||||
|
#
|
||||||
|
# In CI this happens in publish-workspace-server-image.yml's "Pre-clone
|
||||||
|
# manifest deps" step (uses AUTO_SYNC_TOKEN = devops-engineer persona).
|
||||||
|
# For a manual operator-host build, source the same token from
|
||||||
|
# /etc/molecule-bootstrap/agent-secrets.env first.
|
||||||
#
|
#
|
||||||
# docker buildx build --platform linux/amd64 \
|
# docker buildx build --platform linux/amd64 \
|
||||||
# -f workspace-server/Dockerfile.tenant \
|
# -f workspace-server/Dockerfile.tenant \
|
||||||
# -t registry.fly.io/molecule-tenant:latest \
|
# -t <ECR>/molecule-ai/platform-tenant:latest \
|
||||||
|
# --build-arg GIT_SHA=<sha> --build-arg NEXT_PUBLIC_PLATFORM_URL= \
|
||||||
# --push .
|
# --push .
|
||||||
|
|
||||||
# ── Stage 1: Go platform binary ──────────────────────────────────────
|
# ── Stage 1: Go platform binary ──────────────────────────────────────
|
||||||
@ -55,14 +75,7 @@ ENV NEXT_PUBLIC_PLATFORM_URL=$NEXT_PUBLIC_PLATFORM_URL
|
|||||||
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
|
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
|
||||||
RUN npm run build
|
RUN npm run build
|
||||||
|
|
||||||
# ── Stage 3: Clone templates + plugins from manifest.json ─────────────
|
# ── Stage 3: Runtime ──────────────────────────────────────────────────
|
||||||
FROM alpine:3.20 AS templates
|
|
||||||
RUN apk add --no-cache git jq
|
|
||||||
COPY manifest.json /manifest.json
|
|
||||||
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
|
|
||||||
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
|
|
||||||
|
|
||||||
# ── Stage 4: Runtime ──────────────────────────────────────────────────
|
|
||||||
FROM node:20-alpine
|
FROM node:20-alpine
|
||||||
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
|
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
|
||||||
|
|
||||||
@ -87,10 +100,13 @@ COPY --from=go-builder /platform /platform
|
|||||||
COPY --from=go-builder /memory-plugin /memory-plugin
|
COPY --from=go-builder /memory-plugin /memory-plugin
|
||||||
COPY workspace-server/migrations /migrations
|
COPY workspace-server/migrations /migrations
|
||||||
|
|
||||||
# Templates + plugins (cloned from GitHub in stage 3)
|
# Templates + plugins (pre-cloned by scripts/clone-manifest.sh in the
|
||||||
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
|
# trusted CI / operator-host context, .git already stripped — see
|
||||||
COPY --from=templates /org-templates /org-templates
|
# .tenant-bundle-deps/ in the build context). The Gitea token used to
|
||||||
COPY --from=templates /plugins /plugins
|
# clone them never enters this image.
|
||||||
|
COPY .tenant-bundle-deps/workspace-configs-templates /workspace-configs-templates
|
||||||
|
COPY .tenant-bundle-deps/org-templates /org-templates
|
||||||
|
COPY .tenant-bundle-deps/plugins /plugins
|
||||||
|
|
||||||
# Canvas standalone
|
# Canvas standalone
|
||||||
WORKDIR /canvas
|
WORKDIR /canvas
|
||||||
|
|||||||
89
workspace-server/cmd/server/bind_test.go
Normal file
89
workspace-server/cmd/server/bind_test.go
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
// TestResolveBindHost pins the precedence: BIND_ADDR explicit > dev-mode
|
||||||
|
// fail-open default of 127.0.0.1 > production-shape empty (all interfaces).
|
||||||
|
//
|
||||||
|
// Mutation-test invariant: removing the IsDevModeFailOpen() branch makes
|
||||||
|
// "no_bindaddr_devmode_unset_admin" fail (returns "" instead of "127.0.0.1").
|
||||||
|
// Removing the BIND_ADDR branch makes "explicit_bindaddr_*" cases fail.
|
||||||
|
func TestResolveBindHost(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
name string
|
||||||
|
bindAddr string
|
||||||
|
adminToken string
|
||||||
|
molEnv string
|
||||||
|
want string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
name: "no_bindaddr_devmode_unset_admin",
|
||||||
|
bindAddr: "",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "dev",
|
||||||
|
want: "127.0.0.1",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "no_bindaddr_devmode_unset_admin_full_word",
|
||||||
|
bindAddr: "",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "development",
|
||||||
|
want: "127.0.0.1",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "no_bindaddr_admin_set_in_dev_env",
|
||||||
|
bindAddr: "",
|
||||||
|
adminToken: "secret",
|
||||||
|
molEnv: "dev",
|
||||||
|
want: "", // ADMIN_TOKEN flips IsDevModeFailOpen to false → all interfaces
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "no_bindaddr_production_env",
|
||||||
|
bindAddr: "",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "production",
|
||||||
|
want: "", // production is not a dev value → all interfaces
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "no_bindaddr_unset_env",
|
||||||
|
bindAddr: "",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "",
|
||||||
|
want: "", // unset MOLECULE_ENV → not dev → all interfaces
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "explicit_bindaddr_loopback_overrides_devmode",
|
||||||
|
bindAddr: "127.0.0.1",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "dev",
|
||||||
|
want: "127.0.0.1",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "explicit_bindaddr_wildcard_overrides_devmode_default",
|
||||||
|
bindAddr: "0.0.0.0",
|
||||||
|
adminToken: "",
|
||||||
|
molEnv: "dev",
|
||||||
|
want: "0.0.0.0",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "explicit_bindaddr_in_production",
|
||||||
|
bindAddr: "10.0.5.7",
|
||||||
|
adminToken: "secret",
|
||||||
|
molEnv: "production",
|
||||||
|
want: "10.0.5.7",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tc := range cases {
|
||||||
|
t.Run(tc.name, func(t *testing.T) {
|
||||||
|
t.Setenv("BIND_ADDR", tc.bindAddr)
|
||||||
|
t.Setenv("ADMIN_TOKEN", tc.adminToken)
|
||||||
|
t.Setenv("MOLECULE_ENV", tc.molEnv)
|
||||||
|
got := resolveBindHost()
|
||||||
|
if got != tc.want {
|
||||||
|
t.Errorf("resolveBindHost() = %q, want %q (BIND_ADDR=%q ADMIN_TOKEN=%q MOLECULE_ENV=%q)",
|
||||||
|
got, tc.want, tc.bindAddr, tc.adminToken, tc.molEnv)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -19,6 +19,7 @@ import (
|
|||||||
"github.com/Molecule-AI/molecule-monorepo/platform/internal/handlers"
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/handlers"
|
||||||
"github.com/Molecule-AI/molecule-monorepo/platform/internal/imagewatch"
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/imagewatch"
|
||||||
memwiring "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/wiring"
|
memwiring "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/wiring"
|
||||||
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/middleware"
|
||||||
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
|
||||||
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
|
||||||
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
|
||||||
@ -332,15 +333,23 @@ func main() {
|
|||||||
// Router
|
// Router
|
||||||
r := router.Setup(hub, broadcaster, prov, platformURL, configsDir, wh, channelMgr, memBundle)
|
r := router.Setup(hub, broadcaster, prov, platformURL, configsDir, wh, channelMgr, memBundle)
|
||||||
|
|
||||||
// HTTP server with graceful shutdown
|
// HTTP server with graceful shutdown.
|
||||||
|
//
|
||||||
|
// Bind host: in dev-mode (no ADMIN_TOKEN, MOLECULE_ENV=dev|development)
|
||||||
|
// the AdminAuth chain fails open by design; pairing that with a wildcard
|
||||||
|
// bind would expose unauth /workspaces to any same-LAN peer. Default to
|
||||||
|
// loopback when fail-open is active. Operators who need LAN exposure set
|
||||||
|
// BIND_ADDR=0.0.0.0 explicitly. Production (ADMIN_TOKEN set) is unchanged.
|
||||||
|
// See molecule-core#7.
|
||||||
|
bindHost := resolveBindHost()
|
||||||
srv := &http.Server{
|
srv := &http.Server{
|
||||||
Addr: fmt.Sprintf(":%s", port),
|
Addr: fmt.Sprintf("%s:%s", bindHost, port),
|
||||||
Handler: r,
|
Handler: r,
|
||||||
}
|
}
|
||||||
|
|
||||||
// Start server in goroutine
|
// Start server in goroutine
|
||||||
go func() {
|
go func() {
|
||||||
log.Printf("Platform starting on :%s", port)
|
log.Printf("Platform starting on %s:%s (dev-mode-fail-open=%v)", bindHost, port, middleware.IsDevModeFailOpen())
|
||||||
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
|
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
|
||||||
log.Fatalf("Server failed: %v", err)
|
log.Fatalf("Server failed: %v", err)
|
||||||
}
|
}
|
||||||
@ -375,6 +384,29 @@ func envOr(key, fallback string) string {
|
|||||||
return fallback
|
return fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// resolveBindHost picks the listener interface for the HTTP server.
|
||||||
|
//
|
||||||
|
// Precedence:
|
||||||
|
// 1. BIND_ADDR — explicit operator override (any value, including "0.0.0.0").
|
||||||
|
// 2. dev-mode fail-open active → "127.0.0.1" (loopback only).
|
||||||
|
// 3. otherwise → "" (Go binds every interface; existing prod/self-host shape).
|
||||||
|
//
|
||||||
|
// Coupling the loopback default to middleware.IsDevModeFailOpen() means the
|
||||||
|
// two safety levers — bind narrowness and auth strength — move together. A
|
||||||
|
// production deploy (ADMIN_TOKEN set) keeps binding to all interfaces because
|
||||||
|
// the auth chain is doing its job; a dev Mac (no ADMIN_TOKEN, MOLECULE_ENV=dev)
|
||||||
|
// is reachable only via loopback because the auth chain is fail-open. See
|
||||||
|
// molecule-core#7 for the original LAN exposure finding.
|
||||||
|
func resolveBindHost() string {
|
||||||
|
if v := os.Getenv("BIND_ADDR"); v != "" {
|
||||||
|
return v
|
||||||
|
}
|
||||||
|
if middleware.IsDevModeFailOpen() {
|
||||||
|
return "127.0.0.1"
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
func findConfigsDir() string {
|
func findConfigsDir() string {
|
||||||
candidates := []string{
|
candidates := []string{
|
||||||
"workspace-configs-templates",
|
"workspace-configs-templates",
|
||||||
|
|||||||
@ -413,11 +413,56 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
|
|||||||
return http.StatusOK, respBody, nil
|
return http.StatusOK, respBody, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Mock-runtime short-circuit. Workspaces with runtime='mock' have
|
||||||
|
// no container, no EC2, no URL — every reply is synthesised here
|
||||||
|
// from a small canned-variant pool. Built for the "200-workspace
|
||||||
|
// mock org" demo: a CEO/VPs/Managers/ICs hierarchy that renders
|
||||||
|
// at scale on the canvas without burning real LLM credits or
|
||||||
|
// provisioning 200 EC2 instances. See mock_runtime.go for the
|
||||||
|
// full rationale + reply shape contract.
|
||||||
|
//
|
||||||
|
// Position: AFTER poll-mode (mock isn't a delivery mode, it's a
|
||||||
|
// runtime; treating poll-set-on-mock as poll matches operator
|
||||||
|
// intent if anyone ever does that), BEFORE resolveAgentURL (mock
|
||||||
|
// has no URL — going through resolveAgentURL would 404 on the
|
||||||
|
// SELECT url since the row is provisioned as NULL).
|
||||||
|
if status, respBody, handled := h.handleMockA2A(ctx, workspaceID, callerID, body, a2aMethod, logActivity); handled {
|
||||||
|
return status, respBody, nil
|
||||||
|
}
|
||||||
|
|
||||||
agentURL, proxyErr := h.resolveAgentURL(ctx, workspaceID)
|
agentURL, proxyErr := h.resolveAgentURL(ctx, workspaceID)
|
||||||
if proxyErr != nil {
|
if proxyErr != nil {
|
||||||
return 0, nil, proxyErr
|
return 0, nil, proxyErr
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Pre-flight container-health check (#36). The dispatchA2A path below
|
||||||
|
// does Docker-DNS forwarding to `ws-<wsShort>:8000` and only catches a
|
||||||
|
// missing/dead container REACTIVELY via maybeMarkContainerDead in
|
||||||
|
// handleA2ADispatchError. That works but costs the caller a full
|
||||||
|
// network-timeout (2-30s) before the structured 503 surfaces.
|
||||||
|
//
|
||||||
|
// When we KNOW the workspace is container-backed (h.docker != nil + we
|
||||||
|
// rewrite to Docker-DNS form below), do a single proactive
|
||||||
|
// RunningContainerName lookup. If the container is genuinely missing,
|
||||||
|
// short-circuit with the same structured 503 + async restart that
|
||||||
|
// maybeMarkContainerDead would produce — but immediately, without the
|
||||||
|
// network round-trip.
|
||||||
|
//
|
||||||
|
// Three outcomes of provisioner.RunningContainerName(ctx, h.docker, id):
|
||||||
|
// ("ws-<id>", nil) → forward as today.
|
||||||
|
// ("", nil) → container is genuinely not running. Fast-503.
|
||||||
|
// ("", err) → transient daemon error. Fall through to optimistic
|
||||||
|
// forward — matches Provisioner.IsRunning's
|
||||||
|
// (true, err) "fail-soft as alive" contract.
|
||||||
|
//
|
||||||
|
// Same SSOT as findRunningContainer (#10/#12). See AST gate
|
||||||
|
// TestProxyA2A_RoutesThroughProvisionerSSOT.
|
||||||
|
if h.provisioner != nil && platformInDocker && strings.HasPrefix(agentURL, "http://"+provisioner.ContainerName(workspaceID)+":") {
|
||||||
|
if proxyErr := h.preflightContainerHealth(ctx, workspaceID); proxyErr != nil {
|
||||||
|
return 0, nil, proxyErr
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
startTime := time.Now()
|
startTime := time.Now()
|
||||||
resp, cancelFwd, err := h.dispatchA2A(ctx, workspaceID, agentURL, body, callerID)
|
resp, cancelFwd, err := h.dispatchA2A(ctx, workspaceID, agentURL, body, callerID)
|
||||||
if cancelFwd != nil {
|
if cancelFwd != nil {
|
||||||
|
|||||||
@ -198,6 +198,60 @@ func (h *WorkspaceHandler) maybeMarkContainerDead(ctx context.Context, workspace
|
|||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// preflightContainerHealth runs a proactive Provisioner.IsRunning check
|
||||||
|
// (#36) before dispatching the a2a forward. Routed through provisioner's
|
||||||
|
// SSOT IsRunning, which itself wraps RunningContainerName — same source
|
||||||
|
// as findRunningContainer in the plugins handler (#10/#12).
|
||||||
|
//
|
||||||
|
// Returns nil when the forward should proceed:
|
||||||
|
// - container is running, OR
|
||||||
|
// - daemon errored transiently (matches IsRunning's (true, err)
|
||||||
|
// "fail-soft as alive" contract — let the optimistic forward run
|
||||||
|
// and reactive maybeMarkContainerDead catch a real failure).
|
||||||
|
//
|
||||||
|
// Returns a structured 503 + triggers the same async restart that
|
||||||
|
// maybeMarkContainerDead would produce, when:
|
||||||
|
// - container is genuinely not running (NotFound / Exited / Created…).
|
||||||
|
//
|
||||||
|
// The point of running this BEFORE the forward is to save the caller
|
||||||
|
// 2-30s of network-timeout cost when the container is missing — a common
|
||||||
|
// shape post-EC2-replace (see molecule-controlplane#20 incident
|
||||||
|
// 2026-05-07) where the reconciler hasn't respawned the agent yet.
|
||||||
|
func (h *WorkspaceHandler) preflightContainerHealth(ctx context.Context, workspaceID string) *proxyA2AError {
|
||||||
|
running, err := h.provisioner.IsRunning(ctx, workspaceID)
|
||||||
|
if err != nil {
|
||||||
|
// Transient daemon error. Provisioner.IsRunning returns (true, err)
|
||||||
|
// in this case — fall through to the optimistic forward, reactive
|
||||||
|
// maybeMarkContainerDead handles a real failure later.
|
||||||
|
log.Printf("ProxyA2A preflight: IsRunning transient error for %s: %v (proceeding with forward)", workspaceID, err)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if running {
|
||||||
|
// Container is running — forward as today.
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
// Container is genuinely not running. Mark offline + trigger restart
|
||||||
|
// (same effect as maybeMarkContainerDead's branch), and return the
|
||||||
|
// structured 503 immediately so the caller skips the forward.
|
||||||
|
log.Printf("ProxyA2A preflight: container for %s is not running — marking offline and triggering restart (#36)", workspaceID)
|
||||||
|
if _, dbErr := db.DB.ExecContext(ctx,
|
||||||
|
`UPDATE workspaces SET status = $1, updated_at = now() WHERE id = $2 AND status NOT IN ('removed', 'provisioning')`,
|
||||||
|
models.StatusOffline, workspaceID); dbErr != nil {
|
||||||
|
log.Printf("ProxyA2A preflight: failed to mark workspace %s offline: %v", workspaceID, dbErr)
|
||||||
|
}
|
||||||
|
db.ClearWorkspaceKeys(ctx, workspaceID)
|
||||||
|
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOffline), workspaceID, map[string]interface{}{})
|
||||||
|
go h.RestartByID(workspaceID)
|
||||||
|
return &proxyA2AError{
|
||||||
|
Status: http.StatusServiceUnavailable,
|
||||||
|
Response: gin.H{
|
||||||
|
"error": "workspace container not running — restart triggered",
|
||||||
|
"restarting": true,
|
||||||
|
"preflight": true, // distinguishes from reactive containerDead path
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// logA2AFailure records a failed A2A attempt to activity_logs in a detached
|
// logA2AFailure records a failed A2A attempt to activity_logs in a detached
|
||||||
// goroutine (the request context may already be done by the time it runs).
|
// goroutine (the request context may already be done by the time it runs).
|
||||||
func (h *WorkspaceHandler) logA2AFailure(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string, err error, durationMs int) {
|
func (h *WorkspaceHandler) logA2AFailure(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string, err error, durationMs int) {
|
||||||
|
|||||||
194
workspace-server/internal/handlers/a2a_proxy_preflight_test.go
Normal file
194
workspace-server/internal/handlers/a2a_proxy_preflight_test.go
Normal file
@ -0,0 +1,194 @@
|
|||||||
|
package handlers
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"go/ast"
|
||||||
|
"go/parser"
|
||||||
|
"go/token"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/DATA-DOG/go-sqlmock"
|
||||||
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/models"
|
||||||
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
|
||||||
|
)
|
||||||
|
|
||||||
|
// preflightLocalProv is a controllable LocalProvisionerAPI stub for the
|
||||||
|
// preflight tests (#36). Other API methods panic to guard against tests
|
||||||
|
// that should be using a different stub.
|
||||||
|
type preflightLocalProv struct {
|
||||||
|
running bool
|
||||||
|
err error
|
||||||
|
calls int
|
||||||
|
calledWith []string
|
||||||
|
}
|
||||||
|
|
||||||
|
func (p *preflightLocalProv) IsRunning(_ context.Context, workspaceID string) (bool, error) {
|
||||||
|
p.calls++
|
||||||
|
p.calledWith = append(p.calledWith, workspaceID)
|
||||||
|
return p.running, p.err
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) Start(_ context.Context, _ provisioner.WorkspaceConfig) (string, error) {
|
||||||
|
panic("preflightLocalProv: Start not implemented")
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) Stop(_ context.Context, _ string) error {
|
||||||
|
panic("preflightLocalProv: Stop not implemented")
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) ExecRead(_ context.Context, _, _ string) ([]byte, error) {
|
||||||
|
panic("preflightLocalProv: ExecRead not implemented")
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) RemoveVolume(_ context.Context, _ string) error {
|
||||||
|
panic("preflightLocalProv: RemoveVolume not implemented")
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) VolumeHasFile(_ context.Context, _, _ string) (bool, error) {
|
||||||
|
panic("preflightLocalProv: VolumeHasFile not implemented")
|
||||||
|
}
|
||||||
|
func (p *preflightLocalProv) WriteAuthTokenToVolume(_ context.Context, _, _ string) error {
|
||||||
|
panic("preflightLocalProv: WriteAuthTokenToVolume not implemented")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPreflight_ContainerRunning_ReturnsNil — IsRunning(true,nil): forward
|
||||||
|
// proceeds. preflight returns nil → caller continues to dispatchA2A.
|
||||||
|
func TestPreflight_ContainerRunning_ReturnsNil(t *testing.T) {
|
||||||
|
_ = setupTestDB(t)
|
||||||
|
stub := &preflightLocalProv{running: true, err: nil}
|
||||||
|
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
|
||||||
|
h.provisioner = stub
|
||||||
|
|
||||||
|
if err := h.preflightContainerHealth(context.Background(), "ws-running-123"); err != nil {
|
||||||
|
t.Fatalf("preflight should return nil when container running, got %+v", err)
|
||||||
|
}
|
||||||
|
if stub.calls != 1 {
|
||||||
|
t.Errorf("IsRunning should be called exactly once, got %d", stub.calls)
|
||||||
|
}
|
||||||
|
if len(stub.calledWith) != 1 || stub.calledWith[0] != "ws-running-123" {
|
||||||
|
t.Errorf("IsRunning should be called with workspace id, got %v", stub.calledWith)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPreflight_ContainerNotRunning_StructuredFastFail — IsRunning(false,nil):
|
||||||
|
// preflight returns structured 503 with restarting=true + preflight=true, AND
|
||||||
|
// triggers the offline-flip + WORKSPACE_OFFLINE broadcast + async restart.
|
||||||
|
// This is the load-bearing case — saves the caller 2-30s of network timeout.
|
||||||
|
func TestPreflight_ContainerNotRunning_StructuredFastFail(t *testing.T) {
|
||||||
|
mock := setupTestDB(t)
|
||||||
|
_ = setupTestRedis(t)
|
||||||
|
stub := &preflightLocalProv{running: false, err: nil}
|
||||||
|
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
|
||||||
|
h.provisioner = stub
|
||||||
|
|
||||||
|
// Expect the offline-flip UPDATE.
|
||||||
|
mock.ExpectExec(`UPDATE workspaces SET status =`).
|
||||||
|
WithArgs(models.StatusOffline, "ws-dead-456").
|
||||||
|
WillReturnResult(sqlmock.NewResult(0, 1))
|
||||||
|
// Broadcaster's INSERT INTO structure_events fires too — best-effort
|
||||||
|
// log entry for the WORKSPACE_OFFLINE event. Match permissively.
|
||||||
|
mock.ExpectExec(`INSERT INTO structure_events`).
|
||||||
|
WillReturnResult(sqlmock.NewResult(0, 1))
|
||||||
|
|
||||||
|
proxyErr := h.preflightContainerHealth(context.Background(), "ws-dead-456")
|
||||||
|
if proxyErr == nil {
|
||||||
|
t.Fatal("preflight should return *proxyA2AError when container not running")
|
||||||
|
}
|
||||||
|
if proxyErr.Status != 503 {
|
||||||
|
t.Errorf("expected 503, got %d", proxyErr.Status)
|
||||||
|
}
|
||||||
|
if got := proxyErr.Response["restarting"]; got != true {
|
||||||
|
t.Errorf("response should mark restarting=true, got %v", got)
|
||||||
|
}
|
||||||
|
if got := proxyErr.Response["preflight"]; got != true {
|
||||||
|
t.Errorf("response should mark preflight=true so callers can distinguish from reactive containerDead, got %v", got)
|
||||||
|
}
|
||||||
|
if got := proxyErr.Response["error"]; got != "workspace container not running — restart triggered" {
|
||||||
|
t.Errorf("error message mismatch, got %q", got)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Note: broadcaster firing is exercised by the production path's
|
||||||
|
// h.broadcaster.RecordAndBroadcast call but not asserted here — the
|
||||||
|
// real *events.Broadcaster doesn't expose received events for inspection.
|
||||||
|
// The DB UPDATE expectation is sufficient to pin the offline-flip path.
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPreflight_TransientError_FailsSoftAsAlive — IsRunning(true,err): the
|
||||||
|
// (true, err) "fail-soft" contract — preflight returns nil so the optimistic
|
||||||
|
// forward runs; reactive maybeMarkContainerDead handles a real failure later.
|
||||||
|
// This pin is critical: a flaky daemon must NOT trigger a restart cascade.
|
||||||
|
func TestPreflight_TransientError_FailsSoftAsAlive(t *testing.T) {
|
||||||
|
_ = setupTestDB(t)
|
||||||
|
stub := &preflightLocalProv{running: true, err: errors.New("docker daemon EOF")}
|
||||||
|
h := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
|
||||||
|
h.provisioner = stub
|
||||||
|
|
||||||
|
if err := h.preflightContainerHealth(context.Background(), "ws-flaky-789"); err != nil {
|
||||||
|
t.Fatalf("preflight should return nil on transient error (fail-soft), got %+v", err)
|
||||||
|
}
|
||||||
|
// No DB UPDATE expected — sqlmock would complain about unexpected calls
|
||||||
|
// at test cleanup if the offline-flip path fired.
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestProxyA2A_Preflight_RoutesThroughProvisionerSSOT — AST gate (#36 mirror
|
||||||
|
// of #12's gate). Pins the invariant that preflightContainerHealth uses the
|
||||||
|
// SSOT Provisioner.IsRunning helper, NOT a parallel docker.ContainerInspect
|
||||||
|
// of its own.
|
||||||
|
//
|
||||||
|
// Mutation invariant: if a future PR replaces h.provisioner.IsRunning with
|
||||||
|
// a direct cli.ContainerInspect call, this test fails. That's the signal to
|
||||||
|
// either (a) extend Provisioner.IsRunning's contract OR (b) document why
|
||||||
|
// this call site needs to differ. Either way, the drift gets a reviewer's
|
||||||
|
// attention instead of shipping silently.
|
||||||
|
func TestProxyA2A_Preflight_RoutesThroughProvisionerSSOT(t *testing.T) {
|
||||||
|
fset := token.NewFileSet()
|
||||||
|
file, err := parser.ParseFile(fset, "a2a_proxy_helpers.go", nil, parser.ParseComments)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("parse a2a_proxy_helpers.go: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var fn *ast.FuncDecl
|
||||||
|
ast.Inspect(file, func(n ast.Node) bool {
|
||||||
|
f, ok := n.(*ast.FuncDecl)
|
||||||
|
if !ok || f.Name.Name != "preflightContainerHealth" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
fn = f
|
||||||
|
return false
|
||||||
|
})
|
||||||
|
if fn == nil {
|
||||||
|
t.Fatal("preflightContainerHealth not found — was it renamed? update this gate or the SSOT routing assumption")
|
||||||
|
}
|
||||||
|
|
||||||
|
var (
|
||||||
|
callsIsRunning bool
|
||||||
|
callsContainerInspectRaw bool
|
||||||
|
callsRunningContainerNameDirect bool
|
||||||
|
)
|
||||||
|
ast.Inspect(fn.Body, func(n ast.Node) bool {
|
||||||
|
call, ok := n.(*ast.CallExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
sel, ok := call.Fun.(*ast.SelectorExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
switch sel.Sel.Name {
|
||||||
|
case "IsRunning":
|
||||||
|
callsIsRunning = true
|
||||||
|
case "ContainerInspect":
|
||||||
|
callsContainerInspectRaw = true
|
||||||
|
case "RunningContainerName":
|
||||||
|
// Direct RunningContainerName is also acceptable SSOT — but
|
||||||
|
// preferring IsRunning keeps the (bool, error) contract that
|
||||||
|
// already exists in the helper API surface.
|
||||||
|
callsRunningContainerNameDirect = true
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
})
|
||||||
|
|
||||||
|
if !callsIsRunning && !callsRunningContainerNameDirect {
|
||||||
|
t.Errorf("preflightContainerHealth must call provisioner.IsRunning OR provisioner.RunningContainerName for the SSOT health check — see molecule-core#36. Found neither.")
|
||||||
|
}
|
||||||
|
if callsContainerInspectRaw {
|
||||||
|
t.Errorf("preflightContainerHealth carries a direct ContainerInspect call. This is the parallel-impl drift molecule-core#36 fixed. " +
|
||||||
|
"Either route through provisioner.IsRunning OR — if a new use case truly needs a different inspect — extend the helper's contract first and update this gate to allow the specific delta.")
|
||||||
|
}
|
||||||
|
}
|
||||||
223
workspace-server/internal/handlers/mock_runtime.go
Normal file
223
workspace-server/internal/handlers/mock_runtime.go
Normal file
@ -0,0 +1,223 @@
|
|||||||
|
package handlers
|
||||||
|
|
||||||
|
// mock_runtime.go — "mock" runtime: a virtual workspace that has no
|
||||||
|
// container, no EC2, no LLM, just hardcoded canned A2A replies. Built
|
||||||
|
// for the funding-demo "200-workspace mock org" so hongming can show
|
||||||
|
// investors a CEO/VPs/Managers/ICs hierarchy at scale without burning
|
||||||
|
// 200 EC2 instances or 200 Anthropic keys.
|
||||||
|
//
|
||||||
|
// Wire model:
|
||||||
|
// - org template declares `runtime: mock` on every workspace
|
||||||
|
// - createWorkspaceTree skips provisioning, sets status='online'
|
||||||
|
// directly (mirrors the `external` short-circuit, minus the URL +
|
||||||
|
// awaiting_agent dance)
|
||||||
|
// - proxyA2ARequest short-circuits on a mock-runtime target and
|
||||||
|
// returns a canned JSON-RPC reply; never calls resolveAgentURL,
|
||||||
|
// never opens an HTTP connection, never touches Docker/EC2
|
||||||
|
//
|
||||||
|
// The reply is JSON-RPC 2.0 + a2a-sdk v0.3 shape so the canvas's
|
||||||
|
// extractAgentText / extractTextsFromParts read it without any
|
||||||
|
// special-casing. We rotate over a small variant pool so a screen
|
||||||
|
// full of replies doesn't all read identical — gives the demo a bit
|
||||||
|
// of life without pretending to be a real agent.
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"crypto/sha1"
|
||||||
|
"database/sql"
|
||||||
|
"encoding/binary"
|
||||||
|
"encoding/json"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"net/http"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
|
||||||
|
"github.com/gin-gonic/gin"
|
||||||
|
"github.com/google/uuid"
|
||||||
|
)
|
||||||
|
|
||||||
|
// MockRuntimeName is the canonical runtime string a workspace row
|
||||||
|
// carries to opt into the canned-reply short-circuit. Kept as a const
|
||||||
|
// so the proxy's runtime-check + the org-import skip-block reference
|
||||||
|
// the same literal.
|
||||||
|
const MockRuntimeName = "mock"
|
||||||
|
|
||||||
|
// mockReplyVariants is the pool of canned strings the mock runtime
|
||||||
|
// rotates through. Picked to read like a busy-but-short reply from a
|
||||||
|
// real human in a hierarchy — a CEO would NOT respond with "On it!",
|
||||||
|
// but for the demo every node is shown to be reachable, so we lean
|
||||||
|
// into the variety. Variant selection is deterministic per
|
||||||
|
// (workspaceID, request-id) pair so a screen recording replays the
|
||||||
|
// same reply for the same input.
|
||||||
|
var mockReplyVariants = []string{
|
||||||
|
"On it!",
|
||||||
|
"Got it, on it now.",
|
||||||
|
"On it, boss.",
|
||||||
|
"Working on it.",
|
||||||
|
"Acknowledged — on it.",
|
||||||
|
"On it, will report back.",
|
||||||
|
"Roger that, on it.",
|
||||||
|
"Copy that. On it.",
|
||||||
|
"On it — ETA shortly.",
|
||||||
|
"On it. Standby for update.",
|
||||||
|
}
|
||||||
|
|
||||||
|
// pickMockReply returns a canned reply for the given workspaceID +
|
||||||
|
// requestID. Deterministic so the same (workspace, message-id) pair
|
||||||
|
// always picks the same variant — useful for screen recordings and
|
||||||
|
// flake-free e2e snapshots. Falls back to variant[0] if the inputs
|
||||||
|
// are empty.
|
||||||
|
func pickMockReply(workspaceID, requestID string) string {
|
||||||
|
if len(mockReplyVariants) == 0 {
|
||||||
|
return "On it!"
|
||||||
|
}
|
||||||
|
if workspaceID == "" && requestID == "" {
|
||||||
|
return mockReplyVariants[0]
|
||||||
|
}
|
||||||
|
h := sha1.Sum([]byte(workspaceID + ":" + requestID))
|
||||||
|
idx := int(binary.BigEndian.Uint32(h[0:4]) % uint32(len(mockReplyVariants)))
|
||||||
|
return mockReplyVariants[idx]
|
||||||
|
}
|
||||||
|
|
||||||
|
// lookupRuntime returns the workspace's runtime string. Empty when the
|
||||||
|
// row is missing / DB hiccup so callers fall through to the existing
|
||||||
|
// dispatch path (which will then 404 / 502 normally). Fail-open here
|
||||||
|
// because a transient DB error must not silently flip a real workspace
|
||||||
|
// into mock-mode and start handing out canned replies in place of
|
||||||
|
// genuine agent traffic.
|
||||||
|
func lookupRuntime(ctx context.Context, workspaceID string) string {
|
||||||
|
var runtime sql.NullString
|
||||||
|
err := db.DB.QueryRowContext(ctx,
|
||||||
|
`SELECT runtime FROM workspaces WHERE id = $1`, workspaceID,
|
||||||
|
).Scan(&runtime)
|
||||||
|
if err != nil {
|
||||||
|
if !errors.Is(err, sql.ErrNoRows) {
|
||||||
|
log.Printf("ProxyA2A: lookupRuntime(%s) failed (%v) — falling through to dispatch path", workspaceID, err)
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
if !runtime.Valid {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return runtime.String
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildMockA2AResponse synthesises a JSON-RPC 2.0 success envelope that
|
||||||
|
// matches the a2a-sdk v0.3 reply shape the canvas's extractAgentText
|
||||||
|
// already understands: `{result: {parts: [{kind: "text", text: ...}]}}`.
|
||||||
|
// `requestID` is the JSON-RPC `id` of the inbound request — A2A
|
||||||
|
// implementations echo it on the reply so callers can correlate. We
|
||||||
|
// extract it from the normalized payload in the caller and pass it in
|
||||||
|
// here so this function stays JSON-only (no payload parsing).
|
||||||
|
//
|
||||||
|
// Returns marshalled bytes ready to write straight to the HTTP body.
|
||||||
|
// Marshal failure is logged + a tiny fallback envelope returned, since
|
||||||
|
// failing the whole request because of a JSON encoding hiccup on a
|
||||||
|
// constant-shaped payload would defeat the "mock always works" guarantee.
|
||||||
|
func buildMockA2AResponse(workspaceID, requestID, replyText string) []byte {
|
||||||
|
if requestID == "" {
|
||||||
|
requestID = uuid.New().String()
|
||||||
|
}
|
||||||
|
envelope := map[string]any{
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": requestID,
|
||||||
|
"result": map[string]any{
|
||||||
|
"parts": []map[string]any{
|
||||||
|
{"kind": "text", "text": replyText},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
out, err := json.Marshal(envelope)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("ProxyA2A: mock-runtime response marshal failed for %s: %v — emitting fallback", workspaceID, err)
|
||||||
|
// Hand-rolled minimal envelope. Safe because every value is a
|
||||||
|
// hardcoded constant string with no characters that need
|
||||||
|
// escaping in a JSON string literal.
|
||||||
|
fallback := fmt.Sprintf(
|
||||||
|
`{"jsonrpc":"2.0","id":%q,"result":{"parts":[{"kind":"text","text":%q}]}}`,
|
||||||
|
requestID, replyText,
|
||||||
|
)
|
||||||
|
return []byte(fallback)
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractRequestID pulls the JSON-RPC `id` out of an already-normalized
|
||||||
|
// A2A payload. Returns "" when the field is absent or not a string —
|
||||||
|
// caller substitutes a fresh UUID. Tolerant of every shape
|
||||||
|
// normalizeA2APayload could produce.
|
||||||
|
func extractRequestID(body []byte) string {
|
||||||
|
var top map[string]json.RawMessage
|
||||||
|
if err := json.Unmarshal(body, &top); err != nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
raw, ok := top["id"]
|
||||||
|
if !ok {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
var s string
|
||||||
|
if json.Unmarshal(raw, &s) == nil {
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
// JSON-RPC permits numeric IDs too; canvas issues UUIDs but be
|
||||||
|
// defensive against alternative SDKs.
|
||||||
|
var n json.Number
|
||||||
|
if json.Unmarshal(raw, &n) == nil {
|
||||||
|
return n.String()
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleMockA2A is the proxy short-circuit for mock-runtime workspaces.
|
||||||
|
// Returns (status, body, true) when the target is mock — caller writes
|
||||||
|
// the response and returns. Returns (_, _, false) when the target is
|
||||||
|
// not mock — caller continues to the real dispatch path.
|
||||||
|
//
|
||||||
|
// Side-effects: writes a synthetic activity_logs row via logA2ASuccess
|
||||||
|
// when logActivity is true so the canvas's "Agent Comms" tab shows the
|
||||||
|
// mock reply in the trace alongside real-agent traffic. Without this
|
||||||
|
// the demo would render messages on the canvas chat panel but a peer
|
||||||
|
// node clicking through to its activity tab would see an empty list.
|
||||||
|
func (h *WorkspaceHandler) handleMockA2A(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string, logActivity bool) (int, []byte, bool) {
|
||||||
|
if lookupRuntime(ctx, workspaceID) != MockRuntimeName {
|
||||||
|
return 0, nil, false
|
||||||
|
}
|
||||||
|
requestID := extractRequestID(body)
|
||||||
|
replyText := pickMockReply(workspaceID, requestID)
|
||||||
|
respBody := buildMockA2AResponse(workspaceID, requestID, replyText)
|
||||||
|
|
||||||
|
// Tiny artificial delay so the canvas chat UI has time to render
|
||||||
|
// the user's outgoing bubble before the agent reply appears.
|
||||||
|
// Without it the reply lands the same animation frame and feels
|
||||||
|
// robotic. 80ms is too fast to look "real" but masks the React
|
||||||
|
// double-render race that drops the user bubble entirely on slow
|
||||||
|
// machines (observed locally on M1 Air, 2026-05-07). Below 200ms
|
||||||
|
// keeps a 200-node demo snappy when investors fan out 30 messages
|
||||||
|
// at once.
|
||||||
|
time.Sleep(80 * time.Millisecond)
|
||||||
|
|
||||||
|
if logActivity {
|
||||||
|
// Reuse the existing success-logger so the activity feed shape
|
||||||
|
// is identical to a real agent reply. Status 200 + duration 0
|
||||||
|
// is the "synthesised reply" marker; activity_logs.duration_ms
|
||||||
|
// being 0 is harmless (real fast paths can hit 0 too).
|
||||||
|
h.logA2ASuccess(ctx, workspaceID, callerID, body, respBody, a2aMethod, http.StatusOK, 0)
|
||||||
|
}
|
||||||
|
return http.StatusOK, respBody, true
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsMockRuntime is a small public helper for callers outside this
|
||||||
|
// package (tests, the org importer) that need to ask the question
|
||||||
|
// without depending on the unexported constant. Trims + lower-cases
|
||||||
|
// so a typoed YAML cell like " Mock " still resolves correctly.
|
||||||
|
func IsMockRuntime(runtime string) bool {
|
||||||
|
return strings.EqualFold(strings.TrimSpace(runtime), MockRuntimeName)
|
||||||
|
}
|
||||||
|
|
||||||
|
// gin import is unused at file scope but kept as a tag so a future
|
||||||
|
// addition of a thin HTTP handler (e.g. POST /workspaces/:id/mock/replies
|
||||||
|
// for an admin-set custom reply pool) doesn't need an import re-order.
|
||||||
|
var _ = gin.H{}
|
||||||
266
workspace-server/internal/handlers/mock_runtime_test.go
Normal file
266
workspace-server/internal/handlers/mock_runtime_test.go
Normal file
@ -0,0 +1,266 @@
|
|||||||
|
package handlers
|
||||||
|
|
||||||
|
// mock_runtime_test.go — locks the contract for the mock-runtime
|
||||||
|
// short-circuit added for the funding-demo "200-workspace mock org"
|
||||||
|
// template. Three invariants:
|
||||||
|
//
|
||||||
|
// 1. ProxyA2A on a workspace with runtime='mock' must return 200
|
||||||
|
// with a JSON-RPC reply containing one text part. NO HTTP
|
||||||
|
// dispatch, NO resolveAgentURL DB read (mock workspaces have
|
||||||
|
// no URL — that read would 404 and break the demo).
|
||||||
|
//
|
||||||
|
// 2. The reply text must be one of the canned variants and must be
|
||||||
|
// deterministic for a given (workspace_id, request_id) pair so
|
||||||
|
// screen recordings replay identically.
|
||||||
|
//
|
||||||
|
// 3. Workspaces with runtime != 'mock' must NOT be affected — the
|
||||||
|
// mock check fails fast and falls through to the existing
|
||||||
|
// dispatch path. Same kind of regression guard the poll-mode
|
||||||
|
// tests carry.
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/json"
|
||||||
|
"net/http"
|
||||||
|
"net/http/httptest"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/DATA-DOG/go-sqlmock"
|
||||||
|
"github.com/gin-gonic/gin"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestProxyA2A_MockRuntime_ReturnsCannedReply is the happy-path
|
||||||
|
// contract. A workspace flagged runtime='mock' must:
|
||||||
|
// - return 200 with JSON-RPC envelope {result:{parts:[{kind:text,text:...}]}}
|
||||||
|
// - not dispatch HTTP (no SELECT url SQL expected)
|
||||||
|
// - reply text is one of mockReplyVariants
|
||||||
|
func TestProxyA2A_MockRuntime_ReturnsCannedReply(t *testing.T) {
|
||||||
|
mock := setupTestDB(t)
|
||||||
|
setupTestRedis(t)
|
||||||
|
broadcaster := newTestBroadcaster()
|
||||||
|
handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
|
||||||
|
|
||||||
|
const wsID = "ws-mock-canned"
|
||||||
|
|
||||||
|
// Budget check fires before runtime lookup (same as the poll-mode
|
||||||
|
// short-circuit) — keeps mock workspaces honest if a tenant ever
|
||||||
|
// sets a budget on one. Unlikely on a demo, but the guard stays
|
||||||
|
// uniform so future "monthly_spend on mock = 0" assertions don't
|
||||||
|
// drift.
|
||||||
|
expectBudgetCheck(mock, wsID)
|
||||||
|
|
||||||
|
// lookupDeliveryMode runs first — return push so the poll
|
||||||
|
// short-circuit doesn't fire and we hit the mock check.
|
||||||
|
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
|
||||||
|
WithArgs(wsID).
|
||||||
|
WillReturnRows(sqlmock.NewRows([]string{"delivery_mode"}).AddRow("push"))
|
||||||
|
|
||||||
|
// lookupRuntime SELECT — returns 'mock', triggering the canned-reply
|
||||||
|
// short-circuit. CRITICAL: NO ExpectQuery for `SELECT url, status
|
||||||
|
// FROM workspaces` (resolveAgentURL's query). If the short-circuit
|
||||||
|
// fails to fire, sqlmock will surface "unexpected query" on the URL
|
||||||
|
// SELECT and the test fails loudly — that's the dispatch-leak detector.
|
||||||
|
mock.ExpectQuery("SELECT runtime FROM workspaces WHERE id").
|
||||||
|
WithArgs(wsID).
|
||||||
|
WillReturnRows(sqlmock.NewRows([]string{"runtime"}).AddRow("mock"))
|
||||||
|
|
||||||
|
// Activity log: logA2ASuccess writes the synthetic reply to
|
||||||
|
// activity_logs so the canvas's Agent Comms tab shows it alongside
|
||||||
|
// real-agent traffic.
|
||||||
|
mock.ExpectExec("INSERT INTO activity_logs").
|
||||||
|
WillReturnResult(sqlmock.NewResult(0, 1))
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: wsID}}
|
||||||
|
|
||||||
|
body := `{"jsonrpc":"2.0","id":"req-mock-1","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hello mock"}]}}}`
|
||||||
|
c.Request = httptest.NewRequest("POST", "/workspaces/"+wsID+"/a2a", bytes.NewBufferString(body))
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
handler.ProxyA2A(c)
|
||||||
|
|
||||||
|
// logA2ASuccess fires async — give it a moment to settle so
|
||||||
|
// ExpectationsWereMet doesn't flake.
|
||||||
|
time.Sleep(200 * time.Millisecond)
|
||||||
|
|
||||||
|
if w.Code != http.StatusOK {
|
||||||
|
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
|
||||||
|
}
|
||||||
|
var resp map[string]interface{}
|
||||||
|
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
|
||||||
|
t.Fatalf("response is not valid JSON: %v", err)
|
||||||
|
}
|
||||||
|
if resp["jsonrpc"] != "2.0" {
|
||||||
|
t.Errorf("response.jsonrpc = %v, want 2.0", resp["jsonrpc"])
|
||||||
|
}
|
||||||
|
if resp["id"] != "req-mock-1" {
|
||||||
|
t.Errorf("response.id = %v, want %q (echoed from request)", resp["id"], "req-mock-1")
|
||||||
|
}
|
||||||
|
result, _ := resp["result"].(map[string]interface{})
|
||||||
|
if result == nil {
|
||||||
|
t.Fatalf("response.result missing or wrong type: %v", resp["result"])
|
||||||
|
}
|
||||||
|
parts, _ := result["parts"].([]interface{})
|
||||||
|
if len(parts) != 1 {
|
||||||
|
t.Fatalf("expected exactly one part, got %d: %v", len(parts), parts)
|
||||||
|
}
|
||||||
|
part, _ := parts[0].(map[string]interface{})
|
||||||
|
if part["kind"] != "text" {
|
||||||
|
t.Errorf("part.kind = %v, want text", part["kind"])
|
||||||
|
}
|
||||||
|
text, _ := part["text"].(string)
|
||||||
|
if text == "" {
|
||||||
|
t.Error("part.text is empty — canned reply not populated")
|
||||||
|
}
|
||||||
|
// Reply must be one of the variants.
|
||||||
|
matched := false
|
||||||
|
for _, v := range mockReplyVariants {
|
||||||
|
if v == text {
|
||||||
|
matched = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !matched {
|
||||||
|
t.Errorf("reply text %q is not in mockReplyVariants", text)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := mock.ExpectationsWereMet(); err != nil {
|
||||||
|
t.Errorf("unmet sqlmock expectations: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestProxyA2A_NonMockRuntime_NoShortCircuit verifies the symmetric
|
||||||
|
// contract: a workspace with a real runtime (claude-code, hermes, etc.)
|
||||||
|
// must NOT be affected by the mock check — it falls through to the
|
||||||
|
// real dispatch path. Without this guard, a regression in
|
||||||
|
// lookupRuntime could silently flip every workspace into mock-mode
|
||||||
|
// and start handing out canned replies in place of real-agent traffic.
|
||||||
|
func TestProxyA2A_NonMockRuntime_NoShortCircuit(t *testing.T) {
|
||||||
|
mock := setupTestDB(t)
|
||||||
|
mr := setupTestRedis(t)
|
||||||
|
allowLoopbackForTest(t)
|
||||||
|
broadcaster := newTestBroadcaster()
|
||||||
|
handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
|
||||||
|
|
||||||
|
const wsID = "ws-real-runtime"
|
||||||
|
|
||||||
|
dispatched := false
|
||||||
|
agentServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
dispatched = true
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
w.Write([]byte(`{"jsonrpc":"2.0","id":"1","result":{"status":"ok"}}`))
|
||||||
|
}))
|
||||||
|
defer agentServer.Close()
|
||||||
|
mr.Set("ws:"+wsID+":url", agentServer.URL)
|
||||||
|
|
||||||
|
expectBudgetCheck(mock, wsID)
|
||||||
|
|
||||||
|
// poll-mode SELECT — return push so we proceed past the poll
|
||||||
|
// short-circuit.
|
||||||
|
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
|
||||||
|
WithArgs(wsID).
|
||||||
|
WillReturnRows(sqlmock.NewRows([]string{"delivery_mode"}).AddRow("push"))
|
||||||
|
|
||||||
|
// runtime SELECT — return claude-code so the mock check falls
|
||||||
|
// through.
|
||||||
|
mock.ExpectQuery("SELECT runtime FROM workspaces WHERE id").
|
||||||
|
WithArgs(wsID).
|
||||||
|
WillReturnRows(sqlmock.NewRows([]string{"runtime"}).AddRow("claude-code"))
|
||||||
|
|
||||||
|
mock.ExpectExec("INSERT INTO activity_logs").
|
||||||
|
WillReturnResult(sqlmock.NewResult(0, 1))
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: wsID}}
|
||||||
|
body := `{"jsonrpc":"2.0","id":"real-1","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hi"}]}}}`
|
||||||
|
c.Request = httptest.NewRequest("POST", "/workspaces/"+wsID+"/a2a", bytes.NewBufferString(body))
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
handler.ProxyA2A(c)
|
||||||
|
|
||||||
|
time.Sleep(50 * time.Millisecond)
|
||||||
|
|
||||||
|
if w.Code != http.StatusOK {
|
||||||
|
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
|
||||||
|
}
|
||||||
|
if !dispatched {
|
||||||
|
t.Error("non-mock runtime: expected the agent server to receive the request, but it did not — mock short-circuit may be over-firing")
|
||||||
|
}
|
||||||
|
if err := mock.ExpectationsWereMet(); err != nil {
|
||||||
|
t.Errorf("unmet sqlmock expectations: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPickMockReply_Deterministic locks the determinism contract:
|
||||||
|
// the same (workspaceID, requestID) input must yield the same variant
|
||||||
|
// every call. Required for screen recordings + flake-free e2e
|
||||||
|
// snapshots.
|
||||||
|
func TestPickMockReply_Deterministic(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
ws, req string
|
||||||
|
}{
|
||||||
|
{"ws-1", "req-A"},
|
||||||
|
{"ws-1", "req-B"},
|
||||||
|
{"ws-2", "req-A"},
|
||||||
|
{"", ""},
|
||||||
|
}
|
||||||
|
for _, tc := range cases {
|
||||||
|
first := pickMockReply(tc.ws, tc.req)
|
||||||
|
for i := 0; i < 10; i++ {
|
||||||
|
next := pickMockReply(tc.ws, tc.req)
|
||||||
|
if next != first {
|
||||||
|
t.Errorf("pickMockReply(%q,%q) is not deterministic: got %q then %q",
|
||||||
|
tc.ws, tc.req, first, next)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestIsMockRuntime_TrimsAndCaseInsensitive — typos and stray
|
||||||
|
// whitespace in YAML must still resolve to mock so a single
|
||||||
|
// runtime: " Mock " entry doesn't silently get dispatched.
|
||||||
|
func TestIsMockRuntime_TrimsAndCaseInsensitive(t *testing.T) {
|
||||||
|
cases := map[string]bool{
|
||||||
|
"mock": true,
|
||||||
|
"MOCK": true,
|
||||||
|
" Mock ": true,
|
||||||
|
"mocky": false,
|
||||||
|
"": false,
|
||||||
|
"external": false,
|
||||||
|
"claude-code": false,
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := IsMockRuntime(in); got != want {
|
||||||
|
t.Errorf("IsMockRuntime(%q) = %v, want %v", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestBuildMockA2AResponse_EchoesRequestID — JSON-RPC requires the
|
||||||
|
// reply id to match the request id so callers can correlate. Mock
|
||||||
|
// must hold this contract or canvas's correlation logic breaks.
|
||||||
|
func TestBuildMockA2AResponse_EchoesRequestID(t *testing.T) {
|
||||||
|
out := buildMockA2AResponse("ws-x", "req-echo-7", "On it!")
|
||||||
|
var resp map[string]interface{}
|
||||||
|
if err := json.Unmarshal(out, &resp); err != nil {
|
||||||
|
t.Fatalf("response is not valid JSON: %v", err)
|
||||||
|
}
|
||||||
|
if resp["id"] != "req-echo-7" {
|
||||||
|
t.Errorf("id = %v, want req-echo-7", resp["id"])
|
||||||
|
}
|
||||||
|
if resp["jsonrpc"] != "2.0" {
|
||||||
|
t.Errorf("jsonrpc = %v, want 2.0", resp["jsonrpc"])
|
||||||
|
}
|
||||||
|
result, _ := resp["result"].(map[string]interface{})
|
||||||
|
parts, _ := result["parts"].([]interface{})
|
||||||
|
if len(parts) != 1 {
|
||||||
|
t.Fatalf("expected 1 part, got %d", len(parts))
|
||||||
|
}
|
||||||
|
p, _ := parts[0].(map[string]interface{})
|
||||||
|
if p["text"] != "On it!" {
|
||||||
|
t.Errorf("part.text = %v, want On it!", p["text"])
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -250,6 +250,21 @@ func (h *OrgHandler) createWorkspaceTree(ws OrgWorkspace, parentID *string, absX
|
|||||||
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), id, map[string]interface{}{
|
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), id, map[string]interface{}{
|
||||||
"name": ws.Name, "external": true,
|
"name": ws.Name, "external": true,
|
||||||
})
|
})
|
||||||
|
} else if IsMockRuntime(runtime) {
|
||||||
|
// Mock-runtime workspaces have no container, no EC2, no URL —
|
||||||
|
// the proxyA2ARequest short-circuit synthesises every reply
|
||||||
|
// from a canned variant pool (see mock_runtime.go). Status
|
||||||
|
// goes straight to 'online' so the canvas renders the node
|
||||||
|
// as reachable + the chat tab's send button is enabled. No
|
||||||
|
// URL is set; the proxy never tries to resolve one for mock
|
||||||
|
// runtimes. Built for the funding-demo "200-workspace mock
|
||||||
|
// org" template — visual scale without real backend cost.
|
||||||
|
if _, err := db.DB.ExecContext(ctx, `UPDATE workspaces SET status = $1 WHERE id = $2`, models.StatusOnline, id); err != nil {
|
||||||
|
log.Printf("Org import: mock workspace status update failed for %s: %v", ws.Name, err)
|
||||||
|
}
|
||||||
|
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), id, map[string]interface{}{
|
||||||
|
"name": ws.Name, "mock": true, "runtime": runtime,
|
||||||
|
})
|
||||||
} else if h.workspace.HasProvisioner() {
|
} else if h.workspace.HasProvisioner() {
|
||||||
// Provision container — either backend (CP for SaaS, local Docker
|
// Provision container — either backend (CP for SaaS, local Docker
|
||||||
// for self-hosted) is fine. Pre-2026-05-05 this gate was
|
// for self-hosted) is fine. Pre-2026-05-05 this gate was
|
||||||
@ -675,7 +690,23 @@ func (h *OrgHandler) recurseChildrenForImport(ws OrgWorkspace, parentID string,
|
|||||||
if err := h.createWorkspaceTree(child, &parentID, childAbsX, childAbsY, slotX, slotY, defaults, orgBaseDir, results, provisionSem); err != nil {
|
if err := h.createWorkspaceTree(child, &parentID, childAbsX, childAbsY, slotX, slotY, defaults, orgBaseDir, results, provisionSem); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
time.Sleep(workspaceCreatePacingMs * time.Millisecond)
|
// Pacing exists to throttle Docker container-spawn thundering
|
||||||
|
// during a self-hosted import. Mock-runtime children spawn no
|
||||||
|
// container — no Docker pressure, no LLM bursts, just DB
|
||||||
|
// inserts + a broadcast. Skipping the 2s sleep collapses a
|
||||||
|
// 200-workspace mock-org import from ~7min → ~5s, which is
|
||||||
|
// the difference between a snappy demo and a "did it freeze?"
|
||||||
|
// staring contest. Real (containerful) runtimes still pace.
|
||||||
|
// Inheritance: if the child itself doesn't declare a runtime,
|
||||||
|
// fall back to defaults.runtime — the org template sets
|
||||||
|
// runtime: mock once at the org level, not on every IC node.
|
||||||
|
childRuntime := child.Runtime
|
||||||
|
if childRuntime == "" {
|
||||||
|
childRuntime = defaults.Runtime
|
||||||
|
}
|
||||||
|
if !IsMockRuntime(childRuntime) {
|
||||||
|
time.Sleep(workspaceCreatePacingMs * time.Millisecond)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|||||||
@ -4,6 +4,7 @@ import (
|
|||||||
"bytes"
|
"bytes"
|
||||||
"context"
|
"context"
|
||||||
"io"
|
"io"
|
||||||
|
"log"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
"strings"
|
"strings"
|
||||||
@ -177,16 +178,42 @@ func strDefault(m map[string]interface{}, key, fallback string) string {
|
|||||||
return fallback
|
return fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// findRunningContainer returns the live container name for workspaceID, or ""
|
||||||
|
// when the container is genuinely not running OR the daemon errored
|
||||||
|
// transiently. Routed through provisioner.RunningContainerName as the SSOT
|
||||||
|
// (molecule-core#10) so this handler agrees with healthsweep on the same
|
||||||
|
// inputs. Transient daemon errors are logged distinctly so triage doesn't
|
||||||
|
// confuse a flaky daemon with a stopped container.
|
||||||
func (h *PluginsHandler) findRunningContainer(ctx context.Context, workspaceID string) string {
|
func (h *PluginsHandler) findRunningContainer(ctx context.Context, workspaceID string) string {
|
||||||
if h.docker == nil {
|
name, err := provisioner.RunningContainerName(ctx, h.docker, workspaceID)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("plugins: docker inspect transient error for %s: %v (treating as not-running for this request)", workspaceID, err)
|
||||||
return ""
|
return ""
|
||||||
}
|
}
|
||||||
name := provisioner.ContainerName(workspaceID)
|
return name
|
||||||
info, err := h.docker.ContainerInspect(ctx, name)
|
}
|
||||||
if err == nil && info.State.Running {
|
|
||||||
return name
|
// isExternalRuntime reports whether the workspace's runtime is the
|
||||||
|
// `external` (remote-pull) shape introduced in Phase 30. External
|
||||||
|
// workspaces have no local container — `POST /plugins` (push-install via
|
||||||
|
// docker exec) doesn't apply to them; they pull via the download endpoint
|
||||||
|
// instead. Returns false (allow-install) if the lookup is unwired or
|
||||||
|
// errors — failing open here is safe because the downstream
|
||||||
|
// findRunningContainer step still gates on a real container being there.
|
||||||
|
//
|
||||||
|
// Background — molecule-core#10: without this check, external workspaces
|
||||||
|
// fall through to findRunningContainer's NotFound path and return a
|
||||||
|
// misleading 503 "container not running" instead of a clear "use the
|
||||||
|
// pull endpoint" message.
|
||||||
|
func (h *PluginsHandler) isExternalRuntime(workspaceID string) bool {
|
||||||
|
if h.runtimeLookup == nil {
|
||||||
|
return false
|
||||||
}
|
}
|
||||||
return ""
|
runtime, err := h.runtimeLookup(workspaceID)
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return runtime == "external"
|
||||||
}
|
}
|
||||||
|
|
||||||
func (h *PluginsHandler) execAsRoot(ctx context.Context, containerName string, cmd []string) (string, error) {
|
func (h *PluginsHandler) execAsRoot(ctx context.Context, containerName string, cmd []string) (string, error) {
|
||||||
|
|||||||
@ -0,0 +1,176 @@
|
|||||||
|
package handlers
|
||||||
|
|
||||||
|
import (
|
||||||
|
"go/ast"
|
||||||
|
"go/parser"
|
||||||
|
"go/token"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestFindRunningContainer_RoutesThroughProvisionerSSOT is a behavior-based
|
||||||
|
// AST gate: it pins the invariant that PluginsHandler.findRunningContainer
|
||||||
|
// MUST go through provisioner.RunningContainerName for its is-running check,
|
||||||
|
// instead of carrying its own copy of cli.ContainerInspect logic.
|
||||||
|
//
|
||||||
|
// Background — molecule-core#10: a parallel impl of "is the workspace's
|
||||||
|
// container running" used to live in plugins.go. It drifted from the
|
||||||
|
// canonical impl in healthsweep (which goes through Provisioner.IsRunning
|
||||||
|
// → RunningContainerName) on edge cases like "transient daemon error" —
|
||||||
|
// the duplicate would 503 with a misleading message while healthsweep
|
||||||
|
// correctly stayed defensive. Consolidating onto RunningContainerName as
|
||||||
|
// the SSOT prevents any future copy from re-introducing that drift.
|
||||||
|
//
|
||||||
|
// Mutation invariant: if a future PR replaces the provisioner call with
|
||||||
|
// `h.docker.ContainerInspect(...)` directly, this test fails. That's the
|
||||||
|
// signal to either (a) extend RunningContainerName's contract OR (b)
|
||||||
|
// document why this call site needs to differ. Either way: the drift
|
||||||
|
// gets a reviewer's attention instead of shipping silently.
|
||||||
|
func TestFindRunningContainer_RoutesThroughProvisionerSSOT(t *testing.T) {
|
||||||
|
fset := token.NewFileSet()
|
||||||
|
file, err := parser.ParseFile(fset, "plugins.go", nil, parser.ParseComments)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("parse plugins.go: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var fn *ast.FuncDecl
|
||||||
|
ast.Inspect(file, func(n ast.Node) bool {
|
||||||
|
f, ok := n.(*ast.FuncDecl)
|
||||||
|
if !ok || f.Name.Name != "findRunningContainer" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
// Confirm receiver is *PluginsHandler so we don't pick up an unrelated
|
||||||
|
// helper of the same name. ast.Recv is a FieldList — receivers carry
|
||||||
|
// at most one field.
|
||||||
|
if f.Recv == nil || len(f.Recv.List) == 0 {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
fn = f
|
||||||
|
return false
|
||||||
|
})
|
||||||
|
|
||||||
|
if fn == nil {
|
||||||
|
t.Fatal("findRunningContainer not found in plugins.go — was it renamed? update this test or the SSOT routing assumption")
|
||||||
|
}
|
||||||
|
|
||||||
|
var (
|
||||||
|
callsRunningContainerName bool
|
||||||
|
callsContainerInspectRaw bool
|
||||||
|
)
|
||||||
|
ast.Inspect(fn.Body, func(n ast.Node) bool {
|
||||||
|
call, ok := n.(*ast.CallExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
sel, ok := call.Fun.(*ast.SelectorExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
// Pkg.Func form: provisioner.RunningContainerName(...)
|
||||||
|
if pkgIdent, ok := sel.X.(*ast.Ident); ok {
|
||||||
|
if pkgIdent.Name == "provisioner" && sel.Sel.Name == "RunningContainerName" {
|
||||||
|
callsRunningContainerName = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Receiver-then-method form: h.docker.ContainerInspect(...) /
|
||||||
|
// p.cli.ContainerInspect(...) — anything ending in
|
||||||
|
// .ContainerInspect that's NOT routed through provisioner.
|
||||||
|
if sel.Sel.Name == "ContainerInspect" {
|
||||||
|
callsContainerInspectRaw = true
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
})
|
||||||
|
|
||||||
|
if !callsRunningContainerName {
|
||||||
|
t.Errorf(
|
||||||
|
"findRunningContainer must call provisioner.RunningContainerName for the SSOT inspect — see molecule-core#10. Found no such call.",
|
||||||
|
)
|
||||||
|
}
|
||||||
|
if callsContainerInspectRaw {
|
||||||
|
t.Errorf(
|
||||||
|
"findRunningContainer carries a direct ContainerInspect call. This is the parallel-impl drift molecule-core#10 fixed. " +
|
||||||
|
"Either route through provisioner.RunningContainerName OR — if a new use case truly needs a different inspect — extend RunningContainerName's contract first and update this gate to allow the specific delta.",
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestProvisionerIsRunning_RoutesThroughRunningContainerName mirrors the
|
||||||
|
// gate above but for the OTHER consumer of the SSOT — Provisioner.IsRunning
|
||||||
|
// (called by healthsweep). If a future refactor makes IsRunning carry its
|
||||||
|
// own ContainerInspect again, the two consumers' edge-case behaviors will
|
||||||
|
// silently drift. Keep them yoked.
|
||||||
|
func TestProvisionerIsRunning_RoutesThroughRunningContainerName(t *testing.T) {
|
||||||
|
fset := token.NewFileSet()
|
||||||
|
file, err := parser.ParseFile(fset, "../provisioner/provisioner.go", nil, parser.ParseComments)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("parse provisioner.go: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var fn *ast.FuncDecl
|
||||||
|
ast.Inspect(file, func(n ast.Node) bool {
|
||||||
|
f, ok := n.(*ast.FuncDecl)
|
||||||
|
if !ok || f.Name.Name != "IsRunning" || f.Recv == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
// The receiver type must be *Provisioner specifically. CPProvisioner
|
||||||
|
// has its own IsRunning that talks HTTP to the controlplane and is
|
||||||
|
// out of scope for this gate.
|
||||||
|
if !receiverIs(f, "Provisioner") {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
fn = f
|
||||||
|
return false
|
||||||
|
})
|
||||||
|
if fn == nil {
|
||||||
|
t.Fatal("Provisioner.IsRunning not found — was it renamed? update this test")
|
||||||
|
}
|
||||||
|
|
||||||
|
var (
|
||||||
|
callsRunningContainerName bool
|
||||||
|
callsContainerInspectRaw bool
|
||||||
|
)
|
||||||
|
ast.Inspect(fn.Body, func(n ast.Node) bool {
|
||||||
|
call, ok := n.(*ast.CallExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
// Same-package call: bare identifier (e.g. RunningContainerName(...)).
|
||||||
|
if id, ok := call.Fun.(*ast.Ident); ok && id.Name == "RunningContainerName" {
|
||||||
|
callsRunningContainerName = true
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
// Selector call: pkg.Func (e.g. provisioner.RunningContainerName)
|
||||||
|
// OR recv.Method (e.g. p.cli.ContainerInspect).
|
||||||
|
sel, ok := call.Fun.(*ast.SelectorExpr)
|
||||||
|
if !ok {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
switch sel.Sel.Name {
|
||||||
|
case "RunningContainerName":
|
||||||
|
callsRunningContainerName = true
|
||||||
|
case "ContainerInspect":
|
||||||
|
callsContainerInspectRaw = true
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
})
|
||||||
|
|
||||||
|
if !callsRunningContainerName {
|
||||||
|
t.Errorf("Provisioner.IsRunning must call RunningContainerName for the SSOT inspect — see molecule-core#10")
|
||||||
|
}
|
||||||
|
if callsContainerInspectRaw {
|
||||||
|
t.Errorf("Provisioner.IsRunning carries a direct ContainerInspect call; route through RunningContainerName instead")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// receiverIs reports whether fn's receiver is `*<typeName>` or `<typeName>`.
|
||||||
|
func receiverIs(fn *ast.FuncDecl, typeName string) bool {
|
||||||
|
if fn.Recv == nil || len(fn.Recv.List) == 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
expr := fn.Recv.List[0].Type
|
||||||
|
if star, ok := expr.(*ast.StarExpr); ok {
|
||||||
|
expr = star.X
|
||||||
|
}
|
||||||
|
id, ok := expr.(*ast.Ident)
|
||||||
|
return ok && strings.EqualFold(id.Name, typeName)
|
||||||
|
}
|
||||||
@ -32,6 +32,18 @@ import (
|
|||||||
// inside the workspace at startup.
|
// inside the workspace at startup.
|
||||||
func (h *PluginsHandler) Install(c *gin.Context) {
|
func (h *PluginsHandler) Install(c *gin.Context) {
|
||||||
workspaceID := c.Param("id")
|
workspaceID := c.Param("id")
|
||||||
|
// External-runtime guard (molecule-core#10): push-install via docker
|
||||||
|
// exec is meaningless for `runtime='external'` workspaces — they have
|
||||||
|
// no local container. Reject early with a hint pointing at the
|
||||||
|
// pull-mode endpoint, instead of falling through to a misleading
|
||||||
|
// "container not running" 503 from findRunningContainer.
|
||||||
|
if h.isExternalRuntime(workspaceID) {
|
||||||
|
c.JSON(http.StatusUnprocessableEntity, gin.H{
|
||||||
|
"error": "plugin install via push is not supported for external runtimes",
|
||||||
|
"hint": "external workspaces pull plugins via GET /workspaces/:id/plugins/:name/download",
|
||||||
|
})
|
||||||
|
return
|
||||||
|
}
|
||||||
// Cap the JSON body so a pathological POST can't exhaust parser memory.
|
// Cap the JSON body so a pathological POST can't exhaust parser memory.
|
||||||
bodyMax := envx.Int64("PLUGIN_INSTALL_BODY_MAX_BYTES", defaultInstallBodyMaxBytes)
|
bodyMax := envx.Int64("PLUGIN_INSTALL_BODY_MAX_BYTES", defaultInstallBodyMaxBytes)
|
||||||
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, bodyMax)
|
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, bodyMax)
|
||||||
@ -93,6 +105,16 @@ func (h *PluginsHandler) Uninstall(c *gin.Context) {
|
|||||||
pluginName := c.Param("name")
|
pluginName := c.Param("name")
|
||||||
ctx := c.Request.Context()
|
ctx := c.Request.Context()
|
||||||
|
|
||||||
|
// Mirror Install's external-runtime guard (molecule-core#10) so the
|
||||||
|
// two endpoints reject the same shape with the same message.
|
||||||
|
if h.isExternalRuntime(workspaceID) {
|
||||||
|
c.JSON(http.StatusUnprocessableEntity, gin.H{
|
||||||
|
"error": "plugin uninstall via docker exec is not supported for external runtimes",
|
||||||
|
"hint": "external workspaces manage their own plugin directory; remove it locally",
|
||||||
|
})
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
if err := validatePluginName(pluginName); err != nil {
|
if err := validatePluginName(pluginName); err != nil {
|
||||||
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid plugin name"})
|
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid plugin name"})
|
||||||
return
|
return
|
||||||
|
|||||||
@ -0,0 +1,176 @@
|
|||||||
|
package handlers
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"net/http"
|
||||||
|
"net/http/httptest"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/gin-gonic/gin"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestPluginInstall_ExternalRuntime_Returns422 — molecule-core#10.
|
||||||
|
// Install on a `runtime='external'` workspace must NOT fall through to
|
||||||
|
// findRunningContainer (which would 503 with a misleading "container not
|
||||||
|
// running"). It must return 422 with a hint pointing at the pull-mode
|
||||||
|
// download endpoint.
|
||||||
|
func TestPluginInstall_ExternalRuntime_Returns422(t *testing.T) {
|
||||||
|
h := NewPluginsHandler(t.TempDir(), nil, nil).
|
||||||
|
WithRuntimeLookup(func(workspaceID string) (string, error) {
|
||||||
|
return "external", nil
|
||||||
|
})
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: "ba1789b0-4d21-4f4f-a878-fa226bf77cf5"}}
|
||||||
|
c.Request = httptest.NewRequest(
|
||||||
|
"POST",
|
||||||
|
"/workspaces/ba1789b0-4d21-4f4f-a878-fa226bf77cf5/plugins",
|
||||||
|
bytes.NewBufferString(`{"source":"local://my-plugin"}`),
|
||||||
|
)
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
h.Install(c)
|
||||||
|
|
||||||
|
if w.Code != http.StatusUnprocessableEntity {
|
||||||
|
t.Errorf("expected 422 (Unprocessable Entity) for runtime='external', got %d: %s", w.Code, w.Body.String())
|
||||||
|
}
|
||||||
|
if !strings.Contains(w.Body.String(), "external runtimes") {
|
||||||
|
t.Errorf("expected error body to mention 'external runtimes', got: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
if !strings.Contains(w.Body.String(), "download") {
|
||||||
|
t.Errorf("expected error body to point at the download endpoint, got: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPluginUninstall_ExternalRuntime_Returns422 — symmetric guard on the
|
||||||
|
// uninstall path (DELETE /workspaces/:id/plugins/:name). External
|
||||||
|
// workspaces manage their own plugin directory locally; the platform
|
||||||
|
// can't docker-exec into them.
|
||||||
|
func TestPluginUninstall_ExternalRuntime_Returns422(t *testing.T) {
|
||||||
|
h := NewPluginsHandler(t.TempDir(), nil, nil).
|
||||||
|
WithRuntimeLookup(func(workspaceID string) (string, error) {
|
||||||
|
return "external", nil
|
||||||
|
})
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{
|
||||||
|
{Key: "id", Value: "ba1789b0-4d21-4f4f-a878-fa226bf77cf5"},
|
||||||
|
{Key: "name", Value: "my-plugin"},
|
||||||
|
}
|
||||||
|
c.Request = httptest.NewRequest(
|
||||||
|
"DELETE",
|
||||||
|
"/workspaces/ba1789b0-4d21-4f4f-a878-fa226bf77cf5/plugins/my-plugin",
|
||||||
|
nil,
|
||||||
|
)
|
||||||
|
|
||||||
|
h.Uninstall(c)
|
||||||
|
|
||||||
|
if w.Code != http.StatusUnprocessableEntity {
|
||||||
|
t.Errorf("expected 422 for runtime='external', got %d: %s", w.Code, w.Body.String())
|
||||||
|
}
|
||||||
|
if !strings.Contains(w.Body.String(), "external runtimes") {
|
||||||
|
t.Errorf("expected error body to mention 'external runtimes', got: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPluginInstall_ContainerBackedRuntime_FallsThroughGuard — the runtime
|
||||||
|
// guard MUST NOT short-circuit container-backed runtimes. With
|
||||||
|
// `runtime='claude-code'` the install proceeds past the guard; without a
|
||||||
|
// real plugin source it'll fail downstream (here: 404 from local resolver
|
||||||
|
// because no plugin staged), which is the correct error to surface.
|
||||||
|
//
|
||||||
|
// This is the mutation-test partner: deleting the `runtime == "external"`
|
||||||
|
// check would still pass TestPluginInstall_ExternalRuntime (because Install
|
||||||
|
// would 404 instead of 422 — but the test asserts 422), and would still
|
||||||
|
// pass this test (because both pre-fix and post-fix produce 404 here).
|
||||||
|
// What this case pins is "non-external still falls through," catching
|
||||||
|
// any over-eager guard that rejects all runtimes.
|
||||||
|
func TestPluginInstall_ContainerBackedRuntime_FallsThroughGuard(t *testing.T) {
|
||||||
|
h := NewPluginsHandler(t.TempDir(), nil, nil).
|
||||||
|
WithRuntimeLookup(func(workspaceID string) (string, error) {
|
||||||
|
return "claude-code", nil
|
||||||
|
})
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: "c7c28c0b-4ea5-4e75-9728-3ba860081708"}}
|
||||||
|
c.Request = httptest.NewRequest(
|
||||||
|
"POST",
|
||||||
|
"/workspaces/c7c28c0b-4ea5-4e75-9728-3ba860081708/plugins",
|
||||||
|
bytes.NewBufferString(`{"source":"local://nonexistent-plugin"}`),
|
||||||
|
)
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
h.Install(c)
|
||||||
|
|
||||||
|
if w.Code == http.StatusUnprocessableEntity {
|
||||||
|
t.Errorf("runtime='claude-code' must fall through the external guard; got 422: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
// The local resolver will fail to find the plugin → 404. Anything
|
||||||
|
// other than 422 (which would mean we mis-classified) is fine.
|
||||||
|
if w.Code != http.StatusNotFound {
|
||||||
|
t.Errorf("expected 404 (plugin not found in registry), got %d: %s", w.Code, w.Body.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPluginInstall_NoRuntimeLookup_FailsOpen — when the runtime lookup
|
||||||
|
// is unwired (test fixtures, niche deploy shapes) the guard MUST default
|
||||||
|
// to allowing the install attempt. The downstream findRunningContainer
|
||||||
|
// step still gates on a real container, so failing open here doesn't
|
||||||
|
// expose a bypass — it just preserves backwards-compat with deployments
|
||||||
|
// that haven't wired the lookup.
|
||||||
|
func TestPluginInstall_NoRuntimeLookup_FailsOpen(t *testing.T) {
|
||||||
|
h := NewPluginsHandler(t.TempDir(), nil, nil) // NO WithRuntimeLookup
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: "ws-no-lookup"}}
|
||||||
|
c.Request = httptest.NewRequest(
|
||||||
|
"POST",
|
||||||
|
"/workspaces/ws-no-lookup/plugins",
|
||||||
|
bytes.NewBufferString(`{"source":"local://nonexistent"}`),
|
||||||
|
)
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
h.Install(c)
|
||||||
|
|
||||||
|
if w.Code == http.StatusUnprocessableEntity {
|
||||||
|
t.Errorf("nil runtimeLookup must fall through (fail-open); got 422: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestPluginInstall_RuntimeLookupErrors_FailsOpen — same fail-open story
|
||||||
|
// for transient DB errors in the lookup. We don't want a momentary
|
||||||
|
// Postgres hiccup to flip every plugin install into a 422.
|
||||||
|
func TestPluginInstall_RuntimeLookupErrors_FailsOpen(t *testing.T) {
|
||||||
|
h := NewPluginsHandler(t.TempDir(), nil, nil).
|
||||||
|
WithRuntimeLookup(func(workspaceID string) (string, error) {
|
||||||
|
return "", errFakeDB
|
||||||
|
})
|
||||||
|
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
c, _ := gin.CreateTestContext(w)
|
||||||
|
c.Params = gin.Params{{Key: "id", Value: "ws-db-flake"}}
|
||||||
|
c.Request = httptest.NewRequest(
|
||||||
|
"POST",
|
||||||
|
"/workspaces/ws-db-flake/plugins",
|
||||||
|
bytes.NewBufferString(`{"source":"local://nonexistent"}`),
|
||||||
|
)
|
||||||
|
c.Request.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
h.Install(c)
|
||||||
|
|
||||||
|
if w.Code == http.StatusUnprocessableEntity {
|
||||||
|
t.Errorf("runtimeLookup error must fall through (fail-open); got 422: %s", w.Body.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// errFakeDB is a sentinel for the fail-open lookup-error case.
|
||||||
|
var errFakeDB = &fakeError{msg: "synthetic db error"}
|
||||||
|
|
||||||
|
type fakeError struct{ msg string }
|
||||||
|
|
||||||
|
func (e *fakeError) Error() string { return e.msg }
|
||||||
@ -78,6 +78,10 @@ var fallbackRuntimes = map[string]struct{}{
|
|||||||
"openclaw": {},
|
"openclaw": {},
|
||||||
"codex": {},
|
"codex": {},
|
||||||
"external": {},
|
"external": {},
|
||||||
|
// mock — virtual workspace with hardcoded canned A2A replies.
|
||||||
|
// No container, no EC2, no template repo. See mock_runtime.go
|
||||||
|
// for the full rationale (200-workspace funding-demo org).
|
||||||
|
"mock": {},
|
||||||
}
|
}
|
||||||
|
|
||||||
// loadRuntimesFromManifest builds the runtime allowlist from
|
// loadRuntimesFromManifest builds the runtime allowlist from
|
||||||
@ -104,6 +108,10 @@ func loadRuntimesFromManifest(path string) (map[string]struct{}, error) {
|
|||||||
// the manifest doesn't know about it. Injected here so we
|
// the manifest doesn't know about it. Injected here so we
|
||||||
// don't need a special-case in every caller.
|
// don't need a special-case in every caller.
|
||||||
"external": {},
|
"external": {},
|
||||||
|
// mock is ALWAYS available for the same reason as external:
|
||||||
|
// virtual workspace, no template repo, never spawns a
|
||||||
|
// container. See mock_runtime.go.
|
||||||
|
"mock": {},
|
||||||
}
|
}
|
||||||
for _, e := range m.WorkspaceTemplates {
|
for _, e := range m.WorkspaceTemplates {
|
||||||
name := strings.TrimSpace(e.Name)
|
name := strings.TrimSpace(e.Name)
|
||||||
|
|||||||
@ -112,6 +112,19 @@ func (h *WorkspaceHandler) Restart(c *gin.Context) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// runtime=mock: virtual workspace with canned A2A replies. No
|
||||||
|
// container, no EC2, no provisioning state to recycle. Mirror
|
||||||
|
// the external no-op so the canvas's Restart button doesn't
|
||||||
|
// silently fail or leak through to the (template-less) provisioner.
|
||||||
|
if dbRuntime == "mock" {
|
||||||
|
c.JSON(http.StatusOK, gin.H{
|
||||||
|
"status": "noop",
|
||||||
|
"runtime": "mock",
|
||||||
|
"message": "mock workspaces have no container — restart is a no-op",
|
||||||
|
})
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
// SaaS mode: cpProv handles workspace EC2 lifecycle. Self-hosted mode:
|
// SaaS mode: cpProv handles workspace EC2 lifecycle. Self-hosted mode:
|
||||||
// provisioner handles local Docker containers. At least one must be
|
// provisioner handles local Docker containers. At least one must be
|
||||||
// available — previously only `provisioner` was checked, which broke
|
// available — previously only `provisioner` was checked, which broke
|
||||||
@ -532,7 +545,9 @@ func (h *WorkspaceHandler) runRestartCycle(workspaceID string) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Don't auto-restart external workspaces (no Docker container)
|
// Don't auto-restart external workspaces (no Docker container)
|
||||||
if dbRuntime == "external" {
|
// or mock workspaces (no container, every reply is canned —
|
||||||
|
// see workspace-server/internal/handlers/mock_runtime.go).
|
||||||
|
if dbRuntime == "external" || dbRuntime == "mock" {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,7 @@
|
|||||||
package handlers
|
package handlers
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"runtime"
|
||||||
"sync"
|
"sync"
|
||||||
"sync/atomic"
|
"sync/atomic"
|
||||||
"testing"
|
"testing"
|
||||||
@ -15,6 +16,42 @@ func resetRestartStatesFor(workspaceID string) {
|
|||||||
restartStates.Delete(workspaceID)
|
restartStates.Delete(workspaceID)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// drainCoalesceGoroutine spawns `coalesceRestart(wsID, cycle)` on a
|
||||||
|
// goroutine that mirrors the real production caller shape
|
||||||
|
// (`go h.RestartByID(...)` from a2a_proxy.go, a2a_proxy_helpers.go,
|
||||||
|
// main.go), and registers a t.Cleanup that blocks until the goroutine
|
||||||
|
// has TERMINATED — not just panicked-and-recovered, fully exited.
|
||||||
|
//
|
||||||
|
// This is the bleed-prevention contract for Class H (Task #170): no
|
||||||
|
// test in this file may declare itself complete while a coalesceRestart
|
||||||
|
// goroutine it spawned is still alive, because that goroutine could
|
||||||
|
// otherwise wake up after the test's sqlmock has been closed and
|
||||||
|
// either:
|
||||||
|
// - issue a stale INSERT that gets attributed to the next test's
|
||||||
|
// sqlmock connection — surfaces as
|
||||||
|
// "INSERT-not-expected for kind=DELEGATION_FAILED" / =WORKSPACE_PROVISION_FAILED
|
||||||
|
// in a neighbour test that doesn't itself touch coalesceRestart; or
|
||||||
|
// - hold a reference to the closed *sql.DB and panic on the next op.
|
||||||
|
//
|
||||||
|
// Implementation notes:
|
||||||
|
// - sync.WaitGroup must be Add()ed BEFORE the goroutine is spawned;
|
||||||
|
// Add inside the goroutine races with Wait.
|
||||||
|
// - t.Cleanup runs in LIFO order, so this composes safely with other
|
||||||
|
// cleanups (e.g. setupTestDB's mockDB.Close).
|
||||||
|
// - We don't bound the Wait with a timeout — if the goroutine
|
||||||
|
// genuinely deadlocks, the whole test process should hang and fail
|
||||||
|
// under -timeout. A timeout-then-orphan would mask the bleed.
|
||||||
|
func drainCoalesceGoroutine(t *testing.T, wsID string, cycle func()) {
|
||||||
|
t.Helper()
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
wg.Add(1)
|
||||||
|
go func() {
|
||||||
|
defer wg.Done()
|
||||||
|
coalesceRestart(wsID, cycle)
|
||||||
|
}()
|
||||||
|
t.Cleanup(wg.Wait)
|
||||||
|
}
|
||||||
|
|
||||||
// TestCoalesceRestart_SingleCallRunsOneCycle is the baseline:
|
// TestCoalesceRestart_SingleCallRunsOneCycle is the baseline:
|
||||||
// no concurrency, one cycle. If this fails the gate logic is broken at
|
// no concurrency, one cycle. If this fails the gate logic is broken at
|
||||||
// its simplest path.
|
// its simplest path.
|
||||||
@ -200,19 +237,45 @@ func TestCoalesceRestart_PanicInCycleClearsState(t *testing.T) {
|
|||||||
const wsID = "test-coalesce-panic-recovery"
|
const wsID = "test-coalesce-panic-recovery"
|
||||||
resetRestartStatesFor(wsID)
|
resetRestartStatesFor(wsID)
|
||||||
|
|
||||||
// First call's cycle panics. coalesceRestart's defer must swallow
|
// Spawn the panicking cycle on a goroutine via drainCoalesceGoroutine
|
||||||
// the panic so this test caller doesn't see it propagate up — that
|
// — this mirrors the real production callsite shape
|
||||||
// matches what the real production caller (`go h.RestartByID(...)`)
|
// (`go h.RestartByID(...)` from a2a_proxy.go:584,
|
||||||
// gets: the goroutine survives, no process crash.
|
// a2a_proxy_helpers.go:197, main.go:213). The previous form called
|
||||||
defer func() {
|
// coalesceRestart synchronously, which neither exercised the
|
||||||
if r := recover(); r != nil {
|
// goroutine-survival contract nor caught Class H bleed regressions
|
||||||
t.Errorf("panic should NOT propagate out of coalesceRestart (would crash the platform process from a goroutine), got: %v", r)
|
// where the panic-recovery goroutine outlives the test and pollutes
|
||||||
|
// the next test's sqlmock with INSERTs from runRestartCycle's
|
||||||
|
// LogActivity calls (kinds DELEGATION_FAILED / WORKSPACE_PROVISION_FAILED).
|
||||||
|
//
|
||||||
|
// drainCoalesceGoroutine registers a t.Cleanup that Wait()s for the
|
||||||
|
// goroutine to TERMINATE — not merely panic-and-recover — before
|
||||||
|
// the test ends.
|
||||||
|
drainCoalesceGoroutine(t, wsID, func() { panic("simulated cycle failure") })
|
||||||
|
|
||||||
|
// We need a mid-test barrier (not just the t.Cleanup-time barrier)
|
||||||
|
// so the second coalesceRestart below sees state.running=false. The
|
||||||
|
// goroutine clears state.running inside its deferred recover; poll
|
||||||
|
// the package-level restartStates map until that observable flip
|
||||||
|
// happens. Bound at 2s — longer = real bug.
|
||||||
|
deadline := time.Now().Add(2 * time.Second)
|
||||||
|
for time.Now().Before(deadline) {
|
||||||
|
sv, ok := restartStates.Load(wsID)
|
||||||
|
if ok {
|
||||||
|
st := sv.(*restartState)
|
||||||
|
st.mu.Lock()
|
||||||
|
running := st.running
|
||||||
|
st.mu.Unlock()
|
||||||
|
if !running {
|
||||||
|
break
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}()
|
time.Sleep(time.Millisecond)
|
||||||
coalesceRestart(wsID, func() { panic("simulated cycle failure") })
|
}
|
||||||
|
|
||||||
// Second call must run a fresh cycle. If running stayed true after
|
// Second call must run a fresh cycle. If running stayed true after
|
||||||
// the panic, this call would early-return without invoking cycle.
|
// the panic, this call would early-return without invoking cycle.
|
||||||
|
// Synchronous — no panic, so no goroutine to drain, and we want to
|
||||||
|
// assert ran.Load() immediately after.
|
||||||
var ran atomic.Bool
|
var ran atomic.Bool
|
||||||
coalesceRestart(wsID, func() { ran.Store(true) })
|
coalesceRestart(wsID, func() { ran.Store(true) })
|
||||||
if !ran.Load() {
|
if !ran.Load() {
|
||||||
@ -220,6 +283,98 @@ func TestCoalesceRestart_PanicInCycleClearsState(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// TestCoalesceRestart_DrainHelperWaitsForGoroutineExit is the Class H
|
||||||
|
// regression guard for Task #170. It asserts the contract enforced by
|
||||||
|
// drainCoalesceGoroutine: t.Cleanup blocks until the spawned
|
||||||
|
// coalesceRestart goroutine has FULLY EXITED — not merely recovered
|
||||||
|
// from panic. This is the contract that prevents stale LogActivity
|
||||||
|
// INSERTs from a recovering goroutine bleeding into the next test's
|
||||||
|
// sqlmock (the failure mode reported as "INSERT-not-expected for
|
||||||
|
// kind=DELEGATION_FAILED" in TestPooledWithEICTunnel_PreservesFnErr).
|
||||||
|
//
|
||||||
|
// We use a deterministic bleed-shape probe rather than goroutine-count
|
||||||
|
// arithmetic: the cycle blocks on a release channel for ~150ms — long
|
||||||
|
// enough that without a Wait barrier, the outer sub-test would return
|
||||||
|
// before the goroutine exited. We then verify the wg.Wait inside
|
||||||
|
// drainCoalesceGoroutine actually delayed t.Run's completion: total
|
||||||
|
// elapsed must be >= the block duration. Asserts exact-shape, not
|
||||||
|
// substring (per saved-memory feedback_assert_exact_not_substring):
|
||||||
|
// elapsed < blockFor would mean the cleanup didn't wait, which is the
|
||||||
|
// exact bleed we're guarding against.
|
||||||
|
//
|
||||||
|
// We additionally panic from the cycle (after the block) to confirm
|
||||||
|
// the helper waits past panic recovery, not just past cycle return.
|
||||||
|
func TestCoalesceRestart_DrainHelperWaitsForGoroutineExit(t *testing.T) {
|
||||||
|
const blockFor = 150 * time.Millisecond
|
||||||
|
const wsID = "test-coalesce-drain-helper-contract"
|
||||||
|
resetRestartStatesFor(wsID)
|
||||||
|
|
||||||
|
// done is closed inside the cycle, AFTER the block + AFTER the
|
||||||
|
// panic (which the deferred recover in coalesceRestart catches).
|
||||||
|
// Actually: defer in cycle runs before panic propagates to the
|
||||||
|
// outer recover. Use defer to close.
|
||||||
|
exited := make(chan struct{})
|
||||||
|
|
||||||
|
subStart := time.Now()
|
||||||
|
t.Run("drain_under_subtest", func(st *testing.T) {
|
||||||
|
drainCoalesceGoroutine(st, wsID, func() {
|
||||||
|
defer close(exited)
|
||||||
|
time.Sleep(blockFor)
|
||||||
|
panic("contract-test panic-after-block")
|
||||||
|
})
|
||||||
|
// st.Cleanup runs here, before t.Run returns. wg.Wait must
|
||||||
|
// block until the goroutine has finished its panic recovery.
|
||||||
|
})
|
||||||
|
subElapsed := time.Since(subStart)
|
||||||
|
|
||||||
|
// Contract: the helper's wg.Wait MUST have blocked t.Run from
|
||||||
|
// returning until after the cycle's block + panic recovery.
|
||||||
|
if subElapsed < blockFor {
|
||||||
|
t.Fatalf(
|
||||||
|
"drainCoalesceGoroutine contract violated: t.Run returned in %v, "+
|
||||||
|
"but cycle blocks for %v. The Wait barrier is broken — a "+
|
||||||
|
"coalesceRestart goroutine can outlive its test's t.Cleanup "+
|
||||||
|
"and pollute neighbour-test sqlmock state (Class H bleed).",
|
||||||
|
subElapsed, blockFor,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// And the goroutine must have actually closed `exited` (i.e. ran
|
||||||
|
// the deferred close before panic propagated through coalesceRestart's
|
||||||
|
// recover). If exited is still open here, the goroutine never
|
||||||
|
// reached the close — meaning either the panic short-circuited the
|
||||||
|
// defer (Go runtime bug — won't happen) or the goroutine never
|
||||||
|
// ran at all (drainCoalesceGoroutine spawn shape regressed).
|
||||||
|
select {
|
||||||
|
case <-exited:
|
||||||
|
// Correct path.
|
||||||
|
default:
|
||||||
|
t.Fatal("cycle goroutine never reached its deferred close — panic-recovery contract regressed")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Belt-and-suspenders: the post-recover state-clear must have
|
||||||
|
// flipped state.running back to false. If this fails, the panic
|
||||||
|
// path skipped the deferred state-clear in coalesceRestart.
|
||||||
|
sv, ok := restartStates.Load(wsID)
|
||||||
|
if !ok {
|
||||||
|
t.Fatal("restartStates entry missing for wsID after cycle — sync.Map regression")
|
||||||
|
}
|
||||||
|
st := sv.(*restartState)
|
||||||
|
st.mu.Lock()
|
||||||
|
running := st.running
|
||||||
|
st.mu.Unlock()
|
||||||
|
if running {
|
||||||
|
t.Error("state.running was not cleared after panic — sticky-running deadlock regressed")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reference runtime.NumGoroutine to keep the runtime import
|
||||||
|
// honest — also a useful smoke check that the goroutine count
|
||||||
|
// hasn't ballooned 10x while debugging this test.
|
||||||
|
if n := runtime.NumGoroutine(); n > 200 {
|
||||||
|
t.Logf("warning: NumGoroutine=%d after drain — high but not necessarily a leak", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// TestCoalesceRestart_DifferentWorkspacesDoNotSerialize verifies the
|
// TestCoalesceRestart_DifferentWorkspacesDoNotSerialize verifies the
|
||||||
// per-workspace state map: an in-flight restart for ws A must not
|
// per-workspace state map: an in-flight restart for ws A must not
|
||||||
// block restarts for ws B. Important for performance — without this,
|
// block restarts for ws B. Important for performance — without this,
|
||||||
|
|||||||
@ -1073,18 +1073,53 @@ func (p *Provisioner) IsRunning(ctx context.Context, workspaceID string) (bool,
|
|||||||
if p == nil || p.cli == nil {
|
if p == nil || p.cli == nil {
|
||||||
return false, ErrNoBackend
|
return false, ErrNoBackend
|
||||||
}
|
}
|
||||||
name := ContainerName(workspaceID)
|
name, err := RunningContainerName(ctx, p.cli, workspaceID)
|
||||||
info, err := p.cli.ContainerInspect(ctx, name)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if isContainerNotFound(err) {
|
|
||||||
return false, nil
|
|
||||||
}
|
|
||||||
// Transient daemon error: caller treats !running as dead + restarts.
|
// Transient daemon error: caller treats !running as dead + restarts.
|
||||||
// Returning true + the underlying error preserves the error for
|
// Returning true + the underlying error preserves the error for
|
||||||
// metrics/logging without triggering the destructive path.
|
// metrics/logging without triggering the destructive path.
|
||||||
return true, err
|
return true, err
|
||||||
}
|
}
|
||||||
return info.State.Running, nil
|
return name != "", nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// RunningContainerName returns the container name for workspaceID iff the
|
||||||
|
// container exists AND is in the Running state. Single source of truth for
|
||||||
|
// "what live container should I exec into for this workspace?" — used by
|
||||||
|
// both Provisioner.IsRunning (healthsweep) and the plugins handler.
|
||||||
|
//
|
||||||
|
// Distinguishes three outcomes so callers can pick their own policy:
|
||||||
|
//
|
||||||
|
// - ("ws-<id>", nil): container is running. Caller can exec into it.
|
||||||
|
// - ("", nil): container does not exist OR exists but is stopped
|
||||||
|
// (NotFound, Exited, Created, Restarting…). Caller
|
||||||
|
// should treat as a definitive "not running."
|
||||||
|
// - ("", err): transient daemon error (timeout, socket EOF, ctx
|
||||||
|
// cancel). Caller should NOT infer "not running" —
|
||||||
|
// this could be a flaky daemon under load. Decide
|
||||||
|
// per-callsite whether to fail soft or hard.
|
||||||
|
//
|
||||||
|
// Background — molecule-core#10: the plugins handler used to carry its own
|
||||||
|
// copy of this inspect logic (`findRunningContainer`) which collapsed
|
||||||
|
// transient errors into the same "" return as a genuinely-stopped container.
|
||||||
|
// That hid daemon flakes as misleading 503 "container not running" responses
|
||||||
|
// AND let the two impls drift on edge-case behavior. This is the SSOT.
|
||||||
|
func RunningContainerName(ctx context.Context, cli *client.Client, workspaceID string) (string, error) {
|
||||||
|
if cli == nil {
|
||||||
|
return "", ErrNoBackend
|
||||||
|
}
|
||||||
|
name := ContainerName(workspaceID)
|
||||||
|
info, err := cli.ContainerInspect(ctx, name)
|
||||||
|
if err != nil {
|
||||||
|
if isContainerNotFound(err) {
|
||||||
|
return "", nil
|
||||||
|
}
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
if info.State.Running {
|
||||||
|
return name, nil
|
||||||
|
}
|
||||||
|
return "", nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// isContainerNotFound returns true when the Docker client indicates the
|
// isContainerNotFound returns true when the Docker client indicates the
|
||||||
|
|||||||
@ -71,9 +71,15 @@ func StartHealthSweep(ctx context.Context, checker ContainerChecker, interval ti
|
|||||||
}
|
}
|
||||||
|
|
||||||
func sweepOnlineWorkspaces(ctx context.Context, checker ContainerChecker, onOffline OfflineHandler) {
|
func sweepOnlineWorkspaces(ctx context.Context, checker ContainerChecker, onOffline OfflineHandler) {
|
||||||
// Skip external workspaces (runtime='external') — they have no Docker container
|
// Skip external + mock workspaces — neither has a Docker container.
|
||||||
|
// external: agent runs on operator's laptop, polled via heartbeat.
|
||||||
|
// mock: virtual workspace, every reply is canned (see
|
||||||
|
// workspace-server/internal/handlers/mock_runtime.go). Both would
|
||||||
|
// false-positive as "container gone" on every sweep tick and
|
||||||
|
// auto-restart would loop forever (provisioner has no template
|
||||||
|
// for either runtime).
|
||||||
rows, err := db.DB.QueryContext(ctx,
|
rows, err := db.DB.QueryContext(ctx,
|
||||||
`SELECT id FROM workspaces WHERE status IN ('online', 'degraded') AND COALESCE(runtime, 'langgraph') != 'external'`)
|
`SELECT id FROM workspaces WHERE status IN ('online', 'degraded') AND COALESCE(runtime, 'langgraph') NOT IN ('external', 'mock')`)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Printf("Health sweep: query error: %v", err)
|
log.Printf("Health sweep: query error: %v", err)
|
||||||
return
|
return
|
||||||
|
|||||||
@ -413,22 +413,20 @@ func sweepStaleTokensWithoutContainer(ctx context.Context, reaper OrphanReaper)
|
|||||||
// `"5m0s"` mismatch with Postgres interval grammar; passing seconds
|
// `"5m0s"` mismatch with Postgres interval grammar; passing seconds
|
||||||
// as an int keeps the binding portable.
|
// as an int keeps the binding portable.
|
||||||
graceSeconds := int(staleTokenGrace.Seconds())
|
graceSeconds := int(staleTokenGrace.Seconds())
|
||||||
// `runtime != 'external'` is load-bearing: external workspaces have NO
|
// `runtime NOT IN ('external','mock')` is load-bearing: neither
|
||||||
// local container by design (the agent runs off-host), so the
|
// runtime has a local container, so the "no live container"
|
||||||
// "no live container" predicate below would match every external
|
// predicate below would match every row and revoke its token.
|
||||||
// workspace and revoke its token. The token is the off-host agent's
|
// external: token is the off-host agent's only credential —
|
||||||
// only authentication credential — revoking breaks the entire
|
// revoking breaks the entire external-runtime feature
|
||||||
// external-runtime feature. Discovered 2026-05-03 when a fresh
|
// (incident 2026-05-03). mock: same shape — no container by
|
||||||
// external workspace had its token silently revoked ~6 minutes after
|
// design, see workspace-server/internal/handlers/mock_runtime.go.
|
||||||
// creation by this sweep, killing the operator's MCP heartbeat and
|
|
||||||
// inbox poll with `HTTP 401 — token may be revoked`.
|
|
||||||
rows, qErr := db.DB.QueryContext(ctx, `
|
rows, qErr := db.DB.QueryContext(ctx, `
|
||||||
SELECT DISTINCT t.workspace_id::text
|
SELECT DISTINCT t.workspace_id::text
|
||||||
FROM workspace_auth_tokens t
|
FROM workspace_auth_tokens t
|
||||||
JOIN workspaces w ON w.id = t.workspace_id
|
JOIN workspaces w ON w.id = t.workspace_id
|
||||||
WHERE t.revoked_at IS NULL
|
WHERE t.revoked_at IS NULL
|
||||||
AND w.status NOT IN ('removed', 'provisioning')
|
AND w.status NOT IN ('removed', 'provisioning')
|
||||||
AND w.runtime != 'external'
|
AND w.runtime NOT IN ('external', 'mock')
|
||||||
AND COALESCE(t.last_used_at, t.created_at) < now() - make_interval(secs => $2)
|
AND COALESCE(t.last_used_at, t.created_at) < now() - make_interval(secs => $2)
|
||||||
AND (
|
AND (
|
||||||
cardinality($1::text[]) = 0
|
cardinality($1::text[]) = 0
|
||||||
|
|||||||
@ -26,7 +26,7 @@ import (
|
|||||||
// accidentally matching a future query that opens with the same column
|
// accidentally matching a future query that opens with the same column
|
||||||
// name OR a regression that drops one of the load-bearing predicates.
|
// name OR a regression that drops one of the load-bearing predicates.
|
||||||
func expectStaleTokenSweepNoOp(mock sqlmock.Sqlmock) {
|
func expectStaleTokenSweepNoOp(mock sqlmock.Sqlmock) {
|
||||||
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'`).
|
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\)`).
|
||||||
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}))
|
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}))
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -492,7 +492,7 @@ func TestSweepOnce_StaleTokenRevokeFiresWhenNoContainer(t *testing.T) {
|
|||||||
// excludes 'external' (2026-05-03 fix — the sweep was incorrectly
|
// excludes 'external' (2026-05-03 fix — the sweep was incorrectly
|
||||||
// targeting external workspaces which have no container by design),
|
// targeting external workspaces which have no container by design),
|
||||||
// and the staleness predicate appears in the SELECT.
|
// and the staleness predicate appears in the SELECT.
|
||||||
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'.*COALESCE\(t\.last_used_at, t\.created_at\) < now\(\) - make_interval`).
|
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\).*COALESCE\(t\.last_used_at, t\.created_at\) < now\(\) - make_interval`).
|
||||||
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
|
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
|
||||||
AddRow(orphanedID))
|
AddRow(orphanedID))
|
||||||
|
|
||||||
@ -548,7 +548,7 @@ func TestSweepOnce_StaleTokenRevokeFailureBailsLoop(t *testing.T) {
|
|||||||
|
|
||||||
// Third-pass returns two stale-token workspaces; the first revoke
|
// Third-pass returns two stale-token workspaces; the first revoke
|
||||||
// errors. Loop must bail without attempting the second.
|
// errors. Loop must bail without attempting the second.
|
||||||
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'`).
|
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\)`).
|
||||||
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
|
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
|
||||||
AddRow("aaaa1111-0000-0000-0000-000000000000").
|
AddRow("aaaa1111-0000-0000-0000-000000000000").
|
||||||
AddRow("bbbb2222-0000-0000-0000-000000000000"))
|
AddRow("bbbb2222-0000-0000-0000-000000000000"))
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user