Commit Graph

14 Commits

Author SHA1 Message Date
7eb348536b fix(harness): bake cf-proxy nginx.conf at build time, not via configs:
The previous configs:-based fix (87b971a2) didn't actually fix the DinD
issue — Compose v2 falls back to bind mounts for `configs:` when swarm
mode is not active, so the resulting runc invocation still tries to
mount /workspace/.../cf-proxy/nginx.conf from the OUTER host filesystem
that the act_runner-vs-host-docker socket-mount can't see. Same
"not a directory" error returned.

Switch to a thin Dockerfile (cf-proxy/Dockerfile) that COPYs nginx.conf
into nginx:1.27-alpine. The build context is uploaded to the daemon as
a tarball, not bind-mounted from the host filesystem, so the path
translation gap doesn't apply. Verified locally: `docker build` +
`docker run cf-proxy nginx -T` reproduces the baked config end-to-end.

Trade-off: ~2-3s build cost on every harness up. Acceptable for the
Gitea CI gate; local-dev re-builds the image only when nginx.conf
changes (Docker layer cache).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:09:08 -07:00
87b971a292 fix(ci): close 3 chronic Gitea-Actions workflow flakes (closes #88)
Three workflows have been failing on every push to this Gitea repo for
GitHub-shaped reasons that don't translate to act_runner. Surfaced
while landing #84; bundled per `feedback_gitea_actions_migration_audit_pattern`
("bundle per-repo, not per-finding") instead of three separate PRs.

1) handlers-postgres-integration: localhost → 127.0.0.1
   - lib/pq tries to dial localhost → ::1 first; the postgres service
     container only listens on IPv4 → ECONNREFUSED → all
     TestIntegration_* fail. Pin IPv4 to make the job deterministic.

2) pr-guards / disable-auto-merge-on-push: Gitea no-op
   - The previous reusable-workflow caller invoked `gh pr merge
     --disable-auto`, which calls GitHub's GraphQL API. Gitea returns
     HTTP 405 on /api/graphql → step always fails. Inline the step so
     it can detect Gitea (GITEA_ACTIONS=true OR repo url under
     moleculesai.app) and no-op with a notice. Auto-merge gating is
     moot on Gitea anyway: there's no `--auto` primitive being
     touched. Job stays ALWAYS-RUN so branch protection's required
     check still lands SUCCESS (avoids the SKIPPED-in-set trap from
     `feedback_branch_protection_check_name_parity`).

3) Harness Replays: cf-proxy nginx.conf via docker `configs:` (not bind)
   - act_runner runs the workflow inside a runner container; runc in
     the docker daemon below resolves bind-mount source paths on the
     OUTER host, not inside the runner. The path
     `/workspace/.../cf-proxy/nginx.conf` is invisible there → "not a
     directory" runc error. Switching to compose `configs:` packages
     the file as content rather than a host bind, sidestepping the
     DinD path-translation gap.

Local validation:
  - YAML parsed clean for all 3 files.
  - cf-proxy nginx.conf: standalone `docker compose run cf-proxy
    nginx -T` reproduced the configs: mount end-to-end and dumped the
    config correctly. The full harness compose still renders via
    `docker compose config`.

Real-CI verification will land on this branch's first push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:06:09 -07:00
11afd25e6a chore: retrigger Harness Replays after Class G + clone-manifest fixes (#168)
Empty-shape commit on a tests/harness/** path to trigger the harness-replays
workflow's path-filter on staging, verifying that:
- PR #40 (Class G #168) migrated all explicit github.com/Molecule-AI URL refs
- PR #42 (Class G #168 followup) migrated the indirect clone-manifest.sh + manifest.json forms

After this run, harness-replays should get past the previously-failing
'fatal: could not read Username for https://github.com' clone-manifest step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 13:36:39 -07:00
Hongming Wang
7c8b81c6eb fix(harness): disable memory-plugin sidecar in harness tenants
PR #2906 bundled memory-plugin-postgres as a startup-gated sidecar in
both tenant entrypoints. Plugin migrations include
\`CREATE EXTENSION IF NOT EXISTS vector\` which fails on the harness's
plain postgres:15-alpine (no pgvector preinstalled). The 30s health
gate then aborts container boot and Harness Replays fails.

Detected on auto-promote PR #2914 — Harness Replays job:
  Container harness-tenant-alpha-1  Error
  Container harness-tenant-beta-1   Error
  dependency failed to start: container harness-tenant-alpha-1 exited (1)

The harness doesn't exercise memory features, so the simplest fix is
to use the documented escape hatch the sidecar entrypoint already
ships (MEMORY_PLUGIN_DISABLE=1) — applied to both alpha and beta
tenants in compose.yml. Alternative would be switching the harness
postgres images to pgvector/pgvector:pg15, deferred until the
harness wants to verify memory paths.

Refs PR #2906. Unblocks #2914 (auto-promote staging→main).
2026-05-05 11:42:20 -07:00
Hongming Wang
1b207b214d fix(harness): stub platform_auth with *args lambdas (#2743 fallout)
PR #2743 (multi-workspace MCP PR-2) made auth_headers accept an
optional ``workspace_id`` arg and self_source_headers stayed
1-arg-required. The peer-discovery-404 harness replay stubbed both
with 0-arg lambdas, so the helper call inside the replay raised:

    TypeError: <lambda>() takes 0 positional arguments but 1 was given

…and the diagnostic captured by the replay was the TypeError text,
not the platform-404 string the assertion grep'd for. Caught by
PR-2737 (auto-promote staging→main) — the replay went red right
after #2743 merged into staging.

Switching both stubs to ``*args, **kwargs`` makes them tolerant of
both the legacy 0-arg call shape AND the new 1-arg-with-workspace
call shape, so neither the harness nor the in-tree unit tests need
to know which version of the runtime helpers ran the call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:55:42 -07:00
Hongming Wang
a15972066b harness(phase-2-followup): fix assert_status mislabel + honest race comment
Two review nits from PR #2493 that don't affect correctness but matter
for honesty in the harness's own self-documentation:

1. tenant-isolation.sh F3/F4 used assert_status for non-HTTP values.
   LEAKED_INTO_ALPHA/BETA are jq-derived counts, not HTTP codes — but
   the assertion ran through assert_status, which formats the result
   as "(HTTP 0)". Anyone reading the test output would believe these
   assertions involved an HTTP call. Adds a plain `assert` helper
   matching per-tenant-independence.sh's pattern, and uses it on the
   two count comparisons.

2. per-tenant-independence.sh Phase F over-claimed coverage.
   The comment said the concurrent-INSERT race catches "shared-pool
   corruption" + "lib/pq prepared-statement cache collision". Both
   are real failure modes — but neither can fire across tenants in
   THIS topology, because each tenant owns its own DATABASE_URL and
   its own postgres-{alpha,beta} container. The comment now lists
   only what the test actually catches (redis cross-keyspace bleed,
   shared cp-stub state corruption, cf-proxy buffer mixup) and notes
   that a future shared-Postgres variant is the right place for the
   lib/pq cache assertion.

No behavioural change — both replays still pass 13/13 + 12/12, all six
replays pass on a clean run-all-replays.sh boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:00:04 -07:00
Hongming Wang
c275716005 harness(phase-2): multi-tenant compose + cross-tenant isolation replays
Brings the local harness from "single tenant covering the request path"
to "two tenants covering both the request path AND the per-tenant
isolation boundary" — the same shape production runs (one EC2 + one
Postgres + one MOLECULE_ORG_ID per tenant).

Why this matters: the four prior replays exercise the SaaS request
path against one tenant. They cannot prove that TenantGuard rejects
a misrouted request (production CF tunnel + AWS LB are the failure
surface), nor that two tenants doing legitimate work in parallel
keep their `activity_logs` / `workspaces` / connection-pool state
partitioned. Both are real bug classes — TenantGuard allowlist drift
shipped #2398, lib/pq prepared-statement cache collision is documented
as an org-wide hazard.

What changed:

1. compose.yml — split into two tenants.
   tenant-alpha + postgres-alpha + tenant-beta + postgres-beta + the
   shared cp-stub, redis, cf-proxy. Each tenant gets a distinct
   ADMIN_TOKEN + MOLECULE_ORG_ID and its own Postgres database. cf-proxy
   depends on both tenants becoming healthy.

2. cf-proxy/nginx.conf — Host-header → tenant routing.
   `map $host $tenant_upstream` resolves the right backend per request.
   Required `resolver 127.0.0.11 valid=30s ipv6=off;` because nginx
   needs an explicit DNS resolver to use a variable in `proxy_pass`
   (literal hostnames resolve once at startup; variables resolve per
   request — without the resolver nginx fails closed with 502).
   `server_name` lists both tenants + the legacy alias so unknown Host
   headers don't silently route to a default and mask routing bugs.

3. _curl.sh — per-tenant + cross-tenant-negative helpers.
   `curl_alpha_admin` / `curl_beta_admin` set the right
   Host + Authorization + X-Molecule-Org-Id triple.
   `curl_alpha_creds_at_beta` / `curl_beta_creds_at_alpha` exist
   precisely to make WRONG requests (replays use them to assert
   TenantGuard rejects). `psql_exec_alpha` / `psql_exec_beta` shell out
   per-tenant Postgres exec. Legacy aliases (`curl_admin`, `psql_exec`)
   keep the four pre-Phase-2 replays working without edits.

4. seed.sh — registers parent+child workspaces in BOTH tenants.
   Captures server-generated IDs via `jq -r '.id'` (POST /workspaces
   ignores body.id, so the older client-side mint silently desynced
   from the workspaces table and broke FK-dependent replays). Stashes
   `ALPHA_PARENT_ID` / `ALPHA_CHILD_ID` / `BETA_PARENT_ID` /
   `BETA_CHILD_ID` to .seed.env, plus legacy `ALPHA_ID` / `BETA_ID`
   aliases for backwards compat with chat-history / channel-envelope.

5. New replays.

   tenant-isolation.sh (13 assertions) — TenantGuard 404s any request
   whose X-Molecule-Org-Id doesn't match the container's
   MOLECULE_ORG_ID. Asserts the 404 body has zero
   tenant/org/forbidden/denied keywords (existence of a tenant must
   not be probable from the outside). Covers cross-tenant routing
   misconfigure + allowlist drift + missing-org-header.

   per-tenant-independence.sh (12 assertions) — both tenants seed
   activity_logs in parallel with distinct row counts (3 vs 5) and
   confirm each tenant's history endpoint returns exactly its own
   counts. Then a concurrent INSERT race (10 rows per tenant in
   parallel via `&` + wait) catches shared-pool corruption +
   prepared-statement cache poisoning + redis cross-keyspace bleed.

6. Bug fix: down.sh + dump-logs SECRETS_ENCRYPTION_KEY validation.
   `docker compose down -v` validates the entire compose file even
   though it doesn't read the env. up.sh generates a per-run key into
   its own shell — down.sh runs in a fresh shell that wouldn't see it,
   so without a placeholder `compose down` exited non-zero before
   removing volumes. Workspaces silently leaked into the next
   ./up.sh + seed.sh boot. Caught when tenant-isolation.sh F1/F2 saw
   3× duplicate alpha-parent rows accumulated across three prior runs.
   Same fix applied to the workflow's dump-logs step.

7. requirements.txt — pin molecule-ai-workspace-runtime>=0.1.78.
   channel-envelope-trust-boundary.sh imports from `molecule_runtime.*`
   (the wheel-rewritten path) so it catches the failure mode where
   the wheel build silently strips a fix that unit tests on local
   source still pass. CI was failing this replay because the wheel
   wasn't installed — caught in the staging push run from #2492.

8. .github/workflows/harness-replays.yml — Phase 2 plumbing.
   * Removed /etc/hosts step (Host-header path eliminated the need;
     scripts already source _curl.sh).
   * Updated dump-logs to reference the new service names
     (tenant-alpha + tenant-beta + postgres-alpha + postgres-beta).
   * Added SECRETS_ENCRYPTION_KEY placeholder env on the dump step.

Verified: ./run-all-replays.sh from a clean state — 6/6 passed
(buildinfo-stale-image, channel-envelope-trust-boundary, chat-history,
peer-discovery-404, per-tenant-independence, tenant-isolation).

Roadmap section updated: Phase 2 marked shipped. Phase 3 promoted to
"replace cp-stub with real molecule-controlplane Docker build + env
coherence lint."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:36:40 -07:00
Hongming Wang
5cca462843 harness(phase-0): sudo-free Host-header path + chat_history + envelope replays
Three changes that bring the local harness from "covers what staging
covers minus the SaaS topology" to "exercises every surface we shipped
this session against the prod-shape Dockerfile.tenant image."

1. Drop the /etc/hosts requirement.

   Replays previously needed `127.0.0.1 harness-tenant.localhost` in
   /etc/hosts to resolve the cf-proxy. That gated the harness behind a
   sudo step on every fresh dev box and CI runner. The cf-proxy nginx
   already routes by Host header (matches production CF tunnel: URL is
   public, Host carries tenant identity), so the no-sudo path is to
   target loopback :8080 with `Host: harness-tenant.localhost` set as
   a header.

   New `tests/harness/_curl.sh` centralises this — curl_anon /
   curl_admin / curl_workspace / psql_exec wrappers all set the Host
   + auth headers automatically. seed.sh, peer-discovery-404.sh,
   buildinfo-stale-image.sh updated to source it. Legacy /etc/hosts
   users still work via env-var override.

2. Fix the seed.sh FK regression that blocked DB-side replays.

   POST /workspaces ignores any `id` in the request body and generates
   one server-side. seed.sh was minting client-side UUIDs that never
   reached the workspaces table, so any replay that INSERTed into
   activity_logs (FK-constrained on workspace_id) failed with the
   workspace-not-found error. Capture the returned id from the
   response instead.

3. Two new replays cover the surfaces shipped this session.

   chat-history.sh — exercises the full SaaS-shape wire that PR #2472
   (peer_id filter), #2474 (chat_history client tool), and #2476
   (before_ts paging) ride on. 8 phases / 16 assertions: peer_id filter,
   limit cap, before_ts paging, OR-clause covering both source_id and
   target_id, malformed peer_id 400, malformed before_ts 400, URL-encoded
   SQLi-shape rejection. Verified PASS against the live harness.

   channel-envelope-trust-boundary.sh — exercises PR #2471 + #2481 by
   importing from `molecule_runtime.*` (the wheel-rewritten path) so
   it catches "wheel build dropped a fix that unit tests still pass."
   5 phases / 11 assertions: malicious peer_id scrubbed from envelope,
   agent_card_url omitted on validation failure, XML-injection bytes
   scrubbed, valid UUID preserved, _agent_card_url_for direct gate.
   Verified PASS against published wheel 0.1.79.

run-all-replays.sh auto-discovers — no registration needed. Full
lifecycle (boot → seed → 4 replays → teardown) runs clean.

Roadmap section updated to reflect Phase 1 (this PR) → Phase 2
(multi-tenant + CI gate) → Phase 3 (real CP) → Phase 4 (Miniflare +
LocalStack + traffic replay).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:12:49 -07:00
Hongming Wang
c8b17ea1ad fix(harness): install httpx for replay Python evals
peer-discovery-404 imports workspace/a2a_client.py which depends on
httpx; the runner's stock Python doesn't have it, so the replay's
PARSE assertion (b) fails with ModuleNotFoundError on every run. The
WIRE assertion (a) — pure curl — passes, so the failure was masking
just enough to make the replay LOOK partially-broken when the tenant
side is fine.

Adding tests/harness/requirements.txt with only httpx instead of
sourcing workspace/requirements.txt: that file pulls a2a-sdk,
langchain-core, opentelemetry, sqlalchemy, temporalio, etc. — ~30s
of install for one replay's PARSE step. The harness's deps surface
should grow when a new replay introduces a new import, not by
default.

Workflow gains one step (`pip install -r tests/harness/requirements.txt`)
between the /etc/hosts setup and run-all-replays. No other changes.
2026-04-30 13:32:00 -07:00
Hongming Wang
9dae0503ee fix(harness): generate SECRETS_ENCRYPTION_KEY per-run instead of hardcoding
Replaces the hardcoded base64 sentinel (630dd0da) with a per-run
generation in up.sh, exported into compose's interpolation environment.

Why:
- Hardcoding a 32-byte base64 string in the repo, even one labelled
  "test-only", sets a bad muscle-memory pattern. The next agent or
  contributor copies the shape into another harness — or worse, into a
  staging .env — and the test-only sentinel turns into something
  someone treats as a real key.
- Secret scanners flag key-shaped values regardless of the surrounding
  comment claiming intent. Avoiding the literal entirely sidesteps the
  false-positive.
- A fresh key per harness lifetime more closely mimics prod's
  per-tenant isolation, exercising the same code paths without any
  pretense of stable encrypted-data fixtures (which the harness wipes
  on every ./down.sh anyway).

Implementation:
- up.sh: `openssl rand -base64 32` if SECRETS_ENCRYPTION_KEY isn't
  already set in the caller's env. Honoring a pre-set value lets a
  debug session pin a key for reproducibility (e.g. when investigating
  encrypted-row corruption).
- compose.yml: `${SECRETS_ENCRYPTION_KEY:?…}` makes a misuse loud —
  running `docker compose up` directly bypassing up.sh fails fast with
  a clear error pointing at the right entry point, rather than a 100s
  unhealthy-tenant timeout.

Both paths verified via `docker compose config`:
- with key exported: value interpolates cleanly
- without it: "required variable SECRETS_ENCRYPTION_KEY is missing a
  value: must be set — run via tests/harness/up.sh, which generates
  one per run"
2026-04-30 13:30:14 -07:00
Hongming Wang
630dd0dae7 fix(harness): seed SECRETS_ENCRYPTION_KEY so MOLECULE_ENV=production tenant boots
Found via the first run of the harness-replays-required-check workflow
(#2410): the tenant container failed its healthcheck after 100s with
"refusing to boot without encryption in production". This is the
deferred CRITICAL flagged on PR #2401 — `crypto.InitStrict()` requires
SECRETS_ENCRYPTION_KEY when MOLECULE_ENV=production, and the harness
sets prod-mode but never seeded a key.

Fix: add a clearly-test 32-byte base64 value (encoding the literal
string "harness-test-only-not-for-prod!!") inline. Keeping
MOLECULE_ENV=production preserves the harness's value as a production-
shape replay surface — it now exercises the full encryption boot path
including the strict check, rather than skirting it via dev-mode.

Why inline rather than .env:
- The harness compose file is meant to be self-contained and
  reproducible from a clean clone. An external .env would split the
  config across two files for one synthetic value.
- The value is intentionally a sentinel; there's no operator decision
  here to gate behind a per-deployment file.

After this lands the harness boots clean and `run-all-replays.sh` can
exercise the buildinfo + peer-discovery replays as designed. The
required-check workflow itself (#2410) needs no change.
2026-04-30 13:25:52 -07:00
Hongming Wang
0af4012f79 feat(tests): add run-all-replays.sh harness runner
Boots the harness, runs every script under replays/, tracks pass/fail,
and tears down on exit. Closes the README's TODO for the harness runner
that the per-replay-registration comment referenced.

Usage:
  ./run-all-replays.sh                # boot, run, teardown
  KEEP_UP=1 ./run-all-replays.sh      # leave harness running on exit
  REBUILD=1 ./run-all-replays.sh      # rebuild images before booting

Trap-on-EXIT teardown ensures partial-failure runs don't leak Docker
resources. Returns non-zero if any replay failed; CI can adopt this as
a single command without per-replay registration. Phase 2 picks this up
to wire harness-based E2E as a required check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:57:27 -07:00
Hongming Wang
046eccbb7c fix(harness): five-axis self-review fixes before merge
Three findings from re-reviewing PR #2401 with fresh eyes:

1. Critical — port binding to 0.0.0.0
   compose.yml's cf-proxy bound 8080:8080 (default 0.0.0.0). The harness
   uses a hardcoded ADMIN_TOKEN so anyone on the local network or VPN
   could hit /workspaces with admin privileges. Switch to 127.0.0.1:8080
   so admin access is loopback-only — safe for E2E and prevents the
   known-token leak.

2. Required — dead code in cp-stub
   peersFailureMode + __stub/mode + __stub/peers were declared with
   atomic.Value setters but no handler ever READ from them. CP doesn't
   host /registry/peers (the tenant does), so the toggles couldn't
   drive responses. Removed the dead vars + handlers; kept
   redeployFleetCalls counter and __stub/state since those have a real
   consumer in the buildinfo replay.

3. Required — replay's auth-context dependency
   peer-discovery-404.sh's Python eval ran a2a_client.get_peers_with_
   diagnostic() against the live tenant. Without a workspace token
   file, auth_headers() yields empty headers — so the helper might
   exercise a 401 branch instead of the 404 branch the replay claims
   to test.

   Split the assertion into (a) WIRE — direct curl proves the platform
   returns 404 from /registry/<unregistered>/peers — and (b) PARSE —
   feed the helper a mocked 404 via httpx patches, no network/auth.
   Each branch tests exactly what it claims.

   Also added a graceful skip when the workspace runtime in the
   current checkout pre-dates #2399 (no get_peers_with_diagnostic
   yet) — replay falls back to wire-only verification with a clear
   message instead of an opaque AttributeError. After #2399 lands on
   staging, both branches will run.

cp-stub still builds clean. compose.yml validates. Replay's bash
syntax + Python eval both verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:32:40 -07:00
Hongming Wang
f13d2b2b7b feat(tests): add production-shape local harness (Phase 1)
The harness brings up the SaaS tenant topology on localhost using the
SAME workspace-server/Dockerfile.tenant image that ships to production.
Tests run against http://harness-tenant.localhost:8080 and exercise the
same code path a real tenant takes:

  client
    → cf-proxy   (nginx; CF tunnel + LB header rewrites)
    → tenant     (Dockerfile.tenant — combined platform + canvas)
    → cp-stub    (minimal Go CP stand-in for /cp/* paths)
    → postgres + redis

Why this exists: bugs that survive `go run ./cmd/server` and ship to
prod almost always live in env-gated middleware (TenantGuard, /cp/*
proxy, canvas proxy), header rewrites, or the strict-auth / live-token
mode. The harness activates ALL of them locally so #2395 + #2397-class
bugs can be reproduced before deploy.

Phase 1 surface:
  - cp-stub/main.go: minimal CP stand-in. /cp/auth/me, redeploy-fleet,
    /__stub/{peers,mode,state} for replay scripts. Catch-all returns
    501 with a clear message when a new CP route appears.
  - cf-proxy/nginx.conf: rewrites Host to <slug>.localhost, injects
    X-Forwarded-*, disables buffering to mirror CF tunnel streaming
    semantics.
  - compose.yml: one service per topology layer; tenant builds from
    the actual production Dockerfile.tenant.
  - up.sh / down.sh / seed.sh: lifecycle scripts.
  - replays/peer-discovery-404.sh: reproduces #2397 + asserts the
    diagnostic helper from PR #2399 surfaces "404" + "registered".
  - replays/buildinfo-stale-image.sh: reproduces #2395 + asserts
    /buildinfo wire shape + GIT_SHA injection from PR #2398.
  - README.md: topology, quickstart, what the harness does NOT cover.

Phases 2-3 (separate PRs):
  - Phase 2: convert tests/e2e/test_api.sh to target the harness URL
    instead of localhost; make harness-based replays a required CI gate.
  - Phase 3: config-coherence lint that diffs harness env list against
    production CP's env list, fails CI on drift.

Verification:
  - cp-stub builds (go build ./...).
  - cp-stub responds to all stubbed endpoints (smoke-tested locally).
  - compose.yml passes `docker compose config --quiet`.
  - All shell scripts pass `bash -n` syntax check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:22:46 -07:00