molecule-ai/molecule-core

Fork 2

Files

T

History

infra-runtime-be 5c989fef2f

Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s

Details

CI / Detect changes (pull_request) Successful in 11s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 16s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 16s

Details

E2E Chat / detect-changes (pull_request) Successful in 13s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s

Details

Harness Replays / detect-changes (pull_request) Successful in 6s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s

Details

Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s

Details

publish-runtime-autobump / pr-validate (pull_request) Successful in 34s

Details

publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped

Details

Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s

Details

gate-check-v3 / gate-check (pull_request) Successful in 6s

Details

qa-review / approved (pull_request) Successful in 6s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

sop-checklist / all-items-acked (pull_request) Successful in 3s

Details

sop-checklist / review-refire (pull_request) Has been skipped

Details

sop-tier-check / tier-check (pull_request) Successful in 4s

Details

CI / Platform (Go) (pull_request) Successful in 5m5s

Details

CI / Canvas (Next.js) (pull_request) Successful in 6m11s

Details

CI / Python Lint & Test (pull_request) Successful in 7m17s

Details

CI / all-required (pull_request) Successful in 6m33s

Details

Harness Replays / Harness Replays (pull_request) Successful in 4s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m27s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m24s

Details

security-review / approved (pull_request) Refired via /security-recheck by unknown

Details

Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m56s

Details

E2E Chat / E2E Chat (pull_request) Failing after 6m33s

Details

audit-force-merge / audit (pull_request) Successful in 4s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10m21s

Details

feat(uploads): bump cap to 100MB + correct-reason error messages

CTO 2026-05-19 directive on forensic a99ab0a1 (reno-stars >50MB
upload that surfaced "signal timed out" when the real cause was
file-size + a fixed 60s client timeout):

  "if its file size issue, should have error that instead saying
   timeout which is wrong"

Bundles the cap raise + the wrong-reason fix in ONE PR because the
two are coupled — bumping the server alone would still leak the
fixed-60s timeout for legitimate slow uploads; fixing the client
alone would 413 every >50MB attempt.

Server (push-mode, EC2 workspace):
  - workspace-server/internal/handlers/chat_files.go:
      chatUploadMaxBytes 50→100 MB
      httpClient.Timeout 120→1200 s (matches the new slow-uplink budget)
  - workspace/internal_chat_uploads.py:
      CHAT_UPLOAD_MAX_BYTES 50→100 MB
      CHAT_UPLOAD_MAX_FILE_BYTES 25→100 MB (aligned with total so a
      single legitimate large file succeeds end-to-end)

Canvas:
  - canvas/src/components/tabs/chat/uploads.ts:
      MAX_UPLOAD_BYTES 100 MB constant + FileTooLargeError class
      pre-flight gate: file-size violation throws BEFORE any fetch,
        with the actionable "File too large (got X MB) — limit is 100MB"
      computeUploadTimeoutMs: 60s floor + 100 KB/s scaled deadline
        (was a fixed 60s — the root cause of the forensic)
  - canvas/src/components/tabs/chat/hooks/useChatSend.ts:
      mapUploadErrorToReason: routes each cause to ITS OWN message
        (FileTooLargeError | TimeoutError | server-Error | fallback)
      no conflation between file-size and connection-too-slow

Tests:
  - workspace-server chat_files_test.go: pins 100 MB constant,
    asserts sub-cap forwards + over-cap non-2xx
  - canvas uploads.cap.test.ts (10 cases): pre-flight gate, exact-cap
    edge, scaled-timeout curve, server-413 propagation, AbortSignal
    shape — explicit negative on "TimeoutError ≠ FileTooLargeError"
  - canvas useChatSend.errorReason.test.ts (5 cases): per-cause
    message contract, explicit negatives that guard against the
    wrong-reason conflation

Test harness mirror:
  - tests/harness/cf-proxy/nginx.conf: client_max_body_size 50m→100m
    (this is the harness mirror; the production CF / nginx tier is
    out-of-repo. If prod still caps at 50m, this mirror passes while
    prod 413s — surface to ops.)

Follow-up (SSOT, NOT in this PR):
  The 100 MB constant now lives in THREE mirror sites (canvas TS +
  workspace Python + platform Go). Per feedback_no_single_source_of_truth,
  the proper fix is exposing the cap via GET /uploads/limits so the
  client fetches the live value. Filing as a separate issue.

References:
  - task #295 (internal tracker; CTO-authorized this work)
  - forensic a99ab0a1 (reno-stars 2026-05-19)
  - feedback_surface_actionable_failure_reason_to_user (CTO 2026-05-17)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-19 20:23:04 -07:00

cf-proxy

feat(uploads): bump cap to 100MB + correct-reason error messages

2026-05-19 20:23:04 -07:00

cp-stub

ci(docker): pin base image digests in all Dockerfiles

2026-05-09 23:56:39 +00:00

replays

fix(harness): stub platform_auth with *args lambdas (#2743 fallout)

2026-05-04 08:55:42 -07:00

_curl.sh

harness(phase-2): multi-tenant compose + cross-tenant isolation replays

2026-05-01 21:36:40 -07:00

.gitignore

harness(phase-0): sudo-free Host-header path + chat_history + envelope replays

2026-05-01 20:12:49 -07:00

compose.yml

fix(harness): bake cf-proxy nginx.conf at build time, not via configs:

2026-05-07 17:09:08 -07:00

down.sh

harness(phase-2): multi-tenant compose + cross-tenant isolation replays

2026-05-01 21:36:40 -07:00

README.md

chore: retrigger Harness Replays after Class G + clone-manifest fixes (#168 )

2026-05-07 13:36:39 -07:00

requirements.txt

harness(phase-2): multi-tenant compose + cross-tenant isolation replays

2026-05-01 21:36:40 -07:00

run-all-replays.sh

feat(tests): add run-all-replays.sh harness runner

2026-04-30 11:57:27 -07:00

seed.sh

harness(phase-2): multi-tenant compose + cross-tenant isolation replays

2026-05-01 21:36:40 -07:00

up.sh

harness(phase-2): multi-tenant compose + cross-tenant isolation replays

2026-05-01 21:36:40 -07:00

README.md

Production-shape local harness

The harness brings up the SaaS tenant topology on localhost using the same Dockerfile.tenant image that ships to production. Tests target the cf-proxy on http://localhost:8080 and pass the tenant identity via a Host: header — exactly the way production CF tunnel routes by Host header. The cf-proxy nginx then rewrites headers and proxies to the right tenant container, exercising the SAME code path a real tenant takes including TenantGuard middleware, the /cp/* reverse proxy, the canvas reverse proxy, and a Cloudflare-tunnel-shape header rewrite layer.

Since Phase 2 the harness runs two tenants in parallel (alpha and beta) with their own Postgres instance and distinct MOLECULE_ORG_IDs — same shape as production, where each tenant gets its own EC2 + DB. This is what cross-tenant isolation replays need to prove TenantGuard actually 404s a misrouted request.

tests/harness/_curl.sh is the helper sourced by every replay. Per tenant: curl_alpha_anon / curl_alpha_admin / curl_beta_anon / curl_beta_admin / psql_exec_alpha / psql_exec_beta. Plus deliberately-wrong cross-tenant negative-test helpers for isolation replays: curl_alpha_creds_at_beta / curl_beta_creds_at_alpha. Legacy single-tenant aliases (curl_anon, curl_admin, psql_exec) default to alpha so pre-Phase-2 replays continue to work. New replays should source _curl.sh rather than rolling their own curl.

Why this exists

Local go run ./cmd/server skips:

TenantGuard middleware (no MOLECULE_ORG_ID env)
/cp/* reverse proxy mount (no CP_UPSTREAM_URL env)
CANVAS_PROXY_URL (canvas runs separately on :3000)
Header rewrites that production's CF tunnel + LB perform
Strict-auth mode (no live ADMIN_TOKEN)

Bugs that survive go run and ship to production almost always live in one of those layers. The harness activates ALL of them.

Topology

                                      client
                                        ↓
                                     cf-proxy            nginx, mirrors CF tunnel header rewrites
                                        ↓ (routes by Host header)
              ┌─────────────────────────┴─────────────────────────┐
              ↓                                                   ↓
        tenant-alpha                                        tenant-beta
        Host: harness-tenant-alpha.localhost                Host: harness-tenant-beta.localhost
        MOLECULE_ORG_ID=harness-org-alpha                   MOLECULE_ORG_ID=harness-org-beta
              ↓                                                   ↓
        postgres-alpha                                      postgres-beta
              ↓                                                   ↓
              └─────────────────────────┬─────────────────────────┘
                                        ↓
                             cp-stub + redis (shared)

Each tenant runs the production Dockerfile.tenant image with its own admin token, org id, and Postgres instance — identical isolation boundaries to production where each tenant gets a dedicated EC2 + DB. cp-stub and redis are shared because they model the per-region multi-tenant CP and a single Redis cluster.

Quickstart

cd tests/harness
./up.sh                 # builds + starts all services (both tenants)
./seed.sh               # registers parent+child workspaces in BOTH tenants
./replays/tenant-isolation.sh
./replays/per-tenant-independence.sh
./down.sh               # tear down + remove volumes

To run every replay in one shot (boot, seed, run-all, teardown):

cd tests/harness
./run-all-replays.sh    # full lifecycle; non-zero exit if any replay fails
KEEP_UP=1 ./run-all-replays.sh   # leave harness up for debugging
REBUILD=1 ./run-all-replays.sh   # rebuild images before booting

No /etc/hosts edit required — replays use the cf-proxy's loopback port and pass the per-tenant Host: header (_curl.sh handles this automatically). This matches how production CF tunnel routes: the URL is the public CF endpoint, the Host header carries the per-tenant identity. Quick check:

curl -H "Host: harness-tenant-alpha.localhost" http://localhost:8080/health
curl -H "Host: harness-tenant-beta.localhost"  http://localhost:8080/health

(If you have a legacy /etc/hosts entry from older docs, it still works — BASE, ALPHA_HOST, BETA_HOST all honor env-var overrides. The legacy harness-tenant.localhost host alias maps to alpha.)

Replay scripts

Each replay script reproduces a real bug class against the harness so fixes can be verified locally before deploy. The bar for adding a replay is "this bug shipped to production despite local E2E being green" — the script becomes the regression gate that closes that gap.

Replay	Closes	What it proves
`peer-discovery-404.sh`	#2397	tool_list_peers surfaces the actual reason instead of "may be isolated"
`buildinfo-stale-image.sh`	#2395	GIT_SHA reaches the binary; verify-step comparison logic works
`chat-history.sh`	#2472 + #2474 + #2476	`peer_id` filter (incl. OR over source/target) + `before_ts` paging + UUID/RFC3339 trust boundary on the activity route
`channel-envelope-trust-boundary.sh`	#2471 + #2481	published wheel scrubs malformed `peer_id` from the channel envelope and from `agent_card_url` (path-traversal + XML-attr injection)
`tenant-isolation.sh`	Phase 2	TenantGuard 404s any request whose `X-Molecule-Org-Id` doesn't match the container's `MOLECULE_ORG_ID` (covers cross-tenant routing bug + allowlist drift); per-tenant `/workspaces` listings stay partitioned
`per-tenant-independence.sh`	Phase 2	parallel A2A workflows in both tenants don't bleed into each other's `activity_logs` / `workspaces`, including under a concurrent INSERT race (catches lib/pq prepared-statement cache collision + shared-pool poisoning)

To add a new replay:

Drop a script under replays/ named after the issue.
The script's purpose: reproduce the production failure mode against the harness, then assert the fix is present. PASS criterion is the post-fix behavior.
The run-all-replays.sh runner picks up every replays/*.sh script automatically — no per-replay registration needed.

Extending the cp-stub

cp-stub/main.go serves the minimum surface for the existing replays plus a catch-all that returns 501 + a clear message when the tenant asks for a route the stub doesn't implement. To add a new CP route:

Add a mux.HandleFunc in cp-stub/main.go for the path.
Return the same wire shape the real CP returns. The contract is "wire compatibility with the staging CP at the time of writing" — document it with a comment pointing at the real CP handler.
Add a replay script that exercises the path.

What the harness does NOT cover

Real TLS / cert handling (CF terminates TLS in production; harness is HTTP-only).
Cloudflare API edge cases (rate limits, DNS propagation timing).
Real EC2 / SSM / EBS behavior (image-cache replay simulates the outcome but not the AWS API surface).
Cross-region or multi-AZ topology.
Real production data scale.

These are intentional Phase 1 limits. If a bug class hits one of these gaps, escalate to staging E2E rather than expanding the harness past its mandate of "exercise the tenant binary in production-shape topology."

Roadmap

Phase 1 (shipped): harness + cp-stub + cf-proxy + 4 replays + run-all-replays.sh runner. No-sudo Host-header path via _curl.sh. Per-replay psql seeding for tests that need DB-side fixtures.
Phase 2 (shipped): multi-tenant — tenant-alpha + tenant-beta with their own Postgres instances and distinct MOLECULE_ORG_IDs; cf-proxy nginx routes by Host header (prod CF tunnel parity); seed.sh registers parent+child workspaces in both tenants; _curl.sh exposes per-tenant + cross-tenant-negative helpers; new replays cover TenantGuard isolation (tenant-isolation.sh) and per-tenant independence under concurrent load (per-tenant-independence.sh). harness-replays.yml runs run-all-replays.sh as a required check on every PR touching workspace-server/**, canvas/**, tests/harness/**, or the workflow itself.
Phase 3: replace cp-stub/ with the real molecule-controlplane Docker build. Add a config-coherence lint that diffs harness env list against production CP's env list and fails CI on drift. Convert tests/e2e/test_api.sh to target the harness instead of localhost.
Phase 4 (long-term): Miniflare in front of cf-proxy for real CF emulation (WAF, BotID, rate-limit, cf-tunnel headers). LocalStack for the EC2 provisioner. Anonymized prod-traffic recording/replay for SaaS-scale regression detection.