forked from molecule-ai/molecule-core
* fix(security): call redactSecrets before seeding workspace memories (F1085) seedInitialMemories() in workspace_provision.go was inserting template/config memories directly into agent_memories without scrubbing credential patterns. A workspace provisioned from a template containing API keys, tokens, or other secrets would store them in plain text — the same class of issue as #838. Fix: call redactSecrets(workspaceID, content) on the truncated memory content before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400) is preserved — redaction runs after truncation so the size limit still applies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(workspace_provision): add seedInitialMemories coverage for #1208 Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the redactSecrets call (F1085 / #1132), both identified as untested in #1208. - TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly 100k, 1 byte over, far over, and well under. Verifies INSERT receives exactly maxMemoryContentLength bytes. - TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs before INSERT, regression test for F1085. - TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently skipped, no INSERT called. - TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without DB calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(marketing): Discord adapter launch visual assets (#1209) Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging. * fix(ci): golangci-lint errcheck failures on staging Suppress errcheck warnings for calls where the return value is safely ignored: - resp.Body.Close() (artifacts/client.go): deferred cleanup — failure to close a response body is non-critical; the defer itself is what matters for connection reuse. - rows.Close() (bundle/exporter.go): deferred cleanup in a loop where rows.Err() already handles query errors. - filepath.Walk (bundle/exporter.go): top-level walk call; errors in sub-directory traversal are handled by the inner callback (which returns nil for err != nil). - broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget event broadcast; errors are logged internally by the broadcaster. - db.DB.ExecContext (bundle/importer.go): best-effort runtime column update; non-critical auxiliary data that the provisioner re-extracts if needed. Fixes: #1143 * test(artifacts): suppress w.Write return values to satisfy errcheck All httptest.ResponseWriter.Write calls in client_test.go now discard the byte count and error return with _, _ = prefix. The Write method is safe to discard in test handlers — httptest.ResponseWriter.Write never returns an error for in-memory buffers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(CI): move changes job off self-hosted runner + add workflow concurrency Cherry-pick from staging PR #1194 for main. Two changes to relieve macOS arm64 runner saturation: 1. `changes` job: runs on ubuntu-latest instead of [self-hosted, macos, arm64]. This job does a plain `git diff` with zero macOS dependencies — moving it off the runner frees a slot immediately on every workflow trigger. 2. Add workflow-level concurrency: concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true Prevents multiple stale in-flight CI runs from queuing on the same ref when new commits arrive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203) seedInitialMemories() in workspace_provision.go was inserting template/config memories directly into agent_memories without scrubbing credential patterns. A workspace provisioned from a template containing API keys, tokens, or other secrets would store them in plain text — the same class of issue as #838. Fix: call redactSecrets(workspaceID, content) on the truncated memory content before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400) is preserved — redaction runs after truncation so the size limit still applies. Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done * fix(tenant-guard): allowlist /registry/register + /registry/heartbeat Final layer of today's stuck-provisioning saga. With the private-IP platform_url fix and the intra-VPC :8080 SG rule in place, workspace EC2s finally reached the tenant on the right port — only to have every POST bounced with a synthetic 404 by TenantGuard. TenantGuard is the SaaS hook that rejects cross-tenant routing. It demands X-Molecule-Org-Id on every request, but CP's workspace user- data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL, RUNTIME, PORT), so the runtime can't attach the header. Net effect: every workspace's first heartbeat to /registry/heartbeat was a silent 404, and the workspace sat in 'provisioning' until the platform sweeper timed it out. Allowlist the two workspace-boot paths: - /registry/register — one-shot at runtime startup - /registry/heartbeat — every 30s Both are still gated by wsauth.HasAnyLiveToken (workspaces with a token on file must present it; legacy tokenless workspaces are grandfathered). And the tenant SG already scopes :8080 to the VPC CIDR, so only intra-VPC callers can reach these paths in the first place. The allowlist bypasses cross-org routing, not auth. Follow-up: passing MOLECULE_ORG_ID into the workspace env would let the runtime attach the header and drop this allowlist entry. Tracked separately; not urgent since the multi-layer auth above is already adequate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app> Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app> Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
146 lines
6.0 KiB
Go
146 lines
6.0 KiB
Go
package middleware
|
|
|
|
import (
|
|
"os"
|
|
"strings"
|
|
|
|
"github.com/gin-gonic/gin"
|
|
)
|
|
|
|
// flyReplaySrcHeader is the header Fly injects on requests it replays via
|
|
// the `fly-replay: ...;state=...` mechanism. Format is a semicolon-
|
|
// separated list of k=v pairs, e.g.
|
|
// instance=91854...;region=ord;t=1700000000000;state=<uuid>
|
|
// Control plane puts the bare UUID in state (no prefix) because Fly's
|
|
// proxy returns 502 "replay malformed" on any second `=` in the value.
|
|
// We read the whole state= segment as the org id.
|
|
const flyReplaySrcHeader = "Fly-Replay-Src"
|
|
|
|
// Tenant-mode guard — public repo's only SaaS hook.
|
|
//
|
|
// The SaaS control plane (private `molecule-controlplane` repo) provisions one
|
|
// platform instance per customer org on Fly Machines and sets:
|
|
// - MOLECULE_ORG_ID=<uuid> (env on the machine)
|
|
// - forwards requests with X-Molecule-Org-Id=<uuid> (control-plane router)
|
|
//
|
|
// TenantGuard wraps every non-allowlisted route so a mis-routed request from
|
|
// another org bounces with 404 (not 403 — don't leak existence).
|
|
//
|
|
// When MOLECULE_ORG_ID is unset (self-hosted / dev / CI), the guard is a
|
|
// passthrough — self-hosters see no behavior change.
|
|
//
|
|
// The guard intentionally knows nothing about orgs, signup, billing, or
|
|
// provisioning. Those live in the private control-plane repo. All this code
|
|
// does is: "am I the tenant for this request? if not, 404."
|
|
|
|
// tenantOrgIDHeader is the HTTP header the control-plane router sets when it
|
|
// uses fly-replay to route a request to a tenant machine. Case-insensitive at
|
|
// the HTTP layer (Gin normalizes).
|
|
const tenantOrgIDHeader = "X-Molecule-Org-Id"
|
|
|
|
// tenantGuardAllowlist is the set of paths that MUST remain accessible even in
|
|
// tenant mode without the org header (health checks, Prometheus scrapes,
|
|
// workspace → platform boot signals).
|
|
// Exact-match — no prefix semantics — to avoid accidentally exposing admin
|
|
// routes via e.g. "/health/debug/admin".
|
|
//
|
|
// /registry/register and /registry/heartbeat are workspace-initiated boot
|
|
// signals. Workspace EC2s are provisioned by the control plane with
|
|
// PLATFORM_URL but no MOLECULE_ORG_ID env var, so the runtime's httpx
|
|
// calls can't attach X-Molecule-Org-Id. Tenant SG already scopes these
|
|
// ports to the VPC CIDR; the registry handlers themselves enforce
|
|
// workspace-scoped bearer auth via wsauth.HasAnyLiveToken. Allowlisting
|
|
// here only bypasses the cross-org routing check, not auth.
|
|
var tenantGuardAllowlist = map[string]struct{}{
|
|
"/health": {},
|
|
"/metrics": {},
|
|
"/registry/register": {},
|
|
"/registry/heartbeat": {},
|
|
}
|
|
|
|
// TenantGuard returns a Gin middleware configured from the MOLECULE_ORG_ID env
|
|
// var. Reads env once at construction — changing the env at runtime requires
|
|
// a restart (matches every other platform env var). Pass the orgID directly to
|
|
// TenantGuardWithOrgID if you need to test a specific configuration without
|
|
// mutating the process environment.
|
|
func TenantGuard() gin.HandlerFunc {
|
|
return TenantGuardWithOrgID(strings.TrimSpace(os.Getenv("MOLECULE_ORG_ID")))
|
|
}
|
|
|
|
// TenantGuardWithOrgID is the constructor used by tests; ordinary callers use
|
|
// TenantGuard. When configuredOrgID is empty the guard is a no-op.
|
|
func TenantGuardWithOrgID(configuredOrgID string) gin.HandlerFunc {
|
|
if configuredOrgID == "" {
|
|
return func(c *gin.Context) { c.Next() }
|
|
}
|
|
return func(c *gin.Context) {
|
|
if _, ok := tenantGuardAllowlist[c.Request.URL.Path]; ok {
|
|
c.Next()
|
|
return
|
|
}
|
|
// /cp/* is reverse-proxied to the control plane. The CP has its
|
|
// own auth (WorkOS session cookie + admin bearer) so the tenant
|
|
// doesn't need to attach org identity here. Bypassing the guard
|
|
// avoids blocking the proxy with a 404 that would then look
|
|
// like the CP is down.
|
|
//
|
|
// SECURITY NOTE: this pass-through is only safe because:
|
|
// (a) cp_proxy enforces its own explicit path allowlist
|
|
// (see router/cp_proxy.go cpProxyAllowedPrefixes) so
|
|
// traversal to admin-surface endpoints is blocked.
|
|
// (b) tenant SG has no :8080 inbound; only the Cloudflare
|
|
// tunnel reaches the platform. A future SG change that
|
|
// opens :8080 to the VPC would also open this path to
|
|
// unauthenticated /cp/* probing — tighten cp_proxy's
|
|
// allowlist OR remove this bypass if that happens.
|
|
if strings.HasPrefix(c.Request.URL.Path, "/cp/") {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Primary: explicit X-Molecule-Org-Id header (direct access path,
|
|
// e.g. from molecli or internal tooling that sets it directly).
|
|
if c.GetHeader(tenantOrgIDHeader) == configuredOrgID {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Secondary: org id encoded in Fly-Replay-Src state by the control
|
|
// plane. This is the path every production request takes, because
|
|
// response headers set by the cp don't travel to the replayed
|
|
// tenant — only the state= param does.
|
|
if orgIDFromReplaySrc(c.GetHeader(flyReplaySrcHeader)) == configuredOrgID {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Tertiary: same-origin Canvas requests on tenant EC2 instances where
|
|
// Caddy serves Canvas (:3000) and API (:8080) under the same domain.
|
|
// CANVAS_PROXY_URL is set → Referer/Origin matches Host → trusted.
|
|
if isSameOriginCanvas(c) {
|
|
c.Next()
|
|
return
|
|
}
|
|
// 404 not 403 — existence of this tenant must not be inferable by
|
|
// probing other orgs' machines.
|
|
c.AbortWithStatus(404)
|
|
}
|
|
}
|
|
|
|
// orgIDFromReplaySrc extracts the org id the control plane put in the
|
|
// fly-replay state= segment. Value is the bare UUID — the control plane
|
|
// deliberately doesn't prefix it because Fly 502s on any `=` in the state
|
|
// value. Returns "" if the header is missing or has no state segment.
|
|
// Separated from TenantGuardWithOrgID so tests can round-trip header →
|
|
// id without spinning a full Gin context.
|
|
func orgIDFromReplaySrc(header string) string {
|
|
if header == "" {
|
|
return ""
|
|
}
|
|
for _, seg := range strings.Split(header, ";") {
|
|
seg = strings.TrimSpace(seg)
|
|
const statePrefix = "state="
|
|
if strings.HasPrefix(seg, statePrefix) {
|
|
return seg[len(statePrefix):]
|
|
}
|
|
}
|
|
return ""
|
|
}
|