fix: add slug validation to prevent SSRF (OFFSEC-006)
All checks were successful
sop-checklist / all-items-acked (pull_request) injected
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 55s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 39s
CI / Detect changes (pull_request) Successful in 58s
E2E API Smoke Test / detect-changes (pull_request) Successful in 56s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 49s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 28s
qa-review / approved (pull_request) Successful in 13s
gate-check-v3 / gate-check (pull_request) Successful in 33s
sop-checklist-gate / gate (pull_request) Successful in 13s
security-review / approved (pull_request) Successful in 15s
sop-tier-check / tier-check (pull_request) Successful in 20s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m17s
audit-force-merge / audit (pull_request) Successful in 10s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m42s
CI / Platform (Go) (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 22s
CI / Canvas Deploy Reminder (pull_request) Successful in 8s
CI / all-required (pull_request) Successful in 3s

OFFSEC-006 (HIGH): promote-tenant-image.sh interpolated raw --tenants
slug into URL paths and subdomains without sanitisation.  Four injection
points were vulnerable:

  • cp_redeploy_tenant  (line 193): /cp/admin/tenants/$slug/redeploy
  • tenant_buildinfo    (line 209): https://${slug}.moleculesai.app/buildinfo
  • tenant_health      (line 217): https://${slug}.moleculesai.app/health
  • resolve_tenant_instance_id (line 263): /cp/admin/tenants/$slug

Attack vectors:
  --tenants 'a?url=https://evil.com'  → curl splits on ? as query separator
  --tenants 'evil.com@legitimate'    → subdomain takeover via @

Fix:
  • Add validate_slug() function with regex ^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$
    before any URL interpolation.  Exit 64 on invalid slug.
  • Call validate_slug() in main() before any operations (up-front guard).
  • Add defense-in-depth calls inside cp_redeploy_tenant, tenant_buildinfo,
    tenant_health, resolve_tenant_instance_id, redeploy_tenant,
    verify_tenant, and the rollback loop.
  • Also fix a latent promote_rc=1 bug where `cmd || promote_rc=1` inside
    `set -e` returned exit 1 and triggered early script exit instead of
    setting the variable.  Replaced with `if ! cmd; then promote_rc=1; fi`.

Test additions (test-promote-tenant-image.sh):
  • Test 9:  8 invalid slug variants rejected with exit 64 (?, &, @, /, \, space, etc.)
  • Test 10: 6 valid slugs accepted (chloe-dong, ab, a, etc.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Molecule AI · core-devops 2026-05-14 02:17:42 +00:00
parent a23ecc18a0
commit 9153a2e464
2 changed files with 84 additions and 6 deletions

View File

@ -179,6 +179,7 @@ cp_redeploy_tenant() {
# 1 — any other failure
# stdout = response body. stderr = "HTTP_STATUS=NNN" line.
local slug="$1" tag="$2"
validate_slug "$slug"
_mock_call cp_redeploy_tenant "$slug" "$tag"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
local tok="${!CP_TOKEN_ENV:-}"
@ -204,6 +205,7 @@ cp_redeploy_tenant() {
tenant_buildinfo() {
# args: <slug>; prints JSON
local slug="$1"
validate_slug "$slug"
_mock_call tenant_buildinfo "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
curl -sf --max-time 10 "https://${slug}.moleculesai.app/buildinfo"
@ -212,6 +214,7 @@ tenant_buildinfo() {
tenant_health() {
# args: <slug>; prints raw response, returns 0 if "ok"
local slug="$1"
validate_slug "$slug"
_mock_call tenant_health "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
curl -sf --max-time 10 "https://${slug}.moleculesai.app/health"
@ -256,6 +259,7 @@ print(json.dumps({'commands': [ecr_login]}))
resolve_tenant_instance_id() {
# args: <slug>; prints i-xxx
local slug="$1"
validate_slug "$slug"
_mock_call resolve_tenant_instance_id "$slug"; local _mrc=$?
[[ $_mrc -ne 99 ]] && return $_mrc
local tok="${!CP_TOKEN_ENV:-}"
@ -271,6 +275,19 @@ resolve_tenant_instance_id() {
log() { printf '[%s] %s\n' "$(date -u +%H:%M:%SZ)" "$*"; }
err() { printf '[%s] ERROR: %s\n' "$(date -u +%H:%M:%SZ)" "$*" >&2; }
# validate_slug — exit 64 if slug contains characters outside the safe set.
# Prevents SSRF via query-separator injection (?foo) and subdomain takeover
# (@evil) when slug is interpolated into URL paths or subdomains.
# OFFSEC-006 fix.
validate_slug() {
local slug="$1"
if ! [[ "$slug" =~ ^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$ ]]; then
printf '[%s] ERROR: invalid slug: %s\n' \
"$(date -u +%H:%M:%SZ)" "$slug" >&2
exit 64
fi
}
preflight() {
log "preflight: source=$SOURCE_TAG dest=$DEST_TAG repo=$REPO region=$REGION"
local src_manifest
@ -339,6 +356,7 @@ promote() {
redeploy_tenant() {
# args: <slug> — handle the 403→SSM-refresh→retry pattern
local slug="$1"
validate_slug "$slug"
log " redeploy: $slug"
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would POST /redeploy slug=$slug"
@ -372,6 +390,7 @@ redeploy_tenant() {
verify_tenant() {
local slug="$1"
validate_slug "$slug"
log " verify: $slug"
if [[ "$DRY_RUN" == "true" ]]; then
log " [dry-run] would curl /buildinfo + /health"
@ -398,6 +417,7 @@ rollback() {
rm -f "$mfile"
IFS=',' read -ra slugs <<<"$TENANTS"
for slug in "${slugs[@]}"; do
validate_slug "$slug"
redeploy_tenant "$slug" || err " rollback redeploy failed for $slug"
done
log "rollback: complete"
@ -408,6 +428,13 @@ rollback() {
# ─────────────────────────────────────────────────────────────────────────────
main() {
# OFFSEC-006: validate slugs before any network I/O.
IFS=',' read -ra _slugs <<<"$TENANTS"
for _slug in "${_slugs[@]}"; do
validate_slug "$_slug"
done
unset _slugs _slug
preflight || return 1
snapshot_dest_tag || return 2
promote || return 2
@ -415,8 +442,15 @@ main() {
local promote_rc=0
IFS=',' read -ra slugs <<<"$TENANTS"
for slug in "${slugs[@]}"; do
redeploy_tenant "$slug" || promote_rc=1
[[ $promote_rc -eq 0 ]] && { verify_tenant "$slug" || promote_rc=1; }
validate_slug "$slug"
if ! redeploy_tenant "$slug"; then
promote_rc=1
fi
if [[ $promote_rc -eq 0 ]]; then
if ! verify_tenant "$slug"; then
promote_rc=1
fi
fi
[[ $promote_rc -ne 0 ]] && break
done

View File

@ -267,7 +267,51 @@ else
printf ' ✗ unknown-flag should fail (got %s)\n' "$rc"
fi
printf '\n== Test 9: ROLLBACK_TAG follows YYYYMMDD via NOW_OVERRIDE_DATE ==\n'
printf '\n== Test 9: slug validation — invalid slugs rejected with exit 64 (OFFSEC-006) ==\n'
# Attack vectors: SSRF via ? (curl query separator), subdomain takeover via @,
# path traversal via /, shell metacharacters. Use a newline-delimited temp file
# so slugs containing spaces are NOT split by shell word-splitting.
_invalid_tmp=$(mktemp)
cat > "$_invalid_tmp" <<'INVALID_EOF'
a?url=https://evil.com
a&url=https://evil.com
a@evil.com
a/b
a\b
a b
chloe-dong?url=http://evil.com
evil.com@legitimate
INVALID_EOF
while IFS= read -r attack || [[ -n "$attack" ]]; do
set +e
out=$("$SCRIPT" --source-tag x --dest-tag y --tenants "$attack" 2>&1); rc=$?
set -e
if [[ $rc -eq 64 ]] && printf '%s' "$out" | grep -q 'invalid slug'; then
PASS=$((PASS + 1)); printf ' ✓ slug rejected: %s\n' "$(printf '%q' "$attack")"
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("slug-reject:$attack")
printf ' ✗ slug should be rejected: %s — got exit %s\n' "$(printf '%q' "$attack")" "$rc"
fi
done < "$_invalid_tmp"
rm -f "$_invalid_tmp"
printf '\n== Test 10: slug validation — valid slugs pass through ==\n'
valid_slugs='chloe-dong hongming ab a abc123 my-tenant-42'
for slug in $valid_slugs; do
set +e
out=$("$SCRIPT" --source-tag x --dest-tag y --tenants "$slug" --mock-dir /nonexistent 2>&1); rc=$?
set -e
# valid slugs: script should fail at preflight (no such mock dir / no real infra),
# but NOT at slug validation (exit 64). So we check exit != 64.
if [[ $rc -ne 64 ]]; then
PASS=$((PASS + 1)); printf ' ✓ valid slug accepted: %s\n' "$slug"
else
FAIL=$((FAIL + 1)); FAIL_NAMES+=("slug-accept:$slug")
printf ' ✗ valid slug rejected: %s (should have passed slug check)\n' "$slug"
fi
done
printf '\n== Test 11: ROLLBACK_TAG follows YYYYMMDD via NOW_OVERRIDE_DATE ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{}' 0
mock_set "$m" aws_ecr_describe_image '' 1
@ -289,7 +333,7 @@ fi
assert_calls_contain "rollback tag uses NOW_OVERRIDE_DATE (20260603)" "$m" 'aws_ecr_put_image b-prev-20260603'
rm -rf "$m"
printf '\n== Test 10: empty source manifest fails preflight ==\n'
printf '\n== Test 12: empty source manifest fails preflight ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '' 0 # rc=0 but empty body (the "None" case)
out=$(run_script "$m")
@ -297,7 +341,7 @@ assert_exit "empty source manifest fails preflight" "$out" 1
assert_contains "empty manifest message" "$out" 'returned empty manifest'
rm -rf "$m"
printf '\n== Test 11: tenant_buildinfo failure during verify → rollback ==\n'
printf '\n== Test 13: tenant_buildinfo failure during verify → rollback ==\n'
m=$(mkmock)
mock_set "$m" aws_ecr_get_image '{"manifests":[]}' 0
mock_set "$m" aws_ecr_describe_image '' 1
@ -311,7 +355,7 @@ assert_contains "logs buildinfo failure" "$out" '/buildinfo failed for chloe-don
assert_contains "rollback fired after verify fail" "$out" 'ROLLBACK:'
rm -rf "$m"
printf '\n== Test 12: ssm_refresh_ecr_auth JSON escaping (CWE-78 / OFFSEC-001) ==\n'
printf '\n== Test 14: ssm_refresh_ecr_auth JSON escaping (CWE-78 / OFFSEC-001) ==\n'
# Verify the python3 snippet in ssm_refresh_ecr_auth produces valid JSON and
# correctly escapes shell-injection characters in region + account ID fields.
# The fix replaces unquoted shell-printf interpolation with json.dumps.