Canary #2090 has been red for 6 consecutive runs over 4+ hours, all
timing out at the TLS-readiness step exactly at the 10-min cap. Time
window correlates with three CP commits that landed today/yesterday
and changed EC2 boot behaviour:
- molecule-controlplane@a3eb8be — fix(ec2): force fresh clone of /opt/adapter
- molecule-controlplane@ed70405 — feat(sweep): wire up healthcheck loop
- molecule-controlplane@4ab339e — fix(provisioner): aggregate cleanup errors
Two changes here, both surgical:
1. Bump the bash-side TLS deadline from 600s to 900s, and the canvas TS
mirror from 10m to 15m. Stays below the 20-min provision envelope
(so a genuinely-stuck tenant still fails loud at the earlier
provision step instead of masquerading as TLS).
2. On TLS-timeout, dump a diagnostic burst before exiting:
- getent hosts $TENANT_HOST (DNS resolution state)
- curl -kv $TENANT_URL/health (TLS handshake + HTTP layer)
The previous failure log was just "no 2xx in N min" with no signal
for which layer was actually broken. After this, the next timeout
tells us whether DNS, TLS handshake, or HTTP layer is the culprit
so the CP root cause can be isolated without speculation.
This is the unblock; a separate molecule-controlplane issue tracks the
underlying regression suspicion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>