[DRAFT] obs/infra: Railway CP→Loki drain + tenant SSM:SendCommand (#3214) #3221

Closed
agent-dev-a wants to merge 1 commits from draft/obs-railway-loki-drain-3214 into main
Member

DRAFT-ONLY — do NOT merge.

Closes #3214 (owner/infra execution still required).

This PR provides the configuration and policy artifacts the owner needs to close the observability gap described in #3214:

  • infra/obs/railway-loki-drain.alloy — Alloy configuration to receive Railway control-plane logs and ship them to the existing Grafana Cloud Loki stack (ds P8E80F9AEF21F6940).
  • infra/obs/tenant-ssm-sendcommand-policy.json — IAM policy granting the fleet/operator identity ssm:SendCommand (and related actions) on production tenant instances tagged MoleculeTenant=true.
  • infra/obs/WIRING_CHECKLIST.md — step-by-step owner checklist covering Grafana Cloud credentials, Alloy deployment, Railway log-drain wiring, tenant log-shipper reconciliation, and SSM rollout.

Head SHA

822ba81cb62997a75d234b29c4e2d318c95f988c

Next step

Owner reviews the checklist, fills in the actual Loki endpoint/credentials, deploys Alloy, wires the Railway drain, attaches the IAM policy, and verifies end-to-end ingestion + SSM access. Once the infra changes are applied, this draft can be closed or converted to a docs-only merge.

DRAFT-ONLY — do NOT merge. Closes #3214 (owner/infra execution still required). This PR provides the configuration and policy artifacts the owner needs to close the observability gap described in #3214: - `infra/obs/railway-loki-drain.alloy` — Alloy configuration to receive Railway control-plane logs and ship them to the existing Grafana Cloud Loki stack (ds `P8E80F9AEF21F6940`). - `infra/obs/tenant-ssm-sendcommand-policy.json` — IAM policy granting the fleet/operator identity `ssm:SendCommand` (and related actions) on production tenant instances tagged `MoleculeTenant=true`. - `infra/obs/WIRING_CHECKLIST.md` — step-by-step owner checklist covering Grafana Cloud credentials, Alloy deployment, Railway log-drain wiring, tenant log-shipper reconciliation, and SSM rollout. ## Head SHA `822ba81cb62997a75d234b29c4e2d318c95f988c` ## Next step Owner reviews the checklist, fills in the actual Loki endpoint/credentials, deploys Alloy, wires the Railway drain, attaches the IAM policy, and verifies end-to-end ingestion + SSM access. Once the infra changes are applied, this draft can be closed or converted to a docs-only merge.
agent-dev-a added 1 commit 2026-06-24 08:42:18 +00:00
draft(obs): Railway CP→Loki drain config + tenant SSM:SendCommand policy + wiring checklist (#3214)
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
Block integration-tester contamination artifacts / Block staging-trigger / invalid manifest contamination (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
PR Diff Guard / PR diff guard (pull_request) Successful in 18s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 14s
CI / Detect changes (pull_request) Successful in 28s
E2E Chat / E2E Chat (pull_request) Successful in 6s
template-delivery-e2e / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 16s
sop-checklist / all-items-acked (pull_request) acked: 0/9 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +6 — body-unfilled: comprehensive-testing, local-postgres-e2
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 5s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3s
security-review / approved (pull_request_target) Failing after 18s
gate-check-v3 / gate-check (pull_request_target) Failing after 25s
sop-checklist / all-items-acked (pull_request_target) Successful in 15s
CI / Canvas Deploy Status (pull_request) Successful in 1s
CI / all-required (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 42s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 2m9s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / Prune stale e2e DNS records (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Plugin Install Lifecycle (pull_request) Waiting to run
audit-force-merge / audit (pull_request_target) Has been skipped
822ba81cb6
DRAFT-ONLY — owner/infra execution required.

- infra/obs/railway-loki-drain.alloy: Alloy config to receive Railway logs
  and ship to Grafana Cloud Loki (ds P8E80F9AEF21F6940).
- infra/obs/tenant-ssm-sendcommand-policy.json: IAM policy granting the
  fleet/operator identity ssm:SendCommand on tagged tenant instances.
- infra/obs/WIRING_CHECKLIST.md: step-by-step owner checklist for
  credentials, deployment, Railway drain wiring, and SSM rollout.

Co-Authored-By: Claude <noreply@anthropic.com>
agent-dev-a closed this pull request 2026-06-24 11:50:42 +00:00
Some required checks failed
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
Block integration-tester contamination artifacts / Block staging-trigger / invalid manifest contamination (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
Required
Details
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Required
Details
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
PR Diff Guard / PR diff guard (pull_request) Successful in 18s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 14s
Required
Details
CI / Detect changes (pull_request) Successful in 28s
E2E Chat / E2E Chat (pull_request) Successful in 6s
template-delivery-e2e / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
Required
Details
reserved-path-review / reserved-path-review (pull_request_target) Successful in 16s
Required
Details
sop-checklist / all-items-acked (pull_request) acked: 0/9 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +6 — body-unfilled: comprehensive-testing, local-postgres-e2
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 5s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3s
security-review / approved (pull_request_target) Failing after 18s
Required
Details
gate-check-v3 / gate-check (pull_request_target) Failing after 25s
sop-checklist / all-items-acked (pull_request_target) Successful in 15s
CI / Canvas Deploy Status (pull_request) Successful in 1s
CI / all-required (pull_request) Successful in 3s
Required
Details
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 42s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 2m9s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / Prune stale e2e DNS records (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Plugin Install Lifecycle (pull_request) Waiting to run
audit-force-merge / audit (pull_request_target) Has been skipped

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3221