obs gap: Gitea CI action logs are not shipped to Loki — debugging requires ssh+zstdcat on the operator #2272

Open
opened 2026-06-05 01:46:11 +00:00 by hongming · 0 comments
Owner

Problem

Diagnosing CI / staging-e2e failures currently requires ssh root@operator + zstdcat /opt/molecule/gitea/actions_log/.../<task>.log.zst — because Gitea Actions per-task logs are not shipped to the obs system (Loki). Verified 2026-06-05: the operator Vector config (/etc/vector/vector.yaml) has NO gitea/actions source; service/container/source label values in Loki contain operator docker containers + JSONL ops streams + tenant platform logs, but zero CI job output. This violates the obs-first rule for the entire CI/e2e surface.

Why it's not a trivial Vector file-source

Gitea stores action logs zstd-compressed (<task_id>.log.zst) on task completion; in-progress tasks are plaintext (<task_id>.log), in a nested layout /opt/molecule/gitea/actions_log/<owner>/<repo>/<NN>/<task_id>.log[.zst]. A bare type: file tail can't read the .zst.

Proposed fix (mirrors the existing operator JSONL→Vector pattern)

A small operator shipper (cron or inotify on /opt/molecule/gitea/actions_log) that, on task completion, decompresses the .zst and appends NDJSON lines to /var/log/gitea-actions.jsonl with fields {repo, run_id, job, task_id, ts, line} (run/job/task resolvable from the path + the action_task/action_run tables). Then add a Vector type: file source for /var/log/gitea-actions.jsonl + a remap transform stamping service=gitea-ci, repo, run_id (mirroring parse_disk_gc/parse_runner_health in /etc/vector/vector.yaml), appended to sinks.loki.inputs. Then CI logs are queryable as {service="gitea-ci", repo="molecule-core"} |= "<run_id>" instead of SSH+zstdcat.

  • Idempotent (track last-shipped task id); cap line size; only ship completed tasks (or tail live .log for in-progress, dedupe on completion).
  • Assets live in operator-config (the obs/Vector home); note /etc/vector/vector.yaml is currently untracked (internal#242) and obs Vector snippets are manual-scp (not auto-synced).

Related obs gaps (separate, for visibility)

  • Staging/prod CP logs are on Railway, not Loki — a Railway→Loki bridge (or Railway log drain) is a separate item.
  • Workspace/agent container logs excluded by design (internal#107); RFC#640 is the in-flight reversal (owes ToS check).

Surfaced while debugging the core#2261 reconciler live-e2e (had to SSH+zstdcat to read the HTTP-400 cause). Ref: obs recipe in reference_obs_system_access.

## Problem Diagnosing CI / staging-e2e failures currently requires `ssh root@operator` + `zstdcat /opt/molecule/gitea/actions_log/.../<task>.log.zst` — because **Gitea Actions per-task logs are not shipped to the obs system (Loki).** Verified 2026-06-05: the operator Vector config (`/etc/vector/vector.yaml`) has NO gitea/actions source; `service`/`container`/`source` label values in Loki contain operator docker containers + JSONL ops streams + tenant platform logs, but **zero CI job output.** This violates the obs-first rule for the entire CI/e2e surface. ## Why it's not a trivial Vector file-source Gitea stores action logs **zstd-compressed** (`<task_id>.log.zst`) on task completion; in-progress tasks are plaintext (`<task_id>.log`), in a nested layout `/opt/molecule/gitea/actions_log/<owner>/<repo>/<NN>/<task_id>.log[.zst]`. A bare `type: file` tail can't read the `.zst`. ## Proposed fix (mirrors the existing operator JSONL→Vector pattern) A small operator shipper (cron or inotify on `/opt/molecule/gitea/actions_log`) that, on task completion, decompresses the `.zst` and appends NDJSON lines to `/var/log/gitea-actions.jsonl` with fields `{repo, run_id, job, task_id, ts, line}` (run/job/task resolvable from the path + the `action_task`/`action_run` tables). Then add a Vector `type: file` source for `/var/log/gitea-actions.jsonl` + a `remap` transform stamping `service=gitea-ci`, `repo`, `run_id` (mirroring `parse_disk_gc`/`parse_runner_health` in `/etc/vector/vector.yaml`), appended to `sinks.loki.inputs`. Then CI logs are queryable as `{service="gitea-ci", repo="molecule-core"} |= "<run_id>"` instead of SSH+zstdcat. - Idempotent (track last-shipped task id); cap line size; only ship completed tasks (or tail live `.log` for in-progress, dedupe on completion). - Assets live in operator-config (the obs/Vector home); note `/etc/vector/vector.yaml` is currently untracked (internal#242) and obs Vector snippets are manual-scp (not auto-synced). ## Related obs gaps (separate, for visibility) - **Staging/prod CP logs** are on **Railway**, not Loki — a Railway→Loki bridge (or Railway log drain) is a separate item. - **Workspace/agent container logs** excluded by design (internal#107); RFC#640 is the in-flight reversal (owes ToS check). Surfaced while debugging the core#2261 reconciler live-e2e (had to SSH+zstdcat to read the HTTP-400 cause). Ref: obs recipe in `reference_obs_system_access`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2272