molecule-core

History

rabbitblood 7dc9d83792 fix(scheduler): recover from panics + add liveness watchdog (#85 ) The scheduler died silently on 2026-04-14 14:21 UTC and stayed dead for 12+ hours. Platform restart didn't recover it. Root cause: tick() and fireSchedule() goroutines have no panic recovery. A single bad row, bad cron expression, DB blip, or transient panic anywhere in the chain permanently kills the scheduler goroutine — and the only signal to an operator is "no crons firing", which is invisible if you're not watching. Specifically: func (s *Scheduler) Start(ctx context.Context) { for { select { case <-ticker.C: s.tick(ctx) // <- if this panics, the for-loop exits forever } } } And inside tick: go func(s2 scheduleRow) { defer wg.Done() defer func() { <-sem }() s.fireSchedule(ctx, s2) // <- panic here propagates up wg.Wait() }(sched) Two `defer recover()` additions: 1. In Start's tick wrapper — a panic in tick() (DB scan, cron parse, row processing) is logged and the next tick fires normally. 2. In each fireSchedule goroutine — a single bad workspace can't take the rest of the batch down. Plus a liveness watchdog: - Scheduler now records `lastTickAt` after each successful tick. - New methods `LastTickAt()` and `Healthy()` (true if last tick within 2× pollInterval = 60s). - Initialised at Start so Healthy() returns true on a fresh process. Endpoint plumbing for /admin/scheduler/health is a follow-up — needs threading the scheduler instance through router.Setup(). Documented on #85. Closes the silent-outage failure mode of #85. The other proposed fixes (force-kill on /restart hang, active_tasks watchdog) are separate concerns tracked in #85's comments.		2026-04-14 19:32:01 -07:00
..
cmd	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
internal	fix(scheduler): recover from panics + add liveness watchdog (#85 )	2026-04-14 19:32:01 -07:00
migrations	fix(schedules): backfill legacy rows to 'template' + extract import SQL const	2026-04-14 14:30:22 -07:00
Dockerfile	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
go.mod	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
go.sum	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00