Provisioning silently seeds claude-code config.yaml when the requested runtime's workspace-template isn't cached yet #2027

Closed
opened 2026-06-01 02:41:45 +00:00 by devops-engineer · 0 comments
Member

Summary

When a workspace is provisioned for a runtime whose workspace-template is not yet present in the tenant's template cache, config seeding silently falls back to the claude-code-default config.yaml instead of the requested runtime's template. The workspace then boots with the wrong runtime config (no runtime-specific runtime_config, no persona/system-prompt.md) and the entrypoint preflight reports it as not-configured (-32603 'agent not configured'), even though the runtime image and injected env are correct.

Where it bit us

The molecule-adk-demo org (Google for Startups hackathon demo) — 4 google-adk agents (Marketing PM, Content Writer, Reviewer, Research Agent). All chatted back an identical canned greeting ("Hello! How can I help you today?") to every message instead of answering, because the ADK LlmAgent instruction was empty and the runtime was mis-declared.

Root cause (confirmed)

  • Agents provisioned 2026-05-30 23:46Z. Their /configs/config.yaml is the full Claude Code template (runtime: claude-code, anthropic/xiaomi/minimax/zai/kimi providers), and system-prompt.md is absent.
  • The tenant template cache (/tmp/molecule-template-cache, /workspace-configs-templates) was only refreshed 2026-05-31 23:59Zafter provisioning — and only then gained google-adk/config.yaml.
  • So at provision time the google-adk template was missing from cache and seeding fell back to claude-code-default. Config is seeded once at provision and never re-synced, so the already-provisioned workspaces stayed broken after the later cache refresh.
  • The runtime image (google-adk adapter), env (RUNTIME=google-adk, MODEL=vertex:gemini-2.5-pro, keyless Vertex ADC) were all correct — only the seeded config.yaml/system-prompt.md were wrong. Vertex→Gemini-2.5-pro itself verified working via the agent's own ADC.

Impact

Silent: a workspace looks 'online' (green) but every turn returns a generic non-answer / not-configured. No loud error at provision. Affects any runtime added to the manifest whose template hasn't propagated into a given tenant's cache before a workspace using it is provisioned.

Proposed fix

  1. Fail loud, don't fall back: if the requested runtime's template is not in the tenant cache at provision time, block/queue the provision (or trigger a synchronous cache refresh for that runtime) instead of silently seeding claude-code-default.
  2. Consider re-seeding/repairing config.yaml + prompt files for existing workspaces when the matching template later lands (or expose an admin 're-seed config from template' action).
  3. Preflight should treat a runtime/config-declared-runtime mismatch (adapter google-adk vs config claude-code) as an error surfaced to the canvas, not just a WARN that still boots not-configured.

Mitigation applied

The 4 demo agents were manually repaired (correct google-adk config.yaml + per-agent system-prompt.md written to /configs, container restarted); all 4 now answer correctly via ADK→Vertex→Gemini 2.5 Pro. /configs is bind-mounted so the fix persists across restart.

Filed from CTO ops session 2026-05-31.

## Summary When a workspace is provisioned for a runtime whose workspace-template is **not yet present in the tenant's template cache**, config seeding silently **falls back to the `claude-code-default` `config.yaml`** instead of the requested runtime's template. The workspace then boots with the wrong runtime config (no runtime-specific `runtime_config`, no persona/`system-prompt.md`) and the entrypoint preflight reports it as *not-configured* (`-32603 'agent not configured'`), even though the runtime **image** and injected **env** are correct. ## Where it bit us The `molecule-adk-demo` org (Google for Startups hackathon demo) — 4 `google-adk` agents (Marketing PM, Content Writer, Reviewer, Research Agent). All chatted back an identical canned greeting ("Hello! How can I help you today?") to every message instead of answering, because the ADK `LlmAgent` instruction was empty and the runtime was mis-declared. ## Root cause (confirmed) - Agents provisioned `2026-05-30 23:46Z`. Their `/configs/config.yaml` is the full **Claude Code** template (`runtime: claude-code`, anthropic/xiaomi/minimax/zai/kimi providers), and `system-prompt.md` is absent. - The tenant template cache (`/tmp/molecule-template-cache`, `/workspace-configs-templates`) was only refreshed `2026-05-31 23:59Z` — **after** provisioning — and only then gained `google-adk/config.yaml`. - So at provision time the google-adk template was missing from cache and seeding fell back to `claude-code-default`. Config is seeded once at provision and never re-synced, so the already-provisioned workspaces stayed broken after the later cache refresh. - The runtime **image** (google-adk adapter), **env** (`RUNTIME=google-adk`, `MODEL=vertex:gemini-2.5-pro`, keyless Vertex ADC) were all correct — only the seeded `config.yaml`/`system-prompt.md` were wrong. Vertex→Gemini-2.5-pro itself verified working via the agent's own ADC. ## Impact Silent: a workspace looks 'online' (green) but every turn returns a generic non-answer / not-configured. No loud error at provision. Affects any runtime added to the manifest whose template hasn't propagated into a given tenant's cache before a workspace using it is provisioned. ## Proposed fix 1. **Fail loud, don't fall back**: if the requested runtime's template is not in the tenant cache at provision time, block/queue the provision (or trigger a synchronous cache refresh for that runtime) instead of silently seeding `claude-code-default`. 2. Consider re-seeding/repairing `config.yaml` + prompt files for existing workspaces when the matching template later lands (or expose an admin 're-seed config from template' action). 3. Preflight should treat a runtime/config-declared-runtime mismatch (adapter `google-adk` vs config `claude-code`) as an error surfaced to the canvas, not just a WARN that still boots not-configured. ## Mitigation applied The 4 demo agents were manually repaired (correct `google-adk` `config.yaml` + per-agent `system-prompt.md` written to `/configs`, container restarted); all 4 now answer correctly via ADK→Vertex→Gemini 2.5 Pro. `/configs` is bind-mounted so the fix persists across restart. _Filed from CTO ops session 2026-05-31._
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2027