From bc1d530f715d6daa3a8457f1f0346b5a410fbb77 Mon Sep 17 00:00:00 2001
From: core-devops <core-devops@moleculesai.app>
Date: Sun, 21 Jun 2026 02:27:58 +0000
Subject: [PATCH 1/6] docs(rfc): platform-metered image generation
 (entitlement, key injection, cap, attribution)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Draft RFC for CTO review — the billing-sensitive half of molecule-ai-plugin-image-gen.
Applies the post-opus-cost-leak guardrails (attribution + fail-closed cap + priced models)
to a default-platform-metered, BYOK-override image-gen plugin (OpenAI GPT Image 2 + Gemini Nano Banana).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 95 +++++++++++++++++++
 1 file changed, 95 insertions(+)
 create mode 100644 docs/design/rfc-image-gen-platform-metered.md

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
new file mode 100644
index 000000000..489e6fb3a
--- /dev/null
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -0,0 +1,95 @@
+# RFC: Platform-metered image generation — entitlement, key injection, cap, attribution
+
+- **Status:** Draft — for CTO review (do not build until approved)
+- **Date:** 2026-06-20
+- **Scope:** ONLY the platform-metered + cost-control machinery for `molecule-ai-plugin-image-gen`. The plugin's MCP server, provider adapters, tool surface, and output handling are a straightforward, separately-tracked build — not this RFC. This RFC is the billing-sensitive cross-repo half.
+
+## 1. Context
+
+We're adding `molecule-ai-plugin-image-gen` — a multi-vendor image-generation MCP plugin (v1: **OpenAI GPT Image 2** + **Gemini "Nano Banana" / 2.5 Flash Image**, vendor-pluggable). Both run **platform-metered by default** (the platform already holds OpenAI creds in the Infisical SSOT and `GCP_SERVICE_ACCOUNT_JSON` for Gemini); **BYOK is an optional override**.
+
+Platform-metered = the platform pays the vendor bill. The **2026-06-10 opus cost-leak** (the SEO agent drained opus-4-8 via the CP proxy — invisible because the model was missing from the price catalog → fail-open $0 → unattributed) is the cautionary tale: **any platform-paid path must be attributed + capped + fail-closed from day one.** This RFC specifies that machinery for image-gen.
+
+## 2. Goals / Non-goals
+
+**Goals**
+- Platform-metered image gen works **out of the box for every workspace** (default-ON), bounded.
+- Every platform-paid generation is **attributed per workspace** and counted against a **per-workspace cap**.
+- Cap exhaustion **fails closed** (no silent overspend); recovery = raise cap or add a BYOK key.
+- **BYOK** override bypasses platform metering (user's own key + cost, uncapped).
+- Vendor keys **never embedded** in the plugin; injected by the platform, sourced from Infisical SSOT.
+
+**Non-goals**
+- The plugin's server/providers/tools/output (separate build).
+- A general LLM-metering overhaul — **reuse CP#752 attribution**, extend with an image dimension.
+- Marketplace monetization/billing (future).
+
+## 3. Design
+
+### 3.1 Credential resolution (at the mcp-image-gen server, per provider)
+1. **BYOK** — workspace secret `OPENAI_API_KEY` / `GEMINI_API_KEY` present → use it (uncapped, user's cost).
+2. **else platform-metered** — use the platform-injected key, within the per-workspace cap.
+
+There is **no "unavailable" state** — platform-metered is always on until the cap is hit. `list_image_providers()` reports per vendor `mode: byok | platform` + `budget_remaining`.
+
+### 3.2 Key injection (platform → workspace)
+Mirror the concierge org-key pattern (`conciergePlatformMCPEnv`): core/CP injects platform vendor creds into the workspace container env when `image-gen` is installed — the plugin's `settings-fragment` **references** these env vars, never embeds keys.
+- **OpenAI**: `OPENAI_API_KEY` (platform), sourced from **Infisical SSOT** (NOT the bootstrap cache, NOT hardcoded).
+- **Gemini**: `GCP_SERVICE_ACCOUNT_JSON` + `GCP_PROJECT_ID` (platform).
+- A marker (`IMAGE_GEN_PLATFORM=1`) so the server knows it's on the metered path.
+- Injection must happen at provision/install **and survive restart/re-provision** (per the internal#33 identity-on-restart lesson — same delivery-durability trap).
+
+### 3.3 Per-workspace cap (fail-closed)
+- A per-workspace **image budget** (default value + unit TBD — see open questions), stored CP-side (e.g., a small `image_budgets` table or a column).
+- **Default-ON** for all workspaces; org/admin can raise/lower/disable (disable = cap 0).
+- **Enforcement**: before each *platform-metered* generation, the server checks remaining budget (CP endpoint, or injected budget + server counter synced to CP). Over budget → **fail-closed** ("image budget reached — raise the cap or add your own API key"). **BYOK calls skip the check.**
+- **Decrement**: each successful platform generation reports usage → CP decrements + records attribution.
+
+### 3.4 Attribution / metering (reuse CP#752)
+- Every platform generation emits: `{workspace_id, org_id, vendor, model, image_count, size, est_cost, ts}`.
+- Stored in the CP cost-attribution store — image-gen is a new **`service` dimension** alongside LLM.
+- `est_cost` from a per-vendor/model/size **image price table**. **CRITICAL:** the table MUST include every image model — an uncosted model fails open to $0 and goes invisible (exactly the opus-leak failure). New image models are blocked from the platform path until priced.
+- Dashboard: image spend per workspace/org (CP#752 WS3).
+
+### 3.5 Entitlement
+- **Default-ON within the cap** — no opt-in step (per CTO direction: platform-metered by default). The **cap is the control**; org/admin adjusts it. Optional org-level hard toggle is an open question.
+
+### 3.6 Where each piece lives
+| Piece | Repo |
+|---|---|
+| cred resolution, cap pre-check, usage reporting, fail-closed | plugin server (`@molecule-ai/mcp-image-gen`) |
+| key injection (Infisical-sourced, restart-durable) | core (mirror `conciergePlatformMCPEnv`) |
+| cap store + endpoints, attribution store + image price table + dashboard | CP (extend CP#752) |
+
+## 4. Security
+- Keys never in the plugin repo; injected from Infisical SSOT; plugin references env only.
+- GCP SA scoped to Vertex/Gemini image (least privilege).
+- BYOK keys = encrypted per-workspace secrets; never logged.
+- Cap is fail-closed (the cost-leak guardrail).
+- Uncosted image model ⇒ blocked from the platform path (no fail-open $0).
+
+## 5. Cost-leak program mapping
+- **WS1 attribution** — per-workspace usage events ✓
+- **WS2 fail-closed** — per-workspace hard cap ✓
+- **WS3 dashboard** — image spend visibility ✓
+- **WS5 caching** — dedup identical prompt+params (future)
+
+## 6. Rollout
+1. Image price table + attribution schema (image `service` dimension).
+2. Cap store + CP endpoints.
+3. Key injection (core, Infisical-sourced, restart-durable).
+4. Plugin server: cred resolution + cap pre-check + usage reporting.
+5. Default cap value + org-adjust surface.
+6. Staging e2e: platform gen → attributed + decrements; cap-exceeded fail-closes; BYOK bypasses; restart preserves injection.
+
+## 7. Open questions (for your review)
+1. **Default cap value + unit** — images/day, $/day, or a credits-equiv? Starting number?
+2. **Org-level hard toggle** in addition to the cap (some orgs may want platform image-gen fully off, separate from "cap 0")?
+3. **Vertex AI vs Gemini Developer API** on project `gen-lang-client-0607853535` (auth + endpoint differ) — confirm which the SA is wired for.
+4. **Cap enforcement model** — live CP check per call (latency, accuracy) vs injected-budget + periodic sync (low latency, mild staleness)?
+5. **Image price source** — static maintained table vs a fetched catalog; who maintains it?
+
+## 8. Alternatives considered
+- **BYOK-only** — simplest, but worse UX and ignores existing platform creds. Rejected (CTO wants platform-metered default).
+- **Attribution only, no cap** — rejected (the opus cost-leak class).
+- **Route images through the existing LLM proxy** — images aren't chat-completions; a dedicated image service + attribution dimension is cleaner than overloading the LLM proxy.
-- 
2.52.0


From 1683a4131f9fd6983f40100bcaef3a5e819797e7 Mon Sep 17 00:00:00 2001
From: core-devops <core-devops@moleculesai.app>
Date: Sun, 21 Jun 2026 02:34:17 +0000
Subject: [PATCH 2/6] =?UTF-8?q?docs(rfc):=20rev=202=20=E2=80=94=20re-scope?=
 =?UTF-8?q?=20image=20gen=20to=20proxy-fronted=20+=20credits-billed,=20no?=
 =?UTF-8?q?=20caps,=20thin-adaptor=20plugin?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per CTO: image gen is uncapped, consumes platform credits via the existing platform proxy/billing;
the plugin is just an adaptor to the proxy. Drops per-plugin caps/key-injection/attribution.
Cost-leak guard moves to the proxy price-catalog (unpriced image model = rejected, no fail-open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 130 ++++++++----------
 1 file changed, 60 insertions(+), 70 deletions(-)

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
index 489e6fb3a..ea313183d 100644
--- a/docs/design/rfc-image-gen-platform-metered.md
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -1,95 +1,85 @@
-# RFC: Platform-metered image generation — entitlement, key injection, cap, attribution
+# RFC: Image generation via the platform proxy (credits-billed; plugin is a thin adaptor)
 
 - **Status:** Draft — for CTO review (do not build until approved)
-- **Date:** 2026-06-20
-- **Scope:** ONLY the platform-metered + cost-control machinery for `molecule-ai-plugin-image-gen`. The plugin's MCP server, provider adapters, tool surface, and output handling are a straightforward, separately-tracked build — not this RFC. This RFC is the billing-sensitive cross-repo half.
+- **Date:** 2026-06-20 (rev 2 — re-scoped per CTO: no caps, credits-billed, proxy-fronted)
+- **Scope:** how image generation is billed + routed. The real work is **extending the platform proxy** with image support; the plugin is a thin adaptor.
 
-## 1. Context
+## 1. Model (the corrected architecture)
 
-We're adding `molecule-ai-plugin-image-gen` — a multi-vendor image-generation MCP plugin (v1: **OpenAI GPT Image 2** + **Gemini "Nano Banana" / 2.5 Flash Image**, vendor-pluggable). Both run **platform-metered by default** (the platform already holds OpenAI creds in the Infisical SSOT and `GCP_SERVICE_ACCOUNT_JSON` for Gemini); **BYOK is an optional override**.
+Image generation is **uncapped** and **consumes the org's platform credits**, exactly like platform-managed LLM usage. It rides the **existing platform proxy** (the CP proxy that already fronts platform-managed model calls and is wired into billing/credits). The proxy is the single source of truth for billed vendor calls.
 
-Platform-metered = the platform pays the vendor bill. The **2026-06-10 opus cost-leak** (the SEO agent drained opus-4-8 via the CP proxy — invisible because the model was missing from the price catalog → fail-open $0 → unattributed) is the cautionary tale: **any platform-paid path must be attributed + capped + fail-closed from day one.** This RFC specifies that machinery for image-gen.
+```
+agent → mcp-image-gen (thin MCP adaptor) → platform proxy (CP) → {OpenAI | Gemini}
+                                                  │
+                                                  └─ debits org credits via the billing system
+```
+
+- **The proxy** holds the vendor keys (OpenAI from Infisical SSOT, Gemini `GCP_SERVICE_ACCOUNT_JSON`), routes by vendor, prices the call, and **debits credits**. Out of credits / over overage-cap → the proxy rejects (402), the same way every other platform-managed call already behaves. **No per-image cap** — the credit balance + `overage_cap_credits` is the limit.
+- **The plugin** (`molecule-ai-plugin-image-gen`) is a **thin adaptor**: exposes `generate_image` / `edit_image` MCP tools, forwards them to the proxy's image endpoint using the workspace's already-injected platform auth (same base-url + token the platform injects for platform-managed LLM), and writes the returned image to `/workspace`. It holds **no keys, no metering, no cap logic.**
+
+This is deliberately NOT per-plugin machinery — caps, key injection, attribution all live in the proxy/billing system that already exists, so image-gen is just a new billed route, not a new billing subsystem.
 
 ## 2. Goals / Non-goals
 
 **Goals**
-- Platform-metered image gen works **out of the box for every workspace** (default-ON), bounded.
-- Every platform-paid generation is **attributed per workspace** and counted against a **per-workspace cap**.
-- Cap exhaustion **fails closed** (no silent overspend); recovery = raise cap or add a BYOK key.
-- **BYOK** override bypasses platform metering (user's own key + cost, uncapped).
-- Vendor keys **never embedded** in the plugin; injected by the platform, sourced from Infisical SSOT.
+- Image gen works for any workspace out of the box, billed to platform credits via the proxy.
+- One central place (the proxy) owns vendor keys, routing, pricing, and credit-debit.
+- The plugin is a trivial adaptor — vendor-pluggable changes happen in the proxy.
+- No new cap/attribution subsystem — reuse credits + the proxy's existing billing path.
 
 **Non-goals**
-- The plugin's server/providers/tools/output (separate build).
-- A general LLM-metering overhaul — **reuse CP#752 attribution**, extend with an image dimension.
-- Marketplace monetization/billing (future).
+- Per-workspace image caps (explicitly dropped — credits are the limit).
+- Per-plugin key injection / per-plugin metering (the proxy owns these).
+- The plugin's tool schema / output handling (trivial; separately tracked).
 
 ## 3. Design
 
-### 3.1 Credential resolution (at the mcp-image-gen server, per provider)
-1. **BYOK** — workspace secret `OPENAI_API_KEY` / `GEMINI_API_KEY` present → use it (uncapped, user's cost).
-2. **else platform-metered** — use the platform-injected key, within the per-workspace cap.
+### 3.1 Proxy: add image routes
+- Add image endpoints to the platform proxy (e.g. `POST /v1/images/generations`, `/v1/images/edits`, or a unified `/v1/images` with a `vendor`/`model` param).
+- **Vendor routing**: `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini 2.5 Flash Image ("Nano Banana") via the platform GCP SA. Adding a vendor = a new route handler in the proxy.
+- **Vendor keys**: held by the proxy, sourced from Infisical SSOT (OpenAI key; Gemini SA). Never leave the proxy.
+- **Auth from the plugin**: the workspace's existing platform token (the proxy already authenticates platform-managed calls per workspace/org — reuse it to identify who to bill).
 
-There is **no "unavailable" state** — platform-metered is always on until the cap is hit. `list_image_providers()` reports per vendor `mode: byok | platform` + `budget_remaining`.
+### 3.2 Billing: credits debit (the cost-leak guard lives here)
+- Each image call is **priced** (per vendor/model/size) and **debited from org credits** through the existing billing system (`credits_balance` → `overage_used_credits` up to `overage_cap_credits`).
+- **CRITICAL (opus-cost-leak lesson):** image models MUST be in the price catalog. An unpriced model is **rejected**, never passed through at $0 — that fail-open (opus-4-8 missing from `llm_price_catalog`) is exactly what made the June 10 leak invisible. Extend the catalog with image SKUs; block unpriced models from the platform route.
+- **Limit = credits**, not a cap: when an org is out of credits / over `overage_cap_credits`, the proxy returns 402 and the plugin surfaces "out of image credits — top up." (Same UX as other platform-managed exhaustion.)
+- Attribution comes for free — the proxy already records per-workspace/org spend; image becomes a `service`/`sku` dimension on the existing ledger.
 
-### 3.2 Key injection (platform → workspace)
-Mirror the concierge org-key pattern (`conciergePlatformMCPEnv`): core/CP injects platform vendor creds into the workspace container env when `image-gen` is installed — the plugin's `settings-fragment` **references** these env vars, never embeds keys.
-- **OpenAI**: `OPENAI_API_KEY` (platform), sourced from **Infisical SSOT** (NOT the bootstrap cache, NOT hardcoded).
-- **Gemini**: `GCP_SERVICE_ACCOUNT_JSON` + `GCP_PROJECT_ID` (platform).
-- A marker (`IMAGE_GEN_PLATFORM=1`) so the server knows it's on the metered path.
-- Injection must happen at provision/install **and survive restart/re-provision** (per the internal#33 identity-on-restart lesson — same delivery-durability trap).
+### 3.3 Plugin (thin adaptor)
+- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` (npx `@molecule-ai/mcp-image-gen`).
+- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:path|url, vendor?, …)`, `list_image_models()`.
+- Each tool → `POST {PLATFORM_PROXY_BASE}/v1/images/...` with the platform auth → on success write `/workspace/.molecule/images/<ts>-<id>.png`, return the path. On 402 → surface "out of credits."
+- No keys, no cap, no metering in the plugin.
 
-### 3.3 Per-workspace cap (fail-closed)
-- A per-workspace **image budget** (default value + unit TBD — see open questions), stored CP-side (e.g., a small `image_budgets` table or a column).
-- **Default-ON** for all workspaces; org/admin can raise/lower/disable (disable = cap 0).
-- **Enforcement**: before each *platform-metered* generation, the server checks remaining budget (CP endpoint, or injected budget + server counter synced to CP). Over budget → **fail-closed** ("image budget reached — raise the cap or add your own API key"). **BYOK calls skip the check.**
-- **Decrement**: each successful platform generation reports usage → CP decrements + records attribution.
+### 3.4 BYOK (optional, likely defer)
+Under the proxy model, BYOK = the proxy accepts a caller-supplied vendor key and skips the credit-debit for that call (own cost). Clean to add later; **propose deferring from v1** unless you want it now — the proxy/credits path is the product default and BYOK adds a passthrough-auth path. (Open question.)
 
-### 3.4 Attribution / metering (reuse CP#752)
-- Every platform generation emits: `{workspace_id, org_id, vendor, model, image_count, size, est_cost, ts}`.
-- Stored in the CP cost-attribution store — image-gen is a new **`service` dimension** alongside LLM.
-- `est_cost` from a per-vendor/model/size **image price table**. **CRITICAL:** the table MUST include every image model — an uncosted model fails open to $0 and goes invisible (exactly the opus-leak failure). New image models are blocked from the platform path until priced.
-- Dashboard: image spend per workspace/org (CP#752 WS3).
+## 4. Where each piece lives
+| Piece | Repo | Notes |
+|---|---|---|
+| image routes, vendor routing, vendor keys, pricing, credit-debit | **CP / the platform proxy** | the bulk of the work; extends existing billing |
+| image SKUs in the price catalog | CP | unpriced = rejected (no fail-open) |
+| thin MCP adaptor + tools + output | `molecule-ai-plugin-image-gen` | trivial |
 
-### 3.5 Entitlement
-- **Default-ON within the cap** — no opt-in step (per CTO direction: platform-metered by default). The **cap is the control**; org/admin adjusts it. Optional org-level hard toggle is an open question.
-
-### 3.6 Where each piece lives
-| Piece | Repo |
-|---|---|
-| cred resolution, cap pre-check, usage reporting, fail-closed | plugin server (`@molecule-ai/mcp-image-gen`) |
-| key injection (Infisical-sourced, restart-durable) | core (mirror `conciergePlatformMCPEnv`) |
-| cap store + endpoints, attribution store + image price table + dashboard | CP (extend CP#752) |
-
-## 4. Security
-- Keys never in the plugin repo; injected from Infisical SSOT; plugin references env only.
-- GCP SA scoped to Vertex/Gemini image (least privilege).
-- BYOK keys = encrypted per-workspace secrets; never logged.
-- Cap is fail-closed (the cost-leak guardrail).
-- Uncosted image model ⇒ blocked from the platform path (no fail-open $0).
-
-## 5. Cost-leak program mapping
-- **WS1 attribution** — per-workspace usage events ✓
-- **WS2 fail-closed** — per-workspace hard cap ✓
-- **WS3 dashboard** — image spend visibility ✓
-- **WS5 caching** — dedup identical prompt+params (future)
+## 5. Security / cost-safety
+- Vendor keys live only in the proxy (Infisical-sourced); never in the plugin or workspace env.
+- Billed via credits → an org can only spend what it has (+ overage cap) — intrinsic limit, no runaway.
+- Unpriced image model ⇒ rejected at the proxy (the explicit anti-opus-leak rule).
 
 ## 6. Rollout
-1. Image price table + attribution schema (image `service` dimension).
-2. Cap store + CP endpoints.
-3. Key injection (core, Infisical-sourced, restart-durable).
-4. Plugin server: cred resolution + cap pre-check + usage reporting.
-5. Default cap value + org-adjust surface.
-6. Staging e2e: platform gen → attributed + decrements; cap-exceeded fail-closes; BYOK bypasses; restart preserves injection.
+1. Image SKUs + pricing in the catalog (block unpriced).
+2. Proxy image routes + OpenAI + Gemini vendor handlers + credit-debit.
+3. `@molecule-ai/mcp-image-gen` adaptor + plugin repo + register.
+4. Staging e2e: platform image gen debits credits + writes to /workspace; out-of-credits → 402 surfaced; (if BYOK) bypass works.
 
 ## 7. Open questions (for your review)
-1. **Default cap value + unit** — images/day, $/day, or a credits-equiv? Starting number?
-2. **Org-level hard toggle** in addition to the cap (some orgs may want platform image-gen fully off, separate from "cap 0")?
-3. **Vertex AI vs Gemini Developer API** on project `gen-lang-client-0607853535` (auth + endpoint differ) — confirm which the SA is wired for.
-4. **Cap enforcement model** — live CP check per call (latency, accuracy) vs injected-budget + periodic sync (low latency, mild staleness)?
-5. **Image price source** — static maintained table vs a fetched catalog; who maintains it?
+1. **Does the platform proxy already have a non-chat-completions extension point**, or do image routes need new plumbing? (It's LLM/chat-shaped today; images are a different request/response.)
+2. **Vertex AI vs Gemini Developer API** on `gen-lang-client-0607853535` (the proxy's Gemini handler auth/endpoint).
+3. **Image pricing** — where do the per-vendor/model/size SKUs come from (static maintained table vs fetched)?
+4. **BYOK in v1, or defer?** (Proxy-passthrough-key vs not.)
+5. **Output** — workspace-file + path (current plan) still right, or also return an inline/preview form?
 
 ## 8. Alternatives considered
-- **BYOK-only** — simplest, but worse UX and ignores existing platform creds. Rejected (CTO wants platform-metered default).
-- **Attribution only, no cap** — rejected (the opus cost-leak class).
-- **Route images through the existing LLM proxy** — images aren't chat-completions; a dedicated image service + attribution dimension is cleaner than overloading the LLM proxy.
+- **Per-plugin keys + cap + attribution** (rev 1 of this RFC) — rejected: rebuilds billing the proxy already does; caps are unnecessary when credits are the limit.
+- **Plugin calls vendors directly (no proxy)** — rejected: scatters keys + billing across workspaces; the proxy centralizes both.
-- 
2.52.0


From 2ba7da3155084f09ca0447fb132fa0e4d22a3120 Mon Sep 17 00:00:00 2001
From: core-devops <core-devops@moleculesai.app>
Date: Sun, 21 Jun 2026 02:49:59 +0000
Subject: [PATCH 3/6] =?UTF-8?q?docs(rfc):=20rev=203=20=E2=80=94=20fold=20i?=
 =?UTF-8?q?n=20CTO=20answers=20(price=20x1.5,=20defer=20BYOK,=20return-URL?=
 =?UTF-8?q?=20output)=20+=20proxy/Vertex=20findings?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Q1: proxy is per-wire-format (add ProxyImages like /v1/responses) + image billing is count/size not tokens + needs image storage->URL.
Q2: Vertex NOT available (SA 404 on aiplatform; AI-Studio project) -> recommend Gemini Developer API key.
Q5: tool returns a download URL; agent places it; no forced /workspace write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 120 +++++++++---------
 1 file changed, 58 insertions(+), 62 deletions(-)

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
index ea313183d..10ae0513b 100644
--- a/docs/design/rfc-image-gen-platform-metered.md
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -1,85 +1,81 @@
-# RFC: Image generation via the platform proxy (credits-billed; plugin is a thin adaptor)
+# RFC: Image generation via the platform proxy (credits-billed, no caps; plugin = thin adaptor)
 
 - **Status:** Draft — for CTO review (do not build until approved)
-- **Date:** 2026-06-20 (rev 2 — re-scoped per CTO: no caps, credits-billed, proxy-fronted)
-- **Scope:** how image generation is billed + routed. The real work is **extending the platform proxy** with image support; the plugin is a thin adaptor.
+- **Date:** 2026-06-20 (rev 3 — folds in CTO answers + proxy/Vertex investigation)
+- **Scope:** how image generation is routed + billed. The work is **a new image handler on the platform proxy**; the plugin is a thin adaptor.
 
-## 1. Model (the corrected architecture)
+## 1. Model
 
-Image generation is **uncapped** and **consumes the org's platform credits**, exactly like platform-managed LLM usage. It rides the **existing platform proxy** (the CP proxy that already fronts platform-managed model calls and is wired into billing/credits). The proxy is the single source of truth for billed vendor calls.
+Image generation is **uncapped** and **consumes the org's platform credits**, like platform-managed LLM. It rides the **existing CP LLM proxy** (`internal/handlers/llm_proxy.go` + `internal/credits` billing + the price catalog + fail-closed + attribution — all already built and tested).
 
 ```
-agent → mcp-image-gen (thin MCP adaptor) → platform proxy (CP) → {OpenAI | Gemini}
-                                                  │
-                                                  └─ debits org credits via the billing system
+agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | Gemini}
+                                              │         │
+                                              │         └─ stores image → returns a download URL
+                                              └─ prices (vendor × 1.5) + debits org credits
 ```
 
-- **The proxy** holds the vendor keys (OpenAI from Infisical SSOT, Gemini `GCP_SERVICE_ACCOUNT_JSON`), routes by vendor, prices the call, and **debits credits**. Out of credits / over overage-cap → the proxy rejects (402), the same way every other platform-managed call already behaves. **No per-image cap** — the credit balance + `overage_cap_credits` is the limit.
-- **The plugin** (`molecule-ai-plugin-image-gen`) is a **thin adaptor**: exposes `generate_image` / `edit_image` MCP tools, forwards them to the proxy's image endpoint using the workspace's already-injected platform auth (same base-url + token the platform injects for platform-managed LLM), and writes the returned image to `/workspace`. It holds **no keys, no metering, no cap logic.**
+- **The proxy** holds vendor keys (Infisical SSOT), routes by vendor, prices the call, **debits credits**, **stores the generated image, and returns a download URL**. Out of credits / over `overage_cap_credits` → 402 (same as today). **No per-image cap** — credits are the limit.
+- **The plugin** (`molecule-ai-plugin-image-gen`) is a thin adaptor: `generate_image` / `edit_image` MCP tools → call the proxy → **return the download URL** to the agent. It holds no keys, no billing, no storage.
 
-This is deliberately NOT per-plugin machinery — caps, key injection, attribution all live in the proxy/billing system that already exists, so image-gen is just a new billed route, not a new billing subsystem.
+## 2. CTO decisions (locked) + findings
 
-## 2. Goals / Non-goals
-
-**Goals**
-- Image gen works for any workspace out of the box, billed to platform credits via the proxy.
-- One central place (the proxy) owns vendor keys, routing, pricing, and credit-debit.
-- The plugin is a trivial adaptor — vendor-pluggable changes happen in the proxy.
-- No new cap/attribution subsystem — reuse credits + the proxy's existing billing path.
-
-**Non-goals**
-- Per-workspace image caps (explicitly dropped — credits are the limit).
-- Per-plugin key injection / per-plugin metering (the proxy owns these).
-- The plugin's tool schema / output handling (trivial; separately tracked).
+- **Caps:** none. Credits are the limit. *(locked)*
+- **Pricing (Q3):** **fetched vendor price × 1.5** service fee → debited from credits. Stored as image SKUs in the price catalog; **unpriced model = rejected** (no fail-open $0 — the opus-leak rule). *(locked)*
+- **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)*
+- **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)*
+- **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)*
+- **Gemini path (Q2 — CONFIRMED NEGATIVE):** the platform SA (`molecule-provisioner@gen-lang-client-0607853535`) mints a token but **Vertex AI returns 404 (API not enabled on that AI-Studio project)** and the **Gemini Developer API returns 403 (wants an API key, not SA-OAuth)**. So **we do NOT currently have working Vertex.** → **needs a decision** (see §5).
 
 ## 3. Design
 
-### 3.1 Proxy: add image routes
-- Add image endpoints to the platform proxy (e.g. `POST /v1/images/generations`, `/v1/images/edits`, or a unified `/v1/images` with a `vendor`/`model` param).
-- **Vendor routing**: `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini 2.5 Flash Image ("Nano Banana") via the platform GCP SA. Adding a vendor = a new route handler in the proxy.
-- **Vendor keys**: held by the proxy, sourced from Infisical SSOT (OpenAI key; Gemini SA). Never leave the proxy.
-- **Auth from the plugin**: the workspace's existing platform token (the proxy already authenticates platform-managed calls per workspace/org — reuse it to identify who to bill).
+### 3.1 Proxy: new `/v1/images` handler
+- `ProxyImages(c)` — mirror the `ProxyOpenAIResponses` precedent. Accept a unified body `{prompt, vendor, model, size, n, image?(for edit)}`.
+- **Vendor routing:** `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini "Nano Banana" (`gemini-2.5-flash-image`) via whichever auth §5 resolves. New vendor = new branch.
+- **Keys:** held by the proxy, Infisical-sourced; never leave the proxy.
+- **Principal:** reuse the proxy's existing per-workspace/org auth to know who to bill.
 
-### 3.2 Billing: credits debit (the cost-leak guard lives here)
-- Each image call is **priced** (per vendor/model/size) and **debited from org credits** through the existing billing system (`credits_balance` → `overage_used_credits` up to `overage_cap_credits`).
-- **CRITICAL (opus-cost-leak lesson):** image models MUST be in the price catalog. An unpriced model is **rejected**, never passed through at $0 — that fail-open (opus-4-8 missing from `llm_price_catalog`) is exactly what made the June 10 leak invisible. Extend the catalog with image SKUs; block unpriced models from the platform route.
-- **Limit = credits**, not a cap: when an org is out of credits / over `overage_cap_credits`, the proxy returns 402 and the plugin surfaces "out of image credits — top up." (Same UX as other platform-managed exhaustion.)
-- Attribution comes for free — the proxy already records per-workspace/org spend; image becomes a `service`/`sku` dimension on the existing ledger.
+### 3.2 Billing: image SKUs + credits (the cost-leak guard)
+- Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`.
+- Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger.
+- **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images).
+- Limit = credits; out → 402 surfaced by the plugin as "out of image credits."
 
-### 3.3 Plugin (thin adaptor)
-- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` (npx `@molecule-ai/mcp-image-gen`).
-- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:path|url, vendor?, …)`, `list_image_models()`.
-- Each tool → `POST {PLATFORM_PROXY_BASE}/v1/images/...` with the platform auth → on success write `/workspace/.molecule/images/<ts>-<id>.png`, return the path. On 402 → surface "out of credits."
-- No keys, no cap, no metering in the plugin.
+### 3.3 Image storage → download URL (new)
+- The proxy stores each generated image (object store / signed-URL bucket) and returns a **time-boxed download URL** in the response.
+- The agent fetches it (to `/workspace` or anywhere) and decides what to do (send to user, etc.). Retention/expiry of the stored image: open (default e.g. 24h signed URL).
 
-### 3.4 BYOK (optional, likely defer)
-Under the proxy model, BYOK = the proxy accepts a caller-supplied vendor key and skips the credit-debit for that call (own cost). Clean to add later; **propose deferring from v1** unless you want it now — the proxy/credits path is the product default and BYOK adds a passthrough-auth path. (Open question.)
+### 3.4 Plugin (thin adaptor)
+- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` → npx `@molecule-ai/mcp-image-gen`.
+- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:url|path, vendor?, …)`, `list_image_models()`.
+- Each tool → POST the proxy `/v1/images` with the platform auth → **return `{url, vendor, model, expires_at}`** to the agent. On 402 → "out of credits." No keys/billing/storage in the plugin.
 
 ## 4. Where each piece lives
-| Piece | Repo | Notes |
-|---|---|---|
-| image routes, vendor routing, vendor keys, pricing, credit-debit | **CP / the platform proxy** | the bulk of the work; extends existing billing |
-| image SKUs in the price catalog | CP | unpriced = rejected (no fail-open) |
-| thin MCP adaptor + tools + output | `molecule-ai-plugin-image-gen` | trivial |
+| Piece | Repo |
+|---|---|
+| `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk |
+| thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial |
 
-## 5. Security / cost-safety
-- Vendor keys live only in the proxy (Infisical-sourced); never in the plugin or workspace env.
-- Billed via credits → an org can only spend what it has (+ overage cap) — intrinsic limit, no runaway.
-- Unpriced image model ⇒ rejected at the proxy (the explicit anti-opus-leak rule).
+## 5. The one open decision (Q2 fallout)
+Vertex isn't available as-is. Pick the Gemini path:
+- **(A) Gemini Developer API + `GEMINI_API_KEY`** — standard for `gemini-2.5-flash-image`; need to confirm a key exists in Infisical or mint one in `gen-lang-client-0607853535`. **Lowest effort. Recommended.**
+- **(B) Enable Vertex AI** on the project + grant the SA `Vertex AI User` → use Vertex via the SA. More infra; only worth it if you specifically want Vertex (quota/region/SLA reasons).
 
-## 6. Rollout
-1. Image SKUs + pricing in the catalog (block unpriced).
-2. Proxy image routes + OpenAI + Gemini vendor handlers + credit-debit.
-3. `@molecule-ai/mcp-image-gen` adaptor + plugin repo + register.
-4. Staging e2e: platform image gen debits credits + writes to /workspace; out-of-credits → 402 surfaced; (if BYOK) bypass works.
+(OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.)
 
-## 7. Open questions (for your review)
-1. **Does the platform proxy already have a non-chat-completions extension point**, or do image routes need new plumbing? (It's LLM/chat-shaped today; images are a different request/response.)
-2. **Vertex AI vs Gemini Developer API** on `gen-lang-client-0607853535` (the proxy's Gemini handler auth/endpoint).
-3. **Image pricing** — where do the per-vendor/model/size SKUs come from (static maintained table vs fetched)?
-4. **BYOK in v1, or defer?** (Proxy-passthrough-key vs not.)
-5. **Output** — workspace-file + path (current plan) still right, or also return an inline/preview form?
+## 6. Remaining smaller open items
+- Image **storage backend** + URL expiry default (24h?).
+- Image **price source** to feed the ×1.5 (vendor pricing page → static-maintained vs fetched).
+- `n>1` / batch semantics.
+
+## 7. Rollout
+1. Resolve §5 (Gemini path) + confirm/mint the key.
+2. Image SKUs + ×1.5 pricing in the catalog (block unpriced).
+3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL.
+4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register.
+5. Staging e2e: gen debits credits + returns a working URL; out-of-credits → 402; edit works.
 
 ## 8. Alternatives considered
-- **Per-plugin keys + cap + attribution** (rev 1 of this RFC) — rejected: rebuilds billing the proxy already does; caps are unnecessary when credits are the limit.
-- **Plugin calls vendors directly (no proxy)** — rejected: scatters keys + billing across workspaces; the proxy centralizes both.
+- Per-plugin keys/cap/attribution (rev 1) — rejected; rebuilds what the proxy/credits already do.
+- Plugin writes to `/workspace` (rev 2) — superseded by Q5: return a URL, let the agent place it.
+- Plugin calls vendors directly — rejected; scatters keys + billing.
-- 
2.52.0


From 9463f074f8a16af89099d2ba8e80a5e563bf04df Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?=
 <core-devops@agents.moleculesai.app>
Date: Sun, 21 Jun 2026 03:24:05 +0000
Subject: [PATCH 4/6] RFC image-gen: Q2 resolved + verified live (Vertex
 gemini-2.5-flash-image 200, 1290 tok/image); per-vendor billing-unit note

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
index 10ae0513b..3d0e47b9c 100644
--- a/docs/design/rfc-image-gen-platform-metered.md
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -25,7 +25,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 - **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)*
 - **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)*
 - **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)*
-- **Gemini path (Q2 — CONFIRMED NEGATIVE):** the platform SA (`molecule-provisioner@gen-lang-client-0607853535`) mints a token but **Vertex AI returns 404 (API not enabled on that AI-Studio project)** and the **Gemini Developer API returns 403 (wants an API key, not SA-OAuth)**. So **we do NOT currently have working Vertex.** → **needs a decision** (see §5).
+- **Gemini path (Q2 — RESOLVED 2026-06-20):** enable **Vertex AI / "Gemini Enterprise Agent Platform"** (`aiplatform.googleapis.com`) on project **`molecules-ai-proxy`** (the billed proxy project, NOT the AI-Studio `gen-lang-client-*` project) + grant a dedicated SA the **Agent Platform User** role (`roles/aiplatform.user` — note the rebrand; the old "Vertex AI User" title is gone). The proxy authenticates with an **SA JSON key** stored in Infisical (`GCP_VERTEX_SA_JSON`). *(locked)*
+  - **Org-policy note:** the org enforces **`iam.disableServiceAccountKeyCreation`** AND disallows API keys (secure-by-default), so both simple credential paths are blocked org-wide. v1 uses a **project-scoped exception** to that constraint for `molecules-ai-proxy` only (one long-lived key, in Infisical). **Hardening follow-up:** migrate to **Workload Identity Federation** (no key) or run the Gemini-calling path on a **GCP-attached SA** (Cloud Run, ADC) once the proxy has an OIDC token source — both keep the org policy intact. Tracked as a post-v1 item.
 
 ## 3. Design
 
@@ -37,6 +38,7 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 
 ### 3.2 Billing: image SKUs + credits (the cost-leak guard)
 - Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`.
+- **Billing unit differs per vendor (finding):** OpenAI GPT Image 2 bills **count/size** (per-image SKU); **Gemini-2.5-flash-image on Vertex bills token-based — 1290 tokens per generated image** (Vertex meters it as a `generateContent` call). So the Gemini branch slots into the existing **token-billing** path (the same machinery as text), while OpenAI needs the new count/size SKU. Both apply the **×1.5** service fee.
 - Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger.
 - **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images).
 - Limit = credits; out → 402 surfaced by the plugin as "out of image credits."
@@ -56,10 +58,10 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 | `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk |
 | thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial |
 
-## 5. The one open decision (Q2 fallout)
-Vertex isn't available as-is. Pick the Gemini path:
-- **(A) Gemini Developer API + `GEMINI_API_KEY`** — standard for `gemini-2.5-flash-image`; need to confirm a key exists in Infisical or mint one in `gen-lang-client-0607853535`. **Lowest effort. Recommended.**
-- **(B) Enable Vertex AI** on the project + grant the SA `Vertex AI User` → use Vertex via the SA. More infra; only worth it if you specifically want Vertex (quota/region/SLA reasons).
+## 5. Q2 resolved + VERIFIED LIVE — Gemini path (was open)
+**Decision (2026-06-20):** Vertex on `molecules-ai-proxy` via a dedicated SA (`vertex-ai-user@molecules-ai-proxy`) with **Agent Platform User** (`roles/aiplatform.user`), authenticated by an **SA JSON key in Infisical** under a **project-scoped exception** to `iam.disableServiceAccountKeyCreation`. Rejected at this stage: Gemini Developer API + API key (org disallows API keys). Hardening follow-up (WIF / GCP-attached SA) noted in §2.
+
+**Verified live 2026-06-20** — real call, SA key → `cloud-platform` scoped OAuth token → `POST .../locations/global/publishers/google/models/gemini-2.5-flash-image:generateContent` with `generationConfig.responseModalities:["IMAGE"]` → **HTTP 200**, returned an `inlineData` `image/png` part. `usageMetadata`: `promptTokenCount=13`, **`candidatesTokenCount=1290` (IMAGE modality)**, `totalTokenCount=1303`, `trafficType=ON_DEMAND`. Endpoint host: `aiplatform.googleapis.com` (location `global`).
 
 (OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.)
 
@@ -69,7 +71,7 @@ Vertex isn't available as-is. Pick the Gemini path:
 - `n>1` / batch semantics.
 
 ## 7. Rollout
-1. Resolve §5 (Gemini path) + confirm/mint the key.
+1. ~~Resolve §5 (Gemini path) + confirm/mint the key.~~ **DONE + verified live 2026-06-20.** Remaining: move the SA key into Infisical SSOT (`GCP_VERTEX_SA_JSON`) so the Railway proxy can read it (currently only on the operator/local).
 2. Image SKUs + ×1.5 pricing in the catalog (block unpriced).
 3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL.
 4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register.
-- 
2.52.0


From 9accd8d5aa90cc96f21b5507c8a05a75a1f8ad40 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?=
 <core-devops@agents.moleculesai.app>
Date: Sun, 21 Jun 2026 03:55:17 +0000
Subject: [PATCH 5/6] =?UTF-8?q?RFC=20image-gen:=20Q2=20FINAL=20=E2=80=94?=
 =?UTF-8?q?=20reuse=20existing=20molecule-vertex=20keyless=20WIF=20(retire?=
 =?UTF-8?q?=20molecules-ai-proxy=20SA-key=20detour);=20reflects=20CP=20#88?=
 =?UTF-8?q?0?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
index 3d0e47b9c..88047c324 100644
--- a/docs/design/rfc-image-gen-platform-metered.md
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -25,8 +25,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 - **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)*
 - **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)*
 - **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)*
-- **Gemini path (Q2 — RESOLVED 2026-06-20):** enable **Vertex AI / "Gemini Enterprise Agent Platform"** (`aiplatform.googleapis.com`) on project **`molecules-ai-proxy`** (the billed proxy project, NOT the AI-Studio `gen-lang-client-*` project) + grant a dedicated SA the **Agent Platform User** role (`roles/aiplatform.user` — note the rebrand; the old "Vertex AI User" title is gone). The proxy authenticates with an **SA JSON key** stored in Infisical (`GCP_VERTEX_SA_JSON`). *(locked)*
-  - **Org-policy note:** the org enforces **`iam.disableServiceAccountKeyCreation`** AND disallows API keys (secure-by-default), so both simple credential paths are blocked org-wide. v1 uses a **project-scoped exception** to that constraint for `molecules-ai-proxy` only (one long-lived key, in Infisical). **Hardening follow-up:** migrate to **Workload Identity Federation** (no key) or run the Gemini-calling path on a **GCP-attached SA** (Cloud Run, ADC) once the proxy has an OIDC token source — both keep the org policy intact. Tracked as a post-v1 item.
+- **Gemini path (Q2 — RESOLVED 2026-06-20, REVISED after code review):** the proxy **already** serves platform Gemini via Vertex on project **`molecule-vertex`** using a **keyless AWS→GCP Workload Identity Federation** mint (`internal/vertexauth.Token`, SA `molecule-vertex-adc@molecule-vertex`). Image gen **reuses that exact path** — no new credential, no new project. Image calls hit the **native `:generateContent`** endpoint (`responseModalities:["IMAGE"]`) at location `global`; text uses the OpenAI-compat surface. *(locked + built — CP #880)*
+  - **Detour retired:** an earlier revision set up an SA **key** on a separate `molecules-ai-proxy` project (the org blocks `iam.disableServiceAccountKeyCreation` + API keys, so it needed a scoped policy exception). That was redundant — the codebase already does keyless WIF, which IS the hardening target. The SA key, the Infisical secret, and the policy exception are being removed; the SA/exception cleanup is a GCP-console action for the owner.
 
 ## 3. Design
 
@@ -58,10 +58,10 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 | `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk |
 | thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial |
 
-## 5. Q2 resolved + VERIFIED LIVE — Gemini path (was open)
-**Decision (2026-06-20):** Vertex on `molecules-ai-proxy` via a dedicated SA (`vertex-ai-user@molecules-ai-proxy`) with **Agent Platform User** (`roles/aiplatform.user`), authenticated by an **SA JSON key in Infisical** under a **project-scoped exception** to `iam.disableServiceAccountKeyCreation`. Rejected at this stage: Gemini Developer API + API key (org disallows API keys). Hardening follow-up (WIF / GCP-attached SA) noted in §2.
+## 5. Q2 resolved — Gemini path (was open)
+**Final decision (2026-06-20):** reuse the **existing keyless `molecule-vertex` WIF path** the proxy already uses for Gemini text (`internal/vertexauth.Token`). Image gen targets the native `gemini-2.5-flash-image:generateContent` endpoint at location `global`. **Zero new credentials.** Built in CP #880.
 
-**Verified live 2026-06-20** — real call, SA key → `cloud-platform` scoped OAuth token → `POST .../locations/global/publishers/google/models/gemini-2.5-flash-image:generateContent` with `generationConfig.responseModalities:["IMAGE"]` → **HTTP 200**, returned an `inlineData` `image/png` part. `usageMetadata`: `promptTokenCount=13`, **`candidatesTokenCount=1290` (IMAGE modality)**, `totalTokenCount=1303`, `trafficType=ON_DEMAND`. Endpoint host: `aiplatform.googleapis.com` (location `global`).
+**Verified twice:** (a) the model + request/response shape was proven live 2026-06-20 — `:generateContent` with `responseModalities:["IMAGE"]` → HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). (b) The WIF path itself is the same one already serving Gemini text in prod. One deploy-time check remains: that `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — confirmed by the staging e2e (the WIF mint is AWS-identity-bound, not locally exercisable).
 
 (OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.)
 
@@ -71,7 +71,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI |
 - `n>1` / batch semantics.
 
 ## 7. Rollout
-1. ~~Resolve §5 (Gemini path) + confirm/mint the key.~~ **DONE + verified live 2026-06-20.** Remaining: move the SA key into Infisical SSOT (`GCP_VERTEX_SA_JSON`) so the Railway proxy can read it (currently only on the operator/local).
+1. ~~Resolve §5 (Gemini path).~~ **DONE** — reuse existing `molecule-vertex` WIF; no credential work. (Detour SA key/secret/policy-exception being removed.)
+1b. **CP #880** (proxy handler + image SKUs + storage→URL + billing) — open, in review. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) are set.
 2. Image SKUs + ×1.5 pricing in the catalog (block unpriced).
 3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL.
 4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register.
-- 
2.52.0


From 3c83d5da7cd35df7e063919a2c7668dcb3ce0a0c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?=
 <core-devops@agents.moleculesai.app>
Date: Sun, 21 Jun 2026 04:16:34 +0000
Subject: [PATCH 6/6] RFC rev4: re-scope to a GENERIC two-tier plugin proxy
 socket (capabilities = data); image gen = first Tier-A consumer; supersedes
 bespoke ProxyImages

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/design/rfc-image-gen-platform-metered.md | 167 +++++++++++-------
 1 file changed, 104 insertions(+), 63 deletions(-)

diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md
index 88047c324..ce6df7d1a 100644
--- a/docs/design/rfc-image-gen-platform-metered.md
+++ b/docs/design/rfc-image-gen-platform-metered.md
@@ -1,84 +1,125 @@
-# RFC: Image generation via the platform proxy (credits-billed, no caps; plugin = thin adaptor)
+# RFC: Plugin proxy socket — a generic metered egress primitive (two-tier registry); image generation = first consumer
 
-- **Status:** Draft — for CTO review (do not build until approved)
-- **Date:** 2026-06-20 (rev 3 — folds in CTO answers + proxy/Vertex investigation)
-- **Scope:** how image generation is routed + billed. The work is **a new image handler on the platform proxy**; the plugin is a thin adaptor.
+- **Status:** Draft — for CTO review (do not build the generic socket until the two-tier shape is signed off)
+- **Date:** 2026-06-20 (rev 4 — re-scoped from a bespoke image handler to a generic socket after design review)
+- **Supersedes:** the bespoke `ProxyImages` approach in **CP #880** (that PR is gutted down to the migration + the socket's first capability entry — see §10).
 
-## 1. Model
+## 1. Why this changed shape
 
-Image generation is **uncapped** and **consumes the org's platform credits**, like platform-managed LLM. It rides the **existing CP LLM proxy** (`internal/handlers/llm_proxy.go` + `internal/credits` billing + the price catalog + fail-closed + attribution — all already built and tested).
+The first cut added a bespoke `ProxyImages` handler to the LLM proxy. That's an **addon**: do it again for video, TTS, embeddings, rerank, and the proxy becomes a junk drawer of per-capability handlers — which defeats the point of the plugin system. If we touch core, it should be a **fundamental, generic primitive built once**, not another special case.
+
+The codebase is already moving this way: `providers.yaml` (internal#718 / vertex-provider-ssot-endpoint) pulled routing facts (upstream URL, auth mode, wire prefix) **out of hardcoded Go into a registry**, and its own comment says *"Phase 2 migrates the remaining static providers."* This RFC is that endgame: a **generic, registry-driven metered egress socket** that any plugin uses, where adding a capability is **data, not code**.
+
+## 2. The primitive: one metered egress socket
+
+Core exposes ONE generic path. A plugin calls it; everything sensitive resolves server-side; the plugin only ever receives what is safe to hand out.
 
 ```
-agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | Gemini}
-                                              │         │
-                                              │         └─ stores image → returns a download URL
-                                              └─ prices (vendor × 1.5) + debits org credits
+plugin ──> POST /internal/llm/proxy   {capability/model, request}
+              │  CORE (generic — built once):
+              │   1. AUTH: workspace↔org handshake            ← already exists
+              │   2. RESOLVE capability from the REGISTRY     ← providers.yaml SSOT
+              │   3. INJECT vendor credential server-side     ← key-env | wif (registry auth_mode)
+              │   4. FORWARD to the registry-declared upstream
+              │   5. METER usage via registry-declared paths → debit credits (existing)
+              │   6. RETURN per registry response_mode, stripped of anything unsafe
+              └──> {safe output}   (never a key; never raw upstream internals)
 ```
 
-- **The proxy** holds vendor keys (Infisical SSOT), routes by vendor, prices the call, **debits credits**, **stores the generated image, and returns a download URL**. Out of credits / over `overage_cap_credits` → 402 (same as today). **No per-image cap** — credits are the limit.
-- **The plugin** (`molecule-ai-plugin-image-gen`) is a thin adaptor: `generate_image` / `edit_image` MCP tools → call the proxy → **return the download URL** to the agent. It holds no keys, no billing, no storage.
+### 2.1 Trust model (why the box never holds a vendor key)
+The plugin runs on the tenant's workspace box, where the tenant has **root**. The box holds an **org-scoped credential** (the org/admin token + workspace id, read from workspace env) and uses it to handshake with CP (`resolveLLMProxyPrincipal` + `workspaceBelongsToOrg`, already in core). The box does **NOT** hold the vendor key.
 
-## 2. CTO decisions (locked) + findings
+Blast radius of a compromised box is therefore **asymmetric and bounded**:
+- attacker can spend **that one org's** credits (≤ balance + overage cap) — the tenant's own loss;
+- the platform's **master vendor keys stay in CP**, and **every other org is untouched** — no unmetered global abuse, no key exfiltration.
 
-- **Caps:** none. Credits are the limit. *(locked)*
-- **Pricing (Q3):** **fetched vendor price × 1.5** service fee → debited from credits. Stored as image SKUs in the price catalog; **unpriced model = rejected** (no fail-open $0 — the opus-leak rule). *(locked)*
-- **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)*
-- **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)*
-- **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)*
-- **Gemini path (Q2 — RESOLVED 2026-06-20, REVISED after code review):** the proxy **already** serves platform Gemini via Vertex on project **`molecule-vertex`** using a **keyless AWS→GCP Workload Identity Federation** mint (`internal/vertexauth.Token`, SA `molecule-vertex-adc@molecule-vertex`). Image gen **reuses that exact path** — no new credential, no new project. Image calls hit the **native `:generateContent`** endpoint (`responseModalities:["IMAGE"]`) at location `global`; text uses the OpenAI-compat surface. *(locked + built — CP #880)*
-  - **Detour retired:** an earlier revision set up an SA **key** on a separate `molecules-ai-proxy` project (the org blocks `iam.disableServiceAccountKeyCreation` + API keys, so it needed a scoped policy exception). That was redundant — the codebase already does keyless WIF, which IS the hardening target. The SA key, the Infisical secret, and the policy exception are being removed; the SA/exception cleanup is a GCP-console action for the owner.
+That asymmetry is the whole reason keys + billing live in CP and only the handshake lives on the box. (Hardening seam, not v1: make the box token *workspace*-scoped rather than org-admin to shrink the radius further.)
 
-## 3. Design
+## 3. Capabilities are data (the registry)
 
-### 3.1 Proxy: new `/v1/images` handler
-- `ProxyImages(c)` — mirror the `ProxyOpenAIResponses` precedent. Accept a unified body `{prompt, vendor, model, size, n, image?(for edit)}`.
-- **Vendor routing:** `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini "Nano Banana" (`gemini-2.5-flash-image`) via whichever auth §5 resolves. New vendor = new branch.
-- **Keys:** held by the proxy, Infisical-sourced; never leave the proxy.
-- **Principal:** reuse the proxy's existing per-workspace/org auth to know who to bill.
+A capability is a registry entry (extends `providers.yaml`, the existing SSOT). It declares everything the generic socket needs — no per-capability Go:
 
-### 3.2 Billing: image SKUs + credits (the cost-leak guard)
-- Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`.
-- **Billing unit differs per vendor (finding):** OpenAI GPT Image 2 bills **count/size** (per-image SKU); **Gemini-2.5-flash-image on Vertex bills token-based — 1290 tokens per generated image** (Vertex meters it as a `generateContent` call). So the Gemini branch slots into the existing **token-billing** path (the same machinery as text), while OpenAI needs the new count/size SKU. Both apply the **×1.5** service fee.
-- Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger.
-- **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images).
-- Limit = credits; out → 402 surfaced by the plugin as "out of image credits."
+```yaml
+- name: gemini-image
+  capability: image
+  tier: platform_metered            # A (our keys/credits) | byok (their key)
+  upstream: vertex                  # reuses existing auth_mode: wif_adc (keyless WIF mint)
+  endpoint: ":generateContent"      # native image surface (vs the openapi text surface)
+  billing_model: gemini-2.5-flash-image   # → llm_price_catalog row
+  usage:                            # declarative extraction — no parse func
+    input_path:  usageMetadata.promptTokenCount
+    output_path: usageMetadata.candidatesTokenCount
+  response_mode: blob               # see §5; json | blob
+```
 
-### 3.3 Image storage → download URL (new)
-- The proxy stores each generated image (object store / signed-URL bucket) and returns a **time-boxed download URL** in the response.
-- The agent fetches it (to `/workspace` or anywhere) and decides what to do (send to user, etc.). Retention/expiry of the stored image: open (default e.g. 24h signed URL).
+Adding image, video, TTS, embeddings → **a registry entry + a price row. Zero new handler.**
 
-### 3.4 Plugin (thin adaptor)
-- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` → npx `@molecule-ai/mcp-image-gen`.
-- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:url|path, vendor?, …)`, `list_image_models()`.
-- Each tool → POST the proxy `/v1/images` with the platform auth → **return `{url, vendor, model, expires_at}`** to the agent. On 402 → "out of credits." No keys/billing/storage in the plugin.
+## 4. Two-tier registry (the marketplace split)
 
-## 4. Where each piece lives
-| Piece | Repo |
-|---|---|
-| `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk |
-| thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial |
+A capability entry binds **{ upstream · which credential to inject · the price }**. Whether an entry can be **third-party-dynamic** depends entirely on *whose credential*:
 
-## 5. Q2 resolved — Gemini path (was open)
-**Final decision (2026-06-20):** reuse the **existing keyless `molecule-vertex` WIF path** the proxy already uses for Gemini text (`internal/vertexauth.Token`). Image gen targets the native `gemini-2.5-flash-image:generateContent` endpoint at location `global`. **Zero new credentials.** Built in CP #880.
+### Tier A — platform-metered (our keys, our credits): **platform-curated, NOT freely dynamic**
+A free-for-all here is catastrophic: a plugin could declare *"forward to evil.com, inject the platform OpenAI key"* (key exfiltration), *"price = 0"* (billing bypass), or point egress anywhere (SSRF). So entries that spend **our** money with **our** keys — image gen included — are **platform-controlled**: a registry/DB row added through a **vetted onboarding / reviewed change**, never self-served. (Curated ≠ hardcoded-in-Go: it's a trusted config row, but the trust decision is ours.)
 
-**Verified twice:** (a) the model + request/response shape was proven live 2026-06-20 — `:generateContent` with `responseModalities:["IMAGE"]` → HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). (b) The WIF path itself is the same one already serving Gemini text in prod. One deploy-time check remains: that `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — confirmed by the staging e2e (the WIF mint is AWS-identity-bound, not locally exercisable).
+### Tier B — BYOK (the plugin brings its own key): **dynamically self-registrable**
+Nothing of ours is at stake — the third party's credential, their cost. So a third-party plugin **can register its own capability dynamically.** CP still proxies it (egress control + observability) but injects the **plugin's** key and **does not debit platform credits**.
 
-(OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.)
+The **marketplace scales through Tier B** (dynamic, self-served, ~10K plugins/day — see the marketplace RFC `project_marketplace_private_template_delivery`); **Tier A** stays a small curated set of platform-subsidized capabilities. One socket serves both; the only differences are *whose credential is injected* and *whether platform credits are debited*. (Tier B is a designed-in seam in v1, not necessarily shipped day one.)
 
-## 6. Remaining smaller open items
-- Image **storage backend** + URL expiry default (24h?).
-- Image **price source** to feed the ×1.5 (vendor pricing page → static-maintained vs fetched).
-- `n>1` / batch semantics.
+## 5. The only genuinely-new core primitives (built once, reused forever)
 
-## 7. Rollout
-1. ~~Resolve §5 (Gemini path).~~ **DONE** — reuse existing `molecule-vertex` WIF; no credential work. (Detour SA key/secret/policy-exception being removed.)
-1b. **CP #880** (proxy handler + image SKUs + storage→URL + billing) — open, in review. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) are set.
-2. Image SKUs + ×1.5 pricing in the catalog (block unpriced).
-3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL.
-4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register.
-5. Staging e2e: gen debits credits + returns a working URL; out-of-credits → 402; edit works.
+1. **Declarative usage extraction** — a registry-declared JSON path per token bucket (`input_path`, `output_path`, `cached_path`, …). Retires the `parseOpenAIUsage` / `parseAnthropicUsage` / `parseOpenAIResponsesUsage` sprawl; a new vendor's metering becomes config, not Go.
+2. **A small fixed set of response modes** — how the socket returns the upstream result safely:
+   - `json` — passthrough the (sanitized) JSON (text/chat/responses/embeddings).
+   - `blob` — the upstream returns binary (image/audio). Two sub-modes:
+     - `blob_url` — CP stores the bytes (R2) and returns a time-boxed **presigned URL** (uniform across vendors; the agent just gets a link).
+     - `blob_passthrough` — CP returns the bytes to the plugin; the plugin writes them into the workspace. Keeps core thinnest; output is a workspace file, not a hosted URL.
+   - **Open decision (D1):** default response_mode for images — `blob_url` (uniform URL, +R2 in core) vs `blob_passthrough` (thinnest core, file path out). Recommend `blob_url` for a clean agent UX, behind a per-capability flag so a capability can choose.
 
-## 8. Alternatives considered
-- Per-plugin keys/cap/attribution (rev 1) — rejected; rebuilds what the proxy/credits already do.
-- Plugin writes to `/workspace` (rev 2) — superseded by Q5: return a URL, let the agent place it.
-- Plugin calls vendors directly — rejected; scatters keys + billing.
+Everything else (auth, credential injection by `auth_mode`, forwarding, the credits debit, the fail-closed price gate) **already exists** — the socket wires the existing pieces generically.
+
+## 6. Image generation = first Tier-A consumer (the concrete instance)
+
+Image gen proves the primitive. As registry entries (Tier A, our keys/credits):
+
+- **`google/gemini-2.5-flash-image`** ("Nano Banana") — `upstream: vertex`, reuses the **existing keyless `molecule-vertex` WIF mint** (`internal/vertexauth.Token`) the proxy already uses for Gemini text. Native `:generateContent`, `responseModalities:["IMAGE"]`, location `global`. Verified live 2026-06-20: HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). **Zero new credentials.**
+- **`openai/gpt-image-2`** — `upstream: openai`, platform OpenAI key (Infisical). Via the OpenAI image surface (Images API, or the Responses API image tool the proxy already proxies — chosen at build time).
+- **Pricing (migration):** image SKUs in `llm_price_catalog` at **vendor list × 1.5** (markup baked into the row). Both vendors meter token-based (Gemini 1290 tok/image; gpt-image-2 token-based), so the existing per-token columns fit with no schema change. Unpriced image model → **422 pre-serve** (the anti-$0-leak gate). Anti-free-serve: if a vendor omits usage, synthesize the known output-token count so the debit always fires.
+- **Output:** per §5 response_mode (D1).
+
+## 7. Billing (unchanged path)
+Reuses `recordProxiedLLMUsage` → `ChargeLLMUsage` → `DebitWithOverage`: meter (declarative) → price-catalog lookup → debit org credits → overage up to cap → 402 when exhausted. Image gen is **uncapped**; credits are the only limit. Tier B (BYOK) records non-billable usage (observability) and debits nothing.
+
+## 8. What lives where (footprint)
+| Piece | Where | Size |
+|---|---|---|
+| Generic socket (auth→resolve→inject→forward→meter→respond) | **CP** core | built once |
+| Declarative usage extraction + response modes (json/blob) | **CP** core | built once |
+| Each capability (image, video, …) | **registry entry + price row** | data |
+| The plugin (`molecule-ai-plugin-image-gen` etc.) | plugin repo | thin: call socket → hand result to agent |
+
+## 9. The plugin (thin, unchanged in spirit)
+`molecule-ai-plugin-image-gen` — `plugin.yaml` + `settings-fragment.json` + a small MCP adaptor exposing `generate_image` / `edit_image` / `list_image_models`. Each tool reads workspace env (org/workspace id + handshake token), POSTs the socket, and returns the result (URL or file path per D1). No keys, no billing, no storage, no vendor-specific logic.
+
+## 10. Relationship to CP #880
+CP #880 (bespoke `ProxyImages` + R2 wiring + per-vendor parsers + tests) is **superseded**. Keep from it: **migration 055** (image price rows) and the verified vendor request/response shapes (they become the `gemini-image` / `gpt-image-2` registry entries + the `blob` response_mode). Drop: the bespoke handler, the hardcoded per-vendor parse funcs, the standalone storage wiring (folds into `response_mode: blob_url`). Net: #880 shrinks to the migration; the rest re-lands as the generic socket.
+
+## 11. Rollout
+1. **Generic socket alongside the existing text handlers** (do NOT converge text in the same change — don't destabilize the live text path). New capabilities route through the socket; chat/completions/messages/responses keep working as-is.
+2. Declarative usage extraction + response modes (`json`, `blob`).
+3. Tier-A image capability entries (`gemini-image`, `gpt-image-2`) + price rows. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) set, if `blob_url`.
+4. Thin `molecule-ai-plugin-image-gen`.
+5. Staging e2e: image gen debits credits + returns output; out-of-credits → 402; unpriced → 422; (edit works on gemini).
+6. **Follow-ups (designed-in, not v1):** Tier-B BYOK dynamic registration; converge the text handlers onto the socket (internal#718 Phase 2); workspace-scoped box token.
+
+## 12. Open decisions
+- **D1 (§5):** default image `response_mode` — `blob_url` (recommended) vs `blob_passthrough`.
+- **D2 (§11.1):** confirm "socket alongside, converge text later" (recommended) vs converge text now.
+- **D3 (§6):** OpenAI image via Images API vs the already-proxied Responses image tool — pick at build (favor the one needing least new egress surface).
+- **D4 (deploy):** confirm `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — proven by staging e2e; the WIF mint is AWS-identity-bound, not locally exercisable.
+
+## 13. Alternatives considered
+- **Bespoke per-capability handlers** (`ProxyImages`, future `ProxyVideo`, …) — rejected: addon sprawl, defeats the plugin system. (This RFC's whole motivation.)
+- **Plugin calls the vendor directly** — rejected: vendor keys on a root-accessible tenant box = the keyless-Vertex billing leak the codebase already closed; self-reported usage is forgeable.
+- **Separate SA key on `molecules-ai-proxy`** (rev 3) — rejected/retired: the proxy already does keyless `molecule-vertex` WIF; the SA key + org-policy exception were a redundant detour (Infisical secret deleted; owner GCP-console cleanup pending).
+- **Single platform god-token for all capabilities** — rejected: no per-seller isolation/entitlement; conflicts with the marketplace RFC. Hence the two-tier split.
-- 
2.52.0