From bc1d530f715d6daa3a8457f1f0346b5a410fbb77 Mon Sep 17 00:00:00 2001 From: core-devops Date: Sun, 21 Jun 2026 02:27:58 +0000 Subject: [PATCH 1/6] docs(rfc): platform-metered image generation (entitlement, key injection, cap, attribution) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Draft RFC for CTO review — the billing-sensitive half of molecule-ai-plugin-image-gen. Applies the post-opus-cost-leak guardrails (attribution + fail-closed cap + priced models) to a default-platform-metered, BYOK-override image-gen plugin (OpenAI GPT Image 2 + Gemini Nano Banana). Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 docs/design/rfc-image-gen-platform-metered.md diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md new file mode 100644 index 000000000..489e6fb3a --- /dev/null +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -0,0 +1,95 @@ +# RFC: Platform-metered image generation — entitlement, key injection, cap, attribution + +- **Status:** Draft — for CTO review (do not build until approved) +- **Date:** 2026-06-20 +- **Scope:** ONLY the platform-metered + cost-control machinery for `molecule-ai-plugin-image-gen`. The plugin's MCP server, provider adapters, tool surface, and output handling are a straightforward, separately-tracked build — not this RFC. This RFC is the billing-sensitive cross-repo half. + +## 1. Context + +We're adding `molecule-ai-plugin-image-gen` — a multi-vendor image-generation MCP plugin (v1: **OpenAI GPT Image 2** + **Gemini "Nano Banana" / 2.5 Flash Image**, vendor-pluggable). Both run **platform-metered by default** (the platform already holds OpenAI creds in the Infisical SSOT and `GCP_SERVICE_ACCOUNT_JSON` for Gemini); **BYOK is an optional override**. + +Platform-metered = the platform pays the vendor bill. The **2026-06-10 opus cost-leak** (the SEO agent drained opus-4-8 via the CP proxy — invisible because the model was missing from the price catalog → fail-open $0 → unattributed) is the cautionary tale: **any platform-paid path must be attributed + capped + fail-closed from day one.** This RFC specifies that machinery for image-gen. + +## 2. Goals / Non-goals + +**Goals** +- Platform-metered image gen works **out of the box for every workspace** (default-ON), bounded. +- Every platform-paid generation is **attributed per workspace** and counted against a **per-workspace cap**. +- Cap exhaustion **fails closed** (no silent overspend); recovery = raise cap or add a BYOK key. +- **BYOK** override bypasses platform metering (user's own key + cost, uncapped). +- Vendor keys **never embedded** in the plugin; injected by the platform, sourced from Infisical SSOT. + +**Non-goals** +- The plugin's server/providers/tools/output (separate build). +- A general LLM-metering overhaul — **reuse CP#752 attribution**, extend with an image dimension. +- Marketplace monetization/billing (future). + +## 3. Design + +### 3.1 Credential resolution (at the mcp-image-gen server, per provider) +1. **BYOK** — workspace secret `OPENAI_API_KEY` / `GEMINI_API_KEY` present → use it (uncapped, user's cost). +2. **else platform-metered** — use the platform-injected key, within the per-workspace cap. + +There is **no "unavailable" state** — platform-metered is always on until the cap is hit. `list_image_providers()` reports per vendor `mode: byok | platform` + `budget_remaining`. + +### 3.2 Key injection (platform → workspace) +Mirror the concierge org-key pattern (`conciergePlatformMCPEnv`): core/CP injects platform vendor creds into the workspace container env when `image-gen` is installed — the plugin's `settings-fragment` **references** these env vars, never embeds keys. +- **OpenAI**: `OPENAI_API_KEY` (platform), sourced from **Infisical SSOT** (NOT the bootstrap cache, NOT hardcoded). +- **Gemini**: `GCP_SERVICE_ACCOUNT_JSON` + `GCP_PROJECT_ID` (platform). +- A marker (`IMAGE_GEN_PLATFORM=1`) so the server knows it's on the metered path. +- Injection must happen at provision/install **and survive restart/re-provision** (per the internal#33 identity-on-restart lesson — same delivery-durability trap). + +### 3.3 Per-workspace cap (fail-closed) +- A per-workspace **image budget** (default value + unit TBD — see open questions), stored CP-side (e.g., a small `image_budgets` table or a column). +- **Default-ON** for all workspaces; org/admin can raise/lower/disable (disable = cap 0). +- **Enforcement**: before each *platform-metered* generation, the server checks remaining budget (CP endpoint, or injected budget + server counter synced to CP). Over budget → **fail-closed** ("image budget reached — raise the cap or add your own API key"). **BYOK calls skip the check.** +- **Decrement**: each successful platform generation reports usage → CP decrements + records attribution. + +### 3.4 Attribution / metering (reuse CP#752) +- Every platform generation emits: `{workspace_id, org_id, vendor, model, image_count, size, est_cost, ts}`. +- Stored in the CP cost-attribution store — image-gen is a new **`service` dimension** alongside LLM. +- `est_cost` from a per-vendor/model/size **image price table**. **CRITICAL:** the table MUST include every image model — an uncosted model fails open to $0 and goes invisible (exactly the opus-leak failure). New image models are blocked from the platform path until priced. +- Dashboard: image spend per workspace/org (CP#752 WS3). + +### 3.5 Entitlement +- **Default-ON within the cap** — no opt-in step (per CTO direction: platform-metered by default). The **cap is the control**; org/admin adjusts it. Optional org-level hard toggle is an open question. + +### 3.6 Where each piece lives +| Piece | Repo | +|---|---| +| cred resolution, cap pre-check, usage reporting, fail-closed | plugin server (`@molecule-ai/mcp-image-gen`) | +| key injection (Infisical-sourced, restart-durable) | core (mirror `conciergePlatformMCPEnv`) | +| cap store + endpoints, attribution store + image price table + dashboard | CP (extend CP#752) | + +## 4. Security +- Keys never in the plugin repo; injected from Infisical SSOT; plugin references env only. +- GCP SA scoped to Vertex/Gemini image (least privilege). +- BYOK keys = encrypted per-workspace secrets; never logged. +- Cap is fail-closed (the cost-leak guardrail). +- Uncosted image model ⇒ blocked from the platform path (no fail-open $0). + +## 5. Cost-leak program mapping +- **WS1 attribution** — per-workspace usage events ✓ +- **WS2 fail-closed** — per-workspace hard cap ✓ +- **WS3 dashboard** — image spend visibility ✓ +- **WS5 caching** — dedup identical prompt+params (future) + +## 6. Rollout +1. Image price table + attribution schema (image `service` dimension). +2. Cap store + CP endpoints. +3. Key injection (core, Infisical-sourced, restart-durable). +4. Plugin server: cred resolution + cap pre-check + usage reporting. +5. Default cap value + org-adjust surface. +6. Staging e2e: platform gen → attributed + decrements; cap-exceeded fail-closes; BYOK bypasses; restart preserves injection. + +## 7. Open questions (for your review) +1. **Default cap value + unit** — images/day, $/day, or a credits-equiv? Starting number? +2. **Org-level hard toggle** in addition to the cap (some orgs may want platform image-gen fully off, separate from "cap 0")? +3. **Vertex AI vs Gemini Developer API** on project `gen-lang-client-0607853535` (auth + endpoint differ) — confirm which the SA is wired for. +4. **Cap enforcement model** — live CP check per call (latency, accuracy) vs injected-budget + periodic sync (low latency, mild staleness)? +5. **Image price source** — static maintained table vs a fetched catalog; who maintains it? + +## 8. Alternatives considered +- **BYOK-only** — simplest, but worse UX and ignores existing platform creds. Rejected (CTO wants platform-metered default). +- **Attribution only, no cap** — rejected (the opus cost-leak class). +- **Route images through the existing LLM proxy** — images aren't chat-completions; a dedicated image service + attribution dimension is cleaner than overloading the LLM proxy. -- 2.52.0 From 1683a4131f9fd6983f40100bcaef3a5e819797e7 Mon Sep 17 00:00:00 2001 From: core-devops Date: Sun, 21 Jun 2026 02:34:17 +0000 Subject: [PATCH 2/6] =?UTF-8?q?docs(rfc):=20rev=202=20=E2=80=94=20re-scope?= =?UTF-8?q?=20image=20gen=20to=20proxy-fronted=20+=20credits-billed,=20no?= =?UTF-8?q?=20caps,=20thin-adaptor=20plugin?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per CTO: image gen is uncapped, consumes platform credits via the existing platform proxy/billing; the plugin is just an adaptor to the proxy. Drops per-plugin caps/key-injection/attribution. Cost-leak guard moves to the proxy price-catalog (unpriced image model = rejected, no fail-open). Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 130 ++++++++---------- 1 file changed, 60 insertions(+), 70 deletions(-) diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md index 489e6fb3a..ea313183d 100644 --- a/docs/design/rfc-image-gen-platform-metered.md +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -1,95 +1,85 @@ -# RFC: Platform-metered image generation — entitlement, key injection, cap, attribution +# RFC: Image generation via the platform proxy (credits-billed; plugin is a thin adaptor) - **Status:** Draft — for CTO review (do not build until approved) -- **Date:** 2026-06-20 -- **Scope:** ONLY the platform-metered + cost-control machinery for `molecule-ai-plugin-image-gen`. The plugin's MCP server, provider adapters, tool surface, and output handling are a straightforward, separately-tracked build — not this RFC. This RFC is the billing-sensitive cross-repo half. +- **Date:** 2026-06-20 (rev 2 — re-scoped per CTO: no caps, credits-billed, proxy-fronted) +- **Scope:** how image generation is billed + routed. The real work is **extending the platform proxy** with image support; the plugin is a thin adaptor. -## 1. Context +## 1. Model (the corrected architecture) -We're adding `molecule-ai-plugin-image-gen` — a multi-vendor image-generation MCP plugin (v1: **OpenAI GPT Image 2** + **Gemini "Nano Banana" / 2.5 Flash Image**, vendor-pluggable). Both run **platform-metered by default** (the platform already holds OpenAI creds in the Infisical SSOT and `GCP_SERVICE_ACCOUNT_JSON` for Gemini); **BYOK is an optional override**. +Image generation is **uncapped** and **consumes the org's platform credits**, exactly like platform-managed LLM usage. It rides the **existing platform proxy** (the CP proxy that already fronts platform-managed model calls and is wired into billing/credits). The proxy is the single source of truth for billed vendor calls. -Platform-metered = the platform pays the vendor bill. The **2026-06-10 opus cost-leak** (the SEO agent drained opus-4-8 via the CP proxy — invisible because the model was missing from the price catalog → fail-open $0 → unattributed) is the cautionary tale: **any platform-paid path must be attributed + capped + fail-closed from day one.** This RFC specifies that machinery for image-gen. +``` +agent → mcp-image-gen (thin MCP adaptor) → platform proxy (CP) → {OpenAI | Gemini} + │ + └─ debits org credits via the billing system +``` + +- **The proxy** holds the vendor keys (OpenAI from Infisical SSOT, Gemini `GCP_SERVICE_ACCOUNT_JSON`), routes by vendor, prices the call, and **debits credits**. Out of credits / over overage-cap → the proxy rejects (402), the same way every other platform-managed call already behaves. **No per-image cap** — the credit balance + `overage_cap_credits` is the limit. +- **The plugin** (`molecule-ai-plugin-image-gen`) is a **thin adaptor**: exposes `generate_image` / `edit_image` MCP tools, forwards them to the proxy's image endpoint using the workspace's already-injected platform auth (same base-url + token the platform injects for platform-managed LLM), and writes the returned image to `/workspace`. It holds **no keys, no metering, no cap logic.** + +This is deliberately NOT per-plugin machinery — caps, key injection, attribution all live in the proxy/billing system that already exists, so image-gen is just a new billed route, not a new billing subsystem. ## 2. Goals / Non-goals **Goals** -- Platform-metered image gen works **out of the box for every workspace** (default-ON), bounded. -- Every platform-paid generation is **attributed per workspace** and counted against a **per-workspace cap**. -- Cap exhaustion **fails closed** (no silent overspend); recovery = raise cap or add a BYOK key. -- **BYOK** override bypasses platform metering (user's own key + cost, uncapped). -- Vendor keys **never embedded** in the plugin; injected by the platform, sourced from Infisical SSOT. +- Image gen works for any workspace out of the box, billed to platform credits via the proxy. +- One central place (the proxy) owns vendor keys, routing, pricing, and credit-debit. +- The plugin is a trivial adaptor — vendor-pluggable changes happen in the proxy. +- No new cap/attribution subsystem — reuse credits + the proxy's existing billing path. **Non-goals** -- The plugin's server/providers/tools/output (separate build). -- A general LLM-metering overhaul — **reuse CP#752 attribution**, extend with an image dimension. -- Marketplace monetization/billing (future). +- Per-workspace image caps (explicitly dropped — credits are the limit). +- Per-plugin key injection / per-plugin metering (the proxy owns these). +- The plugin's tool schema / output handling (trivial; separately tracked). ## 3. Design -### 3.1 Credential resolution (at the mcp-image-gen server, per provider) -1. **BYOK** — workspace secret `OPENAI_API_KEY` / `GEMINI_API_KEY` present → use it (uncapped, user's cost). -2. **else platform-metered** — use the platform-injected key, within the per-workspace cap. +### 3.1 Proxy: add image routes +- Add image endpoints to the platform proxy (e.g. `POST /v1/images/generations`, `/v1/images/edits`, or a unified `/v1/images` with a `vendor`/`model` param). +- **Vendor routing**: `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini 2.5 Flash Image ("Nano Banana") via the platform GCP SA. Adding a vendor = a new route handler in the proxy. +- **Vendor keys**: held by the proxy, sourced from Infisical SSOT (OpenAI key; Gemini SA). Never leave the proxy. +- **Auth from the plugin**: the workspace's existing platform token (the proxy already authenticates platform-managed calls per workspace/org — reuse it to identify who to bill). -There is **no "unavailable" state** — platform-metered is always on until the cap is hit. `list_image_providers()` reports per vendor `mode: byok | platform` + `budget_remaining`. +### 3.2 Billing: credits debit (the cost-leak guard lives here) +- Each image call is **priced** (per vendor/model/size) and **debited from org credits** through the existing billing system (`credits_balance` → `overage_used_credits` up to `overage_cap_credits`). +- **CRITICAL (opus-cost-leak lesson):** image models MUST be in the price catalog. An unpriced model is **rejected**, never passed through at $0 — that fail-open (opus-4-8 missing from `llm_price_catalog`) is exactly what made the June 10 leak invisible. Extend the catalog with image SKUs; block unpriced models from the platform route. +- **Limit = credits**, not a cap: when an org is out of credits / over `overage_cap_credits`, the proxy returns 402 and the plugin surfaces "out of image credits — top up." (Same UX as other platform-managed exhaustion.) +- Attribution comes for free — the proxy already records per-workspace/org spend; image becomes a `service`/`sku` dimension on the existing ledger. -### 3.2 Key injection (platform → workspace) -Mirror the concierge org-key pattern (`conciergePlatformMCPEnv`): core/CP injects platform vendor creds into the workspace container env when `image-gen` is installed — the plugin's `settings-fragment` **references** these env vars, never embeds keys. -- **OpenAI**: `OPENAI_API_KEY` (platform), sourced from **Infisical SSOT** (NOT the bootstrap cache, NOT hardcoded). -- **Gemini**: `GCP_SERVICE_ACCOUNT_JSON` + `GCP_PROJECT_ID` (platform). -- A marker (`IMAGE_GEN_PLATFORM=1`) so the server knows it's on the metered path. -- Injection must happen at provision/install **and survive restart/re-provision** (per the internal#33 identity-on-restart lesson — same delivery-durability trap). +### 3.3 Plugin (thin adaptor) +- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` (npx `@molecule-ai/mcp-image-gen`). +- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:path|url, vendor?, …)`, `list_image_models()`. +- Each tool → `POST {PLATFORM_PROXY_BASE}/v1/images/...` with the platform auth → on success write `/workspace/.molecule/images/-.png`, return the path. On 402 → surface "out of credits." +- No keys, no cap, no metering in the plugin. -### 3.3 Per-workspace cap (fail-closed) -- A per-workspace **image budget** (default value + unit TBD — see open questions), stored CP-side (e.g., a small `image_budgets` table or a column). -- **Default-ON** for all workspaces; org/admin can raise/lower/disable (disable = cap 0). -- **Enforcement**: before each *platform-metered* generation, the server checks remaining budget (CP endpoint, or injected budget + server counter synced to CP). Over budget → **fail-closed** ("image budget reached — raise the cap or add your own API key"). **BYOK calls skip the check.** -- **Decrement**: each successful platform generation reports usage → CP decrements + records attribution. +### 3.4 BYOK (optional, likely defer) +Under the proxy model, BYOK = the proxy accepts a caller-supplied vendor key and skips the credit-debit for that call (own cost). Clean to add later; **propose deferring from v1** unless you want it now — the proxy/credits path is the product default and BYOK adds a passthrough-auth path. (Open question.) -### 3.4 Attribution / metering (reuse CP#752) -- Every platform generation emits: `{workspace_id, org_id, vendor, model, image_count, size, est_cost, ts}`. -- Stored in the CP cost-attribution store — image-gen is a new **`service` dimension** alongside LLM. -- `est_cost` from a per-vendor/model/size **image price table**. **CRITICAL:** the table MUST include every image model — an uncosted model fails open to $0 and goes invisible (exactly the opus-leak failure). New image models are blocked from the platform path until priced. -- Dashboard: image spend per workspace/org (CP#752 WS3). +## 4. Where each piece lives +| Piece | Repo | Notes | +|---|---|---| +| image routes, vendor routing, vendor keys, pricing, credit-debit | **CP / the platform proxy** | the bulk of the work; extends existing billing | +| image SKUs in the price catalog | CP | unpriced = rejected (no fail-open) | +| thin MCP adaptor + tools + output | `molecule-ai-plugin-image-gen` | trivial | -### 3.5 Entitlement -- **Default-ON within the cap** — no opt-in step (per CTO direction: platform-metered by default). The **cap is the control**; org/admin adjusts it. Optional org-level hard toggle is an open question. - -### 3.6 Where each piece lives -| Piece | Repo | -|---|---| -| cred resolution, cap pre-check, usage reporting, fail-closed | plugin server (`@molecule-ai/mcp-image-gen`) | -| key injection (Infisical-sourced, restart-durable) | core (mirror `conciergePlatformMCPEnv`) | -| cap store + endpoints, attribution store + image price table + dashboard | CP (extend CP#752) | - -## 4. Security -- Keys never in the plugin repo; injected from Infisical SSOT; plugin references env only. -- GCP SA scoped to Vertex/Gemini image (least privilege). -- BYOK keys = encrypted per-workspace secrets; never logged. -- Cap is fail-closed (the cost-leak guardrail). -- Uncosted image model ⇒ blocked from the platform path (no fail-open $0). - -## 5. Cost-leak program mapping -- **WS1 attribution** — per-workspace usage events ✓ -- **WS2 fail-closed** — per-workspace hard cap ✓ -- **WS3 dashboard** — image spend visibility ✓ -- **WS5 caching** — dedup identical prompt+params (future) +## 5. Security / cost-safety +- Vendor keys live only in the proxy (Infisical-sourced); never in the plugin or workspace env. +- Billed via credits → an org can only spend what it has (+ overage cap) — intrinsic limit, no runaway. +- Unpriced image model ⇒ rejected at the proxy (the explicit anti-opus-leak rule). ## 6. Rollout -1. Image price table + attribution schema (image `service` dimension). -2. Cap store + CP endpoints. -3. Key injection (core, Infisical-sourced, restart-durable). -4. Plugin server: cred resolution + cap pre-check + usage reporting. -5. Default cap value + org-adjust surface. -6. Staging e2e: platform gen → attributed + decrements; cap-exceeded fail-closes; BYOK bypasses; restart preserves injection. +1. Image SKUs + pricing in the catalog (block unpriced). +2. Proxy image routes + OpenAI + Gemini vendor handlers + credit-debit. +3. `@molecule-ai/mcp-image-gen` adaptor + plugin repo + register. +4. Staging e2e: platform image gen debits credits + writes to /workspace; out-of-credits → 402 surfaced; (if BYOK) bypass works. ## 7. Open questions (for your review) -1. **Default cap value + unit** — images/day, $/day, or a credits-equiv? Starting number? -2. **Org-level hard toggle** in addition to the cap (some orgs may want platform image-gen fully off, separate from "cap 0")? -3. **Vertex AI vs Gemini Developer API** on project `gen-lang-client-0607853535` (auth + endpoint differ) — confirm which the SA is wired for. -4. **Cap enforcement model** — live CP check per call (latency, accuracy) vs injected-budget + periodic sync (low latency, mild staleness)? -5. **Image price source** — static maintained table vs a fetched catalog; who maintains it? +1. **Does the platform proxy already have a non-chat-completions extension point**, or do image routes need new plumbing? (It's LLM/chat-shaped today; images are a different request/response.) +2. **Vertex AI vs Gemini Developer API** on `gen-lang-client-0607853535` (the proxy's Gemini handler auth/endpoint). +3. **Image pricing** — where do the per-vendor/model/size SKUs come from (static maintained table vs fetched)? +4. **BYOK in v1, or defer?** (Proxy-passthrough-key vs not.) +5. **Output** — workspace-file + path (current plan) still right, or also return an inline/preview form? ## 8. Alternatives considered -- **BYOK-only** — simplest, but worse UX and ignores existing platform creds. Rejected (CTO wants platform-metered default). -- **Attribution only, no cap** — rejected (the opus cost-leak class). -- **Route images through the existing LLM proxy** — images aren't chat-completions; a dedicated image service + attribution dimension is cleaner than overloading the LLM proxy. +- **Per-plugin keys + cap + attribution** (rev 1 of this RFC) — rejected: rebuilds billing the proxy already does; caps are unnecessary when credits are the limit. +- **Plugin calls vendors directly (no proxy)** — rejected: scatters keys + billing across workspaces; the proxy centralizes both. -- 2.52.0 From 2ba7da3155084f09ca0447fb132fa0e4d22a3120 Mon Sep 17 00:00:00 2001 From: core-devops Date: Sun, 21 Jun 2026 02:49:59 +0000 Subject: [PATCH 3/6] =?UTF-8?q?docs(rfc):=20rev=203=20=E2=80=94=20fold=20i?= =?UTF-8?q?n=20CTO=20answers=20(price=20x1.5,=20defer=20BYOK,=20return-URL?= =?UTF-8?q?=20output)=20+=20proxy/Vertex=20findings?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Q1: proxy is per-wire-format (add ProxyImages like /v1/responses) + image billing is count/size not tokens + needs image storage->URL. Q2: Vertex NOT available (SA 404 on aiplatform; AI-Studio project) -> recommend Gemini Developer API key. Q5: tool returns a download URL; agent places it; no forced /workspace write. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 120 +++++++++--------- 1 file changed, 58 insertions(+), 62 deletions(-) diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md index ea313183d..10ae0513b 100644 --- a/docs/design/rfc-image-gen-platform-metered.md +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -1,85 +1,81 @@ -# RFC: Image generation via the platform proxy (credits-billed; plugin is a thin adaptor) +# RFC: Image generation via the platform proxy (credits-billed, no caps; plugin = thin adaptor) - **Status:** Draft — for CTO review (do not build until approved) -- **Date:** 2026-06-20 (rev 2 — re-scoped per CTO: no caps, credits-billed, proxy-fronted) -- **Scope:** how image generation is billed + routed. The real work is **extending the platform proxy** with image support; the plugin is a thin adaptor. +- **Date:** 2026-06-20 (rev 3 — folds in CTO answers + proxy/Vertex investigation) +- **Scope:** how image generation is routed + billed. The work is **a new image handler on the platform proxy**; the plugin is a thin adaptor. -## 1. Model (the corrected architecture) +## 1. Model -Image generation is **uncapped** and **consumes the org's platform credits**, exactly like platform-managed LLM usage. It rides the **existing platform proxy** (the CP proxy that already fronts platform-managed model calls and is wired into billing/credits). The proxy is the single source of truth for billed vendor calls. +Image generation is **uncapped** and **consumes the org's platform credits**, like platform-managed LLM. It rides the **existing CP LLM proxy** (`internal/handlers/llm_proxy.go` + `internal/credits` billing + the price catalog + fail-closed + attribution — all already built and tested). ``` -agent → mcp-image-gen (thin MCP adaptor) → platform proxy (CP) → {OpenAI | Gemini} - │ - └─ debits org credits via the billing system +agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | Gemini} + │ │ + │ └─ stores image → returns a download URL + └─ prices (vendor × 1.5) + debits org credits ``` -- **The proxy** holds the vendor keys (OpenAI from Infisical SSOT, Gemini `GCP_SERVICE_ACCOUNT_JSON`), routes by vendor, prices the call, and **debits credits**. Out of credits / over overage-cap → the proxy rejects (402), the same way every other platform-managed call already behaves. **No per-image cap** — the credit balance + `overage_cap_credits` is the limit. -- **The plugin** (`molecule-ai-plugin-image-gen`) is a **thin adaptor**: exposes `generate_image` / `edit_image` MCP tools, forwards them to the proxy's image endpoint using the workspace's already-injected platform auth (same base-url + token the platform injects for platform-managed LLM), and writes the returned image to `/workspace`. It holds **no keys, no metering, no cap logic.** +- **The proxy** holds vendor keys (Infisical SSOT), routes by vendor, prices the call, **debits credits**, **stores the generated image, and returns a download URL**. Out of credits / over `overage_cap_credits` → 402 (same as today). **No per-image cap** — credits are the limit. +- **The plugin** (`molecule-ai-plugin-image-gen`) is a thin adaptor: `generate_image` / `edit_image` MCP tools → call the proxy → **return the download URL** to the agent. It holds no keys, no billing, no storage. -This is deliberately NOT per-plugin machinery — caps, key injection, attribution all live in the proxy/billing system that already exists, so image-gen is just a new billed route, not a new billing subsystem. +## 2. CTO decisions (locked) + findings -## 2. Goals / Non-goals - -**Goals** -- Image gen works for any workspace out of the box, billed to platform credits via the proxy. -- One central place (the proxy) owns vendor keys, routing, pricing, and credit-debit. -- The plugin is a trivial adaptor — vendor-pluggable changes happen in the proxy. -- No new cap/attribution subsystem — reuse credits + the proxy's existing billing path. - -**Non-goals** -- Per-workspace image caps (explicitly dropped — credits are the limit). -- Per-plugin key injection / per-plugin metering (the proxy owns these). -- The plugin's tool schema / output handling (trivial; separately tracked). +- **Caps:** none. Credits are the limit. *(locked)* +- **Pricing (Q3):** **fetched vendor price × 1.5** service fee → debited from credits. Stored as image SKUs in the price catalog; **unpriced model = rejected** (no fail-open $0 — the opus-leak rule). *(locked)* +- **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)* +- **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)* +- **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)* +- **Gemini path (Q2 — CONFIRMED NEGATIVE):** the platform SA (`molecule-provisioner@gen-lang-client-0607853535`) mints a token but **Vertex AI returns 404 (API not enabled on that AI-Studio project)** and the **Gemini Developer API returns 403 (wants an API key, not SA-OAuth)**. So **we do NOT currently have working Vertex.** → **needs a decision** (see §5). ## 3. Design -### 3.1 Proxy: add image routes -- Add image endpoints to the platform proxy (e.g. `POST /v1/images/generations`, `/v1/images/edits`, or a unified `/v1/images` with a `vendor`/`model` param). -- **Vendor routing**: `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini 2.5 Flash Image ("Nano Banana") via the platform GCP SA. Adding a vendor = a new route handler in the proxy. -- **Vendor keys**: held by the proxy, sourced from Infisical SSOT (OpenAI key; Gemini SA). Never leave the proxy. -- **Auth from the plugin**: the workspace's existing platform token (the proxy already authenticates platform-managed calls per workspace/org — reuse it to identify who to bill). +### 3.1 Proxy: new `/v1/images` handler +- `ProxyImages(c)` — mirror the `ProxyOpenAIResponses` precedent. Accept a unified body `{prompt, vendor, model, size, n, image?(for edit)}`. +- **Vendor routing:** `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini "Nano Banana" (`gemini-2.5-flash-image`) via whichever auth §5 resolves. New vendor = new branch. +- **Keys:** held by the proxy, Infisical-sourced; never leave the proxy. +- **Principal:** reuse the proxy's existing per-workspace/org auth to know who to bill. -### 3.2 Billing: credits debit (the cost-leak guard lives here) -- Each image call is **priced** (per vendor/model/size) and **debited from org credits** through the existing billing system (`credits_balance` → `overage_used_credits` up to `overage_cap_credits`). -- **CRITICAL (opus-cost-leak lesson):** image models MUST be in the price catalog. An unpriced model is **rejected**, never passed through at $0 — that fail-open (opus-4-8 missing from `llm_price_catalog`) is exactly what made the June 10 leak invisible. Extend the catalog with image SKUs; block unpriced models from the platform route. -- **Limit = credits**, not a cap: when an org is out of credits / over `overage_cap_credits`, the proxy returns 402 and the plugin surfaces "out of image credits — top up." (Same UX as other platform-managed exhaustion.) -- Attribution comes for free — the proxy already records per-workspace/org spend; image becomes a `service`/`sku` dimension on the existing ledger. +### 3.2 Billing: image SKUs + credits (the cost-leak guard) +- Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`. +- Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger. +- **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images). +- Limit = credits; out → 402 surfaced by the plugin as "out of image credits." -### 3.3 Plugin (thin adaptor) -- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` (npx `@molecule-ai/mcp-image-gen`). -- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:path|url, vendor?, …)`, `list_image_models()`. -- Each tool → `POST {PLATFORM_PROXY_BASE}/v1/images/...` with the platform auth → on success write `/workspace/.molecule/images/-.png`, return the path. On 402 → surface "out of credits." -- No keys, no cap, no metering in the plugin. +### 3.3 Image storage → download URL (new) +- The proxy stores each generated image (object store / signed-URL bucket) and returns a **time-boxed download URL** in the response. +- The agent fetches it (to `/workspace` or anywhere) and decides what to do (send to user, etc.). Retention/expiry of the stored image: open (default e.g. 24h signed URL). -### 3.4 BYOK (optional, likely defer) -Under the proxy model, BYOK = the proxy accepts a caller-supplied vendor key and skips the credit-debit for that call (own cost). Clean to add later; **propose deferring from v1** unless you want it now — the proxy/credits path is the product default and BYOK adds a passthrough-auth path. (Open question.) +### 3.4 Plugin (thin adaptor) +- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` → npx `@molecule-ai/mcp-image-gen`. +- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:url|path, vendor?, …)`, `list_image_models()`. +- Each tool → POST the proxy `/v1/images` with the platform auth → **return `{url, vendor, model, expires_at}`** to the agent. On 402 → "out of credits." No keys/billing/storage in the plugin. ## 4. Where each piece lives -| Piece | Repo | Notes | -|---|---|---| -| image routes, vendor routing, vendor keys, pricing, credit-debit | **CP / the platform proxy** | the bulk of the work; extends existing billing | -| image SKUs in the price catalog | CP | unpriced = rejected (no fail-open) | -| thin MCP adaptor + tools + output | `molecule-ai-plugin-image-gen` | trivial | +| Piece | Repo | +|---|---| +| `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk | +| thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial | -## 5. Security / cost-safety -- Vendor keys live only in the proxy (Infisical-sourced); never in the plugin or workspace env. -- Billed via credits → an org can only spend what it has (+ overage cap) — intrinsic limit, no runaway. -- Unpriced image model ⇒ rejected at the proxy (the explicit anti-opus-leak rule). +## 5. The one open decision (Q2 fallout) +Vertex isn't available as-is. Pick the Gemini path: +- **(A) Gemini Developer API + `GEMINI_API_KEY`** — standard for `gemini-2.5-flash-image`; need to confirm a key exists in Infisical or mint one in `gen-lang-client-0607853535`. **Lowest effort. Recommended.** +- **(B) Enable Vertex AI** on the project + grant the SA `Vertex AI User` → use Vertex via the SA. More infra; only worth it if you specifically want Vertex (quota/region/SLA reasons). -## 6. Rollout -1. Image SKUs + pricing in the catalog (block unpriced). -2. Proxy image routes + OpenAI + Gemini vendor handlers + credit-debit. -3. `@molecule-ai/mcp-image-gen` adaptor + plugin repo + register. -4. Staging e2e: platform image gen debits credits + writes to /workspace; out-of-credits → 402 surfaced; (if BYOK) bypass works. +(OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.) -## 7. Open questions (for your review) -1. **Does the platform proxy already have a non-chat-completions extension point**, or do image routes need new plumbing? (It's LLM/chat-shaped today; images are a different request/response.) -2. **Vertex AI vs Gemini Developer API** on `gen-lang-client-0607853535` (the proxy's Gemini handler auth/endpoint). -3. **Image pricing** — where do the per-vendor/model/size SKUs come from (static maintained table vs fetched)? -4. **BYOK in v1, or defer?** (Proxy-passthrough-key vs not.) -5. **Output** — workspace-file + path (current plan) still right, or also return an inline/preview form? +## 6. Remaining smaller open items +- Image **storage backend** + URL expiry default (24h?). +- Image **price source** to feed the ×1.5 (vendor pricing page → static-maintained vs fetched). +- `n>1` / batch semantics. + +## 7. Rollout +1. Resolve §5 (Gemini path) + confirm/mint the key. +2. Image SKUs + ×1.5 pricing in the catalog (block unpriced). +3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL. +4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register. +5. Staging e2e: gen debits credits + returns a working URL; out-of-credits → 402; edit works. ## 8. Alternatives considered -- **Per-plugin keys + cap + attribution** (rev 1 of this RFC) — rejected: rebuilds billing the proxy already does; caps are unnecessary when credits are the limit. -- **Plugin calls vendors directly (no proxy)** — rejected: scatters keys + billing across workspaces; the proxy centralizes both. +- Per-plugin keys/cap/attribution (rev 1) — rejected; rebuilds what the proxy/credits already do. +- Plugin writes to `/workspace` (rev 2) — superseded by Q5: return a URL, let the agent place it. +- Plugin calls vendors directly — rejected; scatters keys + billing. -- 2.52.0 From 9463f074f8a16af89099d2ba8e80a5e563bf04df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?= Date: Sun, 21 Jun 2026 03:24:05 +0000 Subject: [PATCH 4/6] RFC image-gen: Q2 resolved + verified live (Vertex gemini-2.5-flash-image 200, 1290 tok/image); per-vendor billing-unit note Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md index 10ae0513b..3d0e47b9c 100644 --- a/docs/design/rfc-image-gen-platform-metered.md +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -25,7 +25,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | - **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)* - **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)* - **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)* -- **Gemini path (Q2 — CONFIRMED NEGATIVE):** the platform SA (`molecule-provisioner@gen-lang-client-0607853535`) mints a token but **Vertex AI returns 404 (API not enabled on that AI-Studio project)** and the **Gemini Developer API returns 403 (wants an API key, not SA-OAuth)**. So **we do NOT currently have working Vertex.** → **needs a decision** (see §5). +- **Gemini path (Q2 — RESOLVED 2026-06-20):** enable **Vertex AI / "Gemini Enterprise Agent Platform"** (`aiplatform.googleapis.com`) on project **`molecules-ai-proxy`** (the billed proxy project, NOT the AI-Studio `gen-lang-client-*` project) + grant a dedicated SA the **Agent Platform User** role (`roles/aiplatform.user` — note the rebrand; the old "Vertex AI User" title is gone). The proxy authenticates with an **SA JSON key** stored in Infisical (`GCP_VERTEX_SA_JSON`). *(locked)* + - **Org-policy note:** the org enforces **`iam.disableServiceAccountKeyCreation`** AND disallows API keys (secure-by-default), so both simple credential paths are blocked org-wide. v1 uses a **project-scoped exception** to that constraint for `molecules-ai-proxy` only (one long-lived key, in Infisical). **Hardening follow-up:** migrate to **Workload Identity Federation** (no key) or run the Gemini-calling path on a **GCP-attached SA** (Cloud Run, ADC) once the proxy has an OIDC token source — both keep the org policy intact. Tracked as a post-v1 item. ## 3. Design @@ -37,6 +38,7 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | ### 3.2 Billing: image SKUs + credits (the cost-leak guard) - Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`. +- **Billing unit differs per vendor (finding):** OpenAI GPT Image 2 bills **count/size** (per-image SKU); **Gemini-2.5-flash-image on Vertex bills token-based — 1290 tokens per generated image** (Vertex meters it as a `generateContent` call). So the Gemini branch slots into the existing **token-billing** path (the same machinery as text), while OpenAI needs the new count/size SKU. Both apply the **×1.5** service fee. - Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger. - **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images). - Limit = credits; out → 402 surfaced by the plugin as "out of image credits." @@ -56,10 +58,10 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | | `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk | | thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial | -## 5. The one open decision (Q2 fallout) -Vertex isn't available as-is. Pick the Gemini path: -- **(A) Gemini Developer API + `GEMINI_API_KEY`** — standard for `gemini-2.5-flash-image`; need to confirm a key exists in Infisical or mint one in `gen-lang-client-0607853535`. **Lowest effort. Recommended.** -- **(B) Enable Vertex AI** on the project + grant the SA `Vertex AI User` → use Vertex via the SA. More infra; only worth it if you specifically want Vertex (quota/region/SLA reasons). +## 5. Q2 resolved + VERIFIED LIVE — Gemini path (was open) +**Decision (2026-06-20):** Vertex on `molecules-ai-proxy` via a dedicated SA (`vertex-ai-user@molecules-ai-proxy`) with **Agent Platform User** (`roles/aiplatform.user`), authenticated by an **SA JSON key in Infisical** under a **project-scoped exception** to `iam.disableServiceAccountKeyCreation`. Rejected at this stage: Gemini Developer API + API key (org disallows API keys). Hardening follow-up (WIF / GCP-attached SA) noted in §2. + +**Verified live 2026-06-20** — real call, SA key → `cloud-platform` scoped OAuth token → `POST .../locations/global/publishers/google/models/gemini-2.5-flash-image:generateContent` with `generationConfig.responseModalities:["IMAGE"]` → **HTTP 200**, returned an `inlineData` `image/png` part. `usageMetadata`: `promptTokenCount=13`, **`candidatesTokenCount=1290` (IMAGE modality)**, `totalTokenCount=1303`, `trafficType=ON_DEMAND`. Endpoint host: `aiplatform.googleapis.com` (location `global`). (OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.) @@ -69,7 +71,7 @@ Vertex isn't available as-is. Pick the Gemini path: - `n>1` / batch semantics. ## 7. Rollout -1. Resolve §5 (Gemini path) + confirm/mint the key. +1. ~~Resolve §5 (Gemini path) + confirm/mint the key.~~ **DONE + verified live 2026-06-20.** Remaining: move the SA key into Infisical SSOT (`GCP_VERTEX_SA_JSON`) so the Railway proxy can read it (currently only on the operator/local). 2. Image SKUs + ×1.5 pricing in the catalog (block unpriced). 3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL. 4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register. -- 2.52.0 From 9accd8d5aa90cc96f21b5507c8a05a75a1f8ad40 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?= Date: Sun, 21 Jun 2026 03:55:17 +0000 Subject: [PATCH 5/6] =?UTF-8?q?RFC=20image-gen:=20Q2=20FINAL=20=E2=80=94?= =?UTF-8?q?=20reuse=20existing=20molecule-vertex=20keyless=20WIF=20(retire?= =?UTF-8?q?=20molecules-ai-proxy=20SA-key=20detour);=20reflects=20CP=20#88?= =?UTF-8?q?0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md index 3d0e47b9c..88047c324 100644 --- a/docs/design/rfc-image-gen-platform-metered.md +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -25,8 +25,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | - **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)* - **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)* - **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)* -- **Gemini path (Q2 — RESOLVED 2026-06-20):** enable **Vertex AI / "Gemini Enterprise Agent Platform"** (`aiplatform.googleapis.com`) on project **`molecules-ai-proxy`** (the billed proxy project, NOT the AI-Studio `gen-lang-client-*` project) + grant a dedicated SA the **Agent Platform User** role (`roles/aiplatform.user` — note the rebrand; the old "Vertex AI User" title is gone). The proxy authenticates with an **SA JSON key** stored in Infisical (`GCP_VERTEX_SA_JSON`). *(locked)* - - **Org-policy note:** the org enforces **`iam.disableServiceAccountKeyCreation`** AND disallows API keys (secure-by-default), so both simple credential paths are blocked org-wide. v1 uses a **project-scoped exception** to that constraint for `molecules-ai-proxy` only (one long-lived key, in Infisical). **Hardening follow-up:** migrate to **Workload Identity Federation** (no key) or run the Gemini-calling path on a **GCP-attached SA** (Cloud Run, ADC) once the proxy has an OIDC token source — both keep the org policy intact. Tracked as a post-v1 item. +- **Gemini path (Q2 — RESOLVED 2026-06-20, REVISED after code review):** the proxy **already** serves platform Gemini via Vertex on project **`molecule-vertex`** using a **keyless AWS→GCP Workload Identity Federation** mint (`internal/vertexauth.Token`, SA `molecule-vertex-adc@molecule-vertex`). Image gen **reuses that exact path** — no new credential, no new project. Image calls hit the **native `:generateContent`** endpoint (`responseModalities:["IMAGE"]`) at location `global`; text uses the OpenAI-compat surface. *(locked + built — CP #880)* + - **Detour retired:** an earlier revision set up an SA **key** on a separate `molecules-ai-proxy` project (the org blocks `iam.disableServiceAccountKeyCreation` + API keys, so it needed a scoped policy exception). That was redundant — the codebase already does keyless WIF, which IS the hardening target. The SA key, the Infisical secret, and the policy exception are being removed; the SA/exception cleanup is a GCP-console action for the owner. ## 3. Design @@ -58,10 +58,10 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | | `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk | | thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial | -## 5. Q2 resolved + VERIFIED LIVE — Gemini path (was open) -**Decision (2026-06-20):** Vertex on `molecules-ai-proxy` via a dedicated SA (`vertex-ai-user@molecules-ai-proxy`) with **Agent Platform User** (`roles/aiplatform.user`), authenticated by an **SA JSON key in Infisical** under a **project-scoped exception** to `iam.disableServiceAccountKeyCreation`. Rejected at this stage: Gemini Developer API + API key (org disallows API keys). Hardening follow-up (WIF / GCP-attached SA) noted in §2. +## 5. Q2 resolved — Gemini path (was open) +**Final decision (2026-06-20):** reuse the **existing keyless `molecule-vertex` WIF path** the proxy already uses for Gemini text (`internal/vertexauth.Token`). Image gen targets the native `gemini-2.5-flash-image:generateContent` endpoint at location `global`. **Zero new credentials.** Built in CP #880. -**Verified live 2026-06-20** — real call, SA key → `cloud-platform` scoped OAuth token → `POST .../locations/global/publishers/google/models/gemini-2.5-flash-image:generateContent` with `generationConfig.responseModalities:["IMAGE"]` → **HTTP 200**, returned an `inlineData` `image/png` part. `usageMetadata`: `promptTokenCount=13`, **`candidatesTokenCount=1290` (IMAGE modality)**, `totalTokenCount=1303`, `trafficType=ON_DEMAND`. Endpoint host: `aiplatform.googleapis.com` (location `global`). +**Verified twice:** (a) the model + request/response shape was proven live 2026-06-20 — `:generateContent` with `responseModalities:["IMAGE"]` → HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). (b) The WIF path itself is the same one already serving Gemini text in prod. One deploy-time check remains: that `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — confirmed by the staging e2e (the WIF mint is AWS-identity-bound, not locally exercisable). (OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.) @@ -71,7 +71,8 @@ agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | - `n>1` / batch semantics. ## 7. Rollout -1. ~~Resolve §5 (Gemini path) + confirm/mint the key.~~ **DONE + verified live 2026-06-20.** Remaining: move the SA key into Infisical SSOT (`GCP_VERTEX_SA_JSON`) so the Railway proxy can read it (currently only on the operator/local). +1. ~~Resolve §5 (Gemini path).~~ **DONE** — reuse existing `molecule-vertex` WIF; no credential work. (Detour SA key/secret/policy-exception being removed.) +1b. **CP #880** (proxy handler + image SKUs + storage→URL + billing) — open, in review. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) are set. 2. Image SKUs + ×1.5 pricing in the catalog (block unpriced). 3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL. 4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register. -- 2.52.0 From 3c83d5da7cd35df7e063919a2c7668dcb3ce0a0c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20core-devops?= Date: Sun, 21 Jun 2026 04:16:34 +0000 Subject: [PATCH 6/6] RFC rev4: re-scope to a GENERIC two-tier plugin proxy socket (capabilities = data); image gen = first Tier-A consumer; supersedes bespoke ProxyImages Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/design/rfc-image-gen-platform-metered.md | 167 +++++++++++------- 1 file changed, 104 insertions(+), 63 deletions(-) diff --git a/docs/design/rfc-image-gen-platform-metered.md b/docs/design/rfc-image-gen-platform-metered.md index 88047c324..ce6df7d1a 100644 --- a/docs/design/rfc-image-gen-platform-metered.md +++ b/docs/design/rfc-image-gen-platform-metered.md @@ -1,84 +1,125 @@ -# RFC: Image generation via the platform proxy (credits-billed, no caps; plugin = thin adaptor) +# RFC: Plugin proxy socket — a generic metered egress primitive (two-tier registry); image generation = first consumer -- **Status:** Draft — for CTO review (do not build until approved) -- **Date:** 2026-06-20 (rev 3 — folds in CTO answers + proxy/Vertex investigation) -- **Scope:** how image generation is routed + billed. The work is **a new image handler on the platform proxy**; the plugin is a thin adaptor. +- **Status:** Draft — for CTO review (do not build the generic socket until the two-tier shape is signed off) +- **Date:** 2026-06-20 (rev 4 — re-scoped from a bespoke image handler to a generic socket after design review) +- **Supersedes:** the bespoke `ProxyImages` approach in **CP #880** (that PR is gutted down to the migration + the socket's first capability entry — see §10). -## 1. Model +## 1. Why this changed shape -Image generation is **uncapped** and **consumes the org's platform credits**, like platform-managed LLM. It rides the **existing CP LLM proxy** (`internal/handlers/llm_proxy.go` + `internal/credits` billing + the price catalog + fail-closed + attribution — all already built and tested). +The first cut added a bespoke `ProxyImages` handler to the LLM proxy. That's an **addon**: do it again for video, TTS, embeddings, rerank, and the proxy becomes a junk drawer of per-capability handlers — which defeats the point of the plugin system. If we touch core, it should be a **fundamental, generic primitive built once**, not another special case. + +The codebase is already moving this way: `providers.yaml` (internal#718 / vertex-provider-ssot-endpoint) pulled routing facts (upstream URL, auth mode, wire prefix) **out of hardcoded Go into a registry**, and its own comment says *"Phase 2 migrates the remaining static providers."* This RFC is that endgame: a **generic, registry-driven metered egress socket** that any plugin uses, where adding a capability is **data, not code**. + +## 2. The primitive: one metered egress socket + +Core exposes ONE generic path. A plugin calls it; everything sensitive resolves server-side; the plugin only ever receives what is safe to hand out. ``` -agent → mcp-image-gen (thin MCP adaptor) → CP proxy /v1/images → {OpenAI | Gemini} - │ │ - │ └─ stores image → returns a download URL - └─ prices (vendor × 1.5) + debits org credits +plugin ──> POST /internal/llm/proxy {capability/model, request} + │ CORE (generic — built once): + │ 1. AUTH: workspace↔org handshake ← already exists + │ 2. RESOLVE capability from the REGISTRY ← providers.yaml SSOT + │ 3. INJECT vendor credential server-side ← key-env | wif (registry auth_mode) + │ 4. FORWARD to the registry-declared upstream + │ 5. METER usage via registry-declared paths → debit credits (existing) + │ 6. RETURN per registry response_mode, stripped of anything unsafe + └──> {safe output} (never a key; never raw upstream internals) ``` -- **The proxy** holds vendor keys (Infisical SSOT), routes by vendor, prices the call, **debits credits**, **stores the generated image, and returns a download URL**. Out of credits / over `overage_cap_credits` → 402 (same as today). **No per-image cap** — credits are the limit. -- **The plugin** (`molecule-ai-plugin-image-gen`) is a thin adaptor: `generate_image` / `edit_image` MCP tools → call the proxy → **return the download URL** to the agent. It holds no keys, no billing, no storage. +### 2.1 Trust model (why the box never holds a vendor key) +The plugin runs on the tenant's workspace box, where the tenant has **root**. The box holds an **org-scoped credential** (the org/admin token + workspace id, read from workspace env) and uses it to handshake with CP (`resolveLLMProxyPrincipal` + `workspaceBelongsToOrg`, already in core). The box does **NOT** hold the vendor key. -## 2. CTO decisions (locked) + findings +Blast radius of a compromised box is therefore **asymmetric and bounded**: +- attacker can spend **that one org's** credits (≤ balance + overage cap) — the tenant's own loss; +- the platform's **master vendor keys stay in CP**, and **every other org is untouched** — no unmetered global abuse, no key exfiltration. -- **Caps:** none. Credits are the limit. *(locked)* -- **Pricing (Q3):** **fetched vendor price × 1.5** service fee → debited from credits. Stored as image SKUs in the price catalog; **unpriced model = rejected** (no fail-open $0 — the opus-leak rule). *(locked)* -- **BYOK (Q4):** **deferred** from v1. v1 is proxy/credits only. *(locked)* -- **Output (Q5):** the tool **returns a download URL**. The agent downloads it wherever it wants and decides whether to send it to the user. The plugin does NOT force a `/workspace` write — it just gives the agent the generate-from-any-vendor ability. *(locked)* -- **Proxy shape (Q1 — investigated):** the proxy is **per-wire-format** (`ProxyOpenAIChatCompletions`, `ProxyAnthropicMessages`, `ProxyOpenAIResponses`). Adding images is the **same pattern used to add `/v1/responses` for Codex** — a new `ProxyImages` handler — but it's genuinely new plumbing: image billing is **count/size-based, not tokens**, and it needs **image storage → URL**. *(finding)* -- **Gemini path (Q2 — RESOLVED 2026-06-20, REVISED after code review):** the proxy **already** serves platform Gemini via Vertex on project **`molecule-vertex`** using a **keyless AWS→GCP Workload Identity Federation** mint (`internal/vertexauth.Token`, SA `molecule-vertex-adc@molecule-vertex`). Image gen **reuses that exact path** — no new credential, no new project. Image calls hit the **native `:generateContent`** endpoint (`responseModalities:["IMAGE"]`) at location `global`; text uses the OpenAI-compat surface. *(locked + built — CP #880)* - - **Detour retired:** an earlier revision set up an SA **key** on a separate `molecules-ai-proxy` project (the org blocks `iam.disableServiceAccountKeyCreation` + API keys, so it needed a scoped policy exception). That was redundant — the codebase already does keyless WIF, which IS the hardening target. The SA key, the Infisical secret, and the policy exception are being removed; the SA/exception cleanup is a GCP-console action for the owner. +That asymmetry is the whole reason keys + billing live in CP and only the handshake lives on the box. (Hardening seam, not v1: make the box token *workspace*-scoped rather than org-admin to shrink the radius further.) -## 3. Design +## 3. Capabilities are data (the registry) -### 3.1 Proxy: new `/v1/images` handler -- `ProxyImages(c)` — mirror the `ProxyOpenAIResponses` precedent. Accept a unified body `{prompt, vendor, model, size, n, image?(for edit)}`. -- **Vendor routing:** `openai` → OpenAI Images API (GPT Image 2); `gemini` → Gemini "Nano Banana" (`gemini-2.5-flash-image`) via whichever auth §5 resolves. New vendor = new branch. -- **Keys:** held by the proxy, Infisical-sourced; never leave the proxy. -- **Principal:** reuse the proxy's existing per-workspace/org auth to know who to bill. +A capability is a registry entry (extends `providers.yaml`, the existing SSOT). It declares everything the generic socket needs — no per-capability Go: -### 3.2 Billing: image SKUs + credits (the cost-leak guard) -- Extend the price catalog with **image SKUs** (per vendor/model/size). `est_cost = fetched_vendor_price × 1.5`. -- **Billing unit differs per vendor (finding):** OpenAI GPT Image 2 bills **count/size** (per-image SKU); **Gemini-2.5-flash-image on Vertex bills token-based — 1290 tokens per generated image** (Vertex meters it as a `generateContent` call). So the Gemini branch slots into the existing **token-billing** path (the same machinery as text), while OpenAI needs the new count/size SKU. Both apply the **×1.5** service fee. -- Debit org credits through the existing `internal/credits` path (`credits_balance` → overage up to `overage_cap_credits`). Reuse the existing fail-closed + attribution machinery — image is a new `service`/`sku` dimension on the ledger. -- **Unpriced image model ⇒ rejected** at the proxy (the explicit anti-opus-leak rule; the `llm_price_miss` guard already exists for tokens — extend to images). -- Limit = credits; out → 402 surfaced by the plugin as "out of image credits." +```yaml +- name: gemini-image + capability: image + tier: platform_metered # A (our keys/credits) | byok (their key) + upstream: vertex # reuses existing auth_mode: wif_adc (keyless WIF mint) + endpoint: ":generateContent" # native image surface (vs the openapi text surface) + billing_model: gemini-2.5-flash-image # → llm_price_catalog row + usage: # declarative extraction — no parse func + input_path: usageMetadata.promptTokenCount + output_path: usageMetadata.candidatesTokenCount + response_mode: blob # see §5; json | blob +``` -### 3.3 Image storage → download URL (new) -- The proxy stores each generated image (object store / signed-URL bucket) and returns a **time-boxed download URL** in the response. -- The agent fetches it (to `/workspace` or anywhere) and decides what to do (send to user, etc.). Retention/expiry of the stored image: open (default e.g. 24h signed URL). +Adding image, video, TTS, embeddings → **a registry entry + a price row. Zero new handler.** -### 3.4 Plugin (thin adaptor) -- `molecule-ai-plugin-image-gen` (mirrors `molecule-ai-plugin-molecule-platform-mcp`): `plugin.yaml` + `settings-fragment.json` → npx `@molecule-ai/mcp-image-gen`. -- Tools: `generate_image(prompt, vendor?, model?, size?, n?)`, `edit_image(prompt, image:url|path, vendor?, …)`, `list_image_models()`. -- Each tool → POST the proxy `/v1/images` with the platform auth → **return `{url, vendor, model, expires_at}`** to the agent. On 402 → "out of credits." No keys/billing/storage in the plugin. +## 4. Two-tier registry (the marketplace split) -## 4. Where each piece lives -| Piece | Repo | -|---|---| -| `ProxyImages` handler, vendor routing, keys, image SKUs (×1.5), credit-debit, **image storage→URL** | **CP** (`molecule-controlplane`) — the bulk | -| thin MCP adaptor + tools (return URL) | `molecule-ai-plugin-image-gen` — trivial | +A capability entry binds **{ upstream · which credential to inject · the price }**. Whether an entry can be **third-party-dynamic** depends entirely on *whose credential*: -## 5. Q2 resolved — Gemini path (was open) -**Final decision (2026-06-20):** reuse the **existing keyless `molecule-vertex` WIF path** the proxy already uses for Gemini text (`internal/vertexauth.Token`). Image gen targets the native `gemini-2.5-flash-image:generateContent` endpoint at location `global`. **Zero new credentials.** Built in CP #880. +### Tier A — platform-metered (our keys, our credits): **platform-curated, NOT freely dynamic** +A free-for-all here is catastrophic: a plugin could declare *"forward to evil.com, inject the platform OpenAI key"* (key exfiltration), *"price = 0"* (billing bypass), or point egress anywhere (SSRF). So entries that spend **our** money with **our** keys — image gen included — are **platform-controlled**: a registry/DB row added through a **vetted onboarding / reviewed change**, never self-served. (Curated ≠ hardcoded-in-Go: it's a trusted config row, but the trust decision is ours.) -**Verified twice:** (a) the model + request/response shape was proven live 2026-06-20 — `:generateContent` with `responseModalities:["IMAGE"]` → HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). (b) The WIF path itself is the same one already serving Gemini text in prod. One deploy-time check remains: that `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — confirmed by the staging e2e (the WIF mint is AWS-identity-bound, not locally exercisable). +### Tier B — BYOK (the plugin brings its own key): **dynamically self-registrable** +Nothing of ours is at stake — the third party's credential, their cost. So a third-party plugin **can register its own capability dynamically.** CP still proxies it (egress control + observability) but injects the **plugin's** key and **does not debit platform credits**. -(OpenAI GPT Image 2 is unaffected — proxy uses the platform OpenAI key from Infisical.) +The **marketplace scales through Tier B** (dynamic, self-served, ~10K plugins/day — see the marketplace RFC `project_marketplace_private_template_delivery`); **Tier A** stays a small curated set of platform-subsidized capabilities. One socket serves both; the only differences are *whose credential is injected* and *whether platform credits are debited*. (Tier B is a designed-in seam in v1, not necessarily shipped day one.) -## 6. Remaining smaller open items -- Image **storage backend** + URL expiry default (24h?). -- Image **price source** to feed the ×1.5 (vendor pricing page → static-maintained vs fetched). -- `n>1` / batch semantics. +## 5. The only genuinely-new core primitives (built once, reused forever) -## 7. Rollout -1. ~~Resolve §5 (Gemini path).~~ **DONE** — reuse existing `molecule-vertex` WIF; no credential work. (Detour SA key/secret/policy-exception being removed.) -1b. **CP #880** (proxy handler + image SKUs + storage→URL + billing) — open, in review. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) are set. -2. Image SKUs + ×1.5 pricing in the catalog (block unpriced). -3. `ProxyImages` handler + OpenAI + Gemini routing + credit-debit + image storage→URL. -4. `@molecule-ai/mcp-image-gen` thin adaptor + plugin repo + register. -5. Staging e2e: gen debits credits + returns a working URL; out-of-credits → 402; edit works. +1. **Declarative usage extraction** — a registry-declared JSON path per token bucket (`input_path`, `output_path`, `cached_path`, …). Retires the `parseOpenAIUsage` / `parseAnthropicUsage` / `parseOpenAIResponsesUsage` sprawl; a new vendor's metering becomes config, not Go. +2. **A small fixed set of response modes** — how the socket returns the upstream result safely: + - `json` — passthrough the (sanitized) JSON (text/chat/responses/embeddings). + - `blob` — the upstream returns binary (image/audio). Two sub-modes: + - `blob_url` — CP stores the bytes (R2) and returns a time-boxed **presigned URL** (uniform across vendors; the agent just gets a link). + - `blob_passthrough` — CP returns the bytes to the plugin; the plugin writes them into the workspace. Keeps core thinnest; output is a workspace file, not a hosted URL. + - **Open decision (D1):** default response_mode for images — `blob_url` (uniform URL, +R2 in core) vs `blob_passthrough` (thinnest core, file path out). Recommend `blob_url` for a clean agent UX, behind a per-capability flag so a capability can choose. -## 8. Alternatives considered -- Per-plugin keys/cap/attribution (rev 1) — rejected; rebuilds what the proxy/credits already do. -- Plugin writes to `/workspace` (rev 2) — superseded by Q5: return a URL, let the agent place it. -- Plugin calls vendors directly — rejected; scatters keys + billing. +Everything else (auth, credential injection by `auth_mode`, forwarding, the credits debit, the fail-closed price gate) **already exists** — the socket wires the existing pieces generically. + +## 6. Image generation = first Tier-A consumer (the concrete instance) + +Image gen proves the primitive. As registry entries (Tier A, our keys/credits): + +- **`google/gemini-2.5-flash-image`** ("Nano Banana") — `upstream: vertex`, reuses the **existing keyless `molecule-vertex` WIF mint** (`internal/vertexauth.Token`) the proxy already uses for Gemini text. Native `:generateContent`, `responseModalities:["IMAGE"]`, location `global`. Verified live 2026-06-20: HTTP 200, `inlineData image/png`, `usageMetadata.candidatesTokenCount=1290` (the 1290 tok/image basis). **Zero new credentials.** +- **`openai/gpt-image-2`** — `upstream: openai`, platform OpenAI key (Infisical). Via the OpenAI image surface (Images API, or the Responses API image tool the proxy already proxies — chosen at build time). +- **Pricing (migration):** image SKUs in `llm_price_catalog` at **vendor list × 1.5** (markup baked into the row). Both vendors meter token-based (Gemini 1290 tok/image; gpt-image-2 token-based), so the existing per-token columns fit with no schema change. Unpriced image model → **422 pre-serve** (the anti-$0-leak gate). Anti-free-serve: if a vendor omits usage, synthesize the known output-token count so the debit always fires. +- **Output:** per §5 response_mode (D1). + +## 7. Billing (unchanged path) +Reuses `recordProxiedLLMUsage` → `ChargeLLMUsage` → `DebitWithOverage`: meter (declarative) → price-catalog lookup → debit org credits → overage up to cap → 402 when exhausted. Image gen is **uncapped**; credits are the only limit. Tier B (BYOK) records non-billable usage (observability) and debits nothing. + +## 8. What lives where (footprint) +| Piece | Where | Size | +|---|---|---| +| Generic socket (auth→resolve→inject→forward→meter→respond) | **CP** core | built once | +| Declarative usage extraction + response modes (json/blob) | **CP** core | built once | +| Each capability (image, video, …) | **registry entry + price row** | data | +| The plugin (`molecule-ai-plugin-image-gen` etc.) | plugin repo | thin: call socket → hand result to agent | + +## 9. The plugin (thin, unchanged in spirit) +`molecule-ai-plugin-image-gen` — `plugin.yaml` + `settings-fragment.json` + a small MCP adaptor exposing `generate_image` / `edit_image` / `list_image_models`. Each tool reads workspace env (org/workspace id + handshake token), POSTs the socket, and returns the result (URL or file path per D1). No keys, no billing, no storage, no vendor-specific logic. + +## 10. Relationship to CP #880 +CP #880 (bespoke `ProxyImages` + R2 wiring + per-vendor parsers + tests) is **superseded**. Keep from it: **migration 055** (image price rows) and the verified vendor request/response shapes (they become the `gemini-image` / `gpt-image-2` registry entries + the `blob` response_mode). Drop: the bespoke handler, the hardcoded per-vendor parse funcs, the standalone storage wiring (folds into `response_mode: blob_url`). Net: #880 shrinks to the migration; the rest re-lands as the generic socket. + +## 11. Rollout +1. **Generic socket alongside the existing text handlers** (do NOT converge text in the same change — don't destabilize the live text path). New capabilities route through the socket; chat/completions/messages/responses keep working as-is. +2. Declarative usage extraction + response modes (`json`, `blob`). +3. Tier-A image capability entries (`gemini-image`, `gpt-image-2`) + price rows. Inert until `MOLECULE_IMAGE_GEN_BUCKET` (+ R2 creds) set, if `blob_url`. +4. Thin `molecule-ai-plugin-image-gen`. +5. Staging e2e: image gen debits credits + returns output; out-of-credits → 402; unpriced → 422; (edit works on gemini). +6. **Follow-ups (designed-in, not v1):** Tier-B BYOK dynamic registration; converge the text handlers onto the socket (internal#718 Phase 2); workspace-scoped box token. + +## 12. Open decisions +- **D1 (§5):** default image `response_mode` — `blob_url` (recommended) vs `blob_passthrough`. +- **D2 (§11.1):** confirm "socket alongside, converge text later" (recommended) vs converge text now. +- **D3 (§6):** OpenAI image via Images API vs the already-proxied Responses image tool — pick at build (favor the one needing least new egress surface). +- **D4 (deploy):** confirm `molecule-vertex` has `gemini-2.5-flash-image` enabled (same API surface as the gemini-2.5-pro/flash it already serves) — proven by staging e2e; the WIF mint is AWS-identity-bound, not locally exercisable. + +## 13. Alternatives considered +- **Bespoke per-capability handlers** (`ProxyImages`, future `ProxyVideo`, …) — rejected: addon sprawl, defeats the plugin system. (This RFC's whole motivation.) +- **Plugin calls the vendor directly** — rejected: vendor keys on a root-accessible tenant box = the keyless-Vertex billing leak the codebase already closed; self-reported usage is forgeable. +- **Separate SA key on `molecules-ai-proxy`** (rev 3) — rejected/retired: the proxy already does keyless `molecule-vertex` WIF; the SA key + org-policy exception were a redundant detour (Infisical secret deleted; owner GCP-console cleanup pending). +- **Single platform god-token for all capabilities** — rejected: no per-seller isolation/entitlement; conflicts with the marketplace RFC. Hence the two-tier split. -- 2.52.0