External-runtime workspaces (registered via molecule connect, behind
NAT, no public callback URL) currently see HTTP 422 "workspace has no
callback URL" on every chat file upload. The only escape is to wrap the
laptop in ngrok / Cloudflare tunnel + re-register push-mode — a tax
that shouldn't exist for a one-line use case.
This phase introduces the platform-side staging layer that lets
canvas → external workspace uploads ride the same poll loop the inbox
already uses for text messages.
Architecture (mirrors inbox poll, SSOT principle):
Canvas POST /chat/uploads (multipart)
↓ delivery_mode=poll
Platform: chat_files.uploadPollMode
↓ pendinguploads.Storage.Put + LogActivity(chat_upload_receive)
Workspace's existing inbox poller picks up the activity row (Phase 2)
Workspace fetches: GET /workspaces/:id/pending-uploads/:fid/content
Workspace acks: POST /workspaces/:id/pending-uploads/:fid/ack
Pieces in this PR:
* Migration 20260505100000 — pending_uploads table; partial indexes
on unacked + expires_at for the workspace fetch + Phase 3 sweep
hot paths. No FK to workspaces (audit retention), 24h hard TTL.
* internal/pendinguploads — Storage interface + Postgres impl. Bytes
inline (bytea) today; the interface lets a future PR replace with
S3 (RFC #2789) by swapping one constructor. 100% test coverage on
the Postgres impl via sqlmock-pinned SQL.
* handlers.PendingUploadsHandler — GET /content + POST /ack endpoints.
wsAuth-gated; cross-workspace bleed protection via per-row
workspace_id check (token leak from A can't read B's pending bytes).
Handler tests pin happy path + every 4xx/5xx mapping including
cross-workspace + race-with-sweep.
* chat_files.go — Upload poll-mode branch behind WithPendingUploads
builder. Push-mode unchanged (regression-tested). Multipart parse
+ per-file sanitize + storage.Put + activity_logs row per file.
* SanitizeFilename — Go mirror of workspace/internal_chat_uploads.py
sanitize_filename. Tests pin parity case-by-case so canvas-emitted
URIs stay identical regardless of which path handles the upload.
* Comprehensive logging — every state transition (staged, fetch,
ack, error) emits a structured log line with workspace_id +
file_id + size + sanitized name. Phase 3 metrics will hook these.
The pendinguploads.Storage wiring is opt-in (WithPendingUploads on
ChatFilesHandler) so a binary deployed without the migration keeps the
pre-existing 422 behavior — no boot-order coupling between code roll
and schema roll.
Phase 2 (separate PR): workspace inbox extension — inbox_uploads.py
fetches via the GET endpoint, writes to /workspace/.molecule/chat-
uploads/, acks, and rewrites the URI from platform-pending: → workspace:
so the agent's existing send-attachments path needs no changes.
Phase 3: GC sweep + dashboards. Phase 4: poll-mode E2E on staging.
Tests:
* 100% coverage on pendinguploads (sqlmock-pinned SQL drift gate).
* Functional 100% on new handler code (uncovered branches are
documented defensive duplicates: uuid re-parse, multipart Open
error, Writer.Write fail — none reproducible in unit tests).
* Push-mode + NULL delivery_mode regression tests pin no behavior
change for existing workspaces.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
104 lines
5.1 KiB
SQL
104 lines
5.1 KiB
SQL
-- 20260505100000_pending_uploads.up.sql
|
|
--
|
|
-- RFC: poll-mode chat upload (counterpart to delivery_mode='poll' messaging).
|
|
--
|
|
-- Today, chat_files.go's Upload handler refuses delivery_mode != 'push'
|
|
-- with HTTP 422 "workspace has no callback URL" — external runtime
|
|
-- workspaces (laptop / behind NAT) cannot receive file attachments at all.
|
|
-- The only escape was "register with ngrok / Cloudflare tunnel + push
|
|
-- mode," which forces every external operator into infra plumbing they
|
|
-- shouldn't need.
|
|
--
|
|
-- This table is the platform-side staging layer that lets canvas → external
|
|
-- workspace file uploads ride the same poll loop the inbox already uses for
|
|
-- text messages:
|
|
--
|
|
-- 1. Canvas POSTs multipart to workspace-server.
|
|
-- 2. workspace-server parses multipart, stores each file as one
|
|
-- pending_uploads row, AND inserts a matching activity_logs row
|
|
-- (type='chat_upload_receive', request_body={file_id, filename, ...}).
|
|
-- 3. Workspace's existing inbox poller picks up the activity row.
|
|
-- 4. Workspace fetches bytes via GET /workspaces/:id/pending-uploads/:fid/content,
|
|
-- writes to /workspace/.molecule/chat-uploads/, ACKs via POST.
|
|
-- 5. Sweep cron deletes rows past expires_at OR acked_at + N hours.
|
|
--
|
|
-- Why a separate table and not bytea-on-activity_logs:
|
|
--
|
|
-- * activity_logs is text/JSON-shaped today; mixing 25 MB binary blobs
|
|
-- into request_body inflates every JOIN, every since_id scan, every
|
|
-- pgdump. The bytes need their own home.
|
|
-- * Lifecycle differs: activity_logs is durable audit history (90d+);
|
|
-- pending_uploads is transient buffer (24h default) that GCs hard.
|
|
-- Keeping them split lets each table's retention policy run
|
|
-- independently.
|
|
-- * A future PR (RFC #2789) will migrate the bytes column to S3 keys
|
|
-- without touching the activity_logs schema or the metadata columns
|
|
-- here. That migration is one ALTER + one backfill rather than a
|
|
-- cross-table rewrite.
|
|
--
|
|
-- No FK to workspaces:
|
|
-- workspace delete should NOT cascade-purge pending_uploads — those
|
|
-- rows are evidence-of-receipt and should expire on their own TTL.
|
|
-- Same posture as tenant_resources (PR #2343) and delegations (PR #2829).
|
|
|
|
CREATE TABLE IF NOT EXISTS pending_uploads (
|
|
-- Server-generated so the canvas can include the URI in the chat
|
|
-- message it sends right after the upload POST. Workspace fetches
|
|
-- by this id, no name collisions across workspaces.
|
|
file_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
|
|
-- Target workspace. NOT a FK (see header).
|
|
workspace_id uuid NOT NULL,
|
|
|
|
-- Content lives inline today via bytea. The Go-side storage interface
|
|
-- (PendingUploadStorage) abstracts read/write so a future PR can
|
|
-- relocate this column's job to S3 (RFC #2789) by adding an `s3_key
|
|
-- text NULL` column, dual-writing for one release, then dropping
|
|
-- `content` once the backfill drains. The CHECK below pins the same
|
|
-- 25 MB per-file cap the workspace-side ingest_handler enforces
|
|
-- (workspace/internal_chat_uploads.py:198) — discrepancy between
|
|
-- the two would let the platform accept files the workspace would
|
|
-- 413 on after pull.
|
|
content bytea NOT NULL,
|
|
size_bytes bigint NOT NULL CHECK (size_bytes > 0 AND size_bytes <= 26214400),
|
|
|
|
-- Filename + mimetype mirror the workspace-side ChatUploadedFile
|
|
-- shape so the eventual InboxMessage hand-off needs no translation.
|
|
-- Filename is sanitized at write-time (matches sanitize_filename in
|
|
-- workspace/internal_chat_uploads.py); 100 char cap is the same.
|
|
filename text NOT NULL CHECK (length(filename) > 0 AND length(filename) <= 100),
|
|
mimetype text NOT NULL DEFAULT '',
|
|
|
|
created_at timestamptz NOT NULL DEFAULT now(),
|
|
|
|
-- Stamped on the GET /content request. Lets Phase 3 sweeper detect
|
|
-- "fetched but never acked" — distinct failure mode from "never
|
|
-- fetched" (workspace offline) so dashboards can split them.
|
|
fetched_at timestamptz,
|
|
|
|
-- Stamped on the POST /ack request. Terminal state for the happy
|
|
-- path. Sweep cron deletes acked rows past acked_at + retention.
|
|
acked_at timestamptz,
|
|
|
|
-- Hard TTL: rows past this are deleted regardless of ack state.
|
|
-- 24h matches the longest-observed legitimate "operator stepped
|
|
-- away from laptop" gap; tunable later via app-level config without
|
|
-- a migration. NOT acked_at + 24h — that would let a stuck-fetched
|
|
-- row live forever.
|
|
expires_at timestamptz NOT NULL DEFAULT (now() + interval '24 hours')
|
|
);
|
|
|
|
-- Hot path: workspace's poll cycle pulls "give me my unacked uploads
|
|
-- in chronological order." Partial-index because acked rows are GC
|
|
-- candidates and shouldn't bloat the working set.
|
|
CREATE INDEX IF NOT EXISTS idx_pending_uploads_workspace_unacked
|
|
ON pending_uploads (workspace_id, created_at)
|
|
WHERE acked_at IS NULL;
|
|
|
|
-- Phase 3 GC sweep hot path: list rows past expires_at, partial-indexed
|
|
-- on unacked because acked rows have a different (shorter) retention
|
|
-- and GC-via-acked_at is a separate query.
|
|
CREATE INDEX IF NOT EXISTS idx_pending_uploads_expires
|
|
ON pending_uploads (expires_at)
|
|
WHERE acked_at IS NULL;
|