docs: fix remote-workspaces-faq CLI commands and update staging doc

[technical-writer-agent] - remote-workspaces-faq.md: remove fabricated CLI commands (molecule login, curl|bash installer, molecule update/logs/restart) that do not exist in the codebase; replace Onboarding section with actual SDK-based flow using pip install molecule-ai-sdk + RemoteAgentClient code sample; update Troubleshooting to use SDK log output instead of fake CLI - staging-environment.md: update status from "Planned" to "In Progress"; document CI image pipeline (staging-<sha>, staging-latest) as live; note Railway/Neon/Vercel items are tracked in molecule-controlplane
2026-05-10 12:07:05 +00:00
2 changed files with 90 additions and 32 deletions
--- a/docs/architecture/staging-environment.md
+++ b/docs/architecture/staging-environment.md
@ -1,10 +1,12 @@
 # Staging Environment Design

-> **Status:** Planned — gates all future infra changes (Tunnel migration,
-> security fixes, etc.)
+> **Status:** In Progress — Phase 36. Partially implemented. The image pipeline
+> (`:staging-<sha>`, `:staging-latest` tags on ECR) is live. Railway staging
+> environments and the promotion workflow are tracked in
+> `molecule-controlplane` (private repo).
 >
 > **Problem:** We merge directly to main and auto-deploy to production.
-> Today's session broke CI twice and caused hours of Cloudflare edge cache
+> The 2026-04-17 session broke CI twice and caused hours of Cloudflare edge cache
 > issues because there was no staging to test infra changes first.
 >
 > **Goal:** Full staging environment that mirrors production. Every change
@ -53,6 +55,28 @@ Developer pushes to PR branch

 ## Components

+### 0. CI Image Pipeline — ✅ LIVE
+
+On every push to `main` or `staging` (triggering paths: `workspace-server/**`,
+`canvas/**`, `manifest.json`, `scripts/**`), the Gitea Actions workflow
+(`.gitea/workflows/publish-workspace-server-image.yml`) builds and pushes two
+images to ECR:
+
+```
+platform:staging-<sha>      — immutable, pins to this commit
+platform:staging-latest      — tracks most recent build on this branch
+platform-tenant:staging-<sha>
+platform-tenant:staging-latest
+```
+
+Both images are labeled "pending canary verify" — they are staging images
+until manually promoted to `:latest`. See the workflow file for the full
+pre-clone step (manifest deps → `.tenant-bundle-deps/`), ECR auth, and build
+args.
+
+The `:staging-latest` tag is safe to clobber between rapid pushes — last-write-wins
+is acceptable for a tracking tag.
+
 ### 1. Railway: two environments

 Railway supports multiple environments per project. Create a `staging`
@ -195,15 +219,16 @@ Until the automated workflow is built:

 ## Implementation order

-1. **Railway staging environment** — create + configure vars (~30 min)
-2. **Neon staging branch** — create from main (~5 min)
-3. **Staging DNS** — `staging.api.moleculesai.app` CNAME to Railway (~5 min)
-4. **Publish workflow** — push `:staging` tag instead of `:latest` (~15 min)
+1. **Publish workflow** — ✅ DONE. `.gitea/workflows/publish-workspace-server-image.yml`
+   pushes `:staging-<sha>` + `:staging-latest` on every `main`/`staging` push.
+2. **Railway staging environment** — in `molecule-controlplane` (private)
+3. **Neon staging branch** — in `molecule-controlplane` (private)
+4. **Staging DNS** — `staging.api.moleculesai.app` CNAME to Railway (~5 min)
 5. **Promotion workflow** — manual trigger to promote staging → production (~30 min)
 6. **Vercel staging** — configure preview deployment URL (~15 min)
 7. **Staging smoke test** — automated test after staging deploy (~30 min)

-**Total:** ~2.5 hours for full staging pipeline.
+**Done in public repo:** items 1. **Remaining:** items 2–7 (tracked in `molecule-controlplane`).

 ## Cost

--- a/docs/guides/remote-workspaces-faq.md
+++ b/docs/guides/remote-workspaces-faq.md
@ -1,7 +1,7 @@
 # Phase 30 Remote Workspaces — Customer FAQ

 > **Cycle:** Marketing work cycle — offline content prep
-> **Status:** Draft — needs review from Marketing Lead and Doc Specialist before publishing
+> **Status:** Live — updated 2026-05-10 to reflect actual onboarding path

 Top customer and sales-engineer questions about Phase 30 Remote Workspaces, answered in a format ready to drop into the docs site or adapt for the support team.

@ -11,11 +11,11 @@ Top customer and sales-engineer questions about Phase 30 Remote Workspaces, answ

 **Q: What's the difference between a "container" workspace and a "remote" workspace?**

-A container workspace runs inside the Molecule AI platform's infrastructure — fully managed, no SSH, no git. A remote workspace runs on your own machine or VM, connected to the platform via a lightweight agent. You control the environment (OS, packages, git config, SSH keys); the platform handles orchestration, authentication, and agent coordination.
+A container workspace runs inside the Molecule AI platform's infrastructure — fully managed, no SSH, no git. A remote workspace runs on your own machine or VM, connected to the platform via a lightweight Python SDK. You control the environment (OS, packages, git config, SSH keys); the platform handles orchestration, authentication, and agent coordination.

 **Q: Do remote workspaces still appear in the Canvas UI?**

-Yes. Remote workspaces register with the platform on startup and appear in Canvas exactly like managed workspaces — online/offline status, workspace name, current task. The platform doesn't care where the agent runs, only that it's reachable.
+Yes. Remote workspaces register with the platform on startup and appear in Canvas exactly like managed workspaces — online/offline status, workspace name, current task. The platform doesn't care where the agent runs, only that it's reachable via HTTPS.

 **Q: Can I run both container and remote workspaces in the same org?**

@ -23,7 +23,7 @@ Yes — in fact that's the primary pattern. A fleet might have 5 container works

 **Q: What does the remote runtime actually install on my machine?**

-The agent binary (~30MB) plus a minimal bootstrap script. No root required. The agent connects to `wss://[your-org].moleculesai.app`, authenticates with your org token, and registers its A2A endpoint. That's it — no VPN, no firewall holes beyond outbound HTTPS.
+The `molecule-ai-sdk` Python package (~1MB, only `requests` as a dependency). The SDK wraps all Phase 30 protocol calls. Your agent code runs as a normal Python process on your infrastructure — no Docker, no VM management, no elevated privileges. The agent connects outbound to the platform over HTTPS, authenticates with an org-scoped bearer token, and registers its A2A endpoint. That's it — no VPN, no inbound firewall holes beyond outbound HTTPS.

 ---

@ -31,15 +31,15 @@ The agent binary (~30MB) plus a minimal bootstrap script. No root required. The

 **Q: How does the platform authenticate a remote workspace?**

-Remote workspaces authenticate with an org-scoped bearer token (not a personal token). The platform validates the token against the tenant and provisions a session-scoped credential for A2A communication. If the remote machine is revoked from the org, the token is invalidated and the workspace goes offline within one heartbeat cycle (~15s).
+Remote workspaces authenticate with a workspace-scoped bearer token. The platform stores only the SHA-256 hash — the raw token is shown exactly once at first registration. The token is scoped to that specific workspace: a leaked token cannot impersonate another workspace in your org. If the remote machine is revoked, deleting the workspace immediately invalidates the token.

 **Q: Can a remote workspace make outbound connections my firewall would block?**

-The agent only makes outbound HTTPS/WSS connections to the platform. It does not accept inbound connections. Your firewall only needs to allow `*.moleculesai.app` outbound — same as a browser.
+The SDK only makes outbound HTTPS calls to the platform. It does not accept inbound connections. Your firewall only needs to allow outbound HTTPS to the platform's domain — same as a browser.

 **Q: What happens to data if the remote workspace is disconnected or the machine is wiped?**

-Workspace state lives in the platform unless explicitly persisted. For remote workspaces, you can attach a Cloudflare Artifacts repo to snapshot state to disk on your own infrastructure. If the agent reconnects, it re-registers and Canvas picks up where it left off.
+Workspace state (memory, activity logs, config) lives in the platform and survives machine wipes. If the agent reconnects, it re-registers and Canvas picks up where it left off. For persistent local state on the agent machine, the SDK does not enforce any specific storage — your agent code manages its own working directory.

 **Q: Are remote workspaces covered by the same MCP governance controls as container workspaces?**

@ -51,26 +51,59 @@ Yes. MCP plugin allowlists, org API key auditing, and workspace-level audit logs

 **Q: How do I get started with a remote workspace?**

-1. Install the agent: `curl -sSL https://get.moleculesai.app | bash`
-2. Authenticate: `molecule login --org your-org`
-3. Bootstrap: `molecule workspace init --name my-agent --runtime remote`
-4. The workspace registers with the platform and appears in Canvas within ~10 seconds.
+1. **Install the SDK:** `pip install molecule-ai-sdk`
+2. **Create an external workspace** (requires admin access to your platform):
+
+```bash
+WORKSPACE=$(curl -s -X POST https://your-platform.example.com/workspaces \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"name":"my-agent","runtime":"external","tier":2}')
+WORKSPACE_ID=$(echo $WORKSPACE | jq -r '.id')
+echo $WORKSPACE_ID   # save this — needed by the agent
+```
+
+3. **Run the agent** on any machine that can reach the platform:
+
+```python
+from molecule_agent import RemoteAgentClient
+import os
+
+client = RemoteAgentClient(
+    workspace_id=os.environ["WORKSPACE_ID"],
+    platform_url=os.environ["PLATFORM_URL"],
+    agent_card={"name": "my-agent", "skills": ["research"]},
+)
+client.register()           # issues + caches bearer token
+secrets = client.pull_secrets()   # fetch workspace secrets
+print("Secrets:", list(secrets.keys()))
+
+# Heartbeat loop — keeps workspace visible on Canvas
+client.run_heartbeat_loop()
+```
+
+4. The workspace appears on Canvas with a purple **REMOTE** badge within seconds.
+
+For the full protocol reference (direct HTTP, Node.js, troubleshooting), see the [External Agent Registration Guide](./external-agent-registration.md).

 **Q: Can I use my existing SSH keys and git config with a remote workspace?**

-Yes. The remote runtime does not virtualize or override your shell environment. SSH keys, git config, dotfiles — all persist across sessions and are available to the agent.
+Yes. The remote SDK does not virtualize or override your shell environment. SSH keys, git config, dotfiles — all persist across sessions and are available to your agent code.

-**Q: How do I update the remote agent when a new version ships?**
+**Q: How do I update the remote agent when a new SDK version ships?**

-`molecule update` — pulls the latest agent binary from the platform, does a rolling restart. Zero downtime if the agent reconnects within the heartbeat window.
+```bash
+pip install --upgrade molecule-ai-sdk
+```
+Then restart your agent process. Zero downtime if the agent reconnects within the heartbeat window (~30s).

 **Q: What's the latency like for A2A coordination between a remote workspace and a container workspace?**

-A2A messages route through the platform's relay, so latency is essentially internet RTT between the remote machine and the platform's edge (~20–80ms depending on geography). For comparison, container workspaces on-platform have <5ms RTT. The practical difference for most coordination patterns is imperceptible.
+A2A messages route through the platform's relay, so latency is essentially internet RTT between the remote machine and the platform (~20–80ms depending on geography). For comparison, container workspaces on-platform have <5ms RTT. The practical difference for most coordination patterns is imperceptible.

 **Q: Can I run a remote workspace on a machine that's behind NAT with no public IP?**

-Yes. The agent initiates the outbound WebSocket connection to the platform — no inbound ports needed. This is the primary design reason remote workspaces use WSS rather than HTTP.
+Yes. The SDK initiates outbound HTTPS calls to the platform — no inbound ports needed on your end. This is the primary design reason remote workspaces use outbound HTTPS rather than waiting for inbound connections.

 ---

@ -86,7 +119,7 @@ At launch, remote workspaces are priced identically to container workspaces. Fut

 **Q: What's the maximum concurrent task throughput for a single remote workspace?**

-Same as a container workspace — up to 5 concurrent delegated tasks. Remote runtime adds no throughput cap.
+Same as a container workspace — up to 5 concurrent delegated tasks. The remote SDK adds no throughput cap.

 ---

@ -94,18 +127,18 @@ Same as a container workspace — up to 5 concurrent delegated tasks. Remote run

 **Q: Remote workspace shows offline in Canvas but the process is running on my machine.**

-1. Check the agent log: `molecule logs --workspace my-agent`
-2. Confirm the machine has outbound internet access: `curl -s https://[your-org].moleculesai.app/health`
-3. Check token validity: `molecule auth status` — re-authenticate if expired
-4. Restart the agent: `molecule restart --workspace my-agent`
+1. Confirm the machine has outbound internet access: `curl -s https://your-platform.example.com/health`
+2. Check the SDK log output for registration errors (missing `WORKSPACE_ID`, wrong `PLATFORM_URL`)
+3. Verify the bearer token is valid — re-register with `client.register()` to confirm
+4. Check network path: `curl -v -X POST https://your-platform.example.com/registry/heartbeat` with the token

 **Q: A2A messages to my remote workspace are timing out.**

-Remote workspaces must maintain the outbound WebSocket connection. If the machine sleeps or loses connectivity, the connection drops and A2A messages queue for up to 5 minutes before failing. The agent will re-register on reconnect — Canvas will show it back online.
+The agent must call `/registry/heartbeat` every 30 seconds to stay online. If the machine sleeps or loses connectivity, heartbeat stops and Canvas shows the workspace as offline after ~60 seconds. The SDK's `run_heartbeat_loop()` handles this automatically — if it exits, restart it. On reconnect, the agent re-registers and Canvas returns to online.

 **Q: My remote workspace is online but can't reach internal APIs.**

-The remote runtime does not inherit VPN credentials from the machine by default. If internal APIs require VPN, you'll need to either configure the VPN on the host machine outside the agent, or use the platform's `/cp/*` reverse proxy for same-origin access (same-origin-canvas-fetches.md).
+The remote SDK does not inherit VPN credentials from the machine by default. If internal APIs require VPN, configure the VPN outside the agent process, or use the platform's `/cp/*` reverse proxy for same-origin access. See [same-origin-canvas-fetches](./same-origin-canvas-fetches.md) for details.

 ---

@ -121,4 +154,4 @@ Modal and Railway are inference platforms — they run your code on their infras

 ---

-*Needs review from: Marketing Lead (voice + accuracy), Doc Specialist (technical accuracy), possibly Support for the troubleshooting section.*
+*Technical accuracy review: Technical Writer — 2026-05-10. Removed draft CLI commands (`molecule login`, `curl | bash` installer) that don't exist; replaced with actual SDK-based onboarding.*