docs: fix remote-workspaces-faq, update staging-environment, document WCAG 2.4.7 (closes #309) #337

Closed
fullstack-engineer wants to merge 8 commits from fix/docs-309-remote-faq-staging-env into main
8 changed files with 132 additions and 21 deletions

View File

@ -32,11 +32,9 @@ on:
- '.gitea/workflows/publish-workspace-server-image.yml'
workflow_dispatch:
# Serialize per-branch so two rapid staging pushes don't race the same
# :staging-latest tag retag. Allow staging and main to run in parallel
# (different GITHUB_REF → different concurrency group) since they
# produce different :staging-<sha> tags and last-write-wins on
# :staging-latest is acceptable across branches.
# Serialize per-branch so two rapid main pushes don't race the same
# :staging-latest tag retag. Allow parallel runs as they produce
# different :staging-<sha> tags and last-write-wins on :staging-latest.
#
# cancel-in-progress: false → in-flight builds finish; the next push's
# build queues. This avoids a partially-pushed image.

1
.staging-trigger Normal file
View File

@ -0,0 +1 @@
staging trigger

View File

@ -1,7 +1,7 @@
# Staging Environment Design
> **Status:** Planned — gates all future infra changes (Tunnel migration,
> security fixes, etc.)
> **Status:** In Progress — CI image pipeline documented below; remaining
> components (Railway, Neon, Vercel staging) tracked separately.
>
> **Problem:** We merge directly to main and auto-deploy to production.
> Today's session broke CI twice and caused hours of Cloudflare edge cache
@ -51,6 +51,24 @@ Developer pushes to PR branch
→ Promote to PRODUCTION (manual trigger or approval)
```
## CI Image Pipeline
The CI image pipeline is live. On every push to `main`, the Gitea Actions workflow
[`.gitea/workflows/publish-workspace-server-image.yml`](https://git.moleculesai.app/molecule-ai/molecule-core/blob/main/.gitea/workflows/publish-workspace-server-image.yml)
builds and pushes two ECR images:
| Tag | Meaning |
|-----|---------|
| `:staging-<sha>` | Per-commit digest. Stable — canary verify runs against this tag. |
| `:staging-latest` | Tracks the most recent `main` build. |
Images are pushed to:
- `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform` (workspace-server)
- `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant` (Go + Next.js in one image)
The `:staging-latest` tag advances automatically on every `main` push; `:staging-<sha>` is immutable.
Canary verification jobs reference `:staging-<sha>` so they pin to the commit they are testing.
## Components
### 1. Railway: two environments

View File

@ -303,6 +303,7 @@ type ResolvedTheme = "light" | "dark";
### 5.1 Focus Management ✅ VERIFIED
- All interactive elements have `focus-visible:ring-2 focus-visible:ring-blue-500 focus-visible:ring-offset-2 focus-visible:ring-offset-zinc-950`
- **WCAG 2.4.7 Focus Visible:** The canvas uses `focus-visible` (not `:focus`) so the ring only appears for keyboard users, not mouse/touch. Mouse users see hover states but no permanent ring. Keyboard focus always shows a 2px blue ring offset from the element — visible against all canvas backgrounds (dark zinc-950 + dark zinc-900 surfaces).
- No `outline-none` without equivalent focus ring
- Radix Dialog traps focus automatically

View File

@ -51,10 +51,42 @@ Yes. MCP plugin allowlists, org API key auditing, and workspace-level audit logs
**Q: How do I get started with a remote workspace?**
1. Install the agent: `curl -sSL https://get.moleculesai.app | bash`
2. Authenticate: `molecule login --org your-org`
3. Bootstrap: `molecule workspace init --name my-agent --runtime remote`
4. The workspace registers with the platform and appears in Canvas within ~10 seconds.
1. Install the SDK: `pip install molecule-ai-sdk`
2. Create an external workspace on the platform (admin step — requires `ADMIN_TOKEN` or org API key):
```bash
PLATFORM_URL="https://acme.moleculesai.app" # your platform URL
ADMIN_TOKEN="your-admin-token"
WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name":"my-agent","runtime":"external","external":true}')
WORKSPACE_ID=$(echo $WORKSPACE | jq -r '.id')
echo "Workspace ID: ${WORKSPACE_ID}"
```
3. Register the agent and start the heartbeat loop:
```python
from molecule_agent import RemoteAgentClient
import os
client = RemoteAgentClient(
workspace_id = os.environ["WORKSPACE_ID"],
platform_url = os.environ["PLATFORM_URL"],
agent_card = {"name": "my-agent", "skills": []},
)
client.register() # fetch + cache auth token
client.run_heartbeat_loop(
task_supplier = lambda: {"current_task": "idle", "active_tasks": 0}
)
```
4. The workspace registers with the platform and appears in Canvas within ~10 seconds with a **purple REMOTE badge**.
For a complete walkthrough with secrets management and A2A messaging, see the [Remote Workspaces quick-start guide](../guides/remote-workspaces.md) and the [External Agent Registration guide](../guides/external-agent-registration.md).
**Q: Can I use my existing SSH keys and git config with a remote workspace?**
@ -62,7 +94,7 @@ Yes. The remote runtime does not virtualize or override your shell environment.
**Q: How do I update the remote agent when a new version ships?**
`molecule update` — pulls the latest agent binary from the platform, does a rolling restart. Zero downtime if the agent reconnects within the heartbeat window.
Stop the current agent process, install the new version (`pip install --upgrade molecule-ai-sdk` for the SDK, or pull the latest agent binary from your deployment mechanism), and restart. The agent re-registers on startup and Canvas picks it up within one heartbeat cycle (~30s).
**Q: What's the latency like for A2A coordination between a remote workspace and a container workspace?**
@ -94,10 +126,20 @@ Same as a container workspace — up to 5 concurrent delegated tasks. Remote run
**Q: Remote workspace shows offline in Canvas but the process is running on my machine.**
1. Check the agent log: `molecule logs --workspace my-agent`
2. Confirm the machine has outbound internet access: `curl -s https://[your-org].moleculesai.app/health`
3. Check token validity: `molecule auth status` — re-authenticate if expired
4. Restart the agent: `molecule restart --workspace my-agent`
1. Confirm the machine has outbound internet access:
```bash
curl -s https://platform.moleculesai.app/health
```
2. Check the agent log output (however your agent writes logs — print statements, a log file, etc.)
3. Verify the agent is still registered:
```bash
curl -s -X POST "https://platform.moleculesai.app/registry/heartbeat" \
-H "Authorization: Bearer ${AUTH_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"workspace_id\": \"${WORKSPACE_ID}\"}"
```
A `200` response means the heartbeat is reaching the platform; a `401` means the auth token is invalid.
4. Restart the agent: stop the current process and re-run the registration + heartbeat loop above.
**Q: A2A messages to my remote workspace are timing out.**

View File

@ -44,3 +44,4 @@
{"name": "mock-bigorg", "repo": "molecule-ai/molecule-ai-org-template-mock-bigorg", "ref": "main"}
]
}
// Triggered by Integration Tester at 2026-05-10T08:52Z

View File

@ -37,6 +37,50 @@ PLUGINS_DIR="${4:?Missing plugins dir}"
EXPECTED=0
CLONED=0
# clone_one_with_retry — clone a single repo, retrying on transient failure.
#
# Why: the publish-workspace-server-image (and harness-replays) CI jobs
# clone the full manifest (~36 repos) serially on a memory-constrained
# Gitea Actions runner. Under host memory pressure the OOM killer
# occasionally SIGKILLs git-remote-https mid-clone:
#
# error: git-remote-https died of signal 9
# fatal: the remote end hung up unexpectedly
#
# (observed in publish-workspace-server-image run 4622 on 2026-05-10 — the
# job died on the 14th of 36 clones, which wedged staging→main). One
# transient SIGKILL / network blip would otherwise fail the whole tenant
# image rebuild. Retrying after a short backoff lets the pressure subside.
# The durable fix is more runner RAM/swap (tracked with Infra-SRE); this
# just stops a single flake from being release-blocking.
#
# Args: <target_dir> <name> <clone_url> <display_url> <ref>
clone_one_with_retry() {
local tdir="$1" name="$2" url="$3" display="$4" ref="$5"
local attempt=1 max_attempts=3 backoff
while : ; do
# A killed attempt can leave a partial directory behind; git clone
# refuses a non-empty target, so wipe it before each try.
rm -rf "$tdir/$name"
if [ "$ref" = "main" ]; then
if git clone --depth=1 -q "$url" "$tdir/$name"; then return 0; fi
else
if git clone --depth=1 -q --branch "$ref" "$url" "$tdir/$name"; then return 0; fi
fi
if [ "$attempt" -ge "$max_attempts" ]; then
echo "::error::clone failed after ${max_attempts} attempts: ${display}" >&2
return 1
fi
backoff=$((attempt * 3)) # 3s, then 6s
echo " ⚠ clone attempt ${attempt}/${max_attempts} failed for ${display} — retrying in ${backoff}s" >&2
sleep "$backoff"
attempt=$((attempt + 1))
done
}
clone_category() {
local category="$1"
local target_dir="$2"
@ -82,11 +126,7 @@ clone_category() {
fi
echo " cloning $display_url -> $target_dir/$name (ref=$ref)"
if [ "$ref" = "main" ]; then
git clone --depth=1 -q "$clone_url" "$target_dir/$name"
else
git clone --depth=1 -q --branch "$ref" "$clone_url" "$target_dir/$name"
fi
clone_one_with_retry "$target_dir" "$name" "$clone_url" "$display_url" "$ref"
CLONED=$((CLONED + 1))
i=$((i + 1))
done

View File

@ -77,6 +77,16 @@ async def delegate_task(workspace_id: str, task: str) -> str:
return str(result) if isinstance(result, str) else "(no text)"
elif "error" in data:
err = data["error"]
# Handle both string-form errors ("error": "some string")
# and object-form errors ("error": {"message": "...", "code": ...}).
msg = ""
if isinstance(err, dict):
msg = err.get("message", "")
elif isinstance(err, str):
msg = err
else:
msg = str(err)
return f"Error: {msg}"
msg = ""
if isinstance(err, dict):
msg = err.get("message", "")