Commit Graph

9 Commits

Author SHA1 Message Date
fabf45216d [infra-lead-agent] feat(workspace): add /configs/.github-token static-token fallback
Adds an operator escape-hatch fallback to molecule-git-token-helper.sh: if
the platform /github-installation-token endpoint is unreachable AND no
GITHUB_TOKEN/GH_TOKEN env var is set, the helper now reads a static PAT
from ${CONFIGS_DIR:-/configs}/.github-token before exiting with "all token
sources exhausted".

# Why

The 2026-05-08 incident exposed a hard dependency: every workspace's git
and gh CLI operations route through the platform's GitHub App
installation-token endpoint. When that endpoint started returning 500
("token refresh failed", root-caused to missing GITHUB_APP_ID env vars on
the platform side), every workspace lost git+gh auth simultaneously and
there was no operator escape-hatch — the helper exhausted its sources
and exited 1, breaking PR review, merge, and clone across the org.

This change lets infra drop a manually-issued PAT into /configs/.github-token
(agent-writable per /entrypoint.sh chown -R agent:agent /configs) to keep
git ops running while the platform endpoint is being repaired.

# Properties

- Pure additive: no existing fallback step is altered. The chain becomes
  cache > API > env > static > exit 1. Existing env-var users see no
  behavior change (env still wins over static).
- Static path NEVER writes to the cache. When the API recovers, the
  next call sees a stale-cache miss and fills the cache via the API
  path immediately — no 50-min stale-cache stickiness on the workaround.
- Both _fetch_token (git credential helper path) and _refresh_gh
  (gh CLI / daemon path) gain the fallback; otherwise git would work
  but gh would still be unauthenticated.
- Empty static file is rejected (no false-positive). File missing
  is rejected. Whitespace stripped via tr -d '[:space:]'.
- Preserves PR #1552's umask 077 hardening verbatim in _write_cache
  and _refresh_gh's ~/.gh_token write — only the api_token variable
  reference is renamed to chosen_token in the post-source-selection
  write paths.

# Tests run on the rebased file

1. bash -n syntax check — clean.
2. Static-token path with API broken + env unset → static path fires,
   correct token output, correct log message.
3. 'get' action via static path → emits proper git-credential-protocol
   (username=x-access-token + password=<token>).
4. Empty static file → rejected, returns "all token sources exhausted",
   exit 1 (no regression).
5. (Implicit by structure) env_token still takes precedence over
   static_token — env-var fallback block is unchanged and runs first.

# Rollout

Applying this change in the canonical repo lands the fix permanently
once a workspace-image rebuild pulls it into /app/scripts/. For the
in-incident window, operators can also drop the patched script at
~/molecule-git-token-helper.sh and re-point credential.https://github.com.helper
in ~/.gitconfig — works without root and without /app/scripts writes.

# Origin

Branch + design originally drafted by fullstack-engineer
(commit d4ed8768 in their workspace, unable to push due to the same
auth incident). Structural approval from core-platform-lead. Rebased
onto upstream main and pushed via my fork because every other agent
in the mesh was also blocked from pushing.

Co-Authored-By: fullstack-engineer <fullstack-engineer@agents.moleculesai.app>
Co-Authored-By: core-platform-lead <core-platform-lead@agents.moleculesai.app>
2026-05-09 01:46:13 +00:00
Hongming Wang
fc2720c1fe fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552)
The token-cache helper had three #1552 findings, all in the
mode-600-after-the-fact pattern:

1. _write_cache writes .tmp with default umask (typically 022 → 644
   on disk) and then chmod 600's after the mv. A concurrent reader
   in that microsecond-wide window sees the token at mode 644.
2. Each chmod was swallowed via `|| true` — if it ever fails, the
   tokens stay world-readable with no operator signal.
3. _refresh_gh's gh_token_file write has the same shape and same
   two issues.

Hardening:

- Wrap the .tmp creates in a `umask 077` block so the files are 600
  from creation. Restore the previous umask before return so callers
  aren't perturbed.
- Replace `chmod ... 2>/dev/null || true` with `if ! chmod ...; then
  echo WARN ...; fi`. A chmod failure is a real signal worth grep'ing.
- Apply the same pattern to the _refresh_gh gh_token_file path.
  `local` is illegal in a top-level case branch, so use a uniquely-
  named global (_gh_prev_umask) and unset it after.

Verified `bash -n` clean and `shellcheck` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:22:29 -07:00
rabbitblood
7b662d2494 ci(gh-wrapper): translate --assignee @me → --label team:<role>
Fixes #1957. All agents share one PAT, so `gh issue create --assignee @me`
resolves to the CEO. Today's "6 issues @me for 7 cycles" defect signal
turned out to be CEO-load misclassified as team-stagnation.

Translation rules:
- `--assignee @me` → `--label team:<role-slug>`
- `--reviewer @me` → dropped (review-bot scans labels, not requests)
- `--assignee user` (real user) → unchanged

role-slug derived from GIT_AUTHOR_NAME ("Molecule AI Core-BE" → "core-be").
The wrapper already handled the title-prefix + body-footer transforms;
these are just two more cases in the existing arg-walk loop.

Backward compat: any agent prompt that doesn't use @me passes through
unchanged. Agents don't need prompt updates — the wrapper is transparent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:34:21 -07:00
Hongming Wang
925a71887d
fix(workspace): credential helper security hardening (#1797)
Four findings from security audit (internal/security/credential-token-backlog.md):

1. STDERR LEAK — molecule-git-token-helper.sh:146,153 logged ${response}
   on platform errors. The response body MAY contain the token in some
   failure modes (alternate JSON key shape on partial success). Now:
   - capture curl's stderr to a tmp file (not $response) so we can log
     the curl error message without ever interpolating the response body
   - on empty-token branch, log only response size (bytes) for debug
2. CHMOD 600 — already in place at lines 116, 124, 223 (verified, no change)
3. RESPAWN SUPERVISION — entrypoint.sh wrapped daemon launch in a
   while-true bash loop with 30s back-off. Without this, a daemon crash
   silently leaves the workspace stuck on an expired token until the
   container restarts. Logs to /home/agent/.gh-token-refresh.log
   (agent-writable; /var/log is root-owned).
4. JITTER — molecule-gh-token-refresh.sh: added 0..120s random offset to
   each sleep so 39 containers don't synchronize their refresh requests
   against the platform endpoint.

Also:
- Daemon now sends helper output to /dev/null instead of merging stderr,
  belt-and-suspenders against any future helper change that might write
  the token to stdout.
- Daemon log lines include rc=$? on failure for actionable triage.

Inherent risks (org-wide token blast, prompt-injection theft, bearer
in volume, no audit log) tracked in internal/security/credential-token-backlog.md
as separate roadmap items.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-23 18:14:55 +00:00
Hongming Wang
2885583d05 feat(workspace): 45-min gh-token refresh daemon + credential helper cache
Extracted from the now-closed PR #1664 (Molecule-AI/molecule-core).

- New scripts/molecule-gh-token-refresh.sh background daemon — every
  45 min (TOKEN_REFRESH_INTERVAL_SEC) calls the credential helper's
  _refresh_gh action to keep both gh CLI auth and the on-disk cache
  fresh through the GitHub App installation token's ~60 min TTL.
- scripts/molecule-git-token-helper.sh rewritten with a ~50 min
  on-disk cache (${CACHE_DIR}/gh_installation_token + _expiry
  companion file), a cache > API > env-var fallback chain, a new
  _refresh_gh action (invoked by the daemon above), a _invalidate_cache
  action, and path references flipped from /workspace/scripts/... to
  /app/scripts/... to match the runtime image layout.
- Dockerfile copies the new refresh daemon and extends mkdir to
  create /home/agent/.molecule-token-cache at build time.
- entrypoint.sh configures the git credential helper for github.com
  while still root (so the global gitconfig is written before the
  gosu handoff), creates + chowns the token cache dir, then as agent
  starts the refresh daemon in the background and does an initial
  gh auth login from GITHUB_TOKEN/GH_TOKEN so gh works before the
  first refresh fires.

Dropped from PR #1664: cosmetic em-dash -> ASCII hyphen rewrites
(charset-normalizer noise) that would conflict with the repo's
existing em-dash convention used elsewhere in workspace/.
2026-04-22 19:52:46 -07:00
molecule-ai[bot]
3bef6af241 fix: apply #1124 env-var defaults + scrub F1088 credentials from INCIDENT_LOG.md (#1347)
- PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default
  with Docker-aware detection (host.docker.internal in containers,
  localhost for local dev) across all workspace Python modules and the
  git-token-helper shell script.
- WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty)
  consistent with coordinator.py / a2a_cli.py patterns already in place.
- INCIDENT_LOG.md: replace all 3 F1088 credential types with
  ***REDACTED*** (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x).

Fixes #1124, #1333.

Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 08:11:44 +00:00
rabbitblood
b1bb5f838a fix: GitHub token refresh — add WorkspaceAuth path for credential helper (#1068)
PR #729 tightened AdminAuth to require ADMIN_TOKEN, breaking the
workspace credential helper which called /admin/github-installation-token
with a workspace bearer token. Tokens expired after 60 min with no refresh.

Fix: Add /workspaces/:id/github-installation-token under WorkspaceAuth
so any authenticated workspace can refresh its GitHub token. Keep the
admin path as backward-compatible alias.

Update molecule-git-token-helper.sh to use the workspace-scoped path
when WORKSPACE_ID is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 08:30:02 -07:00
Hongming Wang
37ed319562 fix: update workspace script comments for workspace-template → workspace rename
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:48:05 -07:00
Hongming Wang
d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00