commit 4fd5ac7be30ff964e48de30de64840f931030050 Author: Hongming Wang Date: Thu Apr 23 20:38:45 2026 -0700 feat(plugin): gh-identity — per-agent attribution via env injection + gh wrapper Fixes molecule-core#1957: agent identity collapse where all agents share one GitHub PAT and their writes attribute to the CEO. This plugin takes the pragmatic "wrap, don't multiply identities" path: - Injects MOLECULE_AGENT_ROLE / OWNER / ATTRIBUTION_BADGE per workspace - Ships a shell wrapper for `gh` that: * prepends an attribution badge to issue/PR bodies on publish * rewrites --assignee @me to the role's designated human owner * emits an NDJSON audit log to /var/log/molecule-gh.ndjson - Wrapper is shipped as base64 env var; each workspace template's install.sh decodes and writes it to /usr/local/bin/gh Scales where GitHub Apps / machine users don't: adding a new agent role is one entry in config.yaml, not a GitHub UI roundtrip per role. See README + known-issues.md for the v2-architecture migration plan. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..87cb605 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,31 @@ +name: CI + +on: + push: + branches: [main] + pull_request: + +jobs: + go: + name: Go build + test + vet + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-go@v5 + with: { go-version: "1.25" } + - run: go mod tidy && git diff --exit-code go.mod go.sum + - run: go build ./... + - run: go vet ./... + - run: go test -race ./... + + shell: + name: Shellcheck + wrapper tests + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Install shellcheck + run: sudo apt-get update -qq && sudo apt-get install -y -qq shellcheck + - name: Shellcheck + run: shellcheck internal/ghidentity/wrapper.sh scripts/test-wrapper.sh + - name: Run wrapper tests + run: bash scripts/test-wrapper.sh diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..35c0146 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +/bin/ +/dist/ +*.test +.DS_Store diff --git a/README.md b/README.md new file mode 100644 index 0000000..a198e35 --- /dev/null +++ b/README.md @@ -0,0 +1,119 @@ +# molecule-ai-plugin-gh-identity + +Injects per-agent identity into workspace env so every `gh` CLI call +carries agent attribution — without needing a distinct GitHub account per +agent. + +## Problem + +All agents in a Molecule fleet share one GitHub PAT. When an agent runs: + +``` +gh issue create --assignee @me ... +gh pr comment ... +``` + +`@me` resolves to the PAT owner (the CEO). Every issue, PR, and comment +gets attributed to one person, making audit impossible and flooding that +person's notifications. See [molecule-core#1957]. + +GitHub's identity model doesn't scale the way agent fleets do: creating N +machine users requires N emails + seats; creating N GitHub Apps requires +N manual UI round-trips. Neither is batch-generatable. + +## Approach + +Work around the identity model with a convention, enforced by a tiny +shell wrapper and this plugin's env injection. + +1. Plugin injects per-workspace env: `MOLECULE_AGENT_ROLE`, + `MOLECULE_OWNER`, `MOLECULE_ATTRIBUTION_BADGE`. +2. The workspace base image ships a `gh` wrapper (`/usr/local/bin/gh`) + that reads `$MOLECULE_AGENT_ROLE` and: + - prepends an attribution block to every `issue comment` / `pr + comment` / `issue create --body` / `pr create --body` + - rewrites `--assignee @me` to `--assignee $MOLECULE_OWNER` (or + strips it entirely) + - emits an audit line to `/var/log/molecule-gh.ndjson` +3. A `git` wrapper does the same for `Co-authored-by:` on commits. + +The wrapper script is shipped embedded in the plugin (`wrapper.sh`) and +installed by each workspace-template's `install.sh` when the plugin is +active. Plugin → env injection; template → file write. + +## What this plugin is NOT + +- NOT a GitHub App installer. Existing `molecule-ai-plugin-github-app-auth` + handles App-based auth; this plugin is additive and does not conflict. +- NOT a machine-user provisioning tool. There are no distinct GitHub + identities; attribution is text-based. +- NOT a per-agent rate limiter or cost accounter (future work; see #1957 + follow-ups). + +## Env vars injected + +| Name | Source | Example | +|---|---|---| +| `MOLECULE_AGENT_ROLE` | workspace metadata (`role` field) | `PMM-Lead` | +| `MOLECULE_OWNER` | plugin config (role → owner map) | `HongmingWang-Rabbit` | +| `MOLECULE_ATTRIBUTION_BADGE` | computed | `🤖 [Agent: PMM-Lead · ws-a0689c35]` | +| `MOLECULE_GH_WRAPPER_SHA` | computed | hash of wrapper.sh for version pinning | + +## Plugin manifest (v1) + +This plugin ships as a v1 plugin (matching `molecule-ai-plugin-github-app-auth`). +Migration to [plugin-architecture-v2] happens in phase 6 of that plan. +The v1 shape here is intentionally structured so v2 migration is mostly a +manifest rename: + +- `EnvMutator.MutateEnv` → v2's `contributions.env` + `hooks.env_refresh` +- Role→owner map in `config.yaml` → v2's `spec.config` +- Wrapper script shipping → v2's `contributions.files` (new axis) + +## Install (v1) + +Monorepo side: +``` +manifest.json:plugins += {name: "gh-identity", repo: "Molecule-AI/molecule-ai-plugin-gh-identity", ref: "main"} +workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity +workspace-server/cmd/server/main.go: pluginloader.BuildRegistry() +``` + +Env (operator): +``` +MOLECULE_GH_IDENTITY_CONFIG_FILE=/path/to/config.yaml +``` + +## Config + +```yaml +# config.yaml — role → owner map (used for `@me` rewrite) +roles: + PMM-Lead: { owner: HongmingWang-Rabbit } + Dev-Lead: { owner: HongmingWang-Rabbit } + Research-Lead:{ owner: HongmingWang-Rabbit } + default: { owner: HongmingWang-Rabbit } +``` + +## Capabilities requested (v2 forward-compat) + +When v2 enforcement lands, this plugin will declare: + +- `workspace:env_inject` — required +- `workspace:file_write:/usr/local/bin/gh` — required (via template install.sh) +- `audit:emit` — required +- `network_egress:api.github.com` — required (wrapper makes API calls via real gh) + +No broader capabilities. In particular: **no secret access** (PAT is +shared and platform-managed, not in plugin scope). + +## Related + +- molecule-core#1957 — agent identity collapse (this plugin's driver) +- molecule-core#1933 — GH_TOKEN refresh (separate concern; handled by + github-app-auth plugin) +- internal `product/plugin-architecture-v2.md` — target arch for v2 + migration + +[molecule-core#1957]: https://github.com/Molecule-AI/molecule-core/issues/1957 +[plugin-architecture-v2]: https://github.com/Molecule-AI/internal/blob/main/product/plugin-architecture-v2.md diff --git a/examples/config.yaml b/examples/config.yaml new file mode 100644 index 0000000..d1d0b30 --- /dev/null +++ b/examples/config.yaml @@ -0,0 +1,17 @@ +# Example config for gh-identity plugin. +# Point MOLECULE_GH_IDENTITY_CONFIG_FILE at this file. + +roles: + # Each role → the GitHub user @me calls should be rewritten to. + # Role names match the `role` field set on workspace metadata; the + # plugin's SanitizeRole() normalizes whitespace/case, so you can + # write these in a consistent style regardless of agent input. + PMM-Lead: { owner: HongmingWang-Rabbit } + Dev-Lead: { owner: HongmingWang-Rabbit } + Research-Lead: { owner: HongmingWang-Rabbit } + Marketing-Lead: { owner: HongmingWang-Rabbit } + + # Catch-all for roles not explicitly listed. If you omit this, unknown + # roles get their `--assignee @me` stripped entirely (safer than + # wrong-attribution). + default: { owner: HongmingWang-Rabbit } diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..72d124a --- /dev/null +++ b/go.mod @@ -0,0 +1,15 @@ +module github.com/Molecule-AI/molecule-ai-plugin-gh-identity + +go 1.25.0 + +require gopkg.in/yaml.v3 v3.0.1 + +// This plugin's Mutator type satisfies monorepo's provisionhook.EnvMutator +// structurally — we don't import it, so no cross-module replace directive +// is needed. If we ever need to reference exported types from +// molecule-monorepo/platform, uncomment: +// +// replace github.com/Molecule-AI/molecule-monorepo/platform => ../molecule-monorepo/workspace-server +// +// Keeping this out of the require list lets the plugin build standalone in CI +// without checking out the monorepo. diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..a62c313 --- /dev/null +++ b/go.sum @@ -0,0 +1,4 @@ +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/internal/ghidentity/config.go b/internal/ghidentity/config.go new file mode 100644 index 0000000..4d3a13d --- /dev/null +++ b/internal/ghidentity/config.go @@ -0,0 +1,101 @@ +// Package ghidentity implements the workspace-server plugin that injects +// per-agent attribution env vars into workspace containers. +// +// See repo README for the "why" (molecule-core#1957 agent-identity +// collapse). This package contains the wiring; the behavioural logic +// lives in wrapper.sh which is shipped to the workspace via env. +package ghidentity + +import ( + "fmt" + "os" + "strings" + + "gopkg.in/yaml.v3" +) + +// Config maps agent roles (set per-workspace via MOLECULE_AGENT_ROLE) +// to the human GitHub user who "owns" that role for purposes of @me +// rewriting. A single default entry covers roles not explicitly listed. +// +// Config is loaded once at platform boot from +// $MOLECULE_GH_IDENTITY_CONFIG_FILE. Missing file → use the DefaultOwner +// for all roles; plugin still works, just with blanket attribution. +type Config struct { + Roles map[string]RoleConfig `yaml:"roles"` +} + +// RoleConfig defines the per-role settings. Today: just the owner. +// Future fields (capability overrides, rate limits, per-role repo +// allowlists) slot in here without breaking the surface. +type RoleConfig struct { + Owner string `yaml:"owner"` +} + +// LoadConfig reads a YAML config file. Missing file is not an error — +// returns a Config with an empty Roles map and the caller falls through +// to DefaultOwner. +func LoadConfig(path string) (*Config, error) { + if path == "" { + return &Config{Roles: map[string]RoleConfig{}}, nil + } + raw, err := os.ReadFile(path) + if err != nil { + if os.IsNotExist(err) { + return &Config{Roles: map[string]RoleConfig{}}, nil + } + return nil, fmt.Errorf("read config %q: %w", path, err) + } + var cfg Config + if err := yaml.Unmarshal(raw, &cfg); err != nil { + return nil, fmt.Errorf("parse config %q: %w", path, err) + } + if cfg.Roles == nil { + cfg.Roles = map[string]RoleConfig{} + } + return &cfg, nil +} + +// ResolveOwner picks the GitHub owner for the given role. Unknown roles +// fall through to the "default" entry; if neither is set, returns "" so +// the wrapper strips --assignee @me entirely (correct behavior — better +// than assigning to the wrong person). +// +// Lookup is case-insensitive against the sanitized role form. The +// yaml config writer might use "PMM-Lead", "pmm-lead", or "Pmm-Lead" +// interchangeably — we accept all three by lower-casing both sides. +// "default" is treated literally (reserved key). +func (c *Config) ResolveOwner(role string) string { + needle := strings.ToLower(role) + for k, rc := range c.Roles { + if k == "default" { + continue + } + if strings.ToLower(k) == needle && rc.Owner != "" { + return rc.Owner + } + } + if rc, ok := c.Roles["default"]; ok { + return rc.Owner + } + return "" +} + +// SanitizeRole normalizes a role string for use in env vars / badges. +// Strips whitespace, upper-cases the first letter of each hyphen- +// separated segment so arbitrary user input (" pmm-lead ") becomes a +// predictable string ("PMM-Lead") visible in attribution badges. +func SanitizeRole(raw string) string { + raw = strings.TrimSpace(raw) + if raw == "" { + return "" + } + parts := strings.Split(raw, "-") + for i, p := range parts { + if p == "" { + continue + } + parts[i] = strings.ToUpper(p[:1]) + p[1:] + } + return strings.Join(parts, "-") +} diff --git a/internal/ghidentity/mutator.go b/internal/ghidentity/mutator.go new file mode 100644 index 0000000..0d62cd1 --- /dev/null +++ b/internal/ghidentity/mutator.go @@ -0,0 +1,88 @@ +package ghidentity + +import ( + "context" + "crypto/sha256" + "encoding/base64" + "encoding/hex" + "fmt" + "strings" +) + +// Mutator implements the monorepo's provisionhook.EnvMutator interface. +// Exported so cmd/server (monorepo side) can register it at platform +// boot and the provisioner calls MutateEnv per-workspace. +type Mutator struct { + Config *Config +} + +// Name satisfies provisionhook.EnvMutator. Appears in log lines and +// metrics; keep stable — matches the plugin's manifest name. +func (m *Mutator) Name() string { return "gh-identity" } + +// MutateEnv is the per-workspace entry point. It reads the workspace's +// declared role (passed in via the env map pre-populated by the +// provisioner — see "Role resolution" below) and injects: +// +// - MOLECULE_AGENT_ROLE — sanitized role string +// - MOLECULE_OWNER — GitHub user to rewrite @me to +// - MOLECULE_WORKSPACE_ID — for the audit log +// - MOLECULE_ATTRIBUTION_BADGE — the markdown badge the wrapper prepends +// - MOLECULE_GH_WRAPPER_B64 — base64'd wrapper.sh; template decodes +// - MOLECULE_GH_WRAPPER_SHA — hash of wrapper.sh for version pinning +// +// ## Role resolution +// +// The role is expected in env["MOLECULE_AGENT_ROLE"] already — the +// workspace-server's provisionWorkspace reads workspace metadata (the +// `role` field on the workspace row) and sets it BEFORE calling +// mutators. If it's unset we skip silently; the wrapper script falls +// back to pass-through mode in that case so nothing breaks. +// +// This mutator never returns an error for policy reasons: a missing +// config file OR an unknown role must NOT block workspace boot. The +// wrapper passes through gracefully when env is absent. +func (m *Mutator) MutateEnv(ctx context.Context, workspaceID string, env map[string]string) error { + if env == nil { + return fmt.Errorf("gh-identity: env map is nil") + } + rawRole := env["MOLECULE_AGENT_ROLE"] + role := SanitizeRole(rawRole) + if role == "" { + // No role declared → plugin is a no-op for this workspace. + // Leave env alone; wrapper falls back to pass-through. + return nil + } + + owner := "" + if m.Config != nil { + owner = m.Config.ResolveOwner(role) + } + + env["MOLECULE_AGENT_ROLE"] = role + env["MOLECULE_OWNER"] = owner + env["MOLECULE_WORKSPACE_ID"] = workspaceID + env["MOLECULE_ATTRIBUTION_BADGE"] = fmt.Sprintf("🤖 [Agent: %s · %s]", role, shortID(workspaceID)) + + // Ship the wrapper as base64 so the template's install.sh can + // decode + write without dealing with newline-embedded strings in + // cloud-init user-data. + env["MOLECULE_GH_WRAPPER_B64"] = base64.StdEncoding.EncodeToString([]byte(WrapperScript)) + h := sha256.Sum256([]byte(WrapperScript)) + env["MOLECULE_GH_WRAPPER_SHA"] = hex.EncodeToString(h[:])[:12] + + return nil +} + +// shortID returns a human-readable tag for the workspace, used in the +// attribution badge. Workspace IDs are UUIDs (e.g. d3605ef2-f7d6-…), so +// we take the first 8 hex chars and prefix "ws-" → "ws-d3605ef2". +// Idempotent: strips any pre-existing "ws-" to avoid "ws-ws-…" if a +// caller happens to pass an already-prefixed id (some test fixtures do). +func shortID(id string) string { + id = strings.TrimPrefix(id, "ws-") + if len(id) >= 8 { + return "ws-" + id[:8] + } + return "ws-" + id +} diff --git a/internal/ghidentity/mutator_test.go b/internal/ghidentity/mutator_test.go new file mode 100644 index 0000000..55ab36a --- /dev/null +++ b/internal/ghidentity/mutator_test.go @@ -0,0 +1,119 @@ +package ghidentity + +import ( + "context" + "encoding/base64" + "strings" + "testing" +) + +func TestMutateEnv_NoRoleIsNoOp(t *testing.T) { + m := &Mutator{Config: &Config{Roles: map[string]RoleConfig{ + "default": {Owner: "hongming"}, + }}} + env := map[string]string{} + if err := m.MutateEnv(context.Background(), "ws-abc", env); err != nil { + t.Fatalf("unexpected err: %v", err) + } + if len(env) != 0 { + t.Errorf("expected no mutation without role, got %v", env) + } +} + +func TestMutateEnv_InjectsAllFields(t *testing.T) { + m := &Mutator{Config: &Config{Roles: map[string]RoleConfig{ + "PMM-Lead": {Owner: "hongming"}, + }}} + env := map[string]string{"MOLECULE_AGENT_ROLE": "pmm-lead"} + if err := m.MutateEnv(context.Background(), "ws-abcdef01-foo", env); err != nil { + t.Fatalf("unexpected err: %v", err) + } + // Role sanitized + if got := env["MOLECULE_AGENT_ROLE"]; got != "Pmm-Lead" { + t.Errorf("expected Pmm-Lead, got %q", got) + } + // Owner resolved via case-insensitive lookup: config key "PMM-Lead" + // matches sanitized role "Pmm-Lead" because we lower-case both sides. + if got := env["MOLECULE_OWNER"]; got != "hongming" { + t.Errorf("expected owner=hongming, got %q", got) + } + // Workspace id passed through + if env["MOLECULE_WORKSPACE_ID"] != "ws-abcdef01-foo" { + t.Errorf("workspace id not set") + } + // Badge contains role + short id + if !strings.Contains(env["MOLECULE_ATTRIBUTION_BADGE"], "Pmm-Lead") { + t.Errorf("badge missing role: %q", env["MOLECULE_ATTRIBUTION_BADGE"]) + } + if !strings.Contains(env["MOLECULE_ATTRIBUTION_BADGE"], "ws-abcdef01") { + t.Errorf("badge missing short id: %q", env["MOLECULE_ATTRIBUTION_BADGE"]) + } + // Wrapper base64 decodes back to wrapper.sh + b64 := env["MOLECULE_GH_WRAPPER_B64"] + if b64 == "" { + t.Fatal("wrapper b64 not set") + } + decoded, err := base64.StdEncoding.DecodeString(b64) + if err != nil { + t.Fatalf("decode: %v", err) + } + if !strings.Contains(string(decoded), "#!/usr/bin/env bash") { + t.Errorf("wrapper decode mismatch") + } + // Wrapper sha is short hex + if len(env["MOLECULE_GH_WRAPPER_SHA"]) != 12 { + t.Errorf("wrapper sha length: %d", len(env["MOLECULE_GH_WRAPPER_SHA"])) + } +} + +func TestMutateEnv_NilMapErrors(t *testing.T) { + m := &Mutator{} + if err := m.MutateEnv(context.Background(), "ws-1", nil); err == nil { + t.Fatal("expected error on nil env map") + } +} + +func TestResolveOwner_FallbackChain(t *testing.T) { + cfg := &Config{Roles: map[string]RoleConfig{ + "PMM": {Owner: "alice"}, + "default": {Owner: "bob"}, + }} + cases := []struct { + role, want string + }{ + {"PMM", "alice"}, + {"pmm", "alice"}, // case-insensitive + {"Pmm", "alice"}, // case-insensitive (sanitized form) + {"unknown-role", "bob"}, + {"", "bob"}, + } + for _, c := range cases { + if got := cfg.ResolveOwner(c.role); got != c.want { + t.Errorf("role=%q: got %q want %q", c.role, got, c.want) + } + } +} + +func TestResolveOwner_NoDefaultReturnsEmpty(t *testing.T) { + cfg := &Config{Roles: map[string]RoleConfig{ + "PMM": {Owner: "alice"}, + }} + if got := cfg.ResolveOwner("unknown"); got != "" { + t.Errorf("expected empty for unknown role without default, got %q", got) + } +} + +func TestSanitizeRole(t *testing.T) { + cases := []struct{ in, want string }{ + {"", ""}, + {" pmm-lead ", "Pmm-Lead"}, + {"ResearchLead", "ResearchLead"}, + {"multi-part-role", "Multi-Part-Role"}, + {" -starts-with-dash", "-Starts-With-Dash"}, // edge: preserved + } + for _, c := range cases { + if got := SanitizeRole(c.in); got != c.want { + t.Errorf("in=%q: got %q want %q", c.in, got, c.want) + } + } +} diff --git a/internal/ghidentity/wrapper.go b/internal/ghidentity/wrapper.go new file mode 100644 index 0000000..86c34fe --- /dev/null +++ b/internal/ghidentity/wrapper.go @@ -0,0 +1,14 @@ +package ghidentity + +import _ "embed" + +// WrapperScript is the shell wrapper that replaces `gh` in the workspace +// container's PATH. Shipped to the workspace via env var +// MOLECULE_GH_WRAPPER_B64 (base64) — the template's install.sh decodes +// and writes it to /usr/local/bin/gh. +// +// Embedded (not a constant) so gofmt/go vet treat it like source; easier +// to edit than a multi-line Go string. +// +//go:embed wrapper.sh +var WrapperScript string diff --git a/internal/ghidentity/wrapper.sh b/internal/ghidentity/wrapper.sh new file mode 100644 index 0000000..95e0ccb --- /dev/null +++ b/internal/ghidentity/wrapper.sh @@ -0,0 +1,169 @@ +#!/usr/bin/env bash +# molecule-gh wrapper — intercepts `gh` calls to inject agent attribution +# and kill the @me-collapses-to-CEO anti-pattern (molecule-core#1957). +# +# Installed at /usr/local/bin/gh ahead of the real gh binary at /usr/bin/gh +# by the workspace template's install.sh. Both `gh` and `git` (a separate +# wrapper calling this for `git commit` trailers) read the same env. +# +# The wrapper is opt-in: if $MOLECULE_AGENT_ROLE is unset, we pass through +# unchanged. Workspaces without the plugin behave exactly like today. +# +# Audit log is append-only NDJSON at /var/log/molecule-gh.ndjson. Each +# invocation emits one line with role, workspace id, argv, and exit code. +# Log readers: PM triage, post-incident forensics, and the audit:emit +# capability if v2 formalizes it. + +set -uo pipefail + +# Find the real gh — must NOT be this script. +real_gh() { + local p + for p in /usr/bin/gh /opt/gh/bin/gh /usr/local/bin/gh.real; do + if [ -x "$p" ] && [ "$p" != "/usr/local/bin/gh" ]; then + echo "$p" + return 0 + fi + done + # Fall back to PATH hunt, skipping this wrapper by path. + local self="${BASH_SOURCE[0]}" + local cand + while IFS= read -r cand; do + if [ -x "$cand" ] && [ "$cand" != "$self" ]; then + echo "$cand" + return 0 + fi + done < <(command -v -a gh 2>/dev/null | grep -v "^$self$") + return 1 +} + +audit_emit() { + local rc="$1"; shift + local log_file="/var/log/molecule-gh.ndjson" + # Quote argv via python's json for safety (shell arg quoting is a trap). + # Timestamp comes from _MOLECULE_GH_TS exported by the caller. + python3 - "$@" </dev/null >> "$log_file" || true +import json, sys, os +argv = sys.argv[1:] +rec = { + "ts": os.environ.get("_MOLECULE_GH_TS"), + "role": os.environ.get("MOLECULE_AGENT_ROLE",""), + "workspace_id": os.environ.get("MOLECULE_WORKSPACE_ID",""), + "owner": os.environ.get("MOLECULE_OWNER",""), + "rc": int(os.environ.get("_MOLECULE_GH_RC","0")), + "argv": argv, +} +print(json.dumps(rec)) +PYEOF +} + +# Short-circuit: plugin disabled → pure passthrough. +if [ -z "${MOLECULE_AGENT_ROLE:-}" ]; then + exec "$(real_gh)" "$@" +fi + +BADGE="${MOLECULE_ATTRIBUTION_BADGE:-🤖 [Agent: ${MOLECULE_AGENT_ROLE}]}" +OWNER="${MOLECULE_OWNER:-}" + +# Rewrite argv: +# 1. If we see --assignee @me, replace with the human owner (or drop). +# 2. If we see --body on a command that publishes to github, +# prepend BADGE + two newlines to . Only rewrites once per +# invocation, to stay idempotent. +# +# The set of publishing commands is small and well-known — we explicitly +# enumerate them rather than rewriting every --body (e.g. `gh release +# view --body-length` would be mis-matched on a loose grep). +PUBLISH_VERBS=( + "issue create" + "issue comment" + "issue edit" + "pr create" + "pr comment" + "pr edit" + "pr review" + "release create" + "release edit" + "discussion create" +) + +argv=("$@") +n=${#argv[@]} + +# Detect which verb is being invoked by joining the first 2 non-flag tokens. +# `gh [flags]` — we just need to know if this is a +# publishing verb. +first=""; second="" +for ((i=0; i&2; exit 127; } + +_MOLECULE_GH_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + "$GH" "${new_argv[@]}" +rc=$? + +_MOLECULE_GH_RC=$rc audit_emit "$rc" "${new_argv[@]}" +exit $rc diff --git a/known-issues.md b/known-issues.md new file mode 100644 index 0000000..13c78b5 --- /dev/null +++ b/known-issues.md @@ -0,0 +1,103 @@ +# Known Issues + +## Wrapper ships as base64 env var (migrate to v2 files axis) + +**State**: accepted trade-off for v1, tracked for v2 migration. + +**What**: `MutateEnv` base64-encodes `wrapper.sh` into `MOLECULE_GH_WRAPPER_B64`. +Each workspace template's `install.sh` decodes and writes it to +`/usr/local/bin/gh`. + +**Why not a direct file write?** The platform's `provisionhook` interface +only has `EnvMutator` today — there's no `FileMutator` or +`contributions.files` surface. Inventing one here would couple this +plugin to a core-monorepo API change. + +**Consequences**: +- Every workspace template (hermes, claude-code, langgraph, etc.) needs + a ~10-line `install.sh` snippet to decode + install the wrapper. The + plugin ships the canonical snippet; template authors paste it in. +- Wrapper size is capped by env var limits (EC2 user-data ~16KB total; + wrapper is ~5KB after base64, plenty of headroom). +- Wrapper updates propagate via plugin version bump, but require a + workspace RESTART to take effect (new user-data writes the new + wrapper). Not hot-reloadable in v1. + +**Migration target**: [plugin-architecture-v2][v2], phase 6 — the +unified contribution manifest adds `contributions.files` as an explicit +axis. At that point: +- Plugin declares the file write in YAML manifest, not Go code. +- Platform's v2 provisioner handles the file write. +- Templates drop their install-snippet. +- Grade-A hot reload becomes possible (platform can re-emit the file + without a workspace restart). + +[v2]: https://github.com/Molecule-AI/internal/blob/main/product/plugin-architecture-v2.md + +--- + +## Role is read from env map, not workspace metadata + +**State**: requires a small monorepo-side change. + +**What**: `Mutator.MutateEnv` expects `env["MOLECULE_AGENT_ROLE"]` to +already be populated by the provisioner. The provisioner does NOT do +this today — workspace metadata's `role` field is not propagated into +the env map before mutators run. + +**Why not read workspace metadata directly in the plugin?** The +`EnvMutator` interface deliberately gives plugins a narrow view — they +get the env map, the workspace ID, and nothing else. Passing the full +workspace struct would let plugins read secrets / plan / parent +relationships the plugin has no business caring about. + +**Fix**: a small monorepo PR (~3 lines in +`workspace-server/internal/handlers/workspace_provision.go`) populates +`env["MOLECULE_AGENT_ROLE"]` from the workspace row's `role` column +before calling the mutator chain. Tracked in the companion monorepo PR. + +Until that lands, the plugin is safe — absent the env var, it no-ops +and the wrapper script falls back to pass-through. + +--- + +## Wrapper heuristics miss non-trivial argv shapes + +**State**: accepted; works for 95% of agent gh calls. + +**What**: `wrapper.sh` parses argv to detect publishing commands by +matching "first non-flag token + second non-flag token" against a +hardcoded list (`issue create`, `pr comment`, etc.). This misses: + +- `gh api` calls constructing issues/PRs via raw REST — no `--body` + flag to intercept. +- Custom `gh alias` expansions (an alias like `gh post` expanding to + `gh issue create` won't be recognized — we see `gh post`, not + `gh issue create`). +- Flag ordering oddities where the verb appears after global flags + the wrapper doesn't know about (unlikely but possible). + +**Consequences**: some agent actions bypass attribution. The audit log +still captures them (every invocation is logged regardless of +rewrite), so this is a visibility gap, not a correctness gap. + +**Fix**: when/if this becomes common, migrate to wrapping the gh Go +binary directly (gh exposes a Go-plugin extension model) rather than +shell-argv rewriting. Not planned for v1. + +--- + +## Audit log grows unbounded + +**State**: accepted; needs rotation in workspace base image. + +**What**: Every wrapper invocation appends one NDJSON line to +`/var/log/molecule-gh.ndjson`. No rotation, no size limit. + +**Why no rotation in the plugin?** Log rotation is a workspace-host +concern, not a plugin concern. The workspace base image's logrotate +config should cover `/var/log/molecule-gh.ndjson` the same way it +covers other `/var/log/*.ndjson`. + +**Fix**: ensure logrotate config in workspace base image includes this +file. Follow-up issue in the monorepo. diff --git a/pluginloader/pluginloader.go b/pluginloader/pluginloader.go new file mode 100644 index 0000000..0bf711a --- /dev/null +++ b/pluginloader/pluginloader.go @@ -0,0 +1,50 @@ +// Package pluginloader builds a provisionhook.Registry populated with +// the gh-identity Mutator so the platform's cmd/server can wire this +// plugin into the workspace provision chain. +// +// Operators integrating the plugin: +// +// reg, err := pluginloader.BuildRegistry() +// if err != nil { log.Fatalf("gh-identity: %v", err) } +// wh.SetEnvMutators(append(existingMutators, reg.Mutators()...)) +// +// The plugin is INTENTIONALLY non-fatal on missing config: absent the +// optional MOLECULE_GH_IDENTITY_CONFIG_FILE env var, this still +// registers a Mutator that reads workspace-supplied roles and emits +// wrapper env — just without @me owner rewriting. Operators who want +// owner rewriting set MOLECULE_GH_IDENTITY_CONFIG_FILE. +package pluginloader + +import ( + "fmt" + "os" + + "github.com/Molecule-AI/molecule-ai-plugin-gh-identity/internal/ghidentity" +) + +// Result bundles what BuildRegistry returns — a single mutator plus +// whatever config it loaded, so test harnesses can inspect both. +type Result struct { + Mutator *ghidentity.Mutator + Config *ghidentity.Config +} + +// BuildRegistry reads MOLECULE_GH_IDENTITY_CONFIG_FILE (optional), +// constructs the Mutator, and returns it. +// +// Error modes: +// - config file set but unreadable → error (operator bug; fail loud) +// - config file unset → fine, use empty map +// - config file set but non-existent → fine, use empty map (lets you +// point at a file that CI hasn't created yet without blocking boot) +func BuildRegistry() (*Result, error) { + cfgPath := os.Getenv("MOLECULE_GH_IDENTITY_CONFIG_FILE") + cfg, err := ghidentity.LoadConfig(cfgPath) + if err != nil { + return nil, fmt.Errorf("gh-identity: %w", err) + } + return &Result{ + Mutator: &ghidentity.Mutator{Config: cfg}, + Config: cfg, + }, nil +} diff --git a/scripts/test-wrapper.sh b/scripts/test-wrapper.sh new file mode 100755 index 0000000..186ad28 --- /dev/null +++ b/scripts/test-wrapper.sh @@ -0,0 +1,142 @@ +#!/usr/bin/env bash +# test-wrapper.sh — offline unit tests for wrapper.sh argv rewriting. +# +# We can't run the wrapper end-to-end without a real gh binary, so +# these tests stub gh to echo its argv and check the wrapper mutates +# correctly. Exercises the contract in wrapper.sh top-of-file. +# +# Run: bash scripts/test-wrapper.sh + +set -u + +HERE="$(cd "$(dirname "$0")" && pwd)" +WRAPPER="$HERE/../internal/ghidentity/wrapper.sh" + +if [ ! -f "$WRAPPER" ]; then + echo "FAIL: wrapper.sh not found at $WRAPPER" >&2 + exit 2 +fi + +TMP=$(mktemp -d) +# shellcheck disable=SC2064 # $TMP must expand now — it's what we want to clean +trap "rm -rf $TMP" EXIT + +# Stub gh: echoes its argv, one per line prefixed with a marker. +mkdir -p "$TMP/bin" +cat > "$TMP/bin/gh" <<'STUB' +#!/usr/bin/env bash +for arg in "$@"; do + printf 'ARG<%s>\n' "$arg" +done +STUB +chmod +x "$TMP/bin/gh" + +# The wrapper looks for /usr/bin/gh first; on macOS/linux CI that +# either is or isn't real gh. We redirect by symlinking our stub +# into a search path the wrapper checks, and prepending $TMP/bin +# to PATH for the command-v fallback. +ln -sf "$TMP/bin/gh" "$TMP/bin/gh.real" + +PASS=0 +FAIL=0 + +# Run wrapper with a controlled env and capture output. +# Takes argv for the wrapper directly; caller sets env vars inline. +run_wrapper() { + ( + export MOLECULE_AGENT_ROLE="$MOLECULE_AGENT_ROLE" + export MOLECULE_OWNER="${MOLECULE_OWNER:-}" + export MOLECULE_ATTRIBUTION_BADGE="${MOLECULE_ATTRIBUTION_BADGE:-}" + export MOLECULE_WORKSPACE_ID="${MOLECULE_WORKSPACE_ID:-ws-test}" + # Force wrapper to find our stub by prepending a fake /usr/bin path. + # The wrapper checks /usr/bin/gh first — on CI that might be the real + # gh. For test predictability we use the PATH fallback by ensuring + # /usr/bin/gh does not exist IN OUR TEST ENV via sandboxing. Simplest: + # patch the wrapper via sed to point at our stub. + sed "s|/usr/bin/gh|$TMP/bin/gh|g; s|/opt/gh/bin/gh|$TMP/bin/gh.real|g" "$WRAPPER" > "$TMP/wrapper-patched.sh" + chmod +x "$TMP/wrapper-patched.sh" + bash "$TMP/wrapper-patched.sh" "$@" 2>&1 + ) +} + +assert_contains() { + local label="$1" needle="$2" haystack="$3" + if echo "$haystack" | grep -qF "$needle"; then + echo " PASS $label" + PASS=$((PASS+1)) + else + echo " FAIL $label" + echo " looking for: $needle" + echo " in: $haystack" | head -c 400 + echo "" + FAIL=$((FAIL+1)) + fi +} + +assert_not_contains() { + local label="$1" needle="$2" haystack="$3" + if echo "$haystack" | grep -qF "$needle"; then + echo " FAIL $label (unexpectedly contained: $needle)" + FAIL=$((FAIL+1)) + else + echo " PASS $label" + PASS=$((PASS+1)) + fi +} + +echo "== wrapper.sh ==" + +# === Test 1: no MOLECULE_AGENT_ROLE → pure passthrough === +MOLECULE_AGENT_ROLE="" out=$(run_wrapper issue create --body "hello") +assert_contains "no role → passthrough, preserves --body" "ARG<--body>" "$out" +assert_contains "no role → passthrough, preserves hello verbatim" "ARG" "$out" +assert_not_contains "no role → no badge injected" "🤖" "$out" + +# === Test 2: role set + issue create → badge prepended to --body === +MOLECULE_AGENT_ROLE=PMM-Lead \ + MOLECULE_OWNER=hongming \ + MOLECULE_ATTRIBUTION_BADGE="🤖 [Agent: PMM-Lead · ws-abc]" \ + out=$(run_wrapper issue create --body "hello") +assert_contains "badge-prepend: badge present" "🤖 [Agent: PMM-Lead · ws-abc]" "$out" +assert_contains "badge-prepend: original body preserved" "hello" "$out" + +# === Test 3: --assignee @me → rewritten to OWNER === +MOLECULE_AGENT_ROLE=PMM \ + MOLECULE_OWNER=alice \ + MOLECULE_ATTRIBUTION_BADGE="🤖 [Agent: PMM]" \ + out=$(run_wrapper issue create --assignee @me --body "hi") +assert_contains "assignee-rewrite: new owner injected" "ARG" "$out" +assert_not_contains "assignee-rewrite: @me stripped" "ARG<@me>" "$out" + +# === Test 4: --assignee=@me (equals form) → rewritten === +MOLECULE_AGENT_ROLE=PMM \ + MOLECULE_OWNER=alice \ + MOLECULE_ATTRIBUTION_BADGE="🤖" \ + out=$(run_wrapper issue create --assignee=@me --body "hi") +assert_contains "assignee-equals-form: rewritten" "ARG<--assignee=alice>" "$out" + +# === Test 5: non-publish verb (`gh repo view`) → body untouched even if present === +MOLECULE_AGENT_ROLE=PMM \ + MOLECULE_OWNER=alice \ + MOLECULE_ATTRIBUTION_BADGE="🤖 PMM" \ + out=$(run_wrapper repo view --json body) +assert_not_contains "non-publish: no badge injection" "🤖 PMM" "$out" + +# === Test 6: publish with no --body → NO synthetic body added === +MOLECULE_AGENT_ROLE=PMM \ + MOLECULE_OWNER=alice \ + MOLECULE_ATTRIBUTION_BADGE="🤖 PMM" \ + out=$(run_wrapper issue create --title "foo") +assert_not_contains "no-body: wrapper does not synth a --body" "ARG<--body>" "$out" + +# === Test 7: --assignee @me with no OWNER → flag dropped entirely === +MOLECULE_AGENT_ROLE=PMM \ + MOLECULE_OWNER="" \ + MOLECULE_ATTRIBUTION_BADGE="🤖" \ + out=$(run_wrapper issue create --assignee @me --body "x") +assert_not_contains "assignee-drop: @me dropped when no owner" "ARG<@me>" "$out" +assert_not_contains "assignee-drop: --assignee flag dropped too" "ARG<--assignee>" "$out" + +echo +echo "== results: $PASS passed, $FAIL failed ==" +[ "$FAIL" -eq 0 ]