From f8ab522bef28fe467214a40bb601faeb2e5923d4 Mon Sep 17 00:00:00 2001 From: vikingowl Date: Thu, 4 Jun 2026 11:59:16 +0200 Subject: [PATCH] docs(todo,plans): specs for open features + MiniMax & ACP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add implementation-ready plans for the in-flight features that lacked one, and two new provider/protocol items: - MiniMax provider (cloud arm + Token Plan billing decision) - Agent Client Protocol (ACP) — dual role: gnoma as ACP agent and as ACP client driving external agents as router arms - Network egress allowlist (Learn/Review/Enforce); note the per-session audit log is already implemented, remaining gap is a viewer command - Cross-platform (Windows/macOS) code touch-points + build-tag pattern - Distribution follow-ups (cosign, brew tap, installer, dockers_v2) Link each plan from its TODO.md entry; mark audit-log item done. --- TODO.md | 101 ++++- .../plans/2026-06-04-agent-client-protocol.md | 375 ++++++++++++++++++ .../plans/2026-06-04-cross-platform.md | 198 +++++++++ .../2026-06-04-distribution-followups.md | 169 ++++++++ .../plans/2026-06-04-egress-allowlist.md | 236 +++++++++++ .../plans/2026-06-04-minimax-provider.md | 224 +++++++++++ 6 files changed, 1297 insertions(+), 6 deletions(-) create mode 100644 docs/superpowers/plans/2026-06-04-agent-client-protocol.md create mode 100644 docs/superpowers/plans/2026-06-04-cross-platform.md create mode 100644 docs/superpowers/plans/2026-06-04-distribution-followups.md create mode 100644 docs/superpowers/plans/2026-06-04-egress-allowlist.md create mode 100644 docs/superpowers/plans/2026-06-04-minimax-provider.md diff --git a/TODO.md b/TODO.md index 05caa45..0bdf9f1 100644 --- a/TODO.md +++ b/TODO.md @@ -4,6 +4,86 @@ Active work, newest first. ## In flight +- **MiniMax provider — cloud arm + subscription token plan.** Add + MiniMax (api.minimax.io / api.minimaxi.com) as a first-class cloud + provider so it can register as a router arm alongside + anthropic/openai/google/mistral. + + **API surface.** MiniMax ships *two* OpenAI-and-Anthropic-compatible + HTTP surfaces, so this is a base-URL + auth wiring task, not a new + translation layer: + - **OpenAI-compatible** chat-completions at `…/v1` — reusable via + `internal/provider/openaicompat`. Cleanest first cut: add a + `NewMiniMax(cfg)` constructor mirroring `NewOllama` / + `NewLlamaCpp` (`openaicompat/provider.go`) with the MiniMax base + URL baked in, then a `case "minimax"` in + `createProvider` (`cmd/gnoma/main.go:1265`) and the available- + providers usage string (`:1279`). + - **Anthropic-compatible** endpoint (`…/anthropic`) — alternative + backing via the existing `anthropic` provider with a `BaseURL` + override. Decide one canonical path; OpenAI-compat is the lower- + risk default since `openaicompat` is already exercised by the + local backends. + - **Auth.** Bearer API key. `envKeyFor`'s default branch + (`main.go:1199`) already resolves `MINIMAX_API_KEY` with no code + change; add an explicit `case "minimax"` only if we want a + friendlier name or alternates list. + - **Models.** `MiniMax-M2` (agentic/coding, the one to default to), + `MiniMax-M1`, abab6.5 series. Set `Strengths` + `MaxComplexity` + + `CostWeight` on the arm so the selector treats it as a cheap + high-capability cloud tier. + + **Token plan (open question — affects auth + billing UX).** MiniMax + offers a flat-rate **Coding Plan** subscription (token-quota based, + Claude-Max-style) *in addition to* metered pay-as-you-go API + credits. Both authenticate with the same Bearer key, so no adapter + difference — but the router's `CostWeight` math assumes metered + per-token pricing. Under a subscription the marginal cost is ~0 + until the quota is hit, then hard-stops. Decisions to make: + - How to model "subscription" cost in the selector — e.g. a + `[provider.minimax].billing = "subscription" | "metered"` knob + that zeroes `CostWeight` while quota remains, vs. real per-token + cost when metered. + - Quota exhaustion handling — surface the 429/quota error cleanly + and let the bandit fail over to the next arm (ties into the + session error-recovery work in `0d3d190`). + - Document both plans + the region split (`api.minimax.io` + international vs `api.minimaxi.com`) in `docs/slm-backends.md` / + provider docs. + + Smallest shippable slice: OpenAI-compat `NewMiniMax` + metered + pricing, registered as a cloud arm. Subscription/quota modelling is + the follow-up once the billing knob lands. Plan: + [`docs/superpowers/plans/2026-06-04-minimax-provider.md`](docs/superpowers/plans/2026-06-04-minimax-provider.md). + +- **Agent Client Protocol (ACP) support.** Run gnoma as an *ACP agent* + (`gnoma acp`) so any ACP-capable editor (Zed, Kiro, OpenCode, …) can + drive it as an external coding agent. ACP is "the LSP for AI coding + agents": JSON-RPC 2.0 over stdio, editor (client) spawns agent + (subprocess). gnoma already owns the hard parts — agentic engine, + tools, permissions, and JSON-RPC-over-stdio (from its MCP-client + side, `internal/mcp/jsonrpc.go`). The fit is symmetric: gnoma is the + JSON-RPC *server* here. No Go SDK exists (official SDKs are + TS/Python/Rust/Kotlin), so gnoma implements the wire protocol + natively against the schema. `session/new` can declare `mcpServers`, + so ACP and gnoma's existing MCP manager wire up in one handshake. + + **Dual role — both directions:** + 1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so + editors drive gnoma. + 2. **gnoma as ACP client** — gnoma spawns *external* ACP agents + (Claude, Gemini CLI, Codex, …) and uses them as router-arm + provider backends. This is the same shape as the existing + `internal/provider/subprocess` CLI-agent arms + (`cmd/gnoma/main.go:521-531`, `IsCLIAgent: true`) but over + standardized ACP JSON-RPC — gaining structured tool-call + surfacing, real turn/permission semantics, and cancellation + that the current one-shot stream-json subprocess provider + lacks (it sets `ToolUse:false` for agents without stream-json). + + Upstream: . Plan: + [`docs/superpowers/plans/2026-06-04-agent-client-protocol.md`](docs/superpowers/plans/2026-06-04-agent-client-protocol.md). + - **Config write/merge — silent corruption of layered configs.** `internal/config/write.go:setConfig` reads the existing TOML into a zero-valued `Config` struct, sets one field, and writes the entire @@ -159,11 +239,13 @@ Active work, newest first. with no per-host allowlist or dial-layer interception. Two follow- ups surfaced from the r/SideProject v0.3.0 launch thread (2026-05-24, `u/Secret_Theme3192`): - 1. **Per-session audit log of blocked/redacted events** — - grep-able file at `.gnoma/sessions//audit.jsonl` so the - user can answer "what did the firewall do this session?" in - one command. Today the `slog` output goes to whatever sink is - configured, with no per-session grouping. + 1. **Per-session audit log of blocked/redacted events** — ✅ JSONL + writing **implemented**: `internal/security/audit.go` + + wiring at `cmd/gnoma/main.go:685-691` + (`.gnoma/sessions//audit.jsonl`), recorded from + `firewall.go:152/173/186`. **Remaining gap:** no CLI to *read* + it — a `gnoma firewall audit` viewer is folded into the egress + plan (shares the `gnoma firewall` command surface). 2. **Per-host egress allowlist (HTTP transport layer)** — design refined by `u/HarjjotSinghh` on the r/SideProject thread (2026-05-28). Three-stage rollout, not a single-shot @@ -195,6 +277,9 @@ Active work, newest first. "network egress gated"; corrected in the README scope note and the audit-log commit. + Egress plan (incl. the `gnoma firewall audit` viewer for item #1): + [`docs/superpowers/plans/2026-06-04-egress-allowlist.md`](docs/superpowers/plans/2026-06-04-egress-allowlist.md). + - **Cross-platform support — Windows + macOS.** GoReleaser builds static binaries for `linux/darwin/windows × amd64/arm64` every release but only Linux is exercised at all today. Windows and @@ -244,6 +329,9 @@ Active work, newest first. least a TODO-linked acknowledgement in the post body so the thread sees gnoma takes the gaps seriously. + Plan (build-tag scaffolding + concrete code touch-points): + [`docs/superpowers/plans/2026-06-04-cross-platform.md`](docs/superpowers/plans/2026-06-04-cross-platform.md). + - **Tool-router specialization (functiongemma)** — gated on telemetry, not committed. Phase A.2 adds did-switch-rate measurement to the two-stage `select_category` path; Phase A.3 (LoRA fine-tune of @@ -288,7 +376,8 @@ Active work, newest first. from `dockers` + `docker_manifests` to `dockers_v2` in `.goreleaser.yml` (collapses ~45 lines into one block but requires Dockerfile changes for the per-platform binary layout - — deferred to its own commit before v0.3.0). + — deferred to its own commit before v0.3.0). Plan: + [`docs/superpowers/plans/2026-06-04-distribution-followups.md`](docs/superpowers/plans/2026-06-04-distribution-followups.md). ## Stable backlog (not in active phases) diff --git a/docs/superpowers/plans/2026-06-04-agent-client-protocol.md b/docs/superpowers/plans/2026-06-04-agent-client-protocol.md new file mode 100644 index 0000000..93a1415 --- /dev/null +++ b/docs/superpowers/plans/2026-06-04-agent-client-protocol.md @@ -0,0 +1,375 @@ +# Agent Client Protocol (ACP) — 2026-06-04 + +Adds **both directions** of ACP to gnoma: + +1. **gnoma as ACP agent (server)** — `gnoma acp` over stdio so any + ACP-capable editor (Zed, Kiro, OpenCode, …) can drive gnoma as an + external coding agent. +2. **gnoma as ACP client** — gnoma spawns *external* ACP agents + (Claude, Gemini CLI, Codex, …) and exposes them as router-arm + provider backends, the standardized successor to the current + `internal/provider/subprocess` CLI-agent arms. + +Adds the TODO.md entry "Agent Client Protocol (ACP) support". + +Upstream: · +spec + +--- + +## Problem + +ACP is "the LSP for AI coding agents": a JSON-RPC 2.0 protocol, spoken +over stdio, that lets editors (clients) spawn agents (subprocesses) and +talk to them in a standard way — eliminating point-to-point editor↔agent +integrations. Zed, Kiro, OpenCode and others are clients; Claude, Gemini +CLI, Codex ship as ACP agents. + +Today gnoma is reachable only via its own TUI and pipe mode. It cannot +plug into an editor's agent panel. Supporting ACP makes gnoma a drop-in +agent inside any ACP client, which is a large distribution surface for +near-zero ongoing cost — the protocol is stable and gnoma already owns +all the hard parts (an agentic engine, tools, permissions, MCP). + +### Why this is a natural fit + +- gnoma already speaks **JSON-RPC over stdio** for MCP + (`internal/mcp/jsonrpc.go` `Request`/`Notification`, + `internal/mcp/transport*.go`) — that machinery is reusable for the + ACP server side (gnoma is the *server* of the JSON-RPC channel here, + the mirror of its MCP-client role). +- The agentic loop is already factored behind + `session.Session` (`internal/session/session.go:54`, + `Local.Send`/`SendWithOptions` at `local.go:80-85`) driving + `engine.Engine` (`internal/engine/engine.go`). ACP `session/prompt` + maps onto one `Send`. +- Permissions already route through a pluggable prompt function + (`permission.NewChecker(mode, rules, promptFn)`, + `cmd/gnoma/main.go:668`). ACP's `session/request_permission` callback + is just another `promptFn` implementation. +- ACP `session/new` can declare the `mcpServers` the agent should + connect to — gnoma already has an MCP manager + (`internal/mcp/manager.go`) to honour that in the same handshake. + +### Role decision — both, server first + +Both roles ship under this plan. Sequence them: **agent (server) +first** — it's the larger distribution win and exercises the wire +protocol end-to-end — then **client**, which reuses the same +`internal/acp` protocol/types from the other side. They share the +JSON-RPC framing, content-block translation, and capability structs; +only the dispatch direction differs. + +The client role is the standardized successor to +`internal/provider/subprocess`: that package shells out to CLI agents +with one-shot `--output-format stream-json` (or prompt-augmentation +fallback), runs the agent's *own* loop with `--yolo`/`--trust`, and +cannot surface structured tool calls (it sets `ToolUse:false` for +agents lacking stream-json — see TODO "Native agy JSON output"). ACP +fixes all of that: a persistent JSON-RPC session, structured +`session/update` tool-call events, real permission round-trips, and +cancellation. + +### No Go SDK exists + +Official SDKs are TypeScript, Python, Rust, Kotlin — **no Go**. gnoma +implements the wire protocol natively against the published JSON +schema. Pin the supported `protocolVersion` and the exact method set +against the spec at implementation time (the protocol is young and +still moving). + +--- + +## Non-goals + +- **A full editor UI.** In agent mode gnoma renders nothing; the client + owns the UI. gnoma emits `session/update` notifications and the client + displays them. +- **Replacing the TUI / pipe modes.** ACP agent mode is a third entry + mode alongside them, not a replacement. +- **Replacing `internal/provider/subprocess` outright.** The ACP-client + provider is added alongside it; the stream-json subprocess path stays + for agents that don't (yet) speak ACP. Deprecation is a later call. +- **Custom transports.** stdio only (the ACP norm: local agent as a + subprocess). No socket/HTTP transport. +- **gnoma-drives-gnoma over ACP as the default.** gnoma's native + providers/router remain the primary path; ACP-client arms are an + additional backend source. + +--- + +## Design + +The two roles share one package (`internal/acp`): JSON-RPC framing, +content-block translation, and the capability/handshake types are +direction-agnostic. **Part A** is the agent (server) side; **Part B** +is the client side. Build Part A first. + +## Part A — gnoma as ACP agent (server) + +### New entry mode: `gnoma acp` + +Add a third mode beside TUI and pipe (mode is chosen near +`cmd/gnoma/main.go:106-114`). Selected by an explicit `acp` subcommand +(stdio is shared with the JSON-RPC channel, so it can't be +TTY-autodetected the way TUI is). In ACP mode: + +- **No banner, no TUI, no stdout chatter.** stdout/stdin are the + JSON-RPC pipe; all human/diagnostic logging goes to **stderr** only + (the firewall/audit slog sink must not write to stdout). Audit this + carefully — any stray stdout write corrupts the protocol stream. +- Reuse the existing session/engine/router/security construction; only + the front-end loop differs. + +### Package layout + +``` +internal/acp/ + protocol.go // ACP types: handshake, capabilities, content blocks (shared) + jsonrpc.go // framing reused/forked from internal/mcp/jsonrpc.go (shared) + content.go // ContentBlock <-> message.Message translation (shared) + server.go // Part A: stdio JSON-RPC read loop; method dispatch + session.go // Part A: ACP session <-> gnoma session.Session bridge + permission.go // Part A: session/request_permission promptFn + update.go // Part A: gnoma stream events -> session/update + client.go // Part B: spawn external agent, drive the handshake/prompt +``` + +A separate `internal/provider/acp/` holds the **Part B provider** +adapter (mirrors `internal/provider/subprocess/`), depending on +`internal/acp/client.go`. + +Reuse `internal/mcp/jsonrpc.go` framing if it generalises; otherwise +fork the minimal envelope (it's tiny). Keep ACP types separate from MCP +types — they are different protocols that happen to share JSON-RPC. + +### Method handlers (agent side) + +Map each ACP method to existing gnoma machinery. Pin exact shapes to the +spec; the mapping is the contract: + +| ACP method (client→agent) | gnoma handling | +|---|---| +| `initialize` | Reply with `agentCapabilities` (tools, MCP support, prompt streaming, permission modes), `agentInfo` (name "gnoma", `buildVersion`). Negotiate `protocolVersion`. | +| `session/new` | Build a `session.Local` (router, security, tools wired as in main). Honour `cwd` (run it through `safety.ClassifyCWD`), and connect any `mcpServers` the client declares via `internal/mcp/manager.go`. Return a `sessionId`. | +| `session/load` (if advertised) | Rehydrate from `internal/session` store (`SessionStore.Load`). Optional — only if we advertise the capability. | +| `session/prompt` | Translate ACP `ContentBlock`s → `message.Message`, call `Send`/`SendWithOptions`, stream results back as `session/update`, return the stop reason. | +| `session/cancel` (notification) | Cancel the in-flight turn's context. | + +Agent→client calls gnoma must make: + +| ACP call (agent→client) | Trigger | +|---|---| +| `session/update` (notification) | Per engine stream event: assistant text deltas, tool-call start/args/result, plan/thoughts, token usage. Map gnoma's stream iterator (`Next/Current`) to update variants. | +| `session/request_permission` | gnoma's `permission.Checker` promptFn — instead of console `Scanln`, send this and await the client's allow/deny (with the ACP "allow once / always" options mapped to gnoma permission modes). | +| `fs/read_text_file`, `fs/write_text_file` | **If** we advertise client-side fs and the client supports it, route the `fs` tools through the client so edits show in the editor's buffers. Otherwise gnoma's own `internal/tool/fs` operates on disk directly. Decide per capability negotiation. | + +### Streaming bridge + +The engine produces a pull-based stream (`Next() / Current() / Err() / +Close()`). The ACP bridge consumes it and emits a `session/update` per +event. Backpressure: ACP is fire-and-forget notifications, so no +blocking — but coalesce text deltas if the client is slow (config knob, +default flush per token). + +### Security & safety interplay + +- The `SafeProvider` firewall boundary and the per-session audit log + apply unchanged — ACP is a front-end, providers/tools sit behind the + same security layer. +- `safety.ClassifyCWD` runs on the `session/new` `cwd`; a `refuse` + classification returns an ACP error rather than starting the session. +- Egress allowlist (`2026-06-04-egress-allowlist.md`) applies as usual. +- Incognito: expose a way to start an ACP session incognito (capability + flag or `session/new` param) so editor-driven sessions can be + non-persistent. + +### MCP-in-ACP + +When `session/new` lists `mcpServers`, spin them up through the existing +manager so the editor's MCP config and gnoma's converge in one +handshake (this is the headline ACP×MCP integration). gnoma's own +config-level MCP servers still load too; merge, don't replace. + +--- + +## Part B — gnoma as ACP client (external agents as router arms) + +gnoma connects to external ACP agents and exposes each as a router-arm +backend, the standardized successor to `internal/provider/subprocess`. +gnoma plays the *client* (editor) side of the JSON-RPC channel. + +### Provider adapter + +Add `internal/provider/acp/` implementing the `provider.Provider` +contract (`Stream`, `Name`, `Models`, `DefaultModel`) — the same surface +the subprocess provider satisfies +(`internal/provider/subprocess/provider.go:28-62`): + +- **Spawn + handshake.** On first use (or at discovery), spawn the agent + subprocess (`exec.CommandContext`, with the Windows/Unix process-group + handling from `2026-06-04-cross-platform.md`), send `initialize` as the + client, then `session/new` with gnoma's `cwd` and — crucially — + gnoma's *own* MCP servers passed through as the `mcpServers` list so + the external agent shares gnoma's tool surface. +- **`Stream` → `session/prompt`.** Translate the gnoma `Request` + messages into ACP `ContentBlock`s, send `session/prompt`, and turn the + incoming `session/update` notifications back into gnoma's pull-based + stream events (`EventTextDelta`, structured tool-call events, usage). + This is the win over the subprocess provider: tool calls arrive + **structured**, not as opaque `EventTextDelta` text. +- **Permission callbacks.** The external agent sends + `session/request_permission` to gnoma (now the client). Route these + through gnoma's existing `permission.Checker` so the *user's* gnoma + permission policy governs the sub-agent — a strict improvement over + today's `--yolo`/`--trust` subprocess invocations that bypass gnoma's + gate entirely. +- **`fs/*` callbacks.** Route the agent's file reads/writes through + gnoma's `internal/tool/fs` guard so the path-safety boundary still + applies. +- **Cancellation.** gnoma's turn-cancel sends ACP `session/cancel`. + +### Discovery & registration + +Mirror the subprocess flow (`cmd/gnoma/main.go:521-531`): + +- Discover ACP agents from config (`[acp.agents]` — command + args + + optional capability hints) and/or a known-agents table analogous to + `subprocess/agent.go:60` (`knownAgents`). +- Register each as a `router.Arm` (a new `IsACPAgent` flag, or reuse + `IsCLIAgent` with a transport discriminant). Set `Capabilities` from + the ACP `initialize` response — notably `ToolUse:true`, which the + subprocess provider often can't claim. +- Wrap in `security.WrapProvider(..., fwRef)` exactly like every other + arm so the firewall + audit + egress boundaries hold. + +### Relationship to the subprocess provider + +Additive. Agents that speak ACP (Claude, Gemini CLI, Codex increasingly +do) get the ACP arm; agents that only do one-shot stream-json keep the +subprocess arm. Where both exist for one binary, prefer ACP. This also +unblocks the "Native agy JSON output" backlog item for any agent that +exposes ACP instead of `--output-format stream-json`. + +--- + +## Touch-points (file:line) + +**Part A — agent (server):** + +| Change | Location | +|---|---| +| New ACP package | `internal/acp/` | +| Entry mode dispatch | `cmd/gnoma/main.go` (mode select ~`:106`, subcommand dispatch ~`:178`) | +| stdout→stderr log discipline | logger setup (`main.go:100-114`) | +| Session bridge | `internal/session` (`Session`/`Local`) | +| Permission callback | `internal/permission` checker promptFn (`main.go:645-668`) | +| Stream→update | engine stream iterator (`internal/engine`, `internal/stream`) | +| MCP per-session | `internal/mcp/manager.go` | +| JSON-RPC framing reuse | `internal/mcp/jsonrpc.go` | + +**Part B — client (external agents as arms):** + +| Change | Location | +|---|---| +| ACP-client provider | new `internal/provider/acp/` (mirrors `internal/provider/subprocess/`) | +| Client handshake/driver | `internal/acp/client.go` | +| Arm discovery + registration | `cmd/gnoma/main.go:521-531` (subprocess pattern), `[acp.agents]` config | +| Known-agents table | analogous to `internal/provider/subprocess/agent.go:60` | +| Arm flag | `router.Arm` (`IsACPAgent`, or `IsCLIAgent` + transport) | +| Security wrap | `security.WrapProvider(..., fwRef)` | + +--- + +## Testing (TDD — write first) + +- **Protocol unit tests (no real provider):** + - `initialize` handshake: version negotiation, advertised + capabilities are stable and accurate. + - `session/new` → returns a sessionId; honours `cwd`; rejects a + `refuse`-classified cwd with an ACP error. + - `session/prompt` with a stubProvider: ContentBlocks translate in, + `session/update`s stream out in order, correct stop reason. + - `session/cancel` aborts the in-flight turn (context cancellation + observed). + - Permission: a tool call triggers `session/request_permission`; a + "deny" response blocks the tool; "allow always" updates the mode. + - **stdout purity test:** drive a full prompt and assert stdout + contains *only* valid JSON-RPC frames (no banner/log leakage) — this + is the most common ACP-agent bug. +- **Conformance:** run gnoma against the upstream ACP test client / + example client (Rust/TS) in a `//go:build integration` test if one is + available; otherwise a recorded-transcript fixture. +- **MCP-in-ACP:** `session/new` with an `mcpServers` entry spins the + server up and its tools become callable in that session. +- **Part B (client) unit tests** — drive a *fake ACP agent* (a small + in-process JSON-RPC responder, the mirror of the agent-side tests): + - Provider `Stream` performs `initialize`+`session/new`+`session/prompt` + and yields gnoma stream events in order, with **structured** tool-call + events (not opaque text). + - An inbound `session/request_permission` is routed through + `permission.Checker` and a deny blocks the call. + - An inbound `fs/write_text_file` is mediated by the `internal/tool/fs` + guard (a guarded path is refused). + - Turn cancel emits `session/cancel`; the subprocess is reaped (tie to + cross-platform process-group handling). + - Discovery registers a fake ACP agent as an arm with `ToolUse:true`. +- **Round-trip (loopback):** point gnoma's ACP-*client* at a `gnoma acp` + *server* subprocess and run a prompt end-to-end — exercises both parts + over a real stdio pipe. + +### Acceptance criteria + +**Part A (agent/server):** + +1. `gnoma acp` speaks the handshake and a full prompt turn over stdio. +2. gnoma appears and works as an external agent in Zed (manual: add + gnoma to Zed's external-agents config, run a prompt, approve a tool). +3. Tool permission prompts surface in the client and gate execution. +4. stdout carries only JSON-RPC; all logs go to stderr. +5. Cancelling from the editor stops the turn. +6. MCP servers declared by the client in `session/new` are available in + that session. + +**Part B (client):** + +7. An external ACP agent configured under `[acp.agents]` appears as a + router arm (`gnoma providers` lists it) with `ToolUse:true`. +8. Routing a task to that arm runs a full turn via ACP, surfacing the + sub-agent's tool calls **structured** in gnoma's stream. +9. The sub-agent's permission requests are gated by the user's gnoma + permission policy (not auto-approved). +10. The sub-agent's file writes pass through gnoma's fs guard. +11. Loopback: `gnoma acp` driven by gnoma's own ACP-client completes a + prompt end-to-end. + +--- + +## Open questions (resolve against the live spec at implementation) + +- Exact `protocolVersion` to target and the precise capability struct + shapes (the schema is the source of truth; pin a version). +- Whether to advertise client-side `fs/*` (edits flow through the + editor's buffers) vs. direct-disk fs tools — depends on parity and on + how gnoma's `internal/tool/fs` guard composes with editor-mediated + writes. +- `session/load` support (needs our session store to round-trip the + ACP transcript shape). +- **(Part B)** How a sub-agent's own model/cost is represented in the + router — an ACP arm's tokens are billed by *that* agent, so + `CostWeight`/`CostPer1k*` are opaque. Likely model it like the + subprocess arms (no metered cost; selection driven by `Strengths`). +- **(Part B)** Lifecycle: spawn-per-session vs. a pooled long-lived + agent process reused across turns; how cancellation and crashes are + recovered (ties to session error-recovery, `0d3d190`). + +--- + +## TODO linkage + +New "Agent Client Protocol (ACP) support" entry in `TODO.md` (In +flight) links here. Covers **both** roles: gnoma as ACP agent (Part A) +and gnoma as ACP client driving external agents as router arms +(Part B). Part B is the standardized successor to +`internal/provider/subprocess` and overlaps the "Native agy JSON +output" backlog item. diff --git a/docs/superpowers/plans/2026-06-04-cross-platform.md b/docs/superpowers/plans/2026-06-04-cross-platform.md new file mode 100644 index 0000000..4238b43 --- /dev/null +++ b/docs/superpowers/plans/2026-06-04-cross-platform.md @@ -0,0 +1,198 @@ +# Cross-Platform Support (Windows + macOS) — 2026-06-04 + +Makes the Windows and macOS binaries — which GoReleaser already builds +for `linux/darwin/windows × amd64/arm64` but only Linux exercises — +actually work and stay working. Promotes the TODO.md entry +"Cross-platform support — Windows + macOS" into a phased design with +concrete code touch-points. + +This plan does not restate the TODO's r/devops question map (Phase 2 +table there stands). Its value-add is the **specific code locations** +that need OS-conditional handling and the build-tag pattern to use. + +--- + +## Problem + +Only Linux is tested. The binaries ship for Windows/macOS untested, and +the codebase has several hard Unix assumptions that will fail or +silently misbehave off-Linux. The pattern to follow already exists: +`internal/mcp/transport_{unix,windows}.go` split via build tags. + +--- + +## Non-goals + +- **MSI installer, Authenticode/Gatekeeper signing.** Covered by + `2026-06-04-distribution-followups.md` — those are packaging, not + runtime correctness. +- **Group Policy / Event Viewer integration.** Out of scope per the + TODO; documentation-only. +- **WSL-specific tuning.** WSL is Linux; it works today. + +--- + +## Confirmed Unix-assumption defects (file:line) + +### Critical — break core functionality on Windows + +1. **Bash tool hardcodes `bash -c`.** + `internal/tool/bash/bash.go:117` → + `exec.CommandContext(ctx, "bash", "-c", command)`. No Windows shell. + Alias harvesting (`internal/tool/bash/aliases.go:115,148`) hardcodes + `/bin/bash` and splits the shell path on `/`. +2. **Llamafile SLM startup hardcodes `sh`.** + `internal/slm/manager.go:172` invokes `sh ` (a Wine + binfmt workaround). `sh` is absent on native Windows → `gnoma slm + status/setup` fails outright. +3. **MCP process-tree kill is a Windows stub.** + `internal/mcp/transport_windows.go:10-18` — `setProcessGroup` is a + no-op and `killProcessTree` calls `p.Kill()`, leaking any child + processes an MCP server spawns. Unix version uses process groups + (`transport_unix.go:11-18`). + +### High — config/auth land in the wrong place off-Linux + +4. **Config/data dirs assume XDG.** + `internal/config/load.go:52-59` falls back to `~/.config`; + `internal/slm/manager.go:25-35` falls back to `~/.local/share`. On + Windows these should be `os.UserConfigDir()` (`%AppData%`) / + `os.UserCacheDir()`. On macOS, native tools use + `~/Library/Application Support`, though `~/.config` is tolerable; + decide and document. +5. **OAuth credential discovery is Unix-pathed.** + `internal/provider/google/provider.go:188-204` hardcodes + `~/.config/...` and `~/.gemini/...`. `expandHome` (`:114-129`) + already handles `\`, but the path *set* is Unix-centric — Gemini/ + Antigravity creds on macOS/Windows won't be found. +6. **No system-proxy support.** No `http.ProxyFromEnvironment` wiring + found. Go stdlib reads `HTTP(S)_PROXY` env vars but **not** the + Windows system proxy / PAC. Corporate Windows networks rely on these. + +### Medium — usability / safety classifier gaps + +7. **`internal/safety/cwd.go`** macOS system roots + (`:185-210`) miss `/opt`, `/usr/local`; personal-dir detection + (`:221-252`) misses Windows `%TEMP%`/`%APPDATA%` and macOS + `~/Library/...`. +8. **Terminal/ANSI.** TUI uses lipgloss/termenv (auto-detects), so + modern Windows Terminal/PowerShell 7 are fine; legacy `conhost.exe` + may mangle. Verify, don't assume. + +--- + +## Design + +### Phase 0 — build-tag scaffolding + +Adopt the existing `_unix.go` / `_windows.go` split (as in +`internal/mcp`) for each defect that needs divergent behaviour. Prefer +`runtime.GOOS` only for small inline branches (as +`internal/safety/cwd.go:201` already does); use build tags when the +implementation genuinely differs (shell selection, process kill). + +### Phase 1 — smoke tests (unblocks the honest "did you test it?" answer) + +Non-blocking GitHub Actions matrix (`windows-latest`, `macos-latest`, +`ubuntu-latest`): + +- `go build ./...` and `go test ./...` per OS (today the release + workflow tests Linux only — `.github/workflows/release.yml`). +- Post-release: download each archive, run `gnoma --version` and a + stubbed `echo hi | gnoma --provider ollama` against a fake endpoint. + Confirms the binary launches and the TUI doesn't crash. + +This is the precondition the TODO names for posting to r/devops. + +### Phase 2 — shell abstraction (defects #1, #2) + +1. Introduce `internal/tool/bash/shell_unix.go` / + `shell_windows.go` exposing `defaultShell() (name string, args + []string)` and a `quoteArg(string) string`: + - Unix: `bash`/`$SHELL`, `-c`, POSIX quoting. + - Windows: prefer `pwsh`/`powershell` with the appropriate + `-Command` invocation and PowerShell quoting rules; fall back to + `cmd /c`. Document the choice. +2. Fix `aliases.go` to use `filepath.Base` instead of splitting on `/`, + and skip alias harvesting on Windows shells that have no equivalent. +3. Llamafile: on Windows, invoke the `.llamafile` (which is a valid + Windows PE as well as a shell script) directly rather than via `sh`; + guard with a build tag. + +### Phase 3 — process management (defect #3) + +Implement Windows job objects via `golang.org/x/sys/windows` in +`transport_windows.go` (and any other subprocess owner — +`internal/provider/subprocess`, `internal/tool/bash`): create a job, +assign the child, `TerminateJobObject` on close to reap the whole tree. +Shared helper so MCP and bash tool both get tree-kill. (This is the +same item the distribution TODO references.) + +### Phase 4 — paths + proxy (defects #4, #5, #6) + +1. Replace XDG fallbacks with `os.UserConfigDir()` / `os.UserCacheDir()` + on Windows (keep XDG honoring on Unix). Centralise in one + `configDir()` / `dataDir()` helper so it's not re-derived. +2. Extend the OAuth credential path sets with OS-appropriate locations + (macOS `~/Library/Application Support/...`, Windows `%AppData%/...`). +3. Ensure every `http.Client` uses a transport with + `Proxy: http.ProxyFromEnvironment`. For Windows system-proxy/PAC, + document the env-var workaround now; optionally vendor a PAC-aware + transport (e.g. `github.com/rapid7/go-get-proxied`) later. This + overlaps the shared-client work in + `2026-06-04-egress-allowlist.md` — do the proxy transport once, in + the shared client. + +### Phase 5 — safety classifier + terminal (defects #7, #8) + +Extend `internal/safety/cwd.go` system-root and personal-dir sets per +OS; add a manual verification note for legacy Windows terminals. + +--- + +## Touch-points (file:line) + +| Defect | Location | +|---|---| +| Bash shell | `internal/tool/bash/bash.go:117`, `aliases.go:115,148` | +| Llamafile `sh` | `internal/slm/manager.go:172` | +| MCP kill stub | `internal/mcp/transport_windows.go:10-18` | +| Config/data dirs | `internal/config/load.go:52-59`, `internal/slm/manager.go:25-35` | +| OAuth paths | `internal/provider/google/provider.go:188-204` | +| Proxy | shared `http.Client` (see egress plan) | +| Safety classifier | `internal/safety/cwd.go:185-252` | +| CI matrix | `.github/workflows/` (new test job), `release.yml` | + +--- + +## Testing (TDD — write first) + +- **OS-gated unit tests** (run on each matrix OS): + - `defaultShell()` returns a runnable shell per OS; `quoteArg` + round-trips a value containing spaces/quotes through the real shell. + - `configDir()`/`dataDir()` return the OS-correct base. + - Job-object kill: spawn a child that spawns a grandchild; assert + both are gone after `killProcessTree` (Windows). + - `safety.ClassifyCWD` flags OS-appropriate system/personal dirs. +- **Existing tests** that `t.Skip` on Windows + (`internal/tool/fs/guard_test.go`, + `internal/provider/subprocess/stream_test.go`) — audit whether the + skip hides a real gap now that Windows is a target. + +### Acceptance criteria + +1. CI smoke matrix is green on `windows-latest` + `macos-latest`. +2. `gnoma --version` and a stubbed pipe run succeed on a Windows runner. +3. A bash-tool command with quoted args runs on Windows (PowerShell). +4. An MCP server that spawns a child leaves no orphan after shutdown on + Windows. +5. Config lands in `%AppData%\gnoma` on Windows, `~/.config/gnoma` on + Linux. + +--- + +## TODO linkage + +Promotes the "Cross-platform support — Windows + macOS" entry in +`TODO.md`. The Phase-2 r/devops question table stays in the TODO as the +public-facing answer map; link this plan for the implementation detail. diff --git a/docs/superpowers/plans/2026-06-04-distribution-followups.md b/docs/superpowers/plans/2026-06-04-distribution-followups.md new file mode 100644 index 0000000..d649692 --- /dev/null +++ b/docs/superpowers/plans/2026-06-04-distribution-followups.md @@ -0,0 +1,169 @@ +# Distribution Follow-ups — 2026-06-04 + +Hardens and broadens the release pipeline. v0.1.0+ already ships static +archives (GitHub mirror releases) and multi-arch Docker images (GHCR) +via GoReleaser. This plan covers the optional follow-ups listed under +"Distribution — follow-ups" in TODO.md: signed checksums, Homebrew tap, +`curl | sh` installer, release-note automation, and the +`dockers`→`dockers_v2` migration. + +--- + +## Current state (confirmed) + +- **`.goreleaser.yml`:** 6-target build matrix (linux/darwin/windows × + amd64/arm64), CGO disabled, version injected via ldflags + (`-X main.buildVersion/buildCommit/buildDate`; read at + `cmd/gnoma/main.go:55-60`, printed at `:95-98`). Archives: tar.gz + (zip on Windows). Checksums: plain SHA256 `checksums.txt`, + **unsigned**. Docker: separate per-arch `dockers` blocks + + `docker_manifests` for the multi-arch manifest. Release published to + GitHub mirror (`release.github` owner `VikingOwl91`). +- **`.github/workflows/release.yml`:** triggers on `v*` tags, sets up + QEMU + Buildx, logs into GHCR with the built-in `GITHUB_TOKEN`, runs + `go test ./...` (Linux only), then `goreleaser release --clean` with + `GORELEASER_CURRENT_TAG` set. **No signing step.** +- **`Dockerfile`:** distroless `static:nonroot`, copies the + GoReleaser-built binary in. Architecture-agnostic (binary built + before `COPY`). +- **No** Homebrew tap, install script, or Makefile release target. + +--- + +## Non-goals + +- **Authenticode (Windows) / Gatekeeper notarization (macOS) code + signing.** These need a paid EV cert / Apple Developer account — + tracked separately (the cross-platform TODO documents the + "right-click → Unblock" workaround). Sigstore/cosign here is for + *checksum* signing, which needs no paid cert. +- **MSI installer.** Lives in the cross-platform plan, gated on demand. +- **Changing the canonical repo flow.** PRs still go to the Gitea + upstream; the GitHub mirror remains the release/CI surface. + +--- + +## Design (independent work items — ship in any order) + +### 1. Signed checksums (cosign / sigstore keyless) + +Add a GoReleaser `signs` block that signs `checksums.txt` with cosign +in **keyless** mode (OIDC via the GitHub Actions token — no stored +private key, no cert cost): + +- Add `cosign` install + `id-token: write` permission to + `release.yml`. +- GoReleaser `signs:` → `cmd: cosign`, `args: sign-blob` producing + `checksums.txt.sig` + `.pem` (cert bundle) as release artifacts. +- Document verification: + `cosign verify-blob --certificate ... --signature ... checksums.txt`. + +Acceptance: a downloaded release verifies offline against the published +signature + Rekor transparency log. + +### 2. Homebrew tap + +Create a tap repo (`VikingOwl91/homebrew-tap`) and add GoReleaser's +`brews:` block targeting it. Needs a PAT with `contents:write` on the +tap repo (the default `GITHUB_TOKEN` can't push to a *second* repo) — +store as `HOMEBREW_TAP_TOKEN` secret. Formula installs the darwin/linux +archives. + +Acceptance: `brew install vikingowl91/tap/gnoma` installs a working +binary on macOS + Linuxbrew; `gnoma --version` matches the tag. + +### 3. `curl | sh` installer + +Add `install.sh` (committed at repo root, served via the raw GitHub +mirror) that: + +- Detects OS/arch, maps to the GoReleaser archive name template + (`gnoma___.`). +- Resolves the latest release via the GitHub API (or honours a pinned + `GNOMA_VERSION`). +- Downloads the archive **and** `checksums.txt`, verifies the SHA256 + before extracting (and the cosign signature if cosign is present). +- Installs to `~/.local/bin` (or `$GNOMA_INSTALL_DIR`), prints a PATH + hint. + +Keep it POSIX-sh, no bashisms. Acceptance: +`curl -fsSL /install.sh | sh` yields a runnable `gnoma` on a clean +Linux + macOS box; checksum mismatch aborts. + +### 4. Release-note automation + +GoReleaser already generates a filtered changelog (excludes +docs/test/chore/style). Enrich it: + +- Group commits by Conventional-Commit type + (`changelog.groups` with title regexes for feat/fix/perf/refactor). +- Add a release header template pointing to the upstream Gitea repo and + the install methods (brew / curl | sh / docker). + +Acceptance: a tagged release's GitHub notes show grouped sections + an +install snippet, with no docs/chore noise. + +### 5. `dockers` → `dockers_v2` migration + +Collapse the two per-arch `dockers` blocks + `docker_manifests` into a +single `dockers_v2` block (GoReleaser's newer multi-platform builder). +The current `Dockerfile` is architecture-agnostic (binary copied +post-build), so verify whether `dockers_v2`'s expected per-platform +binary layout needs a `Dockerfile` change or a `templates`/`extra_files` +tweak — the TODO flags this as the reason it was deferred. Do it in its +own commit; diff the resulting GHCR manifest against the current one to +prove parity (same tags: `-amd64`, `-arm64`, ``, +`latest`). + +Acceptance: GHCR still publishes a multi-arch manifest with identical +tags + labels; `docker pull --platform linux/arm64` works. + +### 6. (Carry-over) Windows process-tree kill + +Listed in this TODO bullet but it's a *runtime* concern — implemented in +`2026-06-04-cross-platform.md` Phase 3 (job objects). Cross-linked here +only so the TODO bullet's reference resolves. + +--- + +## Touch-points (file:line) + +| Item | Location | +|---|---| +| Signing, brews, changelog groups, dockers_v2 | `.goreleaser.yml` | +| cosign install, `id-token` perm, tap token | `.github/workflows/release.yml` | +| Installer | new `install.sh` (repo root) | +| Dockerfile (if dockers_v2 needs it) | `Dockerfile` | +| Tap repo | new `VikingOwl91/homebrew-tap` | + +--- + +## Testing + +Distribution is config + scripts, so testing is mostly pipeline-level: + +- **Dry run:** `goreleaser release --snapshot --clean` locally must + produce signed checksums, brew formula, and the dockers_v2 manifest + without publishing. +- **install.sh:** a `shellcheck` gate + a CI job that runs it against + the latest release on linux + macos runners and asserts + `gnoma --version`. +- **Checksum/signature negative test:** corrupt the archive → installer + aborts; tampered checksums → cosign verify fails. + +### Acceptance criteria + +1. A tagged release publishes `checksums.txt` + `.sig` + `.pem`, + verifiable with cosign keyless. +2. `brew install vikingowl91/tap/gnoma` works on macOS. +3. `curl -fsSL /install.sh | sh` works on clean Linux + macOS, + with checksum verification. +4. Release notes are grouped and carry install instructions. +5. GHCR multi-arch manifest is unchanged after the dockers_v2 swap. + +--- + +## TODO linkage + +Promotes the "Distribution — follow-ups" entry in `TODO.md`. Link this +file; the Windows job-object sub-item points at the cross-platform plan. diff --git a/docs/superpowers/plans/2026-06-04-egress-allowlist.md b/docs/superpowers/plans/2026-06-04-egress-allowlist.md new file mode 100644 index 0000000..b563ad8 --- /dev/null +++ b/docs/superpowers/plans/2026-06-04-egress-allowlist.md @@ -0,0 +1,236 @@ +# Network Egress Allowlist — 2026-06-04 + +Adds a per-host network egress boundary to the security layer via a +Learn → Review → Enforce rollout. Promotes the second half of the +TODO.md entry "Security boundary — egress controls + session audit log" +into a phased design. + +--- + +## Status of the sibling item: per-session audit log — DONE + +The first half of the TODO entry (per-session audit log of +blocked/redacted events) is **already implemented**: + +- `internal/security/audit.go` defines `AuditLogger` / `AuditEvent`, + writing append-only JSONL at mode `0o600`, incognito-gated, + best-effort (write failures never break the scan pipeline). +- `cmd/gnoma/main.go:685-691` wires it to + `/.gnoma/sessions//audit.jsonl`. +- `internal/security/firewall.go` records events at `:152` (unicode + sanitize), `:173` (block), `:186` (redact). + +**Remaining audit-log gap:** there is no CLI surface to *read* it. The +TODO's promise — answer "what did the firewall do this session?" in one +command — needs a `gnoma firewall audit` subcommand (no `firewall` +subcommand exists today; top-level commands are `providers`, `slm`, +`router`, `profile`). That viewer is folded into Phase 3 below since it +shares the `gnoma firewall` command surface with `firewall review`. + +The rest of this plan is the genuinely-unbuilt egress allowlist. + +--- + +## Problem + +The current `Firewall` is a **content** boundary only: it scans +messages and tool results for secrets (regex + Shannon entropy) and +redacts/blocks/warns. It does **not** enforce network egress. Outgoing +HTTP uses stock clients with no per-host allowlist and no dial-layer +interception, so a compromised tool, MCP server, or prompt-injected +provider call can reach any host. + +The README and v0.3.0 launch post oversold "network egress gated"; +this plan makes that claim true. + +### Why this is hard: no egress chokepoint today + +Outgoing HTTP is constructed in many places, none sharing a client: + +- **Provider SDKs** each build their own `http.Client` internally: + - anthropic (`internal/provider/anthropic/provider.go:36`, + `anthropic.NewClient`) + - openai (`internal/provider/openai/provider.go:46`, `oai.NewClient`) + - mistral (`internal/provider/mistral/provider.go:33`, + `mistralgo.NewClient`) + - google genai (`internal/provider/google/provider.go:239,306`) +- **Non-SDK direct calls** using `http.DefaultClient` or ad-hoc + `&http.Client{}`: + - `internal/router/discovery.go` (`:65,141,325,365`) + - `internal/router/probe.go` (`:24,72`) + - `internal/slm/backend.go` (`:266,294,316,343`) + - `internal/slm/download.go` (`:22`) + - `internal/slm/manager.go` (`:273`) + +No custom `http.Client` is injected anywhere today. **But** every SDK +supports injecting one, which is the enabler for a single chokepoint. + +--- + +## Non-goals + +- **TLS interception / MITM.** We allowlist by destination host, not by + inspecting decrypted payloads. Content inspection stays the + firewall's job. +- **Blocking the provider SDKs' own retry/telemetry hosts by default.** + Model-provider hosts are baseline-allowed (see below). +- **Replacing the OS/network firewall.** This is an in-process + application-level guard, defense-in-depth, not a substitute for real + network controls. Document this honestly (the README over-claim is + the cautionary tale). + +--- + +## Design + +### The chokepoint: one shared `http.Client` with a guarded dialer + +Build a single `*http.Client` whose `Transport.DialContext` validates +the destination against the allowlist **before** the connection is +made. `DialContext` receives `host:port` pre-resolution, so host-based +matching works without DNS races. Thread this client everywhere. + +``` +internal/security/egress/ + guard.go // EgressGuard: mode + allowlist + Decide(host) ResultEnum + dialer.go // GuardedDialer wrapping net.Dialer.DialContext + client.go // HTTPClient(guard) *http.Client + store.go // learned-destinations persistence (per project) + baseline.go // curated ship-in-binary allowlist +``` + +**Injection mechanism per SDK** (each differs — enumerate, don't assume): + +| Client | Mechanism | +|---|---| +| anthropic | `option.WithHTTPClient(c)` appended in `anthropic/provider.go` | +| openai | `option.WithHTTPClient(c)` appended in `openai/provider.go` | +| google genai | `genai.ClientConfig{HTTPClient: c}` in `google/provider.go` | +| mistral | **user's own SDK** — add `WithHTTPClient` option if absent (`github.com/VikingOwl91/mistral-go-sdk`), then use it | +| non-SDK paths | replace `http.DefaultClient` with the shared client in `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` | + +Plumb the shared client into providers by adding +`HTTPClient *http.Client` to `provider.ProviderConfig` +(`internal/provider/registry.go:8-16`) and setting it in +`createProvider`. The non-SDK paths take the client via their existing +constructors / a package-level setter. + +> The non-SDK paths are the trap: if any is missed it punches a hole in +> the allowlist. Treat the list above as a checklist; add a grep test +> (Phase 4) that fails if `http.DefaultClient` reappears. + +### Three-stage rollout (not a single "block everything" default) + +**Learn.** First runs log every egress destination per `(project, +agent, tool)` tuple to the per-project store **without blocking**. +Reuse the audit JSONL discipline (atomic, incognito-gated). + +**Review.** `gnoma firewall review` surfaces the captured set; the user +marks each destination `allow | deny | scoped` (scoped = only reachable +by named tool/agent). Persist to `.gnoma/firewall/allowlist.toml` +(project) — subject to the same `omitempty`/atomic-write discipline as +the config-migration plan (`2026-05-24-config-migration.md`) to avoid +the zero-spam corruption class. + +**Enforce.** When mode is `enforce`, unrecognised destinations are +blocked with a clear violation logged to the **same per-session +`audit.jsonl`** (new `AuditEvent.Action = "egress_block"`). Mode is +`[security.egress].mode = "off" | "learn" | "enforce"`, default `off` +(opt-in; shipping `enforce` on by default would break first-run UX). + +### Baseline allowlist (curated, ship-in-binary) + +`baseline.go` seeds the allowlist so Enforce mode is usable immediately: + +- **Package ecosystems:** github.com, registry.npmjs.org, pypi.org, + files.pythonhosted.org, crates.io, static.crates.io, + registry-1.docker.io, proxy.golang.org, sum.golang.org. +- **Model providers:** anthropic, openai, google, mistral, **minimax** + (per `2026-06-04-minimax-provider.md`) — host set derived from the + effective `[provider.endpoints]` map so user-configured local + ollama/llamacpp endpoints are auto-allowed. + +The painful middle ground is SDK egress (sentry, stripe, supabase, +datadog…). These break a naive "block unknown" default, which is +exactly why Learn → Review → Enforce is the only flow that scales. + +### Per-tool scoping + +`scoped` destinations carry an allowed-tool/agent set. Enforcement +checks the calling context — the engine already knows which tool is +running (it threads per-tool context for redaction logging today). Pass +the tool/agent identity into `EgressGuard.Decide(host, callerCtx)`. + +--- + +## Interactions + +- **Incognito:** Learn-mode writes are gated by incognito exactly like + the audit log (`IncognitoMode.ShouldLogContent`). Enforcement still + applies in incognito (security is not relaxed); only the *learning* + persistence is suppressed. +- **Config layering:** the allowlist file is a new corruption surface — + follow `2026-05-24-config-migration.md` #1 discipline. +- **SafeProvider:** egress is orthogonal to the content `SafeProvider` + wrap; it lives one layer down at the transport. Both must hold. + +--- + +## Touch-points (file:line) + +| Change | Location | +|---|---| +| New egress package | `internal/security/egress/` | +| `HTTPClient` field | `internal/provider/registry.go:8-16` | +| Provider client injection | `anthropic/provider.go`, `openai/provider.go`, `google/provider.go`, `mistral/provider.go` | +| mistral SDK `WithHTTPClient` | `github.com/VikingOwl91/mistral-go-sdk` (if absent) | +| Non-SDK client swap | `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` | +| `audit.go` egress action | `internal/security/audit.go` (`AuditEvent`) | +| Config `[security.egress]` | `internal/config/config.go` (SecuritySection ~`:280-306`) | +| `gnoma firewall` command | `cmd/gnoma/main.go` subcommand dispatch (~`:178`) | +| Allowlist store | `.gnoma/firewall/allowlist.toml` | + +--- + +## Testing (TDD — write first) + +- **Unit:** + - `EgressGuard.Decide`: off → always allow; learn → allow + record; + enforce → allow baseline/allowlisted, block unknown, scoped host + allowed only for the named tool. + - `GuardedDialer` blocks a non-allowlisted `host:port` before dial + (use a guard with a closed allowlist; assert no connection + attempt — inject a fake inner dialer that records calls). + - Baseline expansion: `[provider.endpoints]` hosts are auto-allowed; + a local ollama URL becomes an allowlist entry. + - Allowlist store round-trips without zero-spam corruption. + - `audit.jsonl` gains an `egress_block` record on a blocked dial. +- **Grep/guard test:** fails if `http.DefaultClient` is used in + provider/router/slm packages (prevents regressions reopening the + hole). +- **Integration (`//go:build integration`):** with mode=enforce and a + minimal allowlist, a provider call to an allowed host succeeds and a + tool fetch to a blocked host fails with a logged violation. + +### Acceptance criteria + +1. `mode="off"` (default) → behaviour identical to today. +2. `mode="learn"` → every outbound host appears in the store; nothing + is blocked. +3. `gnoma firewall review` lists learned hosts and persists + allow/deny/scoped decisions. +4. `mode="enforce"` → baseline + allowlisted hosts reachable; an + un-allowlisted host is blocked with an `egress_block` line in + `.gnoma/sessions//audit.jsonl`. +5. `gnoma firewall audit` prints this session's firewall events + (block/redact/egress) in a grep-friendly form. (Closes the + remaining audit-log gap.) +6. Scoped destination reachable by its named tool only. + +--- + +## TODO linkage + +Replaces the egress half of the "Security boundary — egress controls + +session audit log" entry in `TODO.md`. Update that entry to mark the +audit log implemented and link this file for the egress work. diff --git a/docs/superpowers/plans/2026-06-04-minimax-provider.md b/docs/superpowers/plans/2026-06-04-minimax-provider.md new file mode 100644 index 0000000..23b8cd4 --- /dev/null +++ b/docs/superpowers/plans/2026-06-04-minimax-provider.md @@ -0,0 +1,224 @@ +# MiniMax Provider — 2026-06-04 + +Adds MiniMax () as a first-class cloud +provider so it can register as a router arm alongside +anthropic/openai/google/mistral. Promotes the TODO.md entry +"MiniMax provider — cloud arm + subscription token plan" out of +bullet form into a phased design. + +--- + +## Problem + +Gnoma has no MiniMax adapter. MiniMax ships strong, very cheap coding +models (M2 family) that are a natural fit for the cheap-high-capability +cloud tier the router already reasons about via `CostWeight`. Two facts +make the integration cheap: + +1. MiniMax exposes **both** an OpenAI-compatible and an + Anthropic-compatible HTTP surface, so no new translation layer is + needed — gnoma already has both `internal/provider/openaicompat` + (built on the OpenAI SDK) and `internal/provider/anthropic` with a + working `BaseURL` override. +2. `envKeyFor`'s default branch (`cmd/gnoma/main.go:1199-1200`) already + resolves `MINIMAX_API_KEY` for any unknown provider with no code + change. + +The remaining work is wiring (a constructor + switch cases + +enumerations), routing metadata (family defaults, rate limits), and a +**design decision around the subscription billing model** that the +router's metered-cost assumption does not currently handle. + +### External facts (VERIFY at implementation — MiniMax docs move fast) + +These were confirmed 2026-06-04 but the model lineup and pricing are +revised frequently (a pricing overhaul landed 2026-06-02). Re-verify +against the live docs before hardcoding anything: + +- **OpenAI-compatible base URL:** `https://api.minimax.io/v1` + (international). A separate region endpoint exists + (`api.minimaxi.com`); confirm the exact host + whether gnoma should + expose a region toggle. Docs: + +- **Anthropic-compatible endpoint:** exists ("two equivalent + endpoints, one mimics OpenAI, one mimics Anthropic"). Confirm the + exact path/host before choosing it over OpenAI-compat. +- **Models (do NOT hardcode a single ID):** MiniMax-M2, M2.1, M2.5, + M2.7 (+ `-highspeed` variants), M3. Coding-relevant default is the + current M2-coding model — at time of writing M2.5 for PAYG, M2.1 for + the subscription plan. **Treat the default as config, not a + constant**, and call `Models(ctx)` to enumerate live. +- **Pricing (PAYG, for `CostPer1k*` metadata):** M2.7 ≈ $0.30 / MTok + input, $1.20 / MTok output; highspeed ≈ 2×. Convert to the EUR + per-1k convention used by the Arm struct. Docs: + +- **Subscription:** "Token Plan" (current; supersedes the former + "Coding Plan"). Flat-rate prompt quota over a rolling window + (published M2.7 limits 1,500–30,000 requests / 5h across tiers). + Same Bearer key as PAYG. + +--- + +## Non-goals + +- **A bespoke MiniMax SDK / translation layer.** We reuse the existing + OpenAI-compat (default) or Anthropic provider via `BaseURL`. If + MiniMax adds non-standard body fields, use the existing + `openai.NewWithStreamOptions` escape hatch (the same one Ollama uses). +- **Region auto-detection.** Ship the international endpoint as the + default; the user can override via `[provider.endpoints]`. A region + toggle is a follow-up if anyone asks. +- **Full subscription-quota accounting.** Phase 2 models subscription + cost as a coarse `CostWeight` zero-out, not a live quota meter. + +--- + +## Decision: OpenAI-compat vs Anthropic-compat backing + +**Default to OpenAI-compat** (`internal/provider/openaicompat`). It is +already exercised by the local backends (ollama/llamacpp), so the +streaming, tool-call, and error paths are battle-tested in this repo. +The Anthropic-compat endpoint is a fallback only if a MiniMax feature +(e.g. extended thinking) is exposed solely through it. Keep the option +open by making the backing selectable via config +(`[provider.minimax].api = "openai" | "anthropic"`), defaulting to +`openai`. + +--- + +## Design + +### Phase 1 — provider wiring (smallest shippable slice) + +Goal: `gnoma --provider minimax` works against PAYG with metered +pricing, registered as a cloud arm. + +1. **Constructor.** Add `NewMiniMax(cfg provider.ProviderConfig) + (provider.Provider, error)` to + `internal/provider/openaicompat/provider.go`, mirroring `NewOllama` + / `NewLlamaCpp` (`openaicompat/provider.go:18-49`): + - Default `BaseURL` to `https://api.minimax.io/v1` when unset (but + let `[provider.endpoints].minimax` override). + - Require a real API key (unlike Ollama's dummy key) — return an + error if `cfg.APIKey == ""`. + - Leave `MaxRetries` at the SDK default (cloud failures *are* + transient, unlike the local backends which force `0`). + - Default `cfg.Model` to the current coding model **read from + config**, not a baked constant. + +2. **Construction switch.** Add `case "minimax": return + openaicompat.NewMiniMax(cfg)` to `createProvider` + (`cmd/gnoma/main.go:1265-1280`). If `[provider.minimax].api = + "anthropic"`, route to `anthropicprov.New(cfg)` with `cfg.BaseURL` + set to the anthropic-compat host instead. + +3. **Provider enumerations.** Add `"minimax"` to: + - the known-providers set (`main.go:233-236`), + - the available-providers usage string (`main.go:1279`), + - NOT the local-providers set (it is a cloud arm). + +4. **API key (optional friendliness).** `envKeyFor`'s default already + yields `MINIMAX_API_KEY`. Add an explicit `case "minimax"` in + `envKeyFor` (`main.go:1189-1201`) only if we want alternates (e.g. + `MINIMAX_GROUP_ID` if the account requires a group id header — + VERIFY whether MiniMax needs a group id alongside the key; if so, + thread it through `ProviderConfig.Options`). + +5. **Family defaults.** Add MiniMax model families to + `knownFamilyDefaults` in `internal/router/defaults.go` (pattern at + `defaults.go:212-239`). Cloud arm → no `MaxComplexity` ceiling. Set + `Strengths` (`TaskGeneration`, `TaskRefactor`, `TaskDebug` are the + coding sweet spot) and a low `CostWeight` (~0.8–1.0 — cheap arm, so + the cost penalty is small) plus `CostPer1kInput/Output` from the + verified PAYG pricing. + +6. **Rate limits.** Add a `minimaxDefaults()` entry in + `internal/provider/ratelimits.go` (pattern at the anthropic block + ~`ratelimits.go:109-130`) and wire it into the `DefaultRateLimits` + switch. Use the published PAYG RPM/TPM; allow `[rate_limits.minimax]` + config overrides (the existing override path in `resolveRateLimitPools`). + +### Phase 2 — subscription (Token Plan) billing model + +The router's `CostWeight` math assumes metered per-token pricing. Under +a Token Plan subscription, marginal cost is ≈0 until the quota is hit, +then requests hard-fail. Design: + +1. **Billing knob.** `[provider.minimax].billing = "metered" | + "subscription"` (default `"metered"`). In `subscription` mode, set + the arm's `CostWeight` to 0 (or `CostPer1k*` to 0) so the selector + treats MiniMax as free while quota remains. + +2. **Quota-exhaustion failover.** MiniMax returns a quota/429 error + when the plan is exhausted. Map it to the existing rate-limit + backoff path (`Arm.BackoffUntil`, the 429 handling that already + disables an arm temporarily) so the bandit fails over to the next + arm cleanly. This ties into the session error-recovery work landed + in `0d3d190`. Confirm the exact error shape MiniMax returns and add + a classifier in `internal/provider/errors.go`. + +3. **Docs.** Document both plans + the region split in + `docs/slm-backends.md` (or a new provider doc) and the README + provider list. + +--- + +## Touch-points (file:line) + +| Change | Location | +|---|---| +| `NewMiniMax` constructor | `internal/provider/openaicompat/provider.go` (after `:49`) | +| Construction switch case | `cmd/gnoma/main.go:1265-1280` | +| Known-providers set | `cmd/gnoma/main.go:233-236` | +| Usage string | `cmd/gnoma/main.go:1279` | +| `envKeyFor` (optional) | `cmd/gnoma/main.go:1189-1201` | +| Family defaults | `internal/router/defaults.go:212-239` | +| Rate-limit defaults | `internal/provider/ratelimits.go` (+ `DefaultRateLimits` switch) | +| Error classifier (Phase 2) | `internal/provider/errors.go` | +| Config: `[provider.minimax]` | `internal/config/config.go` (provider section) | + +The `Provider` interface contract to satisfy +(`internal/provider/provider.go:136-148`): `Stream`, `Name`, `Models`, +`DefaultModel`. All four come free by delegating to the OpenAI-compat +base provider. + +--- + +## Testing (TDD — write first) + +Per CLAUDE.md: table-driven, `//go:build integration` for anything +hitting the live API. + +- **Unit (no network):** + - `NewMiniMax` defaults: empty `BaseURL` → `https://api.minimax.io/v1`; + empty key → error; `[provider.endpoints].minimax` override wins. + - `createProvider("minimax", …)` returns a non-nil provider; unknown + still errors. + - `envKeyFor("minimax") == "MINIMAX_API_KEY"`. + - `defaults.go`: a MiniMax model family resolves to the expected + `Strengths`/`CostWeight`; `MaxComplexity == 0`. + - `ratelimits.go`: `DefaultRateLimits("minimax").LookupModel(...)` + returns the configured limits; `"*"` fallback works. + - Phase 2: billing=`subscription` → arm `CostWeight == 0`; the + quota/429 error maps to a retryable/backoff classification. +- **Integration (`//go:build integration`, real `MINIMAX_API_KEY`):** + a one-shot `Stream` against the cheapest model returns tokens; + `Models(ctx)` enumerates a non-empty list. + +### Acceptance criteria + +1. `MINIMAX_API_KEY=… gnoma --provider minimax -p "hello"` streams a + response in pipe mode. +2. With no `--provider`, MiniMax appears as a selectable router arm and + is chosen for a cheap generation task when `prefer` allows cloud. +3. `gnoma providers` lists `minimax`. +4. Phase 2: with `billing="subscription"`, the selector prefers MiniMax + for eligible tasks; on simulated quota-exhaustion the router fails + over without surfacing an error to the user. + +--- + +## TODO linkage + +Replaces the inline "MiniMax provider" bullet in `TODO.md` (In flight). +Link this file from that entry.