docs(todo,plans): specs for open features + MiniMax & ACP

Add implementation-ready plans for the in-flight features that lacked
one, and two new provider/protocol items:

- MiniMax provider (cloud arm + Token Plan billing decision)
- Agent Client Protocol (ACP) — dual role: gnoma as ACP agent and as
  ACP client driving external agents as router arms
- Network egress allowlist (Learn/Review/Enforce); note the per-session
  audit log is already implemented, remaining gap is a viewer command
- Cross-platform (Windows/macOS) code touch-points + build-tag pattern
- Distribution follow-ups (cosign, brew tap, installer, dockers_v2)

Link each plan from its TODO.md entry; mark audit-log item done.
This commit is contained in:
2026-06-04 11:59:16 +02:00
parent 98daebd359
commit f8ab522bef
6 changed files with 1297 additions and 6 deletions
+95 -6
View File
@@ -4,6 +4,86 @@ Active work, newest first.
## In flight
- **MiniMax provider — cloud arm + subscription token plan.** Add
MiniMax (api.minimax.io / api.minimaxi.com) as a first-class cloud
provider so it can register as a router arm alongside
anthropic/openai/google/mistral.
**API surface.** MiniMax ships *two* OpenAI-and-Anthropic-compatible
HTTP surfaces, so this is a base-URL + auth wiring task, not a new
translation layer:
- **OpenAI-compatible** chat-completions at `…/v1` — reusable via
`internal/provider/openaicompat`. Cleanest first cut: add a
`NewMiniMax(cfg)` constructor mirroring `NewOllama` /
`NewLlamaCpp` (`openaicompat/provider.go`) with the MiniMax base
URL baked in, then a `case "minimax"` in
`createProvider` (`cmd/gnoma/main.go:1265`) and the available-
providers usage string (`:1279`).
- **Anthropic-compatible** endpoint (`…/anthropic`) — alternative
backing via the existing `anthropic` provider with a `BaseURL`
override. Decide one canonical path; OpenAI-compat is the lower-
risk default since `openaicompat` is already exercised by the
local backends.
- **Auth.** Bearer API key. `envKeyFor`'s default branch
(`main.go:1199`) already resolves `MINIMAX_API_KEY` with no code
change; add an explicit `case "minimax"` only if we want a
friendlier name or alternates list.
- **Models.** `MiniMax-M2` (agentic/coding, the one to default to),
`MiniMax-M1`, abab6.5 series. Set `Strengths` + `MaxComplexity`
+ `CostWeight` on the arm so the selector treats it as a cheap
high-capability cloud tier.
**Token plan (open question — affects auth + billing UX).** MiniMax
offers a flat-rate **Coding Plan** subscription (token-quota based,
Claude-Max-style) *in addition to* metered pay-as-you-go API
credits. Both authenticate with the same Bearer key, so no adapter
difference — but the router's `CostWeight` math assumes metered
per-token pricing. Under a subscription the marginal cost is ~0
until the quota is hit, then hard-stops. Decisions to make:
- How to model "subscription" cost in the selector — e.g. a
`[provider.minimax].billing = "subscription" | "metered"` knob
that zeroes `CostWeight` while quota remains, vs. real per-token
cost when metered.
- Quota exhaustion handling — surface the 429/quota error cleanly
and let the bandit fail over to the next arm (ties into the
session error-recovery work in `0d3d190`).
- Document both plans + the region split (`api.minimax.io`
international vs `api.minimaxi.com`) in `docs/slm-backends.md` /
provider docs.
Smallest shippable slice: OpenAI-compat `NewMiniMax` + metered
pricing, registered as a cloud arm. Subscription/quota modelling is
the follow-up once the billing knob lands. Plan:
[`docs/superpowers/plans/2026-06-04-minimax-provider.md`](docs/superpowers/plans/2026-06-04-minimax-provider.md).
- **Agent Client Protocol (ACP) support.** Run gnoma as an *ACP agent*
(`gnoma acp`) so any ACP-capable editor (Zed, Kiro, OpenCode, …) can
drive it as an external coding agent. ACP is "the LSP for AI coding
agents": JSON-RPC 2.0 over stdio, editor (client) spawns agent
(subprocess). gnoma already owns the hard parts — agentic engine,
tools, permissions, and JSON-RPC-over-stdio (from its MCP-client
side, `internal/mcp/jsonrpc.go`). The fit is symmetric: gnoma is the
JSON-RPC *server* here. No Go SDK exists (official SDKs are
TS/Python/Rust/Kotlin), so gnoma implements the wire protocol
natively against the schema. `session/new` can declare `mcpServers`,
so ACP and gnoma's existing MCP manager wire up in one handshake.
**Dual role — both directions:**
1. **gnoma as ACP agent (server)**`gnoma acp` over stdio so
editors drive gnoma.
2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
(Claude, Gemini CLI, Codex, …) and uses them as router-arm
provider backends. This is the same shape as the existing
`internal/provider/subprocess` CLI-agent arms
(`cmd/gnoma/main.go:521-531`, `IsCLIAgent: true`) but over
standardized ACP JSON-RPC — gaining structured tool-call
surfacing, real turn/permission semantics, and cancellation
that the current one-shot stream-json subprocess provider
lacks (it sets `ToolUse:false` for agents without stream-json).
Upstream: <https://github.com/agentclientprotocol>. Plan:
[`docs/superpowers/plans/2026-06-04-agent-client-protocol.md`](docs/superpowers/plans/2026-06-04-agent-client-protocol.md).
- **Config write/merge — silent corruption of layered configs.**
`internal/config/write.go:setConfig` reads the existing TOML into a
zero-valued `Config` struct, sets one field, and writes the entire
@@ -159,11 +239,13 @@ Active work, newest first.
with no per-host allowlist or dial-layer interception. Two follow-
ups surfaced from the r/SideProject v0.3.0 launch thread
(2026-05-24, `u/Secret_Theme3192`):
1. **Per-session audit log of blocked/redacted events** —
grep-able file at `.gnoma/sessions/<id>/audit.jsonl` so the
user can answer "what did the firewall do this session?" in
one command. Today the `slog` output goes to whatever sink is
configured, with no per-session grouping.
1. **Per-session audit log of blocked/redacted events** — ✅ JSONL
writing **implemented**: `internal/security/audit.go` +
wiring at `cmd/gnoma/main.go:685-691`
(`.gnoma/sessions/<id>/audit.jsonl`), recorded from
`firewall.go:152/173/186`. **Remaining gap:** no CLI to *read*
it — a `gnoma firewall audit` viewer is folded into the egress
plan (shares the `gnoma firewall` command surface).
2. **Per-host egress allowlist (HTTP transport layer)** — design
refined by `u/HarjjotSinghh` on the r/SideProject thread
(2026-05-28). Three-stage rollout, not a single-shot
@@ -195,6 +277,9 @@ Active work, newest first.
"network egress gated"; corrected in the README scope note
and the audit-log commit.
Egress plan (incl. the `gnoma firewall audit` viewer for item #1):
[`docs/superpowers/plans/2026-06-04-egress-allowlist.md`](docs/superpowers/plans/2026-06-04-egress-allowlist.md).
- **Cross-platform support — Windows + macOS.** GoReleaser builds
static binaries for `linux/darwin/windows × amd64/arm64` every
release but only Linux is exercised at all today. Windows and
@@ -244,6 +329,9 @@ Active work, newest first.
least a TODO-linked acknowledgement in the post body so the
thread sees gnoma takes the gaps seriously.
Plan (build-tag scaffolding + concrete code touch-points):
[`docs/superpowers/plans/2026-06-04-cross-platform.md`](docs/superpowers/plans/2026-06-04-cross-platform.md).
- **Tool-router specialization (functiongemma)** — gated on telemetry,
not committed. Phase A.2 adds did-switch-rate measurement to the
two-stage `select_category` path; Phase A.3 (LoRA fine-tune of
@@ -288,7 +376,8 @@ Active work, newest first.
from `dockers` + `docker_manifests` to `dockers_v2` in
`.goreleaser.yml` (collapses ~45 lines into one block but
requires Dockerfile changes for the per-platform binary layout
— deferred to its own commit before v0.3.0).
— deferred to its own commit before v0.3.0). Plan:
[`docs/superpowers/plans/2026-06-04-distribution-followups.md`](docs/superpowers/plans/2026-06-04-distribution-followups.md).
## Stable backlog (not in active phases)
@@ -0,0 +1,375 @@
# Agent Client Protocol (ACP) — 2026-06-04
Adds **both directions** of ACP to gnoma:
1. **gnoma as ACP agent (server)**`gnoma acp` over stdio so any
ACP-capable editor (Zed, Kiro, OpenCode, …) can drive gnoma as an
external coding agent.
2. **gnoma as ACP client** — gnoma spawns *external* ACP agents
(Claude, Gemini CLI, Codex, …) and exposes them as router-arm
provider backends, the standardized successor to the current
`internal/provider/subprocess` CLI-agent arms.
Adds the TODO.md entry "Agent Client Protocol (ACP) support".
Upstream: <https://github.com/agentclientprotocol> ·
spec <https://agentclientprotocol.com>
---
## Problem
ACP is "the LSP for AI coding agents": a JSON-RPC 2.0 protocol, spoken
over stdio, that lets editors (clients) spawn agents (subprocesses) and
talk to them in a standard way — eliminating point-to-point editor↔agent
integrations. Zed, Kiro, OpenCode and others are clients; Claude, Gemini
CLI, Codex ship as ACP agents.
Today gnoma is reachable only via its own TUI and pipe mode. It cannot
plug into an editor's agent panel. Supporting ACP makes gnoma a drop-in
agent inside any ACP client, which is a large distribution surface for
near-zero ongoing cost — the protocol is stable and gnoma already owns
all the hard parts (an agentic engine, tools, permissions, MCP).
### Why this is a natural fit
- gnoma already speaks **JSON-RPC over stdio** for MCP
(`internal/mcp/jsonrpc.go` `Request`/`Notification`,
`internal/mcp/transport*.go`) — that machinery is reusable for the
ACP server side (gnoma is the *server* of the JSON-RPC channel here,
the mirror of its MCP-client role).
- The agentic loop is already factored behind
`session.Session` (`internal/session/session.go:54`,
`Local.Send`/`SendWithOptions` at `local.go:80-85`) driving
`engine.Engine` (`internal/engine/engine.go`). ACP `session/prompt`
maps onto one `Send`.
- Permissions already route through a pluggable prompt function
(`permission.NewChecker(mode, rules, promptFn)`,
`cmd/gnoma/main.go:668`). ACP's `session/request_permission` callback
is just another `promptFn` implementation.
- ACP `session/new` can declare the `mcpServers` the agent should
connect to — gnoma already has an MCP manager
(`internal/mcp/manager.go`) to honour that in the same handshake.
### Role decision — both, server first
Both roles ship under this plan. Sequence them: **agent (server)
first** — it's the larger distribution win and exercises the wire
protocol end-to-end — then **client**, which reuses the same
`internal/acp` protocol/types from the other side. They share the
JSON-RPC framing, content-block translation, and capability structs;
only the dispatch direction differs.
The client role is the standardized successor to
`internal/provider/subprocess`: that package shells out to CLI agents
with one-shot `--output-format stream-json` (or prompt-augmentation
fallback), runs the agent's *own* loop with `--yolo`/`--trust`, and
cannot surface structured tool calls (it sets `ToolUse:false` for
agents lacking stream-json — see TODO "Native agy JSON output"). ACP
fixes all of that: a persistent JSON-RPC session, structured
`session/update` tool-call events, real permission round-trips, and
cancellation.
### No Go SDK exists
Official SDKs are TypeScript, Python, Rust, Kotlin — **no Go**. gnoma
implements the wire protocol natively against the published JSON
schema. Pin the supported `protocolVersion` and the exact method set
against the spec at implementation time (the protocol is young and
still moving).
---
## Non-goals
- **A full editor UI.** In agent mode gnoma renders nothing; the client
owns the UI. gnoma emits `session/update` notifications and the client
displays them.
- **Replacing the TUI / pipe modes.** ACP agent mode is a third entry
mode alongside them, not a replacement.
- **Replacing `internal/provider/subprocess` outright.** The ACP-client
provider is added alongside it; the stream-json subprocess path stays
for agents that don't (yet) speak ACP. Deprecation is a later call.
- **Custom transports.** stdio only (the ACP norm: local agent as a
subprocess). No socket/HTTP transport.
- **gnoma-drives-gnoma over ACP as the default.** gnoma's native
providers/router remain the primary path; ACP-client arms are an
additional backend source.
---
## Design
The two roles share one package (`internal/acp`): JSON-RPC framing,
content-block translation, and the capability/handshake types are
direction-agnostic. **Part A** is the agent (server) side; **Part B**
is the client side. Build Part A first.
## Part A — gnoma as ACP agent (server)
### New entry mode: `gnoma acp`
Add a third mode beside TUI and pipe (mode is chosen near
`cmd/gnoma/main.go:106-114`). Selected by an explicit `acp` subcommand
(stdio is shared with the JSON-RPC channel, so it can't be
TTY-autodetected the way TUI is). In ACP mode:
- **No banner, no TUI, no stdout chatter.** stdout/stdin are the
JSON-RPC pipe; all human/diagnostic logging goes to **stderr** only
(the firewall/audit slog sink must not write to stdout). Audit this
carefully — any stray stdout write corrupts the protocol stream.
- Reuse the existing session/engine/router/security construction; only
the front-end loop differs.
### Package layout
```
internal/acp/
protocol.go // ACP types: handshake, capabilities, content blocks (shared)
jsonrpc.go // framing reused/forked from internal/mcp/jsonrpc.go (shared)
content.go // ContentBlock <-> message.Message translation (shared)
server.go // Part A: stdio JSON-RPC read loop; method dispatch
session.go // Part A: ACP session <-> gnoma session.Session bridge
permission.go // Part A: session/request_permission promptFn
update.go // Part A: gnoma stream events -> session/update
client.go // Part B: spawn external agent, drive the handshake/prompt
```
A separate `internal/provider/acp/` holds the **Part B provider**
adapter (mirrors `internal/provider/subprocess/`), depending on
`internal/acp/client.go`.
Reuse `internal/mcp/jsonrpc.go` framing if it generalises; otherwise
fork the minimal envelope (it's tiny). Keep ACP types separate from MCP
types — they are different protocols that happen to share JSON-RPC.
### Method handlers (agent side)
Map each ACP method to existing gnoma machinery. Pin exact shapes to the
spec; the mapping is the contract:
| ACP method (client→agent) | gnoma handling |
|---|---|
| `initialize` | Reply with `agentCapabilities` (tools, MCP support, prompt streaming, permission modes), `agentInfo` (name "gnoma", `buildVersion`). Negotiate `protocolVersion`. |
| `session/new` | Build a `session.Local` (router, security, tools wired as in main). Honour `cwd` (run it through `safety.ClassifyCWD`), and connect any `mcpServers` the client declares via `internal/mcp/manager.go`. Return a `sessionId`. |
| `session/load` (if advertised) | Rehydrate from `internal/session` store (`SessionStore.Load`). Optional — only if we advertise the capability. |
| `session/prompt` | Translate ACP `ContentBlock`s → `message.Message`, call `Send`/`SendWithOptions`, stream results back as `session/update`, return the stop reason. |
| `session/cancel` (notification) | Cancel the in-flight turn's context. |
Agent→client calls gnoma must make:
| ACP call (agent→client) | Trigger |
|---|---|
| `session/update` (notification) | Per engine stream event: assistant text deltas, tool-call start/args/result, plan/thoughts, token usage. Map gnoma's stream iterator (`Next/Current`) to update variants. |
| `session/request_permission` | gnoma's `permission.Checker` promptFn — instead of console `Scanln`, send this and await the client's allow/deny (with the ACP "allow once / always" options mapped to gnoma permission modes). |
| `fs/read_text_file`, `fs/write_text_file` | **If** we advertise client-side fs and the client supports it, route the `fs` tools through the client so edits show in the editor's buffers. Otherwise gnoma's own `internal/tool/fs` operates on disk directly. Decide per capability negotiation. |
### Streaming bridge
The engine produces a pull-based stream (`Next() / Current() / Err() /
Close()`). The ACP bridge consumes it and emits a `session/update` per
event. Backpressure: ACP is fire-and-forget notifications, so no
blocking — but coalesce text deltas if the client is slow (config knob,
default flush per token).
### Security & safety interplay
- The `SafeProvider` firewall boundary and the per-session audit log
apply unchanged — ACP is a front-end, providers/tools sit behind the
same security layer.
- `safety.ClassifyCWD` runs on the `session/new` `cwd`; a `refuse`
classification returns an ACP error rather than starting the session.
- Egress allowlist (`2026-06-04-egress-allowlist.md`) applies as usual.
- Incognito: expose a way to start an ACP session incognito (capability
flag or `session/new` param) so editor-driven sessions can be
non-persistent.
### MCP-in-ACP
When `session/new` lists `mcpServers`, spin them up through the existing
manager so the editor's MCP config and gnoma's converge in one
handshake (this is the headline ACP×MCP integration). gnoma's own
config-level MCP servers still load too; merge, don't replace.
---
## Part B — gnoma as ACP client (external agents as router arms)
gnoma connects to external ACP agents and exposes each as a router-arm
backend, the standardized successor to `internal/provider/subprocess`.
gnoma plays the *client* (editor) side of the JSON-RPC channel.
### Provider adapter
Add `internal/provider/acp/` implementing the `provider.Provider`
contract (`Stream`, `Name`, `Models`, `DefaultModel`) — the same surface
the subprocess provider satisfies
(`internal/provider/subprocess/provider.go:28-62`):
- **Spawn + handshake.** On first use (or at discovery), spawn the agent
subprocess (`exec.CommandContext`, with the Windows/Unix process-group
handling from `2026-06-04-cross-platform.md`), send `initialize` as the
client, then `session/new` with gnoma's `cwd` and — crucially —
gnoma's *own* MCP servers passed through as the `mcpServers` list so
the external agent shares gnoma's tool surface.
- **`Stream``session/prompt`.** Translate the gnoma `Request`
messages into ACP `ContentBlock`s, send `session/prompt`, and turn the
incoming `session/update` notifications back into gnoma's pull-based
stream events (`EventTextDelta`, structured tool-call events, usage).
This is the win over the subprocess provider: tool calls arrive
**structured**, not as opaque `EventTextDelta` text.
- **Permission callbacks.** The external agent sends
`session/request_permission` to gnoma (now the client). Route these
through gnoma's existing `permission.Checker` so the *user's* gnoma
permission policy governs the sub-agent — a strict improvement over
today's `--yolo`/`--trust` subprocess invocations that bypass gnoma's
gate entirely.
- **`fs/*` callbacks.** Route the agent's file reads/writes through
gnoma's `internal/tool/fs` guard so the path-safety boundary still
applies.
- **Cancellation.** gnoma's turn-cancel sends ACP `session/cancel`.
### Discovery & registration
Mirror the subprocess flow (`cmd/gnoma/main.go:521-531`):
- Discover ACP agents from config (`[acp.agents]` — command + args +
optional capability hints) and/or a known-agents table analogous to
`subprocess/agent.go:60` (`knownAgents`).
- Register each as a `router.Arm` (a new `IsACPAgent` flag, or reuse
`IsCLIAgent` with a transport discriminant). Set `Capabilities` from
the ACP `initialize` response — notably `ToolUse:true`, which the
subprocess provider often can't claim.
- Wrap in `security.WrapProvider(..., fwRef)` exactly like every other
arm so the firewall + audit + egress boundaries hold.
### Relationship to the subprocess provider
Additive. Agents that speak ACP (Claude, Gemini CLI, Codex increasingly
do) get the ACP arm; agents that only do one-shot stream-json keep the
subprocess arm. Where both exist for one binary, prefer ACP. This also
unblocks the "Native agy JSON output" backlog item for any agent that
exposes ACP instead of `--output-format stream-json`.
---
## Touch-points (file:line)
**Part A — agent (server):**
| Change | Location |
|---|---|
| New ACP package | `internal/acp/` |
| Entry mode dispatch | `cmd/gnoma/main.go` (mode select ~`:106`, subcommand dispatch ~`:178`) |
| stdout→stderr log discipline | logger setup (`main.go:100-114`) |
| Session bridge | `internal/session` (`Session`/`Local`) |
| Permission callback | `internal/permission` checker promptFn (`main.go:645-668`) |
| Stream→update | engine stream iterator (`internal/engine`, `internal/stream`) |
| MCP per-session | `internal/mcp/manager.go` |
| JSON-RPC framing reuse | `internal/mcp/jsonrpc.go` |
**Part B — client (external agents as arms):**
| Change | Location |
|---|---|
| ACP-client provider | new `internal/provider/acp/` (mirrors `internal/provider/subprocess/`) |
| Client handshake/driver | `internal/acp/client.go` |
| Arm discovery + registration | `cmd/gnoma/main.go:521-531` (subprocess pattern), `[acp.agents]` config |
| Known-agents table | analogous to `internal/provider/subprocess/agent.go:60` |
| Arm flag | `router.Arm` (`IsACPAgent`, or `IsCLIAgent` + transport) |
| Security wrap | `security.WrapProvider(..., fwRef)` |
---
## Testing (TDD — write first)
- **Protocol unit tests (no real provider):**
- `initialize` handshake: version negotiation, advertised
capabilities are stable and accurate.
- `session/new` → returns a sessionId; honours `cwd`; rejects a
`refuse`-classified cwd with an ACP error.
- `session/prompt` with a stubProvider: ContentBlocks translate in,
`session/update`s stream out in order, correct stop reason.
- `session/cancel` aborts the in-flight turn (context cancellation
observed).
- Permission: a tool call triggers `session/request_permission`; a
"deny" response blocks the tool; "allow always" updates the mode.
- **stdout purity test:** drive a full prompt and assert stdout
contains *only* valid JSON-RPC frames (no banner/log leakage) — this
is the most common ACP-agent bug.
- **Conformance:** run gnoma against the upstream ACP test client /
example client (Rust/TS) in a `//go:build integration` test if one is
available; otherwise a recorded-transcript fixture.
- **MCP-in-ACP:** `session/new` with an `mcpServers` entry spins the
server up and its tools become callable in that session.
- **Part B (client) unit tests** — drive a *fake ACP agent* (a small
in-process JSON-RPC responder, the mirror of the agent-side tests):
- Provider `Stream` performs `initialize`+`session/new`+`session/prompt`
and yields gnoma stream events in order, with **structured** tool-call
events (not opaque text).
- An inbound `session/request_permission` is routed through
`permission.Checker` and a deny blocks the call.
- An inbound `fs/write_text_file` is mediated by the `internal/tool/fs`
guard (a guarded path is refused).
- Turn cancel emits `session/cancel`; the subprocess is reaped (tie to
cross-platform process-group handling).
- Discovery registers a fake ACP agent as an arm with `ToolUse:true`.
- **Round-trip (loopback):** point gnoma's ACP-*client* at a `gnoma acp`
*server* subprocess and run a prompt end-to-end — exercises both parts
over a real stdio pipe.
### Acceptance criteria
**Part A (agent/server):**
1. `gnoma acp` speaks the handshake and a full prompt turn over stdio.
2. gnoma appears and works as an external agent in Zed (manual: add
gnoma to Zed's external-agents config, run a prompt, approve a tool).
3. Tool permission prompts surface in the client and gate execution.
4. stdout carries only JSON-RPC; all logs go to stderr.
5. Cancelling from the editor stops the turn.
6. MCP servers declared by the client in `session/new` are available in
that session.
**Part B (client):**
7. An external ACP agent configured under `[acp.agents]` appears as a
router arm (`gnoma providers` lists it) with `ToolUse:true`.
8. Routing a task to that arm runs a full turn via ACP, surfacing the
sub-agent's tool calls **structured** in gnoma's stream.
9. The sub-agent's permission requests are gated by the user's gnoma
permission policy (not auto-approved).
10. The sub-agent's file writes pass through gnoma's fs guard.
11. Loopback: `gnoma acp` driven by gnoma's own ACP-client completes a
prompt end-to-end.
---
## Open questions (resolve against the live spec at implementation)
- Exact `protocolVersion` to target and the precise capability struct
shapes (the schema is the source of truth; pin a version).
- Whether to advertise client-side `fs/*` (edits flow through the
editor's buffers) vs. direct-disk fs tools — depends on parity and on
how gnoma's `internal/tool/fs` guard composes with editor-mediated
writes.
- `session/load` support (needs our session store to round-trip the
ACP transcript shape).
- **(Part B)** How a sub-agent's own model/cost is represented in the
router — an ACP arm's tokens are billed by *that* agent, so
`CostWeight`/`CostPer1k*` are opaque. Likely model it like the
subprocess arms (no metered cost; selection driven by `Strengths`).
- **(Part B)** Lifecycle: spawn-per-session vs. a pooled long-lived
agent process reused across turns; how cancellation and crashes are
recovered (ties to session error-recovery, `0d3d190`).
---
## TODO linkage
New "Agent Client Protocol (ACP) support" entry in `TODO.md` (In
flight) links here. Covers **both** roles: gnoma as ACP agent (Part A)
and gnoma as ACP client driving external agents as router arms
(Part B). Part B is the standardized successor to
`internal/provider/subprocess` and overlaps the "Native agy JSON
output" backlog item.
@@ -0,0 +1,198 @@
# Cross-Platform Support (Windows + macOS) — 2026-06-04
Makes the Windows and macOS binaries — which GoReleaser already builds
for `linux/darwin/windows × amd64/arm64` but only Linux exercises —
actually work and stay working. Promotes the TODO.md entry
"Cross-platform support — Windows + macOS" into a phased design with
concrete code touch-points.
This plan does not restate the TODO's r/devops question map (Phase 2
table there stands). Its value-add is the **specific code locations**
that need OS-conditional handling and the build-tag pattern to use.
---
## Problem
Only Linux is tested. The binaries ship for Windows/macOS untested, and
the codebase has several hard Unix assumptions that will fail or
silently misbehave off-Linux. The pattern to follow already exists:
`internal/mcp/transport_{unix,windows}.go` split via build tags.
---
## Non-goals
- **MSI installer, Authenticode/Gatekeeper signing.** Covered by
`2026-06-04-distribution-followups.md` — those are packaging, not
runtime correctness.
- **Group Policy / Event Viewer integration.** Out of scope per the
TODO; documentation-only.
- **WSL-specific tuning.** WSL is Linux; it works today.
---
## Confirmed Unix-assumption defects (file:line)
### Critical — break core functionality on Windows
1. **Bash tool hardcodes `bash -c`.**
`internal/tool/bash/bash.go:117`
`exec.CommandContext(ctx, "bash", "-c", command)`. No Windows shell.
Alias harvesting (`internal/tool/bash/aliases.go:115,148`) hardcodes
`/bin/bash` and splits the shell path on `/`.
2. **Llamafile SLM startup hardcodes `sh`.**
`internal/slm/manager.go:172` invokes `sh <llamafile>` (a Wine
binfmt workaround). `sh` is absent on native Windows → `gnoma slm
status/setup` fails outright.
3. **MCP process-tree kill is a Windows stub.**
`internal/mcp/transport_windows.go:10-18``setProcessGroup` is a
no-op and `killProcessTree` calls `p.Kill()`, leaking any child
processes an MCP server spawns. Unix version uses process groups
(`transport_unix.go:11-18`).
### High — config/auth land in the wrong place off-Linux
4. **Config/data dirs assume XDG.**
`internal/config/load.go:52-59` falls back to `~/.config`;
`internal/slm/manager.go:25-35` falls back to `~/.local/share`. On
Windows these should be `os.UserConfigDir()` (`%AppData%`) /
`os.UserCacheDir()`. On macOS, native tools use
`~/Library/Application Support`, though `~/.config` is tolerable;
decide and document.
5. **OAuth credential discovery is Unix-pathed.**
`internal/provider/google/provider.go:188-204` hardcodes
`~/.config/...` and `~/.gemini/...`. `expandHome` (`:114-129`)
already handles `\`, but the path *set* is Unix-centric — Gemini/
Antigravity creds on macOS/Windows won't be found.
6. **No system-proxy support.** No `http.ProxyFromEnvironment` wiring
found. Go stdlib reads `HTTP(S)_PROXY` env vars but **not** the
Windows system proxy / PAC. Corporate Windows networks rely on these.
### Medium — usability / safety classifier gaps
7. **`internal/safety/cwd.go`** macOS system roots
(`:185-210`) miss `/opt`, `/usr/local`; personal-dir detection
(`:221-252`) misses Windows `%TEMP%`/`%APPDATA%` and macOS
`~/Library/...`.
8. **Terminal/ANSI.** TUI uses lipgloss/termenv (auto-detects), so
modern Windows Terminal/PowerShell 7 are fine; legacy `conhost.exe`
may mangle. Verify, don't assume.
---
## Design
### Phase 0 — build-tag scaffolding
Adopt the existing `_unix.go` / `_windows.go` split (as in
`internal/mcp`) for each defect that needs divergent behaviour. Prefer
`runtime.GOOS` only for small inline branches (as
`internal/safety/cwd.go:201` already does); use build tags when the
implementation genuinely differs (shell selection, process kill).
### Phase 1 — smoke tests (unblocks the honest "did you test it?" answer)
Non-blocking GitHub Actions matrix (`windows-latest`, `macos-latest`,
`ubuntu-latest`):
- `go build ./...` and `go test ./...` per OS (today the release
workflow tests Linux only — `.github/workflows/release.yml`).
- Post-release: download each archive, run `gnoma --version` and a
stubbed `echo hi | gnoma --provider ollama` against a fake endpoint.
Confirms the binary launches and the TUI doesn't crash.
This is the precondition the TODO names for posting to r/devops.
### Phase 2 — shell abstraction (defects #1, #2)
1. Introduce `internal/tool/bash/shell_unix.go` /
`shell_windows.go` exposing `defaultShell() (name string, args
[]string)` and a `quoteArg(string) string`:
- Unix: `bash`/`$SHELL`, `-c`, POSIX quoting.
- Windows: prefer `pwsh`/`powershell` with the appropriate
`-Command` invocation and PowerShell quoting rules; fall back to
`cmd /c`. Document the choice.
2. Fix `aliases.go` to use `filepath.Base` instead of splitting on `/`,
and skip alias harvesting on Windows shells that have no equivalent.
3. Llamafile: on Windows, invoke the `.llamafile` (which is a valid
Windows PE as well as a shell script) directly rather than via `sh`;
guard with a build tag.
### Phase 3 — process management (defect #3)
Implement Windows job objects via `golang.org/x/sys/windows` in
`transport_windows.go` (and any other subprocess owner —
`internal/provider/subprocess`, `internal/tool/bash`): create a job,
assign the child, `TerminateJobObject` on close to reap the whole tree.
Shared helper so MCP and bash tool both get tree-kill. (This is the
same item the distribution TODO references.)
### Phase 4 — paths + proxy (defects #4, #5, #6)
1. Replace XDG fallbacks with `os.UserConfigDir()` / `os.UserCacheDir()`
on Windows (keep XDG honoring on Unix). Centralise in one
`configDir()` / `dataDir()` helper so it's not re-derived.
2. Extend the OAuth credential path sets with OS-appropriate locations
(macOS `~/Library/Application Support/...`, Windows `%AppData%/...`).
3. Ensure every `http.Client` uses a transport with
`Proxy: http.ProxyFromEnvironment`. For Windows system-proxy/PAC,
document the env-var workaround now; optionally vendor a PAC-aware
transport (e.g. `github.com/rapid7/go-get-proxied`) later. This
overlaps the shared-client work in
`2026-06-04-egress-allowlist.md` — do the proxy transport once, in
the shared client.
### Phase 5 — safety classifier + terminal (defects #7, #8)
Extend `internal/safety/cwd.go` system-root and personal-dir sets per
OS; add a manual verification note for legacy Windows terminals.
---
## Touch-points (file:line)
| Defect | Location |
|---|---|
| Bash shell | `internal/tool/bash/bash.go:117`, `aliases.go:115,148` |
| Llamafile `sh` | `internal/slm/manager.go:172` |
| MCP kill stub | `internal/mcp/transport_windows.go:10-18` |
| Config/data dirs | `internal/config/load.go:52-59`, `internal/slm/manager.go:25-35` |
| OAuth paths | `internal/provider/google/provider.go:188-204` |
| Proxy | shared `http.Client` (see egress plan) |
| Safety classifier | `internal/safety/cwd.go:185-252` |
| CI matrix | `.github/workflows/` (new test job), `release.yml` |
---
## Testing (TDD — write first)
- **OS-gated unit tests** (run on each matrix OS):
- `defaultShell()` returns a runnable shell per OS; `quoteArg`
round-trips a value containing spaces/quotes through the real shell.
- `configDir()`/`dataDir()` return the OS-correct base.
- Job-object kill: spawn a child that spawns a grandchild; assert
both are gone after `killProcessTree` (Windows).
- `safety.ClassifyCWD` flags OS-appropriate system/personal dirs.
- **Existing tests** that `t.Skip` on Windows
(`internal/tool/fs/guard_test.go`,
`internal/provider/subprocess/stream_test.go`) — audit whether the
skip hides a real gap now that Windows is a target.
### Acceptance criteria
1. CI smoke matrix is green on `windows-latest` + `macos-latest`.
2. `gnoma --version` and a stubbed pipe run succeed on a Windows runner.
3. A bash-tool command with quoted args runs on Windows (PowerShell).
4. An MCP server that spawns a child leaves no orphan after shutdown on
Windows.
5. Config lands in `%AppData%\gnoma` on Windows, `~/.config/gnoma` on
Linux.
---
## TODO linkage
Promotes the "Cross-platform support — Windows + macOS" entry in
`TODO.md`. The Phase-2 r/devops question table stays in the TODO as the
public-facing answer map; link this plan for the implementation detail.
@@ -0,0 +1,169 @@
# Distribution Follow-ups — 2026-06-04
Hardens and broadens the release pipeline. v0.1.0+ already ships static
archives (GitHub mirror releases) and multi-arch Docker images (GHCR)
via GoReleaser. This plan covers the optional follow-ups listed under
"Distribution — follow-ups" in TODO.md: signed checksums, Homebrew tap,
`curl | sh` installer, release-note automation, and the
`dockers``dockers_v2` migration.
---
## Current state (confirmed)
- **`.goreleaser.yml`:** 6-target build matrix (linux/darwin/windows ×
amd64/arm64), CGO disabled, version injected via ldflags
(`-X main.buildVersion/buildCommit/buildDate`; read at
`cmd/gnoma/main.go:55-60`, printed at `:95-98`). Archives: tar.gz
(zip on Windows). Checksums: plain SHA256 `checksums.txt`,
**unsigned**. Docker: separate per-arch `dockers` blocks +
`docker_manifests` for the multi-arch manifest. Release published to
GitHub mirror (`release.github` owner `VikingOwl91`).
- **`.github/workflows/release.yml`:** triggers on `v*` tags, sets up
QEMU + Buildx, logs into GHCR with the built-in `GITHUB_TOKEN`, runs
`go test ./...` (Linux only), then `goreleaser release --clean` with
`GORELEASER_CURRENT_TAG` set. **No signing step.**
- **`Dockerfile`:** distroless `static:nonroot`, copies the
GoReleaser-built binary in. Architecture-agnostic (binary built
before `COPY`).
- **No** Homebrew tap, install script, or Makefile release target.
---
## Non-goals
- **Authenticode (Windows) / Gatekeeper notarization (macOS) code
signing.** These need a paid EV cert / Apple Developer account —
tracked separately (the cross-platform TODO documents the
"right-click → Unblock" workaround). Sigstore/cosign here is for
*checksum* signing, which needs no paid cert.
- **MSI installer.** Lives in the cross-platform plan, gated on demand.
- **Changing the canonical repo flow.** PRs still go to the Gitea
upstream; the GitHub mirror remains the release/CI surface.
---
## Design (independent work items — ship in any order)
### 1. Signed checksums (cosign / sigstore keyless)
Add a GoReleaser `signs` block that signs `checksums.txt` with cosign
in **keyless** mode (OIDC via the GitHub Actions token — no stored
private key, no cert cost):
- Add `cosign` install + `id-token: write` permission to
`release.yml`.
- GoReleaser `signs:``cmd: cosign`, `args: sign-blob` producing
`checksums.txt.sig` + `.pem` (cert bundle) as release artifacts.
- Document verification:
`cosign verify-blob --certificate ... --signature ... checksums.txt`.
Acceptance: a downloaded release verifies offline against the published
signature + Rekor transparency log.
### 2. Homebrew tap
Create a tap repo (`VikingOwl91/homebrew-tap`) and add GoReleaser's
`brews:` block targeting it. Needs a PAT with `contents:write` on the
tap repo (the default `GITHUB_TOKEN` can't push to a *second* repo) —
store as `HOMEBREW_TAP_TOKEN` secret. Formula installs the darwin/linux
archives.
Acceptance: `brew install vikingowl91/tap/gnoma` installs a working
binary on macOS + Linuxbrew; `gnoma --version` matches the tag.
### 3. `curl | sh` installer
Add `install.sh` (committed at repo root, served via the raw GitHub
mirror) that:
- Detects OS/arch, maps to the GoReleaser archive name template
(`gnoma_<ver>_<os>_<arch>.<ext>`).
- Resolves the latest release via the GitHub API (or honours a pinned
`GNOMA_VERSION`).
- Downloads the archive **and** `checksums.txt`, verifies the SHA256
before extracting (and the cosign signature if cosign is present).
- Installs to `~/.local/bin` (or `$GNOMA_INSTALL_DIR`), prints a PATH
hint.
Keep it POSIX-sh, no bashisms. Acceptance:
`curl -fsSL <raw>/install.sh | sh` yields a runnable `gnoma` on a clean
Linux + macOS box; checksum mismatch aborts.
### 4. Release-note automation
GoReleaser already generates a filtered changelog (excludes
docs/test/chore/style). Enrich it:
- Group commits by Conventional-Commit type
(`changelog.groups` with title regexes for feat/fix/perf/refactor).
- Add a release header template pointing to the upstream Gitea repo and
the install methods (brew / curl | sh / docker).
Acceptance: a tagged release's GitHub notes show grouped sections + an
install snippet, with no docs/chore noise.
### 5. `dockers` → `dockers_v2` migration
Collapse the two per-arch `dockers` blocks + `docker_manifests` into a
single `dockers_v2` block (GoReleaser's newer multi-platform builder).
The current `Dockerfile` is architecture-agnostic (binary copied
post-build), so verify whether `dockers_v2`'s expected per-platform
binary layout needs a `Dockerfile` change or a `templates`/`extra_files`
tweak — the TODO flags this as the reason it was deferred. Do it in its
own commit; diff the resulting GHCR manifest against the current one to
prove parity (same tags: `<ver>-amd64`, `<ver>-arm64`, `<ver>`,
`latest`).
Acceptance: GHCR still publishes a multi-arch manifest with identical
tags + labels; `docker pull --platform linux/arm64` works.
### 6. (Carry-over) Windows process-tree kill
Listed in this TODO bullet but it's a *runtime* concern — implemented in
`2026-06-04-cross-platform.md` Phase 3 (job objects). Cross-linked here
only so the TODO bullet's reference resolves.
---
## Touch-points (file:line)
| Item | Location |
|---|---|
| Signing, brews, changelog groups, dockers_v2 | `.goreleaser.yml` |
| cosign install, `id-token` perm, tap token | `.github/workflows/release.yml` |
| Installer | new `install.sh` (repo root) |
| Dockerfile (if dockers_v2 needs it) | `Dockerfile` |
| Tap repo | new `VikingOwl91/homebrew-tap` |
---
## Testing
Distribution is config + scripts, so testing is mostly pipeline-level:
- **Dry run:** `goreleaser release --snapshot --clean` locally must
produce signed checksums, brew formula, and the dockers_v2 manifest
without publishing.
- **install.sh:** a `shellcheck` gate + a CI job that runs it against
the latest release on linux + macos runners and asserts
`gnoma --version`.
- **Checksum/signature negative test:** corrupt the archive → installer
aborts; tampered checksums → cosign verify fails.
### Acceptance criteria
1. A tagged release publishes `checksums.txt` + `.sig` + `.pem`,
verifiable with cosign keyless.
2. `brew install vikingowl91/tap/gnoma` works on macOS.
3. `curl -fsSL <raw>/install.sh | sh` works on clean Linux + macOS,
with checksum verification.
4. Release notes are grouped and carry install instructions.
5. GHCR multi-arch manifest is unchanged after the dockers_v2 swap.
---
## TODO linkage
Promotes the "Distribution — follow-ups" entry in `TODO.md`. Link this
file; the Windows job-object sub-item points at the cross-platform plan.
@@ -0,0 +1,236 @@
# Network Egress Allowlist — 2026-06-04
Adds a per-host network egress boundary to the security layer via a
Learn → Review → Enforce rollout. Promotes the second half of the
TODO.md entry "Security boundary — egress controls + session audit log"
into a phased design.
---
## Status of the sibling item: per-session audit log — DONE
The first half of the TODO entry (per-session audit log of
blocked/redacted events) is **already implemented**:
- `internal/security/audit.go` defines `AuditLogger` / `AuditEvent`,
writing append-only JSONL at mode `0o600`, incognito-gated,
best-effort (write failures never break the scan pipeline).
- `cmd/gnoma/main.go:685-691` wires it to
`<projectRoot>/.gnoma/sessions/<sessionID>/audit.jsonl`.
- `internal/security/firewall.go` records events at `:152` (unicode
sanitize), `:173` (block), `:186` (redact).
**Remaining audit-log gap:** there is no CLI surface to *read* it. The
TODO's promise — answer "what did the firewall do this session?" in one
command — needs a `gnoma firewall audit` subcommand (no `firewall`
subcommand exists today; top-level commands are `providers`, `slm`,
`router`, `profile`). That viewer is folded into Phase 3 below since it
shares the `gnoma firewall` command surface with `firewall review`.
The rest of this plan is the genuinely-unbuilt egress allowlist.
---
## Problem
The current `Firewall` is a **content** boundary only: it scans
messages and tool results for secrets (regex + Shannon entropy) and
redacts/blocks/warns. It does **not** enforce network egress. Outgoing
HTTP uses stock clients with no per-host allowlist and no dial-layer
interception, so a compromised tool, MCP server, or prompt-injected
provider call can reach any host.
The README and v0.3.0 launch post oversold "network egress gated";
this plan makes that claim true.
### Why this is hard: no egress chokepoint today
Outgoing HTTP is constructed in many places, none sharing a client:
- **Provider SDKs** each build their own `http.Client` internally:
- anthropic (`internal/provider/anthropic/provider.go:36`,
`anthropic.NewClient`)
- openai (`internal/provider/openai/provider.go:46`, `oai.NewClient`)
- mistral (`internal/provider/mistral/provider.go:33`,
`mistralgo.NewClient`)
- google genai (`internal/provider/google/provider.go:239,306`)
- **Non-SDK direct calls** using `http.DefaultClient` or ad-hoc
`&http.Client{}`:
- `internal/router/discovery.go` (`:65,141,325,365`)
- `internal/router/probe.go` (`:24,72`)
- `internal/slm/backend.go` (`:266,294,316,343`)
- `internal/slm/download.go` (`:22`)
- `internal/slm/manager.go` (`:273`)
No custom `http.Client` is injected anywhere today. **But** every SDK
supports injecting one, which is the enabler for a single chokepoint.
---
## Non-goals
- **TLS interception / MITM.** We allowlist by destination host, not by
inspecting decrypted payloads. Content inspection stays the
firewall's job.
- **Blocking the provider SDKs' own retry/telemetry hosts by default.**
Model-provider hosts are baseline-allowed (see below).
- **Replacing the OS/network firewall.** This is an in-process
application-level guard, defense-in-depth, not a substitute for real
network controls. Document this honestly (the README over-claim is
the cautionary tale).
---
## Design
### The chokepoint: one shared `http.Client` with a guarded dialer
Build a single `*http.Client` whose `Transport.DialContext` validates
the destination against the allowlist **before** the connection is
made. `DialContext` receives `host:port` pre-resolution, so host-based
matching works without DNS races. Thread this client everywhere.
```
internal/security/egress/
guard.go // EgressGuard: mode + allowlist + Decide(host) ResultEnum
dialer.go // GuardedDialer wrapping net.Dialer.DialContext
client.go // HTTPClient(guard) *http.Client
store.go // learned-destinations persistence (per project)
baseline.go // curated ship-in-binary allowlist
```
**Injection mechanism per SDK** (each differs — enumerate, don't assume):
| Client | Mechanism |
|---|---|
| anthropic | `option.WithHTTPClient(c)` appended in `anthropic/provider.go` |
| openai | `option.WithHTTPClient(c)` appended in `openai/provider.go` |
| google genai | `genai.ClientConfig{HTTPClient: c}` in `google/provider.go` |
| mistral | **user's own SDK** — add `WithHTTPClient` option if absent (`github.com/VikingOwl91/mistral-go-sdk`), then use it |
| non-SDK paths | replace `http.DefaultClient` with the shared client in `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
Plumb the shared client into providers by adding
`HTTPClient *http.Client` to `provider.ProviderConfig`
(`internal/provider/registry.go:8-16`) and setting it in
`createProvider`. The non-SDK paths take the client via their existing
constructors / a package-level setter.
> The non-SDK paths are the trap: if any is missed it punches a hole in
> the allowlist. Treat the list above as a checklist; add a grep test
> (Phase 4) that fails if `http.DefaultClient` reappears.
### Three-stage rollout (not a single "block everything" default)
**Learn.** First runs log every egress destination per `(project,
agent, tool)` tuple to the per-project store **without blocking**.
Reuse the audit JSONL discipline (atomic, incognito-gated).
**Review.** `gnoma firewall review` surfaces the captured set; the user
marks each destination `allow | deny | scoped` (scoped = only reachable
by named tool/agent). Persist to `.gnoma/firewall/allowlist.toml`
(project) — subject to the same `omitempty`/atomic-write discipline as
the config-migration plan (`2026-05-24-config-migration.md`) to avoid
the zero-spam corruption class.
**Enforce.** When mode is `enforce`, unrecognised destinations are
blocked with a clear violation logged to the **same per-session
`audit.jsonl`** (new `AuditEvent.Action = "egress_block"`). Mode is
`[security.egress].mode = "off" | "learn" | "enforce"`, default `off`
(opt-in; shipping `enforce` on by default would break first-run UX).
### Baseline allowlist (curated, ship-in-binary)
`baseline.go` seeds the allowlist so Enforce mode is usable immediately:
- **Package ecosystems:** github.com, registry.npmjs.org, pypi.org,
files.pythonhosted.org, crates.io, static.crates.io,
registry-1.docker.io, proxy.golang.org, sum.golang.org.
- **Model providers:** anthropic, openai, google, mistral, **minimax**
(per `2026-06-04-minimax-provider.md`) — host set derived from the
effective `[provider.endpoints]` map so user-configured local
ollama/llamacpp endpoints are auto-allowed.
The painful middle ground is SDK egress (sentry, stripe, supabase,
datadog…). These break a naive "block unknown" default, which is
exactly why Learn → Review → Enforce is the only flow that scales.
### Per-tool scoping
`scoped` destinations carry an allowed-tool/agent set. Enforcement
checks the calling context — the engine already knows which tool is
running (it threads per-tool context for redaction logging today). Pass
the tool/agent identity into `EgressGuard.Decide(host, callerCtx)`.
---
## Interactions
- **Incognito:** Learn-mode writes are gated by incognito exactly like
the audit log (`IncognitoMode.ShouldLogContent`). Enforcement still
applies in incognito (security is not relaxed); only the *learning*
persistence is suppressed.
- **Config layering:** the allowlist file is a new corruption surface —
follow `2026-05-24-config-migration.md` #1 discipline.
- **SafeProvider:** egress is orthogonal to the content `SafeProvider`
wrap; it lives one layer down at the transport. Both must hold.
---
## Touch-points (file:line)
| Change | Location |
|---|---|
| New egress package | `internal/security/egress/` |
| `HTTPClient` field | `internal/provider/registry.go:8-16` |
| Provider client injection | `anthropic/provider.go`, `openai/provider.go`, `google/provider.go`, `mistral/provider.go` |
| mistral SDK `WithHTTPClient` | `github.com/VikingOwl91/mistral-go-sdk` (if absent) |
| Non-SDK client swap | `router/discovery.go`, `router/probe.go`, `slm/backend.go`, `slm/download.go`, `slm/manager.go` |
| `audit.go` egress action | `internal/security/audit.go` (`AuditEvent`) |
| Config `[security.egress]` | `internal/config/config.go` (SecuritySection ~`:280-306`) |
| `gnoma firewall` command | `cmd/gnoma/main.go` subcommand dispatch (~`:178`) |
| Allowlist store | `.gnoma/firewall/allowlist.toml` |
---
## Testing (TDD — write first)
- **Unit:**
- `EgressGuard.Decide`: off → always allow; learn → allow + record;
enforce → allow baseline/allowlisted, block unknown, scoped host
allowed only for the named tool.
- `GuardedDialer` blocks a non-allowlisted `host:port` before dial
(use a guard with a closed allowlist; assert no connection
attempt — inject a fake inner dialer that records calls).
- Baseline expansion: `[provider.endpoints]` hosts are auto-allowed;
a local ollama URL becomes an allowlist entry.
- Allowlist store round-trips without zero-spam corruption.
- `audit.jsonl` gains an `egress_block` record on a blocked dial.
- **Grep/guard test:** fails if `http.DefaultClient` is used in
provider/router/slm packages (prevents regressions reopening the
hole).
- **Integration (`//go:build integration`):** with mode=enforce and a
minimal allowlist, a provider call to an allowed host succeeds and a
tool fetch to a blocked host fails with a logged violation.
### Acceptance criteria
1. `mode="off"` (default) → behaviour identical to today.
2. `mode="learn"` → every outbound host appears in the store; nothing
is blocked.
3. `gnoma firewall review` lists learned hosts and persists
allow/deny/scoped decisions.
4. `mode="enforce"` → baseline + allowlisted hosts reachable; an
un-allowlisted host is blocked with an `egress_block` line in
`.gnoma/sessions/<id>/audit.jsonl`.
5. `gnoma firewall audit` prints this session's firewall events
(block/redact/egress) in a grep-friendly form. (Closes the
remaining audit-log gap.)
6. Scoped destination reachable by its named tool only.
---
## TODO linkage
Replaces the egress half of the "Security boundary — egress controls +
session audit log" entry in `TODO.md`. Update that entry to mark the
audit log implemented and link this file for the egress work.
@@ -0,0 +1,224 @@
# MiniMax Provider — 2026-06-04
Adds MiniMax (<https://platform.minimax.io>) as a first-class cloud
provider so it can register as a router arm alongside
anthropic/openai/google/mistral. Promotes the TODO.md entry
"MiniMax provider — cloud arm + subscription token plan" out of
bullet form into a phased design.
---
## Problem
Gnoma has no MiniMax adapter. MiniMax ships strong, very cheap coding
models (M2 family) that are a natural fit for the cheap-high-capability
cloud tier the router already reasons about via `CostWeight`. Two facts
make the integration cheap:
1. MiniMax exposes **both** an OpenAI-compatible and an
Anthropic-compatible HTTP surface, so no new translation layer is
needed — gnoma already has both `internal/provider/openaicompat`
(built on the OpenAI SDK) and `internal/provider/anthropic` with a
working `BaseURL` override.
2. `envKeyFor`'s default branch (`cmd/gnoma/main.go:1199-1200`) already
resolves `MINIMAX_API_KEY` for any unknown provider with no code
change.
The remaining work is wiring (a constructor + switch cases +
enumerations), routing metadata (family defaults, rate limits), and a
**design decision around the subscription billing model** that the
router's metered-cost assumption does not currently handle.
### External facts (VERIFY at implementation — MiniMax docs move fast)
These were confirmed 2026-06-04 but the model lineup and pricing are
revised frequently (a pricing overhaul landed 2026-06-02). Re-verify
against the live docs before hardcoding anything:
- **OpenAI-compatible base URL:** `https://api.minimax.io/v1`
(international). A separate region endpoint exists
(`api.minimaxi.com`); confirm the exact host + whether gnoma should
expose a region toggle. Docs:
<https://platform.minimax.io/docs/api-reference/text-openai-api>
- **Anthropic-compatible endpoint:** exists ("two equivalent
endpoints, one mimics OpenAI, one mimics Anthropic"). Confirm the
exact path/host before choosing it over OpenAI-compat.
- **Models (do NOT hardcode a single ID):** MiniMax-M2, M2.1, M2.5,
M2.7 (+ `-highspeed` variants), M3. Coding-relevant default is the
current M2-coding model — at time of writing M2.5 for PAYG, M2.1 for
the subscription plan. **Treat the default as config, not a
constant**, and call `Models(ctx)` to enumerate live.
- **Pricing (PAYG, for `CostPer1k*` metadata):** M2.7 ≈ $0.30 / MTok
input, $1.20 / MTok output; highspeed ≈ 2×. Convert to the EUR
per-1k convention used by the Arm struct. Docs:
<https://platform.minimax.io/docs/guides/pricing-token-plan>
- **Subscription:** "Token Plan" (current; supersedes the former
"Coding Plan"). Flat-rate prompt quota over a rolling window
(published M2.7 limits 1,50030,000 requests / 5h across tiers).
Same Bearer key as PAYG.
---
## Non-goals
- **A bespoke MiniMax SDK / translation layer.** We reuse the existing
OpenAI-compat (default) or Anthropic provider via `BaseURL`. If
MiniMax adds non-standard body fields, use the existing
`openai.NewWithStreamOptions` escape hatch (the same one Ollama uses).
- **Region auto-detection.** Ship the international endpoint as the
default; the user can override via `[provider.endpoints]`. A region
toggle is a follow-up if anyone asks.
- **Full subscription-quota accounting.** Phase 2 models subscription
cost as a coarse `CostWeight` zero-out, not a live quota meter.
---
## Decision: OpenAI-compat vs Anthropic-compat backing
**Default to OpenAI-compat** (`internal/provider/openaicompat`). It is
already exercised by the local backends (ollama/llamacpp), so the
streaming, tool-call, and error paths are battle-tested in this repo.
The Anthropic-compat endpoint is a fallback only if a MiniMax feature
(e.g. extended thinking) is exposed solely through it. Keep the option
open by making the backing selectable via config
(`[provider.minimax].api = "openai" | "anthropic"`), defaulting to
`openai`.
---
## Design
### Phase 1 — provider wiring (smallest shippable slice)
Goal: `gnoma --provider minimax` works against PAYG with metered
pricing, registered as a cloud arm.
1. **Constructor.** Add `NewMiniMax(cfg provider.ProviderConfig)
(provider.Provider, error)` to
`internal/provider/openaicompat/provider.go`, mirroring `NewOllama`
/ `NewLlamaCpp` (`openaicompat/provider.go:18-49`):
- Default `BaseURL` to `https://api.minimax.io/v1` when unset (but
let `[provider.endpoints].minimax` override).
- Require a real API key (unlike Ollama's dummy key) — return an
error if `cfg.APIKey == ""`.
- Leave `MaxRetries` at the SDK default (cloud failures *are*
transient, unlike the local backends which force `0`).
- Default `cfg.Model` to the current coding model **read from
config**, not a baked constant.
2. **Construction switch.** Add `case "minimax": return
openaicompat.NewMiniMax(cfg)` to `createProvider`
(`cmd/gnoma/main.go:1265-1280`). If `[provider.minimax].api =
"anthropic"`, route to `anthropicprov.New(cfg)` with `cfg.BaseURL`
set to the anthropic-compat host instead.
3. **Provider enumerations.** Add `"minimax"` to:
- the known-providers set (`main.go:233-236`),
- the available-providers usage string (`main.go:1279`),
- NOT the local-providers set (it is a cloud arm).
4. **API key (optional friendliness).** `envKeyFor`'s default already
yields `MINIMAX_API_KEY`. Add an explicit `case "minimax"` in
`envKeyFor` (`main.go:1189-1201`) only if we want alternates (e.g.
`MINIMAX_GROUP_ID` if the account requires a group id header —
VERIFY whether MiniMax needs a group id alongside the key; if so,
thread it through `ProviderConfig.Options`).
5. **Family defaults.** Add MiniMax model families to
`knownFamilyDefaults` in `internal/router/defaults.go` (pattern at
`defaults.go:212-239`). Cloud arm → no `MaxComplexity` ceiling. Set
`Strengths` (`TaskGeneration`, `TaskRefactor`, `TaskDebug` are the
coding sweet spot) and a low `CostWeight` (~0.81.0 — cheap arm, so
the cost penalty is small) plus `CostPer1kInput/Output` from the
verified PAYG pricing.
6. **Rate limits.** Add a `minimaxDefaults()` entry in
`internal/provider/ratelimits.go` (pattern at the anthropic block
~`ratelimits.go:109-130`) and wire it into the `DefaultRateLimits`
switch. Use the published PAYG RPM/TPM; allow `[rate_limits.minimax]`
config overrides (the existing override path in `resolveRateLimitPools`).
### Phase 2 — subscription (Token Plan) billing model
The router's `CostWeight` math assumes metered per-token pricing. Under
a Token Plan subscription, marginal cost is ≈0 until the quota is hit,
then requests hard-fail. Design:
1. **Billing knob.** `[provider.minimax].billing = "metered" |
"subscription"` (default `"metered"`). In `subscription` mode, set
the arm's `CostWeight` to 0 (or `CostPer1k*` to 0) so the selector
treats MiniMax as free while quota remains.
2. **Quota-exhaustion failover.** MiniMax returns a quota/429 error
when the plan is exhausted. Map it to the existing rate-limit
backoff path (`Arm.BackoffUntil`, the 429 handling that already
disables an arm temporarily) so the bandit fails over to the next
arm cleanly. This ties into the session error-recovery work landed
in `0d3d190`. Confirm the exact error shape MiniMax returns and add
a classifier in `internal/provider/errors.go`.
3. **Docs.** Document both plans + the region split in
`docs/slm-backends.md` (or a new provider doc) and the README
provider list.
---
## Touch-points (file:line)
| Change | Location |
|---|---|
| `NewMiniMax` constructor | `internal/provider/openaicompat/provider.go` (after `:49`) |
| Construction switch case | `cmd/gnoma/main.go:1265-1280` |
| Known-providers set | `cmd/gnoma/main.go:233-236` |
| Usage string | `cmd/gnoma/main.go:1279` |
| `envKeyFor` (optional) | `cmd/gnoma/main.go:1189-1201` |
| Family defaults | `internal/router/defaults.go:212-239` |
| Rate-limit defaults | `internal/provider/ratelimits.go` (+ `DefaultRateLimits` switch) |
| Error classifier (Phase 2) | `internal/provider/errors.go` |
| Config: `[provider.minimax]` | `internal/config/config.go` (provider section) |
The `Provider` interface contract to satisfy
(`internal/provider/provider.go:136-148`): `Stream`, `Name`, `Models`,
`DefaultModel`. All four come free by delegating to the OpenAI-compat
base provider.
---
## Testing (TDD — write first)
Per CLAUDE.md: table-driven, `//go:build integration` for anything
hitting the live API.
- **Unit (no network):**
- `NewMiniMax` defaults: empty `BaseURL` → `https://api.minimax.io/v1`;
empty key → error; `[provider.endpoints].minimax` override wins.
- `createProvider("minimax", …)` returns a non-nil provider; unknown
still errors.
- `envKeyFor("minimax") == "MINIMAX_API_KEY"`.
- `defaults.go`: a MiniMax model family resolves to the expected
`Strengths`/`CostWeight`; `MaxComplexity == 0`.
- `ratelimits.go`: `DefaultRateLimits("minimax").LookupModel(...)`
returns the configured limits; `"*"` fallback works.
- Phase 2: billing=`subscription` → arm `CostWeight == 0`; the
quota/429 error maps to a retryable/backoff classification.
- **Integration (`//go:build integration`, real `MINIMAX_API_KEY`):**
a one-shot `Stream` against the cheapest model returns tokens;
`Models(ctx)` enumerates a non-empty list.
### Acceptance criteria
1. `MINIMAX_API_KEY=… gnoma --provider minimax -p "hello"` streams a
response in pipe mode.
2. With no `--provider`, MiniMax appears as a selectable router arm and
is chosen for a cheap generation task when `prefer` allows cloud.
3. `gnoma providers` lists `minimax`.
4. Phase 2: with `billing="subscription"`, the selector prefers MiniMax
for eligible tasks; on simulated quota-exhaustion the router fails
over without surfacing an error to the user.
---
## TODO linkage
Replaces the inline "MiniMax provider" bullet in `TODO.md` (In flight).
Link this file from that entry.