Files

T

vikingowl f8ab522bef docs(todo,plans): specs for open features + MiniMax & ACP

Add implementation-ready plans for the in-flight features that lacked
one, and two new provider/protocol items:

- MiniMax provider (cloud arm + Token Plan billing decision)
- Agent Client Protocol (ACP) — dual role: gnoma as ACP agent and as
  ACP client driving external agents as router arms
- Network egress allowlist (Learn/Review/Enforce); note the per-session
  audit log is already implemented, remaining gap is a viewer command
- Cross-platform (Windows/macOS) code touch-points + build-tag pattern
- Distribution follow-ups (cosign, brew tap, installer, dockers_v2)

Link each plan from its TODO.md entry; mark audit-log item done.

2026-06-04 11:59:16 +02:00

24 KiB

Raw Blame History

Gnoma — TODO

Active work, newest first.

In flight

MiniMax provider — cloud arm + subscription token plan. Add MiniMax (api.minimax.io / api.minimaxi.com) as a first-class cloud provider so it can register as a router arm alongside anthropic/openai/google/mistral.

API surface. MiniMax ships two OpenAI-and-Anthropic-compatible HTTP surfaces, so this is a base-URL + auth wiring task, not a new translation layer:
- OpenAI-compatible chat-completions at …/v1 — reusable via internal/provider/openaicompat. Cleanest first cut: add a NewMiniMax(cfg) constructor mirroring NewOllama / NewLlamaCpp (openaicompat/provider.go) with the MiniMax base URL baked in, then a case "minimax" in createProvider (cmd/gnoma/main.go:1265) and the available- providers usage string (:1279).
- Anthropic-compatible endpoint (…/anthropic) — alternative backing via the existing anthropic provider with a BaseURL override. Decide one canonical path; OpenAI-compat is the lower- risk default since openaicompat is already exercised by the local backends.
- Auth. Bearer API key. envKeyFor's default branch (main.go:1199) already resolves MINIMAX_API_KEY with no code change; add an explicit case "minimax" only if we want a friendlier name or alternates list.
- Models. MiniMax-M2 (agentic/coding, the one to default to), MiniMax-M1, abab6.5 series. Set Strengths + MaxComplexity
  - CostWeight on the arm so the selector treats it as a cheap high-capability cloud tier.
Token plan (open question — affects auth + billing UX). MiniMax offers a flat-rate Coding Plan subscription (token-quota based, Claude-Max-style) in addition to metered pay-as-you-go API credits. Both authenticate with the same Bearer key, so no adapter difference — but the router's CostWeight math assumes metered per-token pricing. Under a subscription the marginal cost is ~0 until the quota is hit, then hard-stops. Decisions to make:
- How to model "subscription" cost in the selector — e.g. a [provider.minimax].billing = "subscription" | "metered" knob that zeroes CostWeight while quota remains, vs. real per-token cost when metered.
- Quota exhaustion handling — surface the 429/quota error cleanly and let the bandit fail over to the next arm (ties into the session error-recovery work in 0d3d190).
- Document both plans + the region split (api.minimax.io international vs api.minimaxi.com) in docs/slm-backends.md / provider docs.
Smallest shippable slice: OpenAI-compat NewMiniMax + metered pricing, registered as a cloud arm. Subscription/quota modelling is the follow-up once the billing knob lands. Plan: docs/superpowers/plans/2026-06-04-minimax-provider.md.
Agent Client Protocol (ACP) support. Run gnoma as an ACP agent (gnoma acp) so any ACP-capable editor (Zed, Kiro, OpenCode, …) can drive it as an external coding agent. ACP is "the LSP for AI coding agents": JSON-RPC 2.0 over stdio, editor (client) spawns agent (subprocess). gnoma already owns the hard parts — agentic engine, tools, permissions, and JSON-RPC-over-stdio (from its MCP-client side, internal/mcp/jsonrpc.go). The fit is symmetric: gnoma is the JSON-RPC server here. No Go SDK exists (official SDKs are TS/Python/Rust/Kotlin), so gnoma implements the wire protocol natively against the schema. session/new can declare mcpServers, so ACP and gnoma's existing MCP manager wire up in one handshake.

Dual role — both directions:
1. gnoma as ACP agent (server) — gnoma acp over stdio so editors drive gnoma.
2. gnoma as ACP client — gnoma spawns external ACP agents (Claude, Gemini CLI, Codex, …) and uses them as router-arm provider backends. This is the same shape as the existing internal/provider/subprocess CLI-agent arms (cmd/gnoma/main.go:521-531, IsCLIAgent: true) but over standardized ACP JSON-RPC — gaining structured tool-call surfacing, real turn/permission semantics, and cancellation that the current one-shot stream-json subprocess provider lacks (it sets ToolUse:false for agents without stream-json).
Upstream: https://github.com/agentclientprotocol. Plan: docs/superpowers/plans/2026-06-04-agent-client-protocol.md.
Config write/merge — silent corruption of layered configs. internal/config/write.go:setConfig reads the existing TOML into a zero-valued Config struct, sets one field, and writes the entire struct back out — so every untouched field gets serialized at its Go zero value (empty strings, zero ints, false bools). On the next load, those explicit zeros overwrite higher-priority layers via toml.Decode's "present field beats absent field" semantics.

Concrete symptom (2026-05-24): user's ~/.config/gnoma/config.toml had [router].prefer = "cloud" but the project-level .gnoma/config.toml had prefer = "" (generated by an earlier gnoma config set ... call), which silently downgraded the effective policy to auto — visible only via the new /router TUI command, with no warning.

Same root cause is responsible for the zero-spammed global config the same user has (max_tokens = 0, permission.mode = "", bash_timeout = 0, etc.) — all overwriting sensible defaults.

Fix surface (multi-part, plan-worthy):
1. Stop generating zero-spam. Two options:
  - Tag struct fields with ,omitempty so the BurntSushi encoder skips zero values. Caveat: conflates "unset" with "explicitly zero" for primitive types (a user who wants max_keep = 0 loses it). Safe for strings/maps/slices where empty is never user-intent; lossy for numeric fields.
  - Switch to pelletier/go-toml/v2 and use its document model to edit only the targeted key, preserving everything else byte-for-byte. Cleaner semantics, bigger refactor.
  - Hybrid: omitempty on string/map/slice fields, document-level edit for numerics. Fastest path that doesn't lose intent.
2. gnoma doctor — read-only diagnostic. Scans both global and project configs and reports:
  - Zero-spam fields that would silently shadow defaults or upstream layers.
  - Invalid enum values (e.g. permission.mode = "").
  - Unknown / removed keys from older schema versions.
  - Effective-merged values (so the user sees what gnoma will actually use after layering). No writes. Exits non-zero on findings so it's CI-friendly.
3. gnoma upgrade-config — active migration. For each config file (global, profiles, project):
  - Compute the cleaned form (only fields the user actually set, dropping zeros that match defaults).
  - Write the original to <path>.bak with timestamp suffix.
  - Write the cleaned form to the original path.
  - Print a diff of what changed so the user can verify.
4. Project-level auto-migration on startup. If gnoma detects a zero-spammed project .gnoma/config.toml at launch:
  - Auto-run the upgrade (project-only, never auto-touch the global config).
  - Write .gnoma/config.toml.bak-YYYY-MM-DD-HHMMSS.
  - Surface a one-line notice in the startup safety banner: config: migrated .gnoma/config.toml (see .bak).
  - The auto-migration is non-destructive (.bak preserves original) but still gated behind a [config].auto_migrate toggle, defaulting to true. Global configs require explicit gnoma upgrade-config.
5. Project registry (~/.config/gnoma/projects.json). Today there is no record of which directories gnoma has been launched in — items #2 and #3 can work with a filesystem scan (find ~ -type d -name .gnoma), but a registry makes them significantly faster and unlocks cross-project features. Sketch:
```
{
  "projects": [
    {
      "path": "/home/.../my-repo",
      "first_seen": "2026-04-15T10:30:00Z",
      "last_seen":  "2026-05-24T19:23:00Z",
      "session_count": 47
    }
  ]
}
```
  Update on every successful startup (record project root, bump last_seen + increment session_count). Enables:
  - Fast gnoma doctor --all-projects without a filesystem walk.
  - Cross-project session listing (gnoma sessions --all picker; surface most-recent sessions across the registry).
  - gnoma upgrade-config that can migrate every known project in one invocation.
  - Future local-only aggregate stats (gnoma stats) — still no-phone-home, just a sum across the registry.
  Caveats and design constraints:
  - The registry file becomes another silent-corruption surface — must use the same omitempty / atomic-write discipline as the encoder fix in #1, or it'll exhibit the same class of bug.
  - Stale entries (deleted projects). gnoma doctor should detect and offer to prune; do not auto-delete.
  - Privacy: this is literally a log of directories the user has worked in. Local-only, never sent off-machine (per the no-phone-home positioning), but worth a one-line note in the Security section of the README so users know it exists.
  - Opt-out: [config].project_registry = false for users who don't want this tracked. Default true.
  - Atomic writes (temp file + rename) so a crash mid-write doesn't corrupt the file.
Surfaced from the v0.3.1 launch wave (2026-05-24). Plan: docs/superpowers/plans/2026-05-24-config-migration.md.
Bandit selector — design decisions deferred. The current selector (internal/router/selector.go:scoreArm) is greedy quality-weighted: per-(arm × task-type) EMA scores blended 70/30 with heuristic defaults, divided by CostWeight-adjusted cost. It is not a true multi-armed bandit — no UCB-style exploration bonus, no Thompson sampling. Tracked as a design question rather than a must-implement item because of two open dependencies:
1. Whether to keep numeric EMA at all. The 2026-05-07 roadmap (Phase 4) puts re-evaluating bandit learning on hold until the SLM-driven dispatcher is in production. Three options on the table: keep bandit as feedback for the SLM, retire EMA in favour of qualitative outcome summaries fed to the SLM, or split responsibilities (SLM = intent routing, bandit = cost/quality within a tier). See docs/superpowers/plans/2026-05-07-gnoma-roadmap.md §Phase 4.
2. User-tunable selector knobs. Several constants are hardcoded today: qualityAlpha (EMA smoothing, ~3-sample memory), the 70/30 observed/heuristic blend, strengthScoreBonus for tagged task types, and the DefaultThresholds.Minimum quality floor. Surfacing these as [router.bandit] config keys would let users tune for their workloads (faster alpha for shifting model performance, longer memory for stable fleets) without waiting for the strategic decision in #1.
Surfaced from the r/coolgithubprojects v0.3.1 launch thread (2026-05-24, u/Ha_Deal_5079). The encoder + contextual bandit alternative is now sketched in docs/superpowers/plans/2026-05-25-encoder-bandit-router.md — that plan supersedes #1 above when it ships.
Security boundary — egress controls + session audit log. The current Firewall is a content boundary only (scans messages and tool results for secrets via regex + Shannon entropy, redacts or blocks, logs via log/slog). It does not enforce network egress — outgoing HTTP from tools and providers uses stock http.Client with no per-host allowlist or dial-layer interception. Two follow- ups surfaced from the r/SideProject v0.3.0 launch thread (2026-05-24, u/Secret_Theme3192):
1. Per-session audit log of blocked/redacted events — ✅ JSONL writing implemented: internal/security/audit.go + wiring at cmd/gnoma/main.go:685-691 (.gnoma/sessions/<id>/audit.jsonl), recorded from firewall.go:152/173/186. Remaining gap: no CLI to read it — a gnoma firewall audit viewer is folded into the egress plan (shares the gnoma firewall command surface).
2. Per-host egress allowlist (HTTP transport layer) — design refined by u/HarjjotSinghh on the r/SideProject thread (2026-05-28). Three-stage rollout, not a single-shot "block everything except X" default:
  - Learn. First run logs every egress destination per (project, agent, tool) tuple without blocking.
  - Review. New gnoma firewall review subcommand surfaces the captured set; user marks each destination as allow / deny / scoped.
  - Enforce. Subsequent runs block unrecognised destinations with a clear violation log (lives alongside the per-session audit log from item #1).
  Default baseline destinations (curated, ship-in-the-binary):
  - Package ecosystems: github.com, npm registry, pypi.org, crates.io, docker hub, golang.org/proxy.golang.org.
  - Model providers: anthropic, openai, google, mistral — plus user-configured local ollama / llamacpp endpoints read from [provider.endpoints].
  The painful middle ground is SDK egress (sentry, stripe, supabase, datadog, …) — these break a "block unknown" default fast, which is why the Learn → Review → Enforce flow is the only thing that scales. Per-tool scoping (bash can only reach hosts X, MCP server Y can only reach hosts Z) is the layer above the project-wide allowlist.
  
  The README and v0.3.0 Reddit post phrasing oversold "network egress gated"; corrected in the README scope note and the audit-log commit.
Egress plan (incl. the gnoma firewall audit viewer for item #1): docs/superpowers/plans/2026-06-04-egress-allowlist.md.

Cross-platform support — Windows + macOS. GoReleaser builds static binaries for linux/darwin/windows × amd64/arm64 every release but only Linux is exercised at all today. Windows and macOS binaries ship untested. Surfaced 2026-05-28 (r/SideProject reply to u/HarjjotSinghh) — answered "yes Windows builds ship" but honestly couldn't claim they're tested. His framing was specifically that the r/devops audience will surface predictable questions "within a week" — list below maps each question to the underlying gnoma-side gap.

Phase 1 — smoke tests (unblock the honest answer)

Non-blocking GitHub Actions matrix job per tag: pull each release archive, run gnoma --version && echo hi | gnoma --provider ollama against a stub provider. Confirms the binary executes and the TUI doesn't crash before any real bug-hunt starts.

Phase 2 — Windows-specific concerns (r/devops question pattern)

Each row is an expected r/devops question, the gnoma-side gap it exposes, and the rough fix scope. Order roughly by "how soon would this come up in a thread":

Question	Gap	Fix scope
"Does it work in PowerShell?"	Shell quoting in `internal/tool/bash` assumes POSIX; ANSI escape handling not tested against PowerShell + Windows Terminal	Add a PowerShell quoter (Quote a la `Get-Process "$arg"` rules); test ANSI emission against `Out-Host` and legacy `conhost.exe`
"WSL or native?"	Both should work; not documented; corporate-managed Windows VMs often lack WSL	One README line + a smoke test invocation under each
"Respects system proxy / corporate proxy?"	Go `http.Client` reads `HTTP_PROXY`/`HTTPS_PROXY` env vars but does not read Windows system proxy registry or PAC files. Corporate networks rely on these.	Either document the env-var workaround, or vendor a PAC-aware transport (e.g. `github.com/rapid7/go-get-proxied`); test path covered by Phase 1 smoke matrix
"Authenticode signed binary?"	Releases are unsigned; SmartScreen will warn, some corp policies block	GoReleaser supports cosign + signtool integration; needs an EV cert (or Azure Trusted Signing) — non-trivial cost. Document the workaround for now: "right-click → Properties → Unblock"
"MSI installer?"	We ship a zip; some shops can't deploy raw zips through SCCM / Intune	Add an `.msi` artifact to GoReleaser via `go-msi` or `wix`. Mid-effort; gated on whether anyone actually asks for it (post the question to the eventual r/devops thread, see who upvotes)
"Windows Event Viewer integration?"	Logs go to slog default sink + per-session audit log under project root	Document the audit log location explicitly; add a `--log-format=eventlog` mode later if anyone asks
"Group Policy hooks?"	None. Config is per-user TOML.	Out of scope short-term. Document `[provider.endpoints]` + `[router].prefer` as the levers admins would use via login script / config push
"Air-gapped install?"	Static binary works; ollama dependency is the problem (model downloads, runtime updates)	Document the offline flow: pre-download models via `ollama pull` on a connected machine, ship to the air-gapped network. Not a code change, just a doc gap

Phase 3 — macOS concerns

Smaller surface; mostly Apple-silicon launch sanity (the arm64 binary works) + Gatekeeper / notarization warning on first run. Same documentation note as Authenticode applies.

Pre-conditions for posting to r/devops

Per next-reddit-post, the security-observation post should land on r/devops eventually. Don't post until Phase 1 is in place so the predictable "did you test it?" question has an honest answer. Phase 2 items don't all need to ship first — but each one needs at least a TODO-linked acknowledgement in the post body so the thread sees gnoma takes the gaps seriously.

Plan (build-tag scaffolding + concrete code touch-points): docs/superpowers/plans/2026-06-04-cross-platform.md.

Tool-router specialization (functiongemma) — gated on telemetry, not committed. Phase A.2 adds did-switch-rate measurement to the two-stage select_category path; Phase A.3 (LoRA fine-tune of functiongemma-270m-it as a dedicated ArmRoleToolRouter) only fires if did-switch rate exceeds 20 %. Three independent external reviews consulted 2026-05-23; consensus is "fits as tool-call router, not chat; fine-tuning mandatory; prove the need first." See docs/superpowers/plans/2026-05-23-tool-router-specialization.md.
Entropy FP reduction (post-SLM Phase F) — F-1 (format-aware pre-extractor) shipped 2026-05-22: [security].entropy_safelist with uuid, sha_hex, iso8601, url; default empty so pre-F-1 behaviour is unchanged. F-2 (SLM-assisted classifier for ambiguous entropy hits) remains gated on F-1 FP-rate telemetry from real workloads plus ≥50 SLM observations. Surfaced from the r/ollama launch thread (2026-05-20); external validation from alterlab.io on the same tiered approach. See docs/superpowers/plans/2026-05-19-post-slm-unlock.md.
Compound tools (post-SLM Phase E) — held until ≥50 SLM observations inform which primitives are worth adding. See docs/superpowers/plans/2026-05-19-post-slm-unlock.md.
Sensitive-content handling — unified policy. Three input paths can introduce sensitive content into the context: pasted images (screenshots may contain secrets, API keys, PII), pasted text (often copied straight from a terminal with credentials), and tool-read files (.env, key files, etc.). Today these are handled inconsistently: incognito gates persistence but content still flows to providers; outgoing-scan firewall covers some patterns but is format-aware only for text. Need a single policy/UI: at-paste warning when the content matches sensitive heuristics, a consent-gated review step, and consistent treatment across the three paths. Cross-cuts with Phase F entropy work and the outgoing-scan firewall. Plan: docs/superpowers/plans/2026-05-24-sensitive-content-policy.md.
Distribution — follow-ups. v0.1.0 shipped (archives on github.com/VikingOwl91/gnoma/releases, multi-arch images on ghcr.io/vikingowl91/gnoma). Still optional: Homebrew tap, curl | sh installer script, signed checksums (cosign/sigstore), release note automation, Windows process-tree kill via golang.org/x/sys/windows job objects (currently os.Process.Kill only — see internal/mcp/transport_windows.go), and migration from dockers + docker_manifests to dockers_v2 in .goreleaser.yml (collapses ~45 lines into one block but requires Dockerfile changes for the per-platform binary layout — deferred to its own commit before v0.3.0). Plan: docs/superpowers/plans/2026-06-04-distribution-followups.md.

Stable backlog (not in active phases)

Thinking mode (disabled / budget / adaptive) — M12.
Structured output with JSON schema validation — M12.
Native agy JSON output — switch the subprocess provider to --output-format stream-json once the agy CLI supports it, replacing the current prompt-augmentation fallback. Until then, agy's ToolUse capability is set to false (see internal/provider/subprocess/agent.go agy entry) — without structured tool-call output, the router would otherwise dispatch tool-needing tasks to agy and the turn would hang on prose hallucinations of tool calls. Flip the capability back to true in the same change that lands stream-json parsing.
SQLite session persistence + serve mode — M10.
Task learning (pattern recognition, persistent tasks) — M11.
Web UI (gnoma web) — M15.
OAuth / keyring — M13.
Observability (feature flags, cost dashboards) — M14.
PE / Mach-O ELF support — future, after ELF Phase 6.

History

Completed initiatives, kept here as pointers to their plan files:

v0.1.0 release — 2026-05-20. First tagged release. GoReleaser pipeline produces six static archives (linux/darwin/windows × amd64/arm64) on the GitHub mirror plus multi-arch Docker images on GHCR. History was rewritten on the same day to migrate authorship to a noreply identity and strip co-author attribution.
Post-audit security hardening — complete 2026-05-19. Three waves
- one ADR closed all 14 findings from the external review:
- Wave 1 — SafeProvider boundary
- Wave 2 — Incognito coherence
- Wave 3 — scanner + path hygiene (rolled out directly without a plan file; see commits leading up to 2026-05-19 on internal/security)
- ADR-004 — PostToolUse hook ordering
Post-SLM unlock — plan. Phases A–D complete (two-stage tool routing, CLI agent binary override, user profiles, per-arm capability tags).
2026-05-07 roadmap — plan. M1–M8 done; SLM classifier (Phase 3) complete; Phase 4 superseded by the post-SLM plan.

Reference

Milestones: docs/essentials/milestones.md
Decisions: docs/essentials/decisions/
ADR-002 (SLM routing, supersedes earlier ADR-009): docs/essentials/decisions/002-slm-routing.md

24 KiB Raw Blame History Unescape Escape